

Ket Software
Data Engineer (Python & PySpark )
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer (Python & PySpark) located in Westlake, TX and Raleigh, NC, with a contract length of "unknown." The pay rate is "unknown." Requires 5+ years of experience, expertise in Python, PySpark, SQL, and AWS services.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
November 29, 2025
🕒 - Duration
Unknown
-
🏝️ - Location
On-site
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Raleigh, NC
-
🧠 - Skills detailed
#Libraries #Data Processing #Cloud #Snowflake #Database Performance #Distributed Computing #Python #Data Transformations #S3 (Amazon Simple Storage Service) #"ETL (Extract #Transform #Load)" #Docker #SQL (Structured Query Language) #Data Science #Data Pipeline #AWS Glue #PySpark #Jenkins #Data Quality #DevOps #Data Analysis #Lambda (AWS Lambda) #Data Extraction #AWS (Amazon Web Services) #Deployment #Hadoop #Pandas #Terraform #SQL Queries #Apache Spark #Data Manipulation #NumPy #Scala #Datasets #Data Engineering #Spark (Apache Spark)
Role description
Role:Data Engineer (Python & PySpark Focus)
Location: Westlake, TX And Raleigh, NC
Job Description:
We are seeking a highly motivated and experienced Data Engineer to join our team, focusing on building, optimizing, and deploying robust, scalable data solutions. The ideal candidate will possess deep expertise in Python and PySpark to drive complex data transformations and support high-volume, performance-critical simulation initiatives.
Key Responsibilities
• Design, build, and maintain high-performance ETL/ELT data pipelines using Python and PySpark.
• Apply expertise in Python's data analysis libraries, including Pandas and NumPy, to perform complex data manipulation, cleansing, and transformation.
• Develop and manage data processing jobs leveraging PySpark for distributed computing across large-scale datasets.
• Implement DevOps practices and tooling (e.g., Docker, Jenkins, Terraform, CloudFormation) for the automated deployment and orchestration of Python applications and data pipelines.
• Collaborate with data scientists and analysts to ensure data quality, availability, and consistency for advanced modeling and reporting.
• Utilize AWS or other cloud services (e.g., S3, Glue, EMR, Snowflake) to architect and maintain cloud-based data ecosystems.
• Write and optimize complex SQL queries for data extraction, integrity checks, and performance tuning.
Required Technical Skills
• 5+ years of experience in Data Engineering or a related technical field.
• Expert-level proficiency in Python, including a strong command of core concepts and specialized data libraries (Pandas, NumPy).
• Solid hands-on experience with PySpark for building scalable data workflows.
• Strong background in DevOps principles and tools for deploying Python-based data applications (e.g., containerization, CI/CD).
• Experience with cloud platforms (AWS strongly preferred) and associated data services (e.g., AWS Glue, S3, Lambda, Snowflake).
• Advanced knowledge of SQL and experience with modern data warehousing and database performance tuning.
• Familiarity with distributed data processing technologies (e.g., Apache Spark, Hadoop).
Role:Data Engineer (Python & PySpark Focus)
Location: Westlake, TX And Raleigh, NC
Job Description:
We are seeking a highly motivated and experienced Data Engineer to join our team, focusing on building, optimizing, and deploying robust, scalable data solutions. The ideal candidate will possess deep expertise in Python and PySpark to drive complex data transformations and support high-volume, performance-critical simulation initiatives.
Key Responsibilities
• Design, build, and maintain high-performance ETL/ELT data pipelines using Python and PySpark.
• Apply expertise in Python's data analysis libraries, including Pandas and NumPy, to perform complex data manipulation, cleansing, and transformation.
• Develop and manage data processing jobs leveraging PySpark for distributed computing across large-scale datasets.
• Implement DevOps practices and tooling (e.g., Docker, Jenkins, Terraform, CloudFormation) for the automated deployment and orchestration of Python applications and data pipelines.
• Collaborate with data scientists and analysts to ensure data quality, availability, and consistency for advanced modeling and reporting.
• Utilize AWS or other cloud services (e.g., S3, Glue, EMR, Snowflake) to architect and maintain cloud-based data ecosystems.
• Write and optimize complex SQL queries for data extraction, integrity checks, and performance tuning.
Required Technical Skills
• 5+ years of experience in Data Engineering or a related technical field.
• Expert-level proficiency in Python, including a strong command of core concepts and specialized data libraries (Pandas, NumPy).
• Solid hands-on experience with PySpark for building scalable data workflows.
• Strong background in DevOps principles and tools for deploying Python-based data applications (e.g., containerization, CI/CD).
• Experience with cloud platforms (AWS strongly preferred) and associated data services (e.g., AWS Glue, S3, Lambda, Snowflake).
• Advanced knowledge of SQL and experience with modern data warehousing and database performance tuning.
• Familiarity with distributed data processing technologies (e.g., Apache Spark, Hadoop).






