

ClifyX
Data Engineer (W2 Only)
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer with 8+ years of experience, offering a 12+ month contract in Chicago, IL (Hybrid). Key skills include PySpark, ETL/ELT, cloud platforms (AWS, Azure, GCP), and data modeling.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
October 17, 2025
🕒 - Duration
More than 6 months
-
🏝️ - Location
Hybrid
-
📄 - Contract
W2 Contractor
-
🔒 - Security
Unknown
-
📍 - Location detailed
Chicago, IL
-
🧠 - Skills detailed
#Databases #Databricks #PySpark #Big Data #Data Ingestion #Monitoring #Data Engineering #Data Modeling #Azure #Documentation #AWS EMR (Amazon Elastic MapReduce) #Datasets #ADLS (Azure Data Lake Storage) #Scala #Data Quality #Data Science #Dataflow #Data Architecture #Data Vault #Vault #Azure Databricks #Data Pipeline #Data Accuracy #Snowflake #Quality Assurance #S3 (Amazon Simple Storage Service) #GCP (Google Cloud Platform) #Cloud #AWS (Amazon Web Services) #Data Transformations #"ETL (Extract #Transform #Load)" #Spark (Apache Spark) #Data Processing #ML (Machine Learning)
Role description
Hello,
Greetings from Clifyx.
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• Visa (GC/USC)
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• Title: Data Engineer
Location: Chicago, IL (Hybrid-Onsite)
12+Months Contract
Minimum years of experience: 8+ Years of experience
Job Description:
Data Pipeline Development: Design, develop, test, and deploy robust and scalable data pipelines using PySpark for data ingestion, transformation, and loading (ETL/ELT) from various sources (e.g., S3, ADLS, databases, APIs, streaming data).
Big Data Processing: Utilize PySpark to process large datasets efficiently, handling complex data transformations, aggregations, and data quality checks.
Performance Optimization: Optimize PySpark jobs for performance, efficiency, and cost-effectiveness, identifying and resolving bottlenecks.
Data Modeling: Collaborate with data architects and analysts to design and implement efficient data models (e.g., star schema, snowflake schema, data vault) for analytical and reporting purposes.
Cloud Integration: Work with cloud platforms (AWS, Azure, GCP) and their respective big data services (e.g., AWS EMR, Azure Databricks, strong understanding of medallion, GCP Dataflow/Dataproc) to deploy and manage PySpark applications.
Collaboration: Work closely with data scientists, machine learning engineers, and other stakeholders to understand data requirements and deliver solutions that meet business needs.
Testing and Quality Assurance: Implement comprehensive unit, integration, and end-to-end tests for data pipelines to ensure data accuracy and reliability.
Monitoring and Support: Monitor production data pipelines, troubleshoot issues, and provide ongoing support to ensure data availability and integrity.
Documentation: Create and maintain clear and concise documentation for data pipelines, data models, and processes.
Innovation: Stay up-to-date with the latest advancements in big data technologies, PySpark, and cloud services, and recommend new tools and approaches.
Thanks & Best Regards,
Vishal Swami – Clifyx (US IT Recruiter)
Contact Number- 908-279-1295
LinkedIn: linkedin.com/in/vishal-swami-790a72179
Headquarters: South Plainfield, NJ – 07080
Hello,
Greetings from Clifyx.
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• Visa (GC/USC)
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• Title: Data Engineer
Location: Chicago, IL (Hybrid-Onsite)
12+Months Contract
Minimum years of experience: 8+ Years of experience
Job Description:
Data Pipeline Development: Design, develop, test, and deploy robust and scalable data pipelines using PySpark for data ingestion, transformation, and loading (ETL/ELT) from various sources (e.g., S3, ADLS, databases, APIs, streaming data).
Big Data Processing: Utilize PySpark to process large datasets efficiently, handling complex data transformations, aggregations, and data quality checks.
Performance Optimization: Optimize PySpark jobs for performance, efficiency, and cost-effectiveness, identifying and resolving bottlenecks.
Data Modeling: Collaborate with data architects and analysts to design and implement efficient data models (e.g., star schema, snowflake schema, data vault) for analytical and reporting purposes.
Cloud Integration: Work with cloud platforms (AWS, Azure, GCP) and their respective big data services (e.g., AWS EMR, Azure Databricks, strong understanding of medallion, GCP Dataflow/Dataproc) to deploy and manage PySpark applications.
Collaboration: Work closely with data scientists, machine learning engineers, and other stakeholders to understand data requirements and deliver solutions that meet business needs.
Testing and Quality Assurance: Implement comprehensive unit, integration, and end-to-end tests for data pipelines to ensure data accuracy and reliability.
Monitoring and Support: Monitor production data pipelines, troubleshoot issues, and provide ongoing support to ensure data availability and integrity.
Documentation: Create and maintain clear and concise documentation for data pipelines, data models, and processes.
Innovation: Stay up-to-date with the latest advancements in big data technologies, PySpark, and cloud services, and recommend new tools and approaches.
Thanks & Best Regards,
Vishal Swami – Clifyx (US IT Recruiter)
Contact Number- 908-279-1295
LinkedIn: linkedin.com/in/vishal-swami-790a72179
Headquarters: South Plainfield, NJ – 07080