

HARAMAIN SYSTEMS INC.
Data Engineer : W2 Onsite Role
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer in New Jersey, NJ, offering a long-term contract. Required skills include 5+ years in Data Engineering, proficiency in PySpark, Python, SQL, and experience with AI/ML workflows. US healthcare domain experience is a plus.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
Unknown
-
ποΈ - Date
February 3, 2026
π - Duration
Unknown
-
ποΈ - Location
On-site
-
π - Contract
W2 Contractor
-
π - Security
Unknown
-
π - Location detailed
Jersey City, NJ
-
π§ - Skills detailed
#FHIR (Fast Healthcare Interoperability Resources) #Data Ingestion #Deployment #MLflow #"ETL (Extract #Transform #Load)" #Spark (Apache Spark) #MySQL #GCP (Google Cloud Platform) #SQL Queries #Data Pipeline #Data Processing #Scala #Azure #Databricks #Data Science #Data Engineering #Data Modeling #Data Mart #SQL (Structured Query Language) #PostgreSQL #Automation #Data Quality #Python #Airflow #SQL Server #ML (Machine Learning) #PySpark #MongoDB #Libraries #AWS (Amazon Web Services) #Databases #Programming #NoSQL #Cloud #Model Deployment #AI (Artificial Intelligence)
Role description
Note: This is a w2 role, cannot do C2C
Role : Data Engineer
Location : New Jersey, NJ onsite
Long term contract
About the Role We are looking for a highly skilled Senior Data Engineer with strong expertise in PySpark, Python, SQL, and database technologies, along with exposure to Data Science, AI/ML techniques. The ideal candidate will design and optimize scalable data pipelines, collaborate with cross-functional teams, and contribute to the development of analytical and machine learningβdriven solutions.
Key Responsibilities
Data Engineering & Pipeline Development
β’ Design, develop, and optimize large-scale ETL/ELT pipelines using PySpark and distributed data processing frameworks.
β’ Build high-performance data ingestion workflows from diverse structured and unstructured sources.
β’ Implement scalable data models, data marts, and warehousing solutions.
Programming & Database Expertise
β’ Write clean, modular, and optimized code using Python for data processing and automation.
β’ Develop complex SQL queries, stored procedures, and performance-tuned database operations.
β’ Work with relational and NoSQL databases (e.g., MySQL, PostgreSQL, SQL Server, MongoDB, etc.).
Data Science + AI/ML Collaboration
β’ Partner with Data Science teams to productionize ML models and enable ML-driven pipelines.
β’ Contribute to model deployment, feature engineering, and ML workflow optimization.
β’ Integrate ML models into scalable data platforms.
Architecture & Best Practices
β’ Ensure data quality, reliability, lineage, and governance across data workflows.
β’ Drive best practices in coding, testing, CI/CD, and cloud-based deployments.
β’ Work with crossβfunctional teams to translate business requirements into robust data solutions.
Required Skills & Qualifications
β’ 5+ years of experience in Data Engineering with strong hands-on work in PySpark.
β’ Strong proficiency in Python, including libraries for data processing.
β’ Advanced knowledge of SQL and performance optimization techniques.
β’ Experience with distributed data systems (Spark, Databricks, Hive, or similar).
β’ Exposure to AI/ML workflows, including model deployment or MLOps.
β’ Solid understanding of data modeling, warehousing concepts, and ETL/ELT architectures.
Good to Have
β’ US Healthcare domain experience (HIPAA, claims data, EHR/EMR, HL7, FHIR, etc.).
β’ Experience with cloud platforms (Azure, AWS, GCP).
β’ Knowledge of MLflow, Airflow, or similar tools.
Note: This is a w2 role, cannot do C2C
Role : Data Engineer
Location : New Jersey, NJ onsite
Long term contract
About the Role We are looking for a highly skilled Senior Data Engineer with strong expertise in PySpark, Python, SQL, and database technologies, along with exposure to Data Science, AI/ML techniques. The ideal candidate will design and optimize scalable data pipelines, collaborate with cross-functional teams, and contribute to the development of analytical and machine learningβdriven solutions.
Key Responsibilities
Data Engineering & Pipeline Development
β’ Design, develop, and optimize large-scale ETL/ELT pipelines using PySpark and distributed data processing frameworks.
β’ Build high-performance data ingestion workflows from diverse structured and unstructured sources.
β’ Implement scalable data models, data marts, and warehousing solutions.
Programming & Database Expertise
β’ Write clean, modular, and optimized code using Python for data processing and automation.
β’ Develop complex SQL queries, stored procedures, and performance-tuned database operations.
β’ Work with relational and NoSQL databases (e.g., MySQL, PostgreSQL, SQL Server, MongoDB, etc.).
Data Science + AI/ML Collaboration
β’ Partner with Data Science teams to productionize ML models and enable ML-driven pipelines.
β’ Contribute to model deployment, feature engineering, and ML workflow optimization.
β’ Integrate ML models into scalable data platforms.
Architecture & Best Practices
β’ Ensure data quality, reliability, lineage, and governance across data workflows.
β’ Drive best practices in coding, testing, CI/CD, and cloud-based deployments.
β’ Work with crossβfunctional teams to translate business requirements into robust data solutions.
Required Skills & Qualifications
β’ 5+ years of experience in Data Engineering with strong hands-on work in PySpark.
β’ Strong proficiency in Python, including libraries for data processing.
β’ Advanced knowledge of SQL and performance optimization techniques.
β’ Experience with distributed data systems (Spark, Databricks, Hive, or similar).
β’ Exposure to AI/ML workflows, including model deployment or MLOps.
β’ Solid understanding of data modeling, warehousing concepts, and ETL/ELT architectures.
Good to Have
β’ US Healthcare domain experience (HIPAA, claims data, EHR/EMR, HL7, FHIR, etc.).
β’ Experience with cloud platforms (Azure, AWS, GCP).
β’ Knowledge of MLflow, Airflow, or similar tools.






