HARAMAIN SYSTEMS INC.

Data Engineer : W2 Onsite Role

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer in New Jersey, NJ, offering a long-term contract. Required skills include 5+ years in Data Engineering, proficiency in PySpark, Python, SQL, and experience with AI/ML workflows. US healthcare domain experience is a plus.
🌎 - Country
United States
πŸ’± - Currency
$ USD
-
πŸ’° - Day rate
Unknown
-
πŸ—“οΈ - Date
February 3, 2026
πŸ•’ - Duration
Unknown
-
🏝️ - Location
On-site
-
πŸ“„ - Contract
W2 Contractor
-
πŸ”’ - Security
Unknown
-
πŸ“ - Location detailed
Jersey City, NJ
-
🧠 - Skills detailed
#FHIR (Fast Healthcare Interoperability Resources) #Data Ingestion #Deployment #MLflow #"ETL (Extract #Transform #Load)" #Spark (Apache Spark) #MySQL #GCP (Google Cloud Platform) #SQL Queries #Data Pipeline #Data Processing #Scala #Azure #Databricks #Data Science #Data Engineering #Data Modeling #Data Mart #SQL (Structured Query Language) #PostgreSQL #Automation #Data Quality #Python #Airflow #SQL Server #ML (Machine Learning) #PySpark #MongoDB #Libraries #AWS (Amazon Web Services) #Databases #Programming #NoSQL #Cloud #Model Deployment #AI (Artificial Intelligence)
Role description
Note: This is a w2 role, cannot do C2C Role : Data Engineer Location : New Jersey, NJ onsite Long term contract About the Role We are looking for a highly skilled Senior Data Engineer with strong expertise in PySpark, Python, SQL, and database technologies, along with exposure to Data Science, AI/ML techniques. The ideal candidate will design and optimize scalable data pipelines, collaborate with cross-functional teams, and contribute to the development of analytical and machine learning–driven solutions. Key Responsibilities Data Engineering & Pipeline Development β€’ Design, develop, and optimize large-scale ETL/ELT pipelines using PySpark and distributed data processing frameworks. β€’ Build high-performance data ingestion workflows from diverse structured and unstructured sources. β€’ Implement scalable data models, data marts, and warehousing solutions. Programming & Database Expertise β€’ Write clean, modular, and optimized code using Python for data processing and automation. β€’ Develop complex SQL queries, stored procedures, and performance-tuned database operations. β€’ Work with relational and NoSQL databases (e.g., MySQL, PostgreSQL, SQL Server, MongoDB, etc.). Data Science + AI/ML Collaboration β€’ Partner with Data Science teams to productionize ML models and enable ML-driven pipelines. β€’ Contribute to model deployment, feature engineering, and ML workflow optimization. β€’ Integrate ML models into scalable data platforms. Architecture & Best Practices β€’ Ensure data quality, reliability, lineage, and governance across data workflows. β€’ Drive best practices in coding, testing, CI/CD, and cloud-based deployments. β€’ Work with cross‑functional teams to translate business requirements into robust data solutions. Required Skills & Qualifications β€’ 5+ years of experience in Data Engineering with strong hands-on work in PySpark. β€’ Strong proficiency in Python, including libraries for data processing. β€’ Advanced knowledge of SQL and performance optimization techniques. β€’ Experience with distributed data systems (Spark, Databricks, Hive, or similar). β€’ Exposure to AI/ML workflows, including model deployment or MLOps. β€’ Solid understanding of data modeling, warehousing concepts, and ETL/ELT architectures. Good to Have β€’ US Healthcare domain experience (HIPAA, claims data, EHR/EMR, HL7, FHIR, etc.). β€’ Experience with cloud platforms (Azure, AWS, GCP). β€’ Knowledge of MLflow, Airflow, or similar tools.