

Data Engineer
β - Featured Role | Apply direct with Data Freelance Hub
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
-
ποΈ - Date discovered
September 16, 2025
π - Project duration
Unknown
-
ποΈ - Location type
Remote
-
π - Contract type
W2 Contractor
-
π - Security clearance
Unknown
-
π - Location detailed
New York City Metropolitan Area
-
π§ - Skills detailed
#Python #GCP (Google Cloud Platform) #Metadata #Agile #Deep Learning #Cloud #Data Processing #PyTorch #AWS (Amazon Web Services) #Data Engineering #Public Cloud #Compliance #Azure #TensorFlow #ML (Machine Learning)
Role description
W2 ONLY - we are not able to provide sponsorship
REMOTE
What You'll Do
β’ Address issues such as ML training pipelines not emitting metadata about upstream dependencies and training data being gathered directly within pipelines without declared upstreams.
β’ Provide insight into the data used to train a model to ensure compliance with legal and regulatory obligations.
β’ Add support to OpenLineage for data endpoint types (e.g., model training, feature generation, model inference, feature registration).
β’ Improve ML pipeline lineage instrumentation by connecting feature lineage to upstream pipeline lineage.
β’ Help user teams migrate their existing implementation to the new orchestration with lineage
β’ Works autonomously, providing proactive updates on progress and obstacles encountered in creating the lineage features between data and training pipelines.
Who You Are
β’ You have 3+ years of hands-on experience implementing production ML infrastructure at scale in Python
β’ 3+ years of experience working with a public cloud provider such as GCP, AWS, or Azure. Preferably GCP.
β’ Must have Knowledge of deep learning fundamentals, algorithms, and open-source tools such as Flyte, PyTorch, Ray, Kubeflow, TensorFlow, and Huggingface
β’ You have a general understanding of data processing for ML
β’ You have experience with agile software processes and modular code design following industry standards.
W2 ONLY - we are not able to provide sponsorship
REMOTE
What You'll Do
β’ Address issues such as ML training pipelines not emitting metadata about upstream dependencies and training data being gathered directly within pipelines without declared upstreams.
β’ Provide insight into the data used to train a model to ensure compliance with legal and regulatory obligations.
β’ Add support to OpenLineage for data endpoint types (e.g., model training, feature generation, model inference, feature registration).
β’ Improve ML pipeline lineage instrumentation by connecting feature lineage to upstream pipeline lineage.
β’ Help user teams migrate their existing implementation to the new orchestration with lineage
β’ Works autonomously, providing proactive updates on progress and obstacles encountered in creating the lineage features between data and training pipelines.
Who You Are
β’ You have 3+ years of hands-on experience implementing production ML infrastructure at scale in Python
β’ 3+ years of experience working with a public cloud provider such as GCP, AWS, or Azure. Preferably GCP.
β’ Must have Knowledge of deep learning fundamentals, algorithms, and open-source tools such as Flyte, PyTorch, Ray, Kubeflow, TensorFlow, and Huggingface
β’ You have a general understanding of data processing for ML
β’ You have experience with agile software processes and modular code design following industry standards.