

Data Engineer
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer with a contract length of "unknown" and a pay rate of "unknown." It requires 2+ years of experience in data engineering or Python development, proficiency in distributed systems like Spark, and strong communication skills. Remote work is available.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
360
-
ποΈ - Date discovered
July 1, 2025
π - Project duration
Unknown
-
ποΈ - Location type
Remote
-
π - Contract type
Unknown
-
π - Security clearance
Unknown
-
π - Location detailed
Cupertino, CA
-
π§ - Skills detailed
#Flask #Data Quality #ML (Machine Learning) #Python #AI (Artificial Intelligence) #Datasets #Automation #Spark (Apache Spark) #Airflow #Data Science #Jenkins #Scala #Data Engineering #API (Application Programming Interface) #Docker #Computer Science #Data Pipeline
Role description
Heading 1
Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Block quote
Ordered list
- Item 1
- Item 2
- Item 3
Unordered list
- Item A
- Item B
- Item C
Bold text
Emphasis
Superscript
Subscript
ABOUT THIS FEATURED OPPORTUNITY
Join our Data Operations Team as a Python Engineer , supporting machine learning and AI teams that depend on high-quality datasets to train their models. You'll work at the intersection of data engineering, automation, and operational excellence , delivering datasets across approximately 200 projects per year . These include use cases such as image generation, animation, and other generative AI applications . Many projects are highly confidential— engineers must be able to assess data quality and relevance even without full visibility into the end use case .
We're looking for someone who can design and manage data pipelines, debug issues efficiently , and operate independently across multiple fast-paced projects. Strong communication and attention to detail are essential —you'll need to respond quickly, handle issues proactively, and deliver accurate work the first time. Mistakes or rework can pose serious risks to project timelines , so precision and accountability are critical. The ideal candidate will be highly responsive, reliable, and thorough in communication , and must be available to work 9am-4pm PST , even if located in a different state.
THE OPPORTUNITY FOR YOU
β’ Work on 3-4 projects to start , scaling up to 6-10 during peak season
β’ Contribute to data collection, annotation, and generation pipelines using Python and distributed systems (Spark)
β’ Collaborate with a tight-knit and highly responsive team , engaging in biweekly check-ins with team leads
β’ Gain experience with confidential, multimodal, and LLM-related datasets across a high volume of AI/ML projects
β’ Influence how large-scale datasets are prepared for training models across an enterprise AI org
KEY SUCCESS FACTORS
β’ 2+ years of experience in data engineering or Python development, with a strong foundation in Computer Science or Data Science
β’ Proficiency in distributed systems (e.g., Spark), and solid understanding of multithreading vs. multiprocessing
β’ Demonstrated ability to design scalable pipelines , handle diverse data structures, and manage large-scale workflows
β’ Comfortable operating under pressure, context-switching across multiple projects, and working with ambiguity
NICE TO HAVES
β’ Familiarity with Airflow , Spark , or Flask for scalable API/UI development
β’ Experience with Docker , containerization, and CI/CD tools (e.g., Jenkins)
β’ Exposure to LLMs , multi-modal data , or generative AI workflows
β’ Prior involvement in designing tools to automate or scale ML data pipelines
β’ Ability to collaborate in a high-volume, high-trust environment —your work will power some of the most impactful ML use cases in the organization