Data Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for a Data Engineer with a contract length of "unknown" and a pay rate of "unknown." It requires 2+ years of experience in data engineering or Python development, proficiency in distributed systems like Spark, and strong communication skills. Remote work is available.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

360

🗓️ - Date discovered

July 1, 2025

🕒 - Project duration

Unknown

🏝️ - Location type

Remote

📄 - Contract type

Unknown

🔒 - Security clearance

Unknown

📍 - Location detailed

Cupertino, CA

🧠 - Skills detailed

#Flask #Data Quality #ML (Machine Learning) #Python #AI (Artificial Intelligence) #Datasets #Automation #Spark (Apache Spark) #Airflow #Data Science #Jenkins #Scala #Data Engineering #API (Application Programming Interface) #Docker #Computer Science #Data Pipeline

Role description

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

ABOUT THIS FEATURED OPPORTUNITY Join our Data Operations Team as a Python Engineer , supporting machine learning and AI teams that depend on high-quality datasets to train their models. You'll work at the intersection of data engineering, automation, and operational excellence , delivering datasets across approximately 200 projects per year . These include use cases such as image generation, animation, and other generative AI applications . Many projects are highly confidential— engineers must be able to assess data quality and relevance even without full visibility into the end use case . We're looking for someone who can design and manage data pipelines, debug issues efficiently , and operate independently across multiple fast-paced projects. Strong communication and attention to detail are essential —you'll need to respond quickly, handle issues proactively, and deliver accurate work the first time. Mistakes or rework can pose serious risks to project timelines , so precision and accountability are critical. The ideal candidate will be highly responsive, reliable, and thorough in communication , and must be available to work 9am-4pm PST , even if located in a different state. THE OPPORTUNITY FOR YOU • Work on 3-4 projects to start , scaling up to 6-10 during peak season • Contribute to data collection, annotation, and generation pipelines using Python and distributed systems (Spark) • Collaborate with a tight-knit and highly responsive team , engaging in biweekly check-ins with team leads • Gain experience with confidential, multimodal, and LLM-related datasets across a high volume of AI/ML projects • Influence how large-scale datasets are prepared for training models across an enterprise AI org KEY SUCCESS FACTORS • 2+ years of experience in data engineering or Python development, with a strong foundation in Computer Science or Data Science • Proficiency in distributed systems (e.g., Spark), and solid understanding of multithreading vs. multiprocessing • Demonstrated ability to design scalable pipelines , handle diverse data structures, and manage large-scale workflows • Comfortable operating under pressure, context-switching across multiple projects, and working with ambiguity NICE TO HAVES • Familiarity with Airflow , Spark , or Flask for scalable API/UI development • Experience with Docker , containerization, and CI/CD tools (e.g., Jenkins) • Exposure to LLMs , multi-modal data , or generative AI workflows • Prior involvement in designing tools to automate or scale ML data pipelines • Ability to collaborate in a high-volume, high-trust environment —your work will power some of the most impactful ML use cases in the organization

Apply now Apply with DFH Sign up

← See all roles