Avacend Inc

Sr Data Engineering (Foundation Models) - Only W2

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for a Sr Data Engineering position focused on building and scaling distributed data pipelines for machine-generated data. Contract length is unspecified, with a pay rate of "Only W2." Requires 5+ years of software engineering, strong Python and Apache Spark skills, and experience with time series or high-volume event data.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

Unknown

🗓️ - Date

February 19, 2026

🕒 - Duration

Unknown

🏝️ - Location

Unknown

📄 - Contract

W2 Contractor

🔒 - Security

Unknown

📍 - Location detailed

San Jose, CA

🧠 - Skills detailed

#Kubernetes #Spark (Apache Spark) #Debugging #PySpark #Scala #ML (Machine Learning) #Python #Apache Spark #Data Pipeline #Data Engineering #Data Quality #Cloud #Datasets #Kafka (Apache Kafka) #Time Series

Role description

Data Engineering (Foundation Models) We are building next-generation foundation models for machine-generated data — including time series, logs, and large-scale event streams. We’re looking for a strong data engineer to design and scale the data pipelines that power model training and production systems. This role sits at the intersection of distributed systems, data infrastructure, and machine learning. What You’ll Do - Build and scale distributed data pipelines for large-scale time series and log data - Design reliable, high-performance Spark/Python workflows for model training datasets - Analyze and resolve performance bottlenecks (latency, memory, skew, throughput) - Improve data quality, validation, and reproducibility for ML workloads - Partner with ML engineers and researchers to accelerate foundation model development - Measure and optimize application and transaction performance in production systems Minimum Requirements - 5+ years of software engineering experience - Strong proficiency in • • Python • • - Hands-on experience with • • Apache Spark • • (PySpark or Scala) - Experience building large-scale data pipelines in distributed environments - Experience working with time series, logs, or high-volume event data - Strong debugging and performance optimization skills Nice to Have - Experience supporting ML or large model training workflows - Familiarity with sequence modeling or time series data systems - Experience with streaming systems (Kafka, Spark Streaming) - Experience with cloud-native or Kubernetes-based platforms What We’re Looking For You are pragmatic, data-driven, and comfortable operating in ambiguous, research-heavy environments. You can debug distributed systems, reason about data correctness at scale, and build infrastructure that researchers trust.

Apply now Apply with DFH

← See all roles