

Avacend Inc
Sr Data Engineering (Foundation Models) - Only W2
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Sr Data Engineering position focused on building and scaling distributed data pipelines for machine-generated data. Contract length is unspecified, with a pay rate of "Only W2." Requires 5+ years of software engineering, strong Python and Apache Spark skills, and experience with time series or high-volume event data.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
Unknown
-
ποΈ - Date
February 19, 2026
π - Duration
Unknown
-
ποΈ - Location
Unknown
-
π - Contract
W2 Contractor
-
π - Security
Unknown
-
π - Location detailed
San Jose, CA
-
π§ - Skills detailed
#Kubernetes #Spark (Apache Spark) #Debugging #PySpark #Scala #ML (Machine Learning) #Python #Apache Spark #Data Pipeline #Data Engineering #Data Quality #Cloud #Datasets #Kafka (Apache Kafka) #Time Series
Role description
Data Engineering (Foundation Models)
We are building next-generation foundation models for machine-generated data β including time series, logs, and large-scale event streams. Weβre looking for a strong data engineer to design and scale the data pipelines that power model training and production systems.
This role sits at the intersection of distributed systems, data infrastructure, and machine learning.
What Youβll Do
- Build and scale distributed data pipelines for large-scale time series and log data
- Design reliable, high-performance Spark/Python workflows for model training datasets
- Analyze and resolve performance bottlenecks (latency, memory, skew, throughput)
- Improve data quality, validation, and reproducibility for ML workloads
- Partner with ML engineers and researchers to accelerate foundation model development
- Measure and optimize application and transaction performance in production systems
Minimum Requirements
- 5+ years of software engineering experience
- Strong proficiency in
β’
β’ Python
β’
β’ - Hands-on experience with
β’
β’ Apache Spark
β’
β’ (PySpark or Scala)
- Experience building large-scale data pipelines in distributed environments
- Experience working with time series, logs, or high-volume event data
- Strong debugging and performance optimization skills
Nice to Have
- Experience supporting ML or large model training workflows
- Familiarity with sequence modeling or time series data systems
- Experience with streaming systems (Kafka, Spark Streaming)
- Experience with cloud-native or Kubernetes-based platforms
What Weβre Looking For
You are pragmatic, data-driven, and comfortable operating in ambiguous, research-heavy environments. You can debug distributed systems, reason about data correctness at scale, and build infrastructure that researchers trust.
Data Engineering (Foundation Models)
We are building next-generation foundation models for machine-generated data β including time series, logs, and large-scale event streams. Weβre looking for a strong data engineer to design and scale the data pipelines that power model training and production systems.
This role sits at the intersection of distributed systems, data infrastructure, and machine learning.
What Youβll Do
- Build and scale distributed data pipelines for large-scale time series and log data
- Design reliable, high-performance Spark/Python workflows for model training datasets
- Analyze and resolve performance bottlenecks (latency, memory, skew, throughput)
- Improve data quality, validation, and reproducibility for ML workloads
- Partner with ML engineers and researchers to accelerate foundation model development
- Measure and optimize application and transaction performance in production systems
Minimum Requirements
- 5+ years of software engineering experience
- Strong proficiency in
β’
β’ Python
β’
β’ - Hands-on experience with
β’
β’ Apache Spark
β’
β’ (PySpark or Scala)
- Experience building large-scale data pipelines in distributed environments
- Experience working with time series, logs, or high-volume event data
- Strong debugging and performance optimization skills
Nice to Have
- Experience supporting ML or large model training workflows
- Familiarity with sequence modeling or time series data systems
- Experience with streaming systems (Kafka, Spark Streaming)
- Experience with cloud-native or Kubernetes-based platforms
What Weβre Looking For
You are pragmatic, data-driven, and comfortable operating in ambiguous, research-heavy environments. You can debug distributed systems, reason about data correctness at scale, and build infrastructure that researchers trust.






