

BayOne Solutions
Agentic Data Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for an "Agentic Data Engineer" on a freelance contract, offering a competitive pay rate. Key skills include agentic AI engineering, Python data engineering, and familiarity with scientific data structures. A technical degree or equivalent experience is required.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
272
-
🗓️ - Date
June 16, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#Data Engineering #Cloud #GCP (Google Cloud Platform) #Docker #FastAPI #AI (Artificial Intelligence) #Data Processing #Databases #Datasets #Unit Testing #AWS (Amazon Web Services) #Python #"ETL (Extract #Transform #Load)" #Data Manipulation #Data Ingestion
Role description
Key Responsibilities:
● Build an agentic data ingestion pipeline.
● Triage and prioritize incoming requests to ingest specific datasets.
● Clean and organize the data. Build the first pass cleaning and organization steps into the agentic flow.
● Validate cross-modal linkage. Add automated checks that catch when ingested data does not connect correctly and flag low quality or mismatched records.
● Version every dataset. Retain and make prior versions addressable.
● Preserve raw data and provenance. Make agent workflows log validation and transformation steps so lineage is traceable.
● Make agents usable across teams. Move beyond bespoke steps towards agents that teams can reliably use as a shared, deployed service.
● Collaborate with AI, software engineering, and computational biology groups to co-define data standards and conventions.
Qualifications & Requirements:
Core (Required)
● Agentic AI engineering: Demonstrated experience building multi-agent workflows or LLM workflows using tools/frameworks such as LangGraph or LlamaIndex, including tool/function calling and asynchronous task execution.
● Python data engineering: Strong Python for data manipulation, working with APIs and databases, and handling heterogeneous data formats.
● Data versioning and provenance: Familiarity with dataset versioning approaches (e.g. DVC, lakeFS, or equivalent).
● Working knowledge of scientific data structures: Comfortable or willingness to learn common omics data formats like AnnData, H5AD, TileDB.
● Basic understanding of omics: No deep bioinformatics expertise required; just a basic understanding of different modalities (e.g. what is RNA-seq vs scRNA-seq vs WES; genomics vs transcriptomics vs proteomics vs metabolomics).
● Unit testing: Comfortable writing unit and functional tests to ensure data processing workflows are reliable and reproducible.
● Education: Degree in a technical field or equivalent practical experience.
Nice to have
● Experience deploying agent workflows as a shared service (e.g., FastAPI or MCP endpoints).
● Exposure to cloud (AWS, GCP) and containerization (Docker).
● Familiarity with workflow managers such as Nextflow or Snakemake.
Key Responsibilities:
● Build an agentic data ingestion pipeline.
● Triage and prioritize incoming requests to ingest specific datasets.
● Clean and organize the data. Build the first pass cleaning and organization steps into the agentic flow.
● Validate cross-modal linkage. Add automated checks that catch when ingested data does not connect correctly and flag low quality or mismatched records.
● Version every dataset. Retain and make prior versions addressable.
● Preserve raw data and provenance. Make agent workflows log validation and transformation steps so lineage is traceable.
● Make agents usable across teams. Move beyond bespoke steps towards agents that teams can reliably use as a shared, deployed service.
● Collaborate with AI, software engineering, and computational biology groups to co-define data standards and conventions.
Qualifications & Requirements:
Core (Required)
● Agentic AI engineering: Demonstrated experience building multi-agent workflows or LLM workflows using tools/frameworks such as LangGraph or LlamaIndex, including tool/function calling and asynchronous task execution.
● Python data engineering: Strong Python for data manipulation, working with APIs and databases, and handling heterogeneous data formats.
● Data versioning and provenance: Familiarity with dataset versioning approaches (e.g. DVC, lakeFS, or equivalent).
● Working knowledge of scientific data structures: Comfortable or willingness to learn common omics data formats like AnnData, H5AD, TileDB.
● Basic understanding of omics: No deep bioinformatics expertise required; just a basic understanding of different modalities (e.g. what is RNA-seq vs scRNA-seq vs WES; genomics vs transcriptomics vs proteomics vs metabolomics).
● Unit testing: Comfortable writing unit and functional tests to ensure data processing workflows are reliable and reproducible.
● Education: Degree in a technical field or equivalent practical experience.
Nice to have
● Experience deploying agent workflows as a shared service (e.g., FastAPI or MCP endpoints).
● Exposure to cloud (AWS, GCP) and containerization (Docker).
● Familiarity with workflow managers such as Nextflow or Snakemake.






