BayOne Solutions

Agentic Data Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for an "Agentic Data Engineer" on a freelance contract, offering a competitive pay rate. Key skills include agentic AI engineering, Python data engineering, and familiarity with scientific data structures. A technical degree or equivalent experience is required.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
272
-
🗓️ - Date
June 16, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#Data Engineering #Cloud #GCP (Google Cloud Platform) #Docker #FastAPI #AI (Artificial Intelligence) #Data Processing #Databases #Datasets #Unit Testing #AWS (Amazon Web Services) #Python #"ETL (Extract #Transform #Load)" #Data Manipulation #Data Ingestion
Role description
Key Responsibilities: ● Build an agentic data ingestion pipeline. ● Triage and prioritize incoming requests to ingest specific datasets. ● Clean and organize the data. Build the first pass cleaning and organization steps into the agentic flow. ● Validate cross-modal linkage. Add automated checks that catch when ingested data does not connect correctly and flag low quality or mismatched records. ● Version every dataset. Retain and make prior versions addressable. ● Preserve raw data and provenance. Make agent workflows log validation and transformation steps so lineage is traceable. ● Make agents usable across teams. Move beyond bespoke steps towards agents that teams can reliably use as a shared, deployed service. ● Collaborate with AI, software engineering, and computational biology groups to co-define data standards and conventions. Qualifications & Requirements: Core (Required) ● Agentic AI engineering: Demonstrated experience building multi-agent workflows or LLM workflows using tools/frameworks such as LangGraph or LlamaIndex, including tool/function calling and asynchronous task execution. ● Python data engineering: Strong Python for data manipulation, working with APIs and databases, and handling heterogeneous data formats. ● Data versioning and provenance: Familiarity with dataset versioning approaches (e.g. DVC, lakeFS, or equivalent). ● Working knowledge of scientific data structures: Comfortable or willingness to learn common omics data formats like AnnData, H5AD, TileDB. ● Basic understanding of omics: No deep bioinformatics expertise required; just a basic understanding of different modalities (e.g. what is RNA-seq vs scRNA-seq vs WES; genomics vs transcriptomics vs proteomics vs metabolomics). ● Unit testing: Comfortable writing unit and functional tests to ensure data processing workflows are reliable and reproducible. ● Education: Degree in a technical field or equivalent practical experience. Nice to have ● Experience deploying agent workflows as a shared service (e.g., FastAPI or MCP endpoints). ● Exposure to cloud (AWS, GCP) and containerization (Docker). ● Familiarity with workflow managers such as Nextflow or Snakemake.