

Optomi
Data Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer with a contract length of "unknown," offering a pay rate of "unknown." Located in Tysons Corner, VA or Rockville, MD, key skills include Apache Spark, SQL, Python, and experience with LLM-powered systems.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
600
-
🗓️ - Date
June 19, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
On-site
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Washington DC-Baltimore Area
-
🧠 - Skills detailed
#AWS (Amazon Web Services) #Big Data #Monitoring #Python #Lambda (AWS Lambda) #PySpark #Automation #Presto #Data Pipeline #AWS S3 (Amazon Simple Storage Service) #Data Engineering #S3 (Amazon Simple Storage Service) #EC2 #Dataiku #Anomaly Detection #Data Lake #Spark (Apache Spark) #SQL (Structured Query Language) #Datasets #Documentation #Apache Spark #API (Application Programming Interface) #Cloud #Data Catalog #SageMaker #AI (Artificial Intelligence) #Trino #Data Quality #Langchain #"ETL (Extract #Transform #Load)"
Role description
Open to Tysons Corner, VA or Rockville, MD locations
Data Engineer
The Data Engineer works with moderate supervision across two equally weighted domains: (1) large-scale data pipeline development processing market events in a cloud environment, and (2) design and development of agentic AI systems including LLM-powered regulatory data assistants, MCP servers, and agent harness architectures. This position contributes to overall product quality throughout the software development lifecycle.
Qualifications:
• Experience building data pipelines using Apache Spark (PySpark preferred) and SQL
• Experience with SQL query engines (Hive, Trino/Presto, or similar) and cloud data platforms (AWS S3, EMR, Lambda)
• Practical experience building LLM-powered agent systems
• Hands-on experience with at least one agent framework: LangChain, LangGraph, AWS Strands, or equivalent
• Proficiency in Python for data engineering and automation
Responsibilities:
• Build and maintain ETL/ELT pipelines using Apache Spark, Hive, and Trino across S3-based data lake environments.
• Develop and optimize SQL for large-scale datasets, including window functions, multi-table joins, and complex aggregations.
• Build and engineer big data systems (EMR-on-EC2, EMR-on-EKS) and develop solutions on analytical platforms (SageMaker, Domino, Dataiku).
• Participate in data quality monitoring, anomaly detection, and production incident investigation.
• Develop AI agent systems using AWS Bedrock and agent frameworks (Strands Agents SDK, LangChain/LangGraph, or equivalent).
• Build agent-harness architectures combining LLM reasoning with deterministic execution, including skill- and RAG-based SQL generation and structured output validation.
• Implement agent memory, context management, and tool integration (MCP servers, API connectors, data catalog lookups) across data platforms.
• Build evaluation frameworks for agent accuracy, including paraphrase robustness, routing precision, and structural consistency.
• Stay informed of advances in LLM frameworks (LangGraph, Google ADK, AWS Strands) and emerging AI capabilities.
• Write clean, well-tested code and contribute to CI/CD pipelines and infrastructure-as-code on AWS.
• Ensure secure handling of sensitive and regulated data across both data pipelines and AI agent outputs, including auditable execution traces.
• Adhere to organizational standards for secure software development practices, governance, and technology policies.
• Partner across teams, communicate technical information effectively to both technical and non-technical stakeholders, and maintain clear documentation.
• Actively learn from senior team members and contribute to continuous improvement initiatives, fostering collaboration, innovation, accountability, and technical excellence.
Open to Tysons Corner, VA or Rockville, MD locations
Data Engineer
The Data Engineer works with moderate supervision across two equally weighted domains: (1) large-scale data pipeline development processing market events in a cloud environment, and (2) design and development of agentic AI systems including LLM-powered regulatory data assistants, MCP servers, and agent harness architectures. This position contributes to overall product quality throughout the software development lifecycle.
Qualifications:
• Experience building data pipelines using Apache Spark (PySpark preferred) and SQL
• Experience with SQL query engines (Hive, Trino/Presto, or similar) and cloud data platforms (AWS S3, EMR, Lambda)
• Practical experience building LLM-powered agent systems
• Hands-on experience with at least one agent framework: LangChain, LangGraph, AWS Strands, or equivalent
• Proficiency in Python for data engineering and automation
Responsibilities:
• Build and maintain ETL/ELT pipelines using Apache Spark, Hive, and Trino across S3-based data lake environments.
• Develop and optimize SQL for large-scale datasets, including window functions, multi-table joins, and complex aggregations.
• Build and engineer big data systems (EMR-on-EC2, EMR-on-EKS) and develop solutions on analytical platforms (SageMaker, Domino, Dataiku).
• Participate in data quality monitoring, anomaly detection, and production incident investigation.
• Develop AI agent systems using AWS Bedrock and agent frameworks (Strands Agents SDK, LangChain/LangGraph, or equivalent).
• Build agent-harness architectures combining LLM reasoning with deterministic execution, including skill- and RAG-based SQL generation and structured output validation.
• Implement agent memory, context management, and tool integration (MCP servers, API connectors, data catalog lookups) across data platforms.
• Build evaluation frameworks for agent accuracy, including paraphrase robustness, routing precision, and structural consistency.
• Stay informed of advances in LLM frameworks (LangGraph, Google ADK, AWS Strands) and emerging AI capabilities.
• Write clean, well-tested code and contribute to CI/CD pipelines and infrastructure-as-code on AWS.
• Ensure secure handling of sensitive and regulated data across both data pipelines and AI agent outputs, including auditable execution traces.
• Adhere to organizational standards for secure software development practices, governance, and technology policies.
• Partner across teams, communicate technical information effectively to both technical and non-technical stakeholders, and maintain clear documentation.
• Actively learn from senior team members and contribute to continuous improvement initiatives, fostering collaboration, innovation, accountability, and technical excellence.






