Optomi

Data Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer with a contract length of "unknown," offering a pay rate of "unknown." Located in Tysons Corner, VA or Rockville, MD, key skills include Apache Spark, SQL, Python, and experience with LLM-powered systems.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
600
-
🗓️ - Date
June 19, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
On-site
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Washington DC-Baltimore Area
-
🧠 - Skills detailed
#AWS (Amazon Web Services) #Big Data #Monitoring #Python #Lambda (AWS Lambda) #PySpark #Automation #Presto #Data Pipeline #AWS S3 (Amazon Simple Storage Service) #Data Engineering #S3 (Amazon Simple Storage Service) #EC2 #Dataiku #Anomaly Detection #Data Lake #Spark (Apache Spark) #SQL (Structured Query Language) #Datasets #Documentation #Apache Spark #API (Application Programming Interface) #Cloud #Data Catalog #SageMaker #AI (Artificial Intelligence) #Trino #Data Quality #Langchain #"ETL (Extract #Transform #Load)"
Role description
Open to Tysons Corner, VA or Rockville, MD locations Data Engineer The Data Engineer works with moderate supervision across two equally weighted domains: (1) large-scale data pipeline development processing market events in a cloud environment, and (2) design and development of agentic AI systems including LLM-powered regulatory data assistants, MCP servers, and agent harness architectures. This position contributes to overall product quality throughout the software development lifecycle. Qualifications: • Experience building data pipelines using Apache Spark (PySpark preferred) and SQL • Experience with SQL query engines (Hive, Trino/Presto, or similar) and cloud data platforms (AWS S3, EMR, Lambda) • Practical experience building LLM-powered agent systems • Hands-on experience with at least one agent framework: LangChain, LangGraph, AWS Strands, or equivalent • Proficiency in Python for data engineering and automation Responsibilities: • Build and maintain ETL/ELT pipelines using Apache Spark, Hive, and Trino across S3-based data lake environments. • Develop and optimize SQL for large-scale datasets, including window functions, multi-table joins, and complex aggregations. • Build and engineer big data systems (EMR-on-EC2, EMR-on-EKS) and develop solutions on analytical platforms (SageMaker, Domino, Dataiku). • Participate in data quality monitoring, anomaly detection, and production incident investigation. • Develop AI agent systems using AWS Bedrock and agent frameworks (Strands Agents SDK, LangChain/LangGraph, or equivalent). • Build agent-harness architectures combining LLM reasoning with deterministic execution, including skill- and RAG-based SQL generation and structured output validation. • Implement agent memory, context management, and tool integration (MCP servers, API connectors, data catalog lookups) across data platforms. • Build evaluation frameworks for agent accuracy, including paraphrase robustness, routing precision, and structural consistency. • Stay informed of advances in LLM frameworks (LangGraph, Google ADK, AWS Strands) and emerging AI capabilities. • Write clean, well-tested code and contribute to CI/CD pipelines and infrastructure-as-code on AWS. • Ensure secure handling of sensitive and regulated data across both data pipelines and AI agent outputs, including auditable execution traces. • Adhere to organizational standards for secure software development practices, governance, and technology policies. • Partner across teams, communicate technical information effectively to both technical and non-technical stakeholders, and maintain clear documentation. • Actively learn from senior team members and contribute to continuous improvement initiatives, fostering collaboration, innovation, accountability, and technical excellence.