

Tential Solutions
GenAI Data Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a GenAI Data Engineer with a contract length of "unknown" and a pay rate of "$/hour". Key skills required include Python, SQL, Apache Spark, and experience with LLM systems. A Bachelor's degree and two years of relevant experience are necessary.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
May 7, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Rockville, CT
-
🧠 - Skills detailed
#Docker #Data Science #Datasets #Cloud #ChatGPT #Automation #"ETL (Extract #Transform #Load)" #Lambda (AWS Lambda) #Langchain #AI (Artificial Intelligence) #Debugging #Computer Science #Python #AWS (Amazon Web Services) #Data Catalog #Data Quality #Security #SQL (Structured Query Language) #EC2 #Big Data #Kubernetes #Apache Spark #AWS S3 (Amazon Simple Storage Service) #GitHub #Logging #Data Engineering #Documentation #Data Pipeline #S3 (Amazon Simple Storage Service) #Data Lake #Spark (Apache Spark) #Trino #Dataiku #Infrastructure as Code (IaC) #SageMaker #Anomaly Detection #Monitoring
Role description
The Data Engineer works with moderate supervision across two equally weighted domains including large scale data pipeline development processing market events in a cloud environment and design and development of agentic AI systems including LLM powered regulatory data assistants, MCP servers, and agent harness architectures. This role is heavily focused on building GenAI tools and intelligent agents while remaining hands on with Python, SQL, and big data technologies. The engineer will support both proof of concept initiatives and production grade systems, helping scale solutions across the organization. This position contributes to overall product quality throughout the software development lifecycle and operates within a highly regulated environment that requires strong security, governance, and auditability. The ideal candidate is a hybrid engineer who understands both large scale data platforms and modern LLM based systems and can translate ideas into practical, reusable solutions.
Responsibilities
• Build and maintain ETL and ELT pipelines using Apache Spark, Hive, and Trino across S3 based data lake environments
• Develop and optimize SQL for large scale datasets including window functions, multi table joins, and complex aggregations
• Build and engineer big data systems using EMR on EC2 and EMR on EKS and develop solutions on analytical platforms such as SageMaker, Domino, and Dataiku
• Participate in data quality monitoring, anomaly detection, and production incident investigation
• Build and productionize LLM powered agents using AWS Bedrock and agent frameworks such as LangChain, LangGraph, AWS Strands or similar
• Design agent harness architectures that combine LLM reasoning with deterministic execution including RAG based SQL generation and structured output validation
• Implement agent memory, context management, and tool integration including MCP servers, APIs, and data catalog lookups
• Build evaluation frameworks for agent accuracy including paraphrase robustness, routing precision, and structural consistency
• Stay informed of advances in LLM frameworks and emerging AI capabilities
• Support proof of concept AI initiatives and scale successful solutions across teams
• Write clean, well tested code and contribute to CI and CD pipelines and infrastructure as code on AWS
• Ensure secure handling of sensitive regulatory data with auditable execution traces across both data pipelines and AI outputs
• Partner across teams, communicate technical concepts clearly, and maintain documentation
• Actively learn from senior engineers and contribute to continuous improvement of processes and engineering practices
Qualifications
• Bachelor degree in Computer Science, Data Science, Information Systems or related discipline with at least two years of experience or equivalent work experience
• Strong experience building data pipelines using Apache Spark and SQL
• Experience with SQL query engines such as Hive and Trino and cloud data platforms including AWS S3, EMR, and Lambda
• Strong understanding of large scale data challenges such as data skew, high volume processing, and troubleshooting job failures
• Hands on experience building LLM powered agent systems that use tools and produce structured outputs
• Experience with at least one agent framework such as LangChain, LangGraph, or AWS Strands
• Knowledge of prompt engineering, RAG architectures, and context and memory management
• Experience working with foundation model APIs such as Anthropic Claude, Amazon Nova, or OpenAI
• Understanding of agent memory models including working memory, episodic memory, and semantic memory
• Familiarity with agent harness design including guardrails, routing, verification loops, and graceful degradation
• Hands on experience using AI development tools such as GitHub Copilot, Q Developer, ChatGPT, or Claude
• Experience integrating AI into development workflows including code generation, debugging, and testing
• Strong experience with AWS services including S3, EMR, Lambda, Bedrock, and Step Functions
• Experience using S3 with Spark including file formats and consistency considerations
• Familiarity with AWS monitoring and logging tools such as CloudWatch and CloudTrail
• Proficiency in Python for data engineering and automation with strong understanding of clean code, modular design, and performance
• Strong SQL skills including window functions, joins, aggregations, and handling edge cases such as null values and duplicates
• Strong understanding of collections, concurrency, and memory management
• Exposure to containerization and orchestration technologies such as Docker and Kubernetes
• Experience with infrastructure as code and CI and CD pipelines
• Strong communication skills and ability to work in a fast paced environment
• Ability to quickly learn new technologies and adapt to evolving requirements
The Data Engineer works with moderate supervision across two equally weighted domains including large scale data pipeline development processing market events in a cloud environment and design and development of agentic AI systems including LLM powered regulatory data assistants, MCP servers, and agent harness architectures. This role is heavily focused on building GenAI tools and intelligent agents while remaining hands on with Python, SQL, and big data technologies. The engineer will support both proof of concept initiatives and production grade systems, helping scale solutions across the organization. This position contributes to overall product quality throughout the software development lifecycle and operates within a highly regulated environment that requires strong security, governance, and auditability. The ideal candidate is a hybrid engineer who understands both large scale data platforms and modern LLM based systems and can translate ideas into practical, reusable solutions.
Responsibilities
• Build and maintain ETL and ELT pipelines using Apache Spark, Hive, and Trino across S3 based data lake environments
• Develop and optimize SQL for large scale datasets including window functions, multi table joins, and complex aggregations
• Build and engineer big data systems using EMR on EC2 and EMR on EKS and develop solutions on analytical platforms such as SageMaker, Domino, and Dataiku
• Participate in data quality monitoring, anomaly detection, and production incident investigation
• Build and productionize LLM powered agents using AWS Bedrock and agent frameworks such as LangChain, LangGraph, AWS Strands or similar
• Design agent harness architectures that combine LLM reasoning with deterministic execution including RAG based SQL generation and structured output validation
• Implement agent memory, context management, and tool integration including MCP servers, APIs, and data catalog lookups
• Build evaluation frameworks for agent accuracy including paraphrase robustness, routing precision, and structural consistency
• Stay informed of advances in LLM frameworks and emerging AI capabilities
• Support proof of concept AI initiatives and scale successful solutions across teams
• Write clean, well tested code and contribute to CI and CD pipelines and infrastructure as code on AWS
• Ensure secure handling of sensitive regulatory data with auditable execution traces across both data pipelines and AI outputs
• Partner across teams, communicate technical concepts clearly, and maintain documentation
• Actively learn from senior engineers and contribute to continuous improvement of processes and engineering practices
Qualifications
• Bachelor degree in Computer Science, Data Science, Information Systems or related discipline with at least two years of experience or equivalent work experience
• Strong experience building data pipelines using Apache Spark and SQL
• Experience with SQL query engines such as Hive and Trino and cloud data platforms including AWS S3, EMR, and Lambda
• Strong understanding of large scale data challenges such as data skew, high volume processing, and troubleshooting job failures
• Hands on experience building LLM powered agent systems that use tools and produce structured outputs
• Experience with at least one agent framework such as LangChain, LangGraph, or AWS Strands
• Knowledge of prompt engineering, RAG architectures, and context and memory management
• Experience working with foundation model APIs such as Anthropic Claude, Amazon Nova, or OpenAI
• Understanding of agent memory models including working memory, episodic memory, and semantic memory
• Familiarity with agent harness design including guardrails, routing, verification loops, and graceful degradation
• Hands on experience using AI development tools such as GitHub Copilot, Q Developer, ChatGPT, or Claude
• Experience integrating AI into development workflows including code generation, debugging, and testing
• Strong experience with AWS services including S3, EMR, Lambda, Bedrock, and Step Functions
• Experience using S3 with Spark including file formats and consistency considerations
• Familiarity with AWS monitoring and logging tools such as CloudWatch and CloudTrail
• Proficiency in Python for data engineering and automation with strong understanding of clean code, modular design, and performance
• Strong SQL skills including window functions, joins, aggregations, and handling edge cases such as null values and duplicates
• Strong understanding of collections, concurrency, and memory management
• Exposure to containerization and orchestration technologies such as Docker and Kubernetes
• Experience with infrastructure as code and CI and CD pipelines
• Strong communication skills and ability to work in a fast paced environment
• Ability to quickly learn new technologies and adapt to evolving requirements






