Data Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for a Data Engineer with a 6-month contract, offering a pay rate of "$X/hour." Required skills include Python, SQL, and experience with data pipelines for RAG type LLM workflows. A degree in Computer Science and 3+ years of experience are essential.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

🗓️ - Date discovered

May 31, 2025

🕒 - Project duration

Unknown

🏝️ - Location type

Unknown

📄 - Contract type

Unknown

🔒 - Security clearance

Unknown

📍 - Location detailed

Cincinnati, OH

🧠 - Skills detailed

#Kafka (Apache Kafka) #Cloud #NoSQL #Apache NiFi #Talend #Azure #Hadoop #Computer Science #Compliance #Databases #NLP (Natural Language Processing) #Data Governance #Scala #Documentation #Programming #Big Data #ML (Machine Learning) #Python #Microsoft Power BI #Data Access #MySQL #"ETL (Extract #Transform #Load)" #Data Engineering #PostgreSQL #Data Science #Visualization #AWS (Amazon Web Services) #Java #BI (Business Intelligence) #Airflow #Data Layers #Spark (Apache Spark) #Data Pipeline #Data Integration #SQL (Structured Query Language) #NiFi (Apache NiFi) #Data Accuracy #Tableau #MongoDB #Data Quality

Role description

Job Summary: The ideal candidate will be responsible for designing, building, and maintaining scalable data pipelines and infrastructure to support data analytics, machine learning, and Retrieval-Augmented Generation (RAG) type Large Language Model (LLM) workflows. This role requires a strong technical background, excellent problem-solving skills, and the ability to work collaboratively with data scientists, analysts, and other stakeholders. Key Responsibilities: 1. Data Pipeline Development: • Design, develop, and maintain robust and scalable ETL (Extract, Transform, Load) processes. • Ensure data is collected, processed, and stored efficiently and accurately. 1. Data Integration: • Integrate data from various sources, including databases, APIs, and third-party data providers. • Ensure data consistency and integrity across different systems. 1. RAG Type LLM Workflows: • Develop and maintain data pipelines specifically tailored for Retrieval-Augmented Generation (RAG) type Large Language Model (LLM) workflows. • Ensure efficient data retrieval and augmentation processes to support LLM training and inference. • Collaborate with data scientists to optimize data pipelines for LLM performance and accuracy. 1. Semantic/Ontology Data Layers: • Develop and maintain semantic and ontology data layers to enhance data integration and retrieval. • Ensure data is semantically enriched to support advanced analytics and machine learning models. 1. Collaboration: • Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions. • Provide technical support and guidance on data-related issues. 1. Data Quality and Governance: • Implement data quality checks and validation processes to ensure data accuracy and reliability. • Adhere to data governance policies and best practices. 1. Performance Optimization: • Monitor and optimize the performance of data pipelines and infrastructure. • Troubleshoot and resolve data-related issues in a timely manner. 1. Support for Analysis: • Support short-term ad-hoc analysis by providing quick and reliable data access. • Contribute to longer-term goals by developing scalable and maintainable data solutions. 1. Documentation: • Maintain comprehensive documentation of data pipelines, processes, and infrastructure. • Ensure knowledge transfer and continuity within the team. Technical Requirements: 1. Education and Experience: • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field. • 3+ years of experience in data engineering or a related role. 1. Technical Skills: • Proficiency in Python (mandatory). • Experience with other programming languages such as Java or Scala is a plus. • Experience with SQL and NoSQL databases (e.g., MySQL, PostgreSQL, MongoDB). • Familiarity with big data technologies (e.g., Hadoop, Spark, Kafka). • Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and their data services. 1. RAG Type LLM Skills: • Experience with data pipelines for LLM workflows, including data retrieval and augmentation. • Familiarity with natural language processing (NLP) techniques and tools. • Understanding of LLM architectures and their data requirements. 1. Semantic/Ontology Data Layers: • Familiarity with semantic and ontology data layers and their application in data integration and retrieval. 1. Tools and Frameworks: • Experience with ETL tools and frameworks (e.g., Apache NiFi, Airflow, Talend). • Familiarity with data visualization tools (e.g., Tableau, Power BI) is a plus. 1. Soft Skills: • Strong analytical and problem-solving skills. • Excellent communication and collaboration abilities. • Ability to work in a fast-paced, dynamic environment. Preferred Qualifications: • Experience with machine learning and data science workflows. • Knowledge of data governance and compliance standards. • Certification in cloud platforms or data engineering.

Apply now Apply with DFH Sign up

← See all roles