Data Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer with a 6-month contract, offering a pay rate of "$X/hour." Required skills include Python, SQL, and experience with data pipelines for RAG type LLM workflows. A degree in Computer Science and 3+ years of experience are essential.
🌎 - Country
United States
πŸ’± - Currency
$ USD
-
πŸ’° - Day rate
-
πŸ—“οΈ - Date discovered
May 31, 2025
πŸ•’ - Project duration
Unknown
-
🏝️ - Location type
Unknown
-
πŸ“„ - Contract type
Unknown
-
πŸ”’ - Security clearance
Unknown
-
πŸ“ - Location detailed
Cincinnati, OH
-
🧠 - Skills detailed
#Kafka (Apache Kafka) #Cloud #NoSQL #Apache NiFi #Talend #Azure #Hadoop #Computer Science #Compliance #Databases #NLP (Natural Language Processing) #Data Governance #Scala #Documentation #Programming #Big Data #ML (Machine Learning) #Python #Microsoft Power BI #Data Access #MySQL #"ETL (Extract #Transform #Load)" #Data Engineering #PostgreSQL #Data Science #Visualization #AWS (Amazon Web Services) #Java #BI (Business Intelligence) #Airflow #Data Layers #Spark (Apache Spark) #Data Pipeline #Data Integration #SQL (Structured Query Language) #NiFi (Apache NiFi) #Data Accuracy #Tableau #MongoDB #Data Quality
Role description
Job Summary: The ideal candidate will be responsible for designing, building, and maintaining scalable data pipelines and infrastructure to support data analytics, machine learning, and Retrieval-Augmented Generation (RAG) type Large Language Model (LLM) workflows. This role requires a strong technical background, excellent problem-solving skills, and the ability to work collaboratively with data scientists, analysts, and other stakeholders. Key Responsibilities: 1. Data Pipeline Development: β€’ Design, develop, and maintain robust and scalable ETL (Extract, Transform, Load) processes. β€’ Ensure data is collected, processed, and stored efficiently and accurately. 1. Data Integration: β€’ Integrate data from various sources, including databases, APIs, and third-party data providers. β€’ Ensure data consistency and integrity across different systems. 1. RAG Type LLM Workflows: β€’ Develop and maintain data pipelines specifically tailored for Retrieval-Augmented Generation (RAG) type Large Language Model (LLM) workflows. β€’ Ensure efficient data retrieval and augmentation processes to support LLM training and inference. β€’ Collaborate with data scientists to optimize data pipelines for LLM performance and accuracy. 1. Semantic/Ontology Data Layers: β€’ Develop and maintain semantic and ontology data layers to enhance data integration and retrieval. β€’ Ensure data is semantically enriched to support advanced analytics and machine learning models. 1. Collaboration: β€’ Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions. β€’ Provide technical support and guidance on data-related issues. 1. Data Quality and Governance: β€’ Implement data quality checks and validation processes to ensure data accuracy and reliability. β€’ Adhere to data governance policies and best practices. 1. Performance Optimization: β€’ Monitor and optimize the performance of data pipelines and infrastructure. β€’ Troubleshoot and resolve data-related issues in a timely manner. 1. Support for Analysis: β€’ Support short-term ad-hoc analysis by providing quick and reliable data access. β€’ Contribute to longer-term goals by developing scalable and maintainable data solutions. 1. Documentation: β€’ Maintain comprehensive documentation of data pipelines, processes, and infrastructure. β€’ Ensure knowledge transfer and continuity within the team. Technical Requirements: 1. Education and Experience: β€’ Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field. β€’ 3+ years of experience in data engineering or a related role. 1. Technical Skills: β€’ Proficiency in Python (mandatory). β€’ Experience with other programming languages such as Java or Scala is a plus. β€’ Experience with SQL and NoSQL databases (e.g., MySQL, PostgreSQL, MongoDB). β€’ Familiarity with big data technologies (e.g., Hadoop, Spark, Kafka). β€’ Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and their data services. 1. RAG Type LLM Skills: β€’ Experience with data pipelines for LLM workflows, including data retrieval and augmentation. β€’ Familiarity with natural language processing (NLP) techniques and tools. β€’ Understanding of LLM architectures and their data requirements. 1. Semantic/Ontology Data Layers: β€’ Familiarity with semantic and ontology data layers and their application in data integration and retrieval. 1. Tools and Frameworks: β€’ Experience with ETL tools and frameworks (e.g., Apache NiFi, Airflow, Talend). β€’ Familiarity with data visualization tools (e.g., Tableau, Power BI) is a plus. 1. Soft Skills: β€’ Strong analytical and problem-solving skills. β€’ Excellent communication and collaboration abilities. β€’ Ability to work in a fast-paced, dynamic environment. Preferred Qualifications: β€’ Experience with machine learning and data science workflows. β€’ Knowledge of data governance and compliance standards. β€’ Certification in cloud platforms or data engineering.