IPolarity

Data Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for a Data Engineer with a contract length of "Unknown," offering a pay rate of "Unknown." Key skills include Python, SQL, NoSQL, and experience with LLMs, Apache Spark, and cloud platforms. Industry experience in AI/ML is required.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

Unknown

🗓️ - Date

March 17, 2026

🕒 - Duration

Unknown

🏝️ - Location

Unknown

📄 - Contract

Unknown

🔒 - Security

Unknown

📍 - Location detailed

Whippany, NJ

🧠 - Skills detailed

#Scala #API (Application Programming Interface) #Data Cleaning #Langchain #SQL (Structured Query Language) #Datasets #AWS (Amazon Web Services) #Transformers #NoSQL #Cloud #ML (Machine Learning) #Apache Spark #Mathematics #Storage #Model Evaluation #Databases #Hugging Face #Azure #GCP (Google Cloud Platform) #Data Management #Kubernetes #Data Pipeline #Kafka (Apache Kafka) #PyTorch #Spark (Apache Spark) #Data Engineering #AI (Artificial Intelligence) #Python #Docker #"ETL (Extract #Transform #Load)" #NLP (Natural Language Processing) #TensorFlow

Role description

We are looking for Data Engineer experienced and skilled in designing, building, and maintaining high-quality data pipelines, preprocessing workflows, and vector databases required for training, fine-tuning, and deploying Large Language Models (LLMs). Build and maintain high-throughput data pipelines, infrastructure, and storage solutions specifically to feed, train, and deploy AI/ML models, implementing RAG (Retrieval-Augmented Generation) systems, data cleaning, and model evaluation to ensure efficient, scalable, and reliable LLM applications. Required Skills & Qualifications • Strong proficiency in Python is essential, along with SQL and NoSQL for data management. • Experience with LangChain, LlamaIndex, Hugging Face Transformers, and OpenAI API • Experience with Apache Spark, Kafka, or modern data stack tools. • Knowledge of NLP techniques, word embeddings, tokenization, and vector mathematics. • Familiarity with TensorFlow, PyTorch, or Hugging Face • Familiarity with cloud platforms (AWS, GCP, Azure), CI/CD, Docker, and Kubernetes. Key Responsibilities • Design and build robust ETL/ELT pipelines for unstructured text data, including scraping, cleaning, deduplication, and transformation for LLM training. • Build and maintain vector search solutions (e.g., Pinecone, Milvus, Weaviate, Chroma) to store and retrieve embeddings for RAG systems. • Prepare high-quality datasets for fine-tuning adapters (e.g., LoRA) and train LLMs using frameworks like PyTorch or TensorFlow. • Implement Retrieval-Augmented Generation using frameworks like LangChain or LlamaIndex to connect LLMs to company data. • Develop evaluation frameworks for model performance, testing for accuracy, hallucination, and bias, and monitor deployed models. • Create APIs and internal web tools for data annotation, curation, and model interaction.

Apply now Apply with DFH

Materials and Manufacturing Planning Analyst

This role is a 6-month contract for a Materials and Manufacturing Planning Analyst focusing on data cleanup and reconciliation in a manufacturing environment. Requires 1–3 years of materials management experience, strong Microsoft Excel skills, and familiarity with ERP systems.

IPolarity

Data Engineer

Materials and Manufacturing Planning Analyst

Lead SRE & Edge Architect

Senior SQL Database Administrator

AWS Data Migration Service (DMS) Developer (contract)

Book a

chat

with us

Company