

HireTalent - Diversity Staffing & Recruiting Firm
Data Scientist
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior Data Scientist specializing in LLM/GenAI, offering a 100% remote contract with competitive pay. Requires 4+ years in data science, expertise in NLP, and proficiency in Python, SQL, and Databricks.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
Unknown
-
ποΈ - Date
December 19, 2025
π - Duration
Unknown
-
ποΈ - Location
Remote
-
π - Contract
Unknown
-
π - Security
Unknown
-
π - Location detailed
United States
-
π§ - Skills detailed
#Datasets #OpenSearch #HTML (Hypertext Markup Language) #Indexing #NLP (Natural Language Processing) #Elasticsearch #Monitoring #Spark (Apache Spark) #"ETL (Extract #Transform #Load)" #Databricks #SQL (Structured Query Language) #pydantic #ML (Machine Learning) #AI (Artificial Intelligence) #Python #NumPy #Langchain #Libraries #Pandas #Data Science
Role description
π Senior Data Scientist β LLM / GenAI (100% Remote)
Weβre working with a Client thatβs building real, production-grade LLM systems to power growth analytics and strategic decision-making. This team isnβt experimenting in notebooks. Theyβre deploying retrieval-grounded AI at scale, working with large, messy document datasets, and setting standards for how GenAI is used in a regulated environment.
If you enjoy owning systems end to end and care about accuracy, grounding, and impact, this role is worth a look.
What Youβll Be Working On
β’ Designing and deploying retrieval-grounded LLM systems, from standard to advanced RAG patterns
β’ Building pipelines to ingest, transform, and normalize large internal and public datasets
β’ Processing complex documents including PDFs, HTML, and scanned content, using OCR, layout-aware parsing, and table extraction
β’ Developing LLM-driven information extraction workflows with structured outputs, validation, and accuracy evaluation
β’ Owning the full retrieval stack: chunking strategies, embeddings, indexing, hybrid retrieval, reranking, and relevance tuning
β’ Integrating web-based data sources with safeguards like retries, rate limiting, and change detection
β’ Establishing evaluation and monitoring practices to ensure grounded, reliable outputs in production
β’ Partnering closely with analytics and business stakeholders to turn ambiguous questions into measurable outcomes
What You Bring
β’ 4+ years of experience in data science or applied ML, with deep focus on NLP and GenAI
β’ Strong hands-on experience building LLM-based retrieval or information extraction systems used in real-world production settings
β’ Proficiency in Python and SQL, with a strong engineering mindset
β’ Solid experience with Databricks, Spark, and lakehouse architectures
β’ Deep understanding of vector search concepts including embeddings, hybrid retrieval, and reranking
β’ Experience working with semi-structured and unstructured data such as PDFs, tables, forms, and web content
β’ A strong grasp of grounding, attribution, and evaluation techniques that reduce hallucinations
β’ Ability to clearly communicate tradeoffs and recommendations to both technical and non-technical partners
Tools & Tech Youβll Use
β’ LLM and orchestration frameworks: OpenAI, Google GenAI, LangChain, LangGraph
β’ Retrieval and vector tooling: FAISS, Elasticsearch/OpenSearch, Pinecone, Weaviate, Milvus, Chroma
β’ Supporting libraries: Pydantic, Tenacity, BeautifulSoup, Pandas, NumPy
Nice to Have
β’ Experience with agentic workflows and tool-calling patterns
β’ Background working in regulated environments with governance, auditability, and access controls
Why This Role
Youβll be joining a team that treats GenAI as a core capability, not a side project. The work is technical, impactful, and highly visible, with real ownership from ingestion to production.
If youβre excited about building reliable LLM systems that actually get used, this is a strong next step.
π Senior Data Scientist β LLM / GenAI (100% Remote)
Weβre working with a Client thatβs building real, production-grade LLM systems to power growth analytics and strategic decision-making. This team isnβt experimenting in notebooks. Theyβre deploying retrieval-grounded AI at scale, working with large, messy document datasets, and setting standards for how GenAI is used in a regulated environment.
If you enjoy owning systems end to end and care about accuracy, grounding, and impact, this role is worth a look.
What Youβll Be Working On
β’ Designing and deploying retrieval-grounded LLM systems, from standard to advanced RAG patterns
β’ Building pipelines to ingest, transform, and normalize large internal and public datasets
β’ Processing complex documents including PDFs, HTML, and scanned content, using OCR, layout-aware parsing, and table extraction
β’ Developing LLM-driven information extraction workflows with structured outputs, validation, and accuracy evaluation
β’ Owning the full retrieval stack: chunking strategies, embeddings, indexing, hybrid retrieval, reranking, and relevance tuning
β’ Integrating web-based data sources with safeguards like retries, rate limiting, and change detection
β’ Establishing evaluation and monitoring practices to ensure grounded, reliable outputs in production
β’ Partnering closely with analytics and business stakeholders to turn ambiguous questions into measurable outcomes
What You Bring
β’ 4+ years of experience in data science or applied ML, with deep focus on NLP and GenAI
β’ Strong hands-on experience building LLM-based retrieval or information extraction systems used in real-world production settings
β’ Proficiency in Python and SQL, with a strong engineering mindset
β’ Solid experience with Databricks, Spark, and lakehouse architectures
β’ Deep understanding of vector search concepts including embeddings, hybrid retrieval, and reranking
β’ Experience working with semi-structured and unstructured data such as PDFs, tables, forms, and web content
β’ A strong grasp of grounding, attribution, and evaluation techniques that reduce hallucinations
β’ Ability to clearly communicate tradeoffs and recommendations to both technical and non-technical partners
Tools & Tech Youβll Use
β’ LLM and orchestration frameworks: OpenAI, Google GenAI, LangChain, LangGraph
β’ Retrieval and vector tooling: FAISS, Elasticsearch/OpenSearch, Pinecone, Weaviate, Milvus, Chroma
β’ Supporting libraries: Pydantic, Tenacity, BeautifulSoup, Pandas, NumPy
Nice to Have
β’ Experience with agentic workflows and tool-calling patterns
β’ Background working in regulated environments with governance, auditability, and access controls
Why This Role
Youβll be joining a team that treats GenAI as a core capability, not a side project. The work is technical, impactful, and highly visible, with real ownership from ingestion to production.
If youβre excited about building reliable LLM systems that actually get used, this is a strong next step.






