

Data Scientist (In Person Interview)
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Scientist with 4+ years of experience, focused on LLM training and graph-based systems. It offers a long-term remote contract with a competitive pay rate. A PhD or master's in a related field is preferred.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
-
ποΈ - Date discovered
September 24, 2025
π - Project duration
Unknown
-
ποΈ - Location type
Remote
-
π - Contract type
Unknown
-
π - Security clearance
Unknown
-
π - Location detailed
United States
-
π§ - Skills detailed
#Computer Science #SQL (Structured Query Language) #Knowledge Graph #Predictive Modeling #Databases #Programming #AI (Artificial Intelligence) #Python #R #ML (Machine Learning) #Compliance #Data Science #HBase #Deployment #Indexing #Cloud #Reinforcement Learning #Classification
Role description
Job Title: Data Scientist (In-Person Interview)
Location: Remote
Duration: Long-Term
Visa: Any visa is Fine(Need 4+ years of experience in Data Scientist)
Interview process: 1st round virtual and 2nd round (F2F mandatory)
End Client: Working in Implementation Project
Note: Need at least 4+ years of experience complete into Data Scientist.
Note: while sharing the resume please share writeup for data scientist for your experience.
Please share resumes at Akhil@tror.ai
Key Responsibilities
β’ Lead end-to-end training and fine-tuning of Large Language Models (LLMs), including both open-source (e.g., Qwen, Llama, Mistral) and closed-source (e.g., OpenAI, Gemini, Anthropic) ecosystems.
β’ Architect and implement Graph RAG pipelines, including knowledge graph representation and retrieval for enhanced contextual grounding.
β’ Design, train, and optimize semantic and dense vector embeddings for document understanding, search, and retrieval.
β’ Develop semantic retrieval systems with advanced document segmentation and indexing strategies.
β’ Build and scale distributed training environments using NCCL and InfiniBand for multi-GPU and multi-node training.
β’ Apply reinforcement learning techniques (e.g., RLHF, RLAIF) to align model behavior with human preferences and domain-specific goals.
β’ Collaborate with cross-functional teams to translate business needs into AI-driven solutions and deploy them in production environments.
β’ Designing and implementing analytical frameworks.
β’ Developing predictive and prescriptive models.
Preferred Qualifications
β’ PhD or masterβs degree in computer science, Machine Learning, or related field.
β’ 4+ years of experience in applied AI/ML, with a strong track record of delivering production-grade models.
Deep Expertise In
β’ LLM training and fine-tuning (e.g., GPT, Llama, Mistral, Qwen)
β’ Graph-based retrieval systems (Graph RAG, knowledge graphs)
β’ Embedding models (e.g., BGE, E5, SimCSE)
β’ Semantic search and vector databases (e.g., FAISS, Weaviate, Milvus)
β’ Document segmentation and preprocessing (OCR, layout parsing)
β’ Distributed training frameworks (NCCL, Horovod, DeepSpeed)
β’ High-performance networking (InfiniBand, RDMA)
β’ Model fusion and ensemble techniques (stacking, boosting, gating)
β’ Optimization algorithms (Bayesian, Particle Swarm, Genetic Algorithms)
β’ Symbolic AI and rule-based systems
β’ Meta-learning and Mixture of Experts architectures
β’ Reinforcement learning (e.g., RLHF, PPO, DPO)
β’ Experience applying causal inference techniques (e.g., causal impact analysis, uplift modeling, DoWhy) to marketing and engagement analytics.
β’ Exercise independent judgment in methods, techniques, and evaluation criteria on data science projects, overseeing the end-to-end process from problem definition to model implementation.
β’ Proficiency with programming languages like Python, R, and SQL.
β’ Strong background in predictive modeling, classification, segmentation, and optimization.
β’ Extensively worked in any Cloud environment.
Bonus Skills
β’ Familiarity with regulatory and compliance frameworks in AI deployment.
β’ Contributions to open-source AI projects or published research. And/Or ability to take research papers to poc β production.
Job Title: Data Scientist (In-Person Interview)
Location: Remote
Duration: Long-Term
Visa: Any visa is Fine(Need 4+ years of experience in Data Scientist)
Interview process: 1st round virtual and 2nd round (F2F mandatory)
End Client: Working in Implementation Project
Note: Need at least 4+ years of experience complete into Data Scientist.
Note: while sharing the resume please share writeup for data scientist for your experience.
Please share resumes at Akhil@tror.ai
Key Responsibilities
β’ Lead end-to-end training and fine-tuning of Large Language Models (LLMs), including both open-source (e.g., Qwen, Llama, Mistral) and closed-source (e.g., OpenAI, Gemini, Anthropic) ecosystems.
β’ Architect and implement Graph RAG pipelines, including knowledge graph representation and retrieval for enhanced contextual grounding.
β’ Design, train, and optimize semantic and dense vector embeddings for document understanding, search, and retrieval.
β’ Develop semantic retrieval systems with advanced document segmentation and indexing strategies.
β’ Build and scale distributed training environments using NCCL and InfiniBand for multi-GPU and multi-node training.
β’ Apply reinforcement learning techniques (e.g., RLHF, RLAIF) to align model behavior with human preferences and domain-specific goals.
β’ Collaborate with cross-functional teams to translate business needs into AI-driven solutions and deploy them in production environments.
β’ Designing and implementing analytical frameworks.
β’ Developing predictive and prescriptive models.
Preferred Qualifications
β’ PhD or masterβs degree in computer science, Machine Learning, or related field.
β’ 4+ years of experience in applied AI/ML, with a strong track record of delivering production-grade models.
Deep Expertise In
β’ LLM training and fine-tuning (e.g., GPT, Llama, Mistral, Qwen)
β’ Graph-based retrieval systems (Graph RAG, knowledge graphs)
β’ Embedding models (e.g., BGE, E5, SimCSE)
β’ Semantic search and vector databases (e.g., FAISS, Weaviate, Milvus)
β’ Document segmentation and preprocessing (OCR, layout parsing)
β’ Distributed training frameworks (NCCL, Horovod, DeepSpeed)
β’ High-performance networking (InfiniBand, RDMA)
β’ Model fusion and ensemble techniques (stacking, boosting, gating)
β’ Optimization algorithms (Bayesian, Particle Swarm, Genetic Algorithms)
β’ Symbolic AI and rule-based systems
β’ Meta-learning and Mixture of Experts architectures
β’ Reinforcement learning (e.g., RLHF, PPO, DPO)
β’ Experience applying causal inference techniques (e.g., causal impact analysis, uplift modeling, DoWhy) to marketing and engagement analytics.
β’ Exercise independent judgment in methods, techniques, and evaluation criteria on data science projects, overseeing the end-to-end process from problem definition to model implementation.
β’ Proficiency with programming languages like Python, R, and SQL.
β’ Strong background in predictive modeling, classification, segmentation, and optimization.
β’ Extensively worked in any Cloud environment.
Bonus Skills
β’ Familiarity with regulatory and compliance frameworks in AI deployment.
β’ Contributions to open-source AI projects or published research. And/Or ability to take research papers to poc β production.