Data Scientist(In Person Interview)

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Scientist with 4+ years of experience, focusing on LLM training, graph-based retrieval, and predictive modeling. It is a long-term remote position, requiring expertise in Python, R, SQL, and cloud environments.
🌎 - Country
United States
πŸ’± - Currency
$ USD
-
πŸ’° - Day rate
-
πŸ—“οΈ - Date discovered
September 24, 2025
πŸ•’ - Project duration
Unknown
-
🏝️ - Location type
Remote
-
πŸ“„ - Contract type
Unknown
-
πŸ”’ - Security clearance
Unknown
-
πŸ“ - Location detailed
United States
-
🧠 - Skills detailed
#Computer Science #SQL (Structured Query Language) #Knowledge Graph #Predictive Modeling #Databases #Programming #AI (Artificial Intelligence) #Python #R #ML (Machine Learning) #Compliance #Data Science #HBase #Deployment #Indexing #Cloud #Reinforcement Learning #Classification
Role description
Job Title: Data Scientist (In-Person Interview) Location: Remote Duration: Long-Term Visa: Any visa is Fine(Need 4+ years of experience in Data Scientist) Interview process: 1st round virtual and 2nd round (F2F mandatory) End Client: Working in Implementation Project Note: Need at least 4+ years of experience complete into Data Scientist. Note: while sharing the resume please share writeup for data scientist for your experience. Please share resumes at Akhil@tror.ai Key Responsibilities β€’ Lead end-to-end training and fine-tuning of Large Language Models (LLMs), including both open-source (e.g., Qwen, Llama, Mistral) and closed-source (e.g., OpenAI, Gemini, Anthropic) ecosystems. β€’ Architect and implement Graph RAG pipelines, including knowledge graph representation and retrieval for enhanced contextual grounding. β€’ Design, train, and optimize semantic and dense vector embeddings for document understanding, search, and retrieval. β€’ Develop semantic retrieval systems with advanced document segmentation and indexing strategies. β€’ Build and scale distributed training environments using NCCL and InfiniBand for multi-GPU and multi-node training. β€’ Apply reinforcement learning techniques (e.g., RLHF, RLAIF) to align model behavior with human preferences and domain-specific goals. β€’ Collaborate with cross-functional teams to translate business needs into AI-driven solutions and deploy them in production environments. β€’ Designing and implementing analytical frameworks. β€’ Developing predictive and prescriptive models. Preferred Qualifications β€’ PhD or master’s degree in computer science, Machine Learning, or related field. β€’ 4+ years of experience in applied AI/ML, with a strong track record of delivering production-grade models. Deep expertise in: β€’ LLM training and fine-tuning (e.g., GPT, Llama, Mistral, Qwen) β€’ Graph-based retrieval systems (Graph RAG, knowledge graphs) β€’ Embedding models (e.g., BGE, E5, SimCSE) β€’ Semantic search and vector databases (e.g., FAISS, Weaviate, Milvus) β€’ Document segmentation and preprocessing (OCR, layout parsing) β€’ Distributed training frameworks (NCCL, Horovod, DeepSpeed) β€’ High-performance networking (InfiniBand, RDMA) β€’ Model fusion and ensemble techniques (stacking, boosting, gating) β€’ Optimization algorithms (Bayesian, Particle Swarm, Genetic Algorithms) β€’ Symbolic AI and rule-based systems β€’ Meta-learning and Mixture of Experts architectures β€’ Reinforcement learning (e.g., RLHF, PPO, DPO) β€’ Experience applying causal inference techniques (e.g., causal impact analysis, uplift modeling, DoWhy) to marketing and engagement analytics. β€’ Exercise independent judgment in methods, techniques, and evaluation criteria on data science projects, overseeing the end-to-end process from problem definition to model implementation. β€’ Proficiency with programming languages like Python, R, and SQL. β€’ Strong background in predictive modeling, classification, segmentation, and optimization. β€’ Extensively worked in any Cloud environment. Bonus Skills β€’ Familiarity with regulatory and compliance frameworks in AI deployment. β€’ Contributions to open-source AI projects or published research. And/Or ability to take research papers to poc – production.