

Galent
Data Scientist - New York City and San Francisco CA (Fulltime Candidates Only)
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Scientist in New York City or San Francisco, requiring 4–6 years of experience in applied machine learning. Key skills include Python, Databricks, and ML frameworks. Contract length exceeds 6 months, with a competitive pay rate.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
February 17, 2026
🕒 - Duration
More than 6 months
-
🏝️ - Location
Unknown
-
📄 - Contract
Fixed Term
-
🔒 - Security
Unknown
-
📍 - Location detailed
San Francisco Bay Area
-
🧠 - Skills detailed
#Spark (Apache Spark) #ML (Machine Learning) #Databricks #Datasets #Libraries #NumPy #Data Engineering #PyTorch #TensorFlow #Data Ingestion #Security #Hadoop #Pandas #Regression #Deployment #PySpark #Documentation #AWS (Amazon Web Services) #Azure #Classification #Compliance #Data Science #GCP (Google Cloud Platform) #Cloud #SQL (Structured Query Language) #Python
Role description
Data Scientist
Location: New York City and San Francisco CA
Primary focus: Model reproduction, feature engineering logic, performance validation, and ensuring alignment with established modeling frameworks.
• Rebuild and port existing Python-based models into the customer’s Databricks platform.
• Develop, train, and validate predictive models using Python, PySpark, and ML frameworks such as scikit-learn, XGBoost, and Spark MLlib.
• Develop, validate, and reproduce feature engineering logic, ensuring parity with baseline models.
• Train, retrain, validate, and benchmark model performance using customer-provided datasets, while maintaining performance parity with reference models.
• Work with Data Engineers to define feature requirements and ensure datasets support model needs.
• Perform model diagnostics, bias checks, stability checks, and accuracy assessments.
• Prepare model documentation, validation summaries, and stakeholder-ready insights.
• Support scoring pipeline design and ensure reproducibility across Dev / QA / Prod.
• Collaborate with compliance and platform teams to ensure adherence to governance requirements.
• Perform model diagnostics, hyperparameter tuning, and stability analysis.
• Evaluate model performance across population segments and time periods.
• Work with platform and engineering teams to support scoring pipeline deployment across Dev / QA / Prod.
Qualifications
• 4–6 years of experience in applied machine learning or data science.
• Strong hands-on experience with Python and ML libraries such as scikit-learn, XGBoost, LightGBM, CatBoost, or similar.
• Experience developing ML models in Databricks using Python or PySpark.
• Strong knowledge of feature engineering, model training workflows, and evaluation techniques.
• Experience working with large structured datasets (financial or transactional data preferred).
• Ability to write clear documentation and communicate technical results to non-technical stakeholders.
• 4+ years of hands-on experience developing, deploying, and maintaining ML models.
• Advanced proficiency in Python (NumPy, pandas, scikit-learn, PyTorch or TensorFlow).
• Strong statistical and mathematical foundation, including regression, classification, probability, and optimization.
• Experience building end-to-end ML pipelines: data ingestion, cleaning, feature engineering, modeling, evaluation, and deployment.
• Experience working within client environments, including adapting to unfamiliar infrastructure, constraints, and security requirements.
• Experience with cloud platforms (AWS, Azure, or GCP) and on-prem environments.
• Advanced SQL ability and experience with big-data tools (Spark, Databricks, Hadoop).
Data Scientist
Location: New York City and San Francisco CA
Primary focus: Model reproduction, feature engineering logic, performance validation, and ensuring alignment with established modeling frameworks.
• Rebuild and port existing Python-based models into the customer’s Databricks platform.
• Develop, train, and validate predictive models using Python, PySpark, and ML frameworks such as scikit-learn, XGBoost, and Spark MLlib.
• Develop, validate, and reproduce feature engineering logic, ensuring parity with baseline models.
• Train, retrain, validate, and benchmark model performance using customer-provided datasets, while maintaining performance parity with reference models.
• Work with Data Engineers to define feature requirements and ensure datasets support model needs.
• Perform model diagnostics, bias checks, stability checks, and accuracy assessments.
• Prepare model documentation, validation summaries, and stakeholder-ready insights.
• Support scoring pipeline design and ensure reproducibility across Dev / QA / Prod.
• Collaborate with compliance and platform teams to ensure adherence to governance requirements.
• Perform model diagnostics, hyperparameter tuning, and stability analysis.
• Evaluate model performance across population segments and time periods.
• Work with platform and engineering teams to support scoring pipeline deployment across Dev / QA / Prod.
Qualifications
• 4–6 years of experience in applied machine learning or data science.
• Strong hands-on experience with Python and ML libraries such as scikit-learn, XGBoost, LightGBM, CatBoost, or similar.
• Experience developing ML models in Databricks using Python or PySpark.
• Strong knowledge of feature engineering, model training workflows, and evaluation techniques.
• Experience working with large structured datasets (financial or transactional data preferred).
• Ability to write clear documentation and communicate technical results to non-technical stakeholders.
• 4+ years of hands-on experience developing, deploying, and maintaining ML models.
• Advanced proficiency in Python (NumPy, pandas, scikit-learn, PyTorch or TensorFlow).
• Strong statistical and mathematical foundation, including regression, classification, probability, and optimization.
• Experience building end-to-end ML pipelines: data ingestion, cleaning, feature engineering, modeling, evaluation, and deployment.
• Experience working within client environments, including adapting to unfamiliar infrastructure, constraints, and security requirements.
• Experience with cloud platforms (AWS, Azure, or GCP) and on-prem environments.
• Advanced SQL ability and experience with big-data tools (Spark, Databricks, Hadoop).






