Galent

Data Scientist - New York City and San Francisco CA (Fulltime Candidates Only)

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Scientist in New York City or San Francisco, requiring 4–6 years of experience in applied machine learning. Key skills include Python, Databricks, and ML frameworks. Contract length exceeds 6 months, with a competitive pay rate.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
February 17, 2026
🕒 - Duration
More than 6 months
-
🏝️ - Location
Unknown
-
📄 - Contract
Fixed Term
-
🔒 - Security
Unknown
-
📍 - Location detailed
San Francisco Bay Area
-
🧠 - Skills detailed
#Spark (Apache Spark) #ML (Machine Learning) #Databricks #Datasets #Libraries #NumPy #Data Engineering #PyTorch #TensorFlow #Data Ingestion #Security #Hadoop #Pandas #Regression #Deployment #PySpark #Documentation #AWS (Amazon Web Services) #Azure #Classification #Compliance #Data Science #GCP (Google Cloud Platform) #Cloud #SQL (Structured Query Language) #Python
Role description
Data Scientist Location: New York City and San Francisco CA Primary focus: Model reproduction, feature engineering logic, performance validation, and ensuring alignment with established modeling frameworks. • Rebuild and port existing Python-based models into the customer’s Databricks platform. • Develop, train, and validate predictive models using Python, PySpark, and ML frameworks such as scikit-learn, XGBoost, and Spark MLlib. • Develop, validate, and reproduce feature engineering logic, ensuring parity with baseline models. • Train, retrain, validate, and benchmark model performance using customer-provided datasets, while maintaining performance parity with reference models. • Work with Data Engineers to define feature requirements and ensure datasets support model needs. • Perform model diagnostics, bias checks, stability checks, and accuracy assessments. • Prepare model documentation, validation summaries, and stakeholder-ready insights. • Support scoring pipeline design and ensure reproducibility across Dev / QA / Prod. • Collaborate with compliance and platform teams to ensure adherence to governance requirements. • Perform model diagnostics, hyperparameter tuning, and stability analysis. • Evaluate model performance across population segments and time periods. • Work with platform and engineering teams to support scoring pipeline deployment across Dev / QA / Prod. Qualifications • 4–6 years of experience in applied machine learning or data science. • Strong hands-on experience with Python and ML libraries such as scikit-learn, XGBoost, LightGBM, CatBoost, or similar. • Experience developing ML models in Databricks using Python or PySpark. • Strong knowledge of feature engineering, model training workflows, and evaluation techniques. • Experience working with large structured datasets (financial or transactional data preferred). • Ability to write clear documentation and communicate technical results to non-technical stakeholders. • 4+ years of hands-on experience developing, deploying, and maintaining ML models. • Advanced proficiency in Python (NumPy, pandas, scikit-learn, PyTorch or TensorFlow). • Strong statistical and mathematical foundation, including regression, classification, probability, and optimization. • Experience building end-to-end ML pipelines: data ingestion, cleaning, feature engineering, modeling, evaluation, and deployment. • Experience working within client environments, including adapting to unfamiliar infrastructure, constraints, and security requirements. • Experience with cloud platforms (AWS, Azure, or GCP) and on-prem environments. • Advanced SQL ability and experience with big-data tools (Spark, Databricks, Hadoop).