Galent

Data Scientist - New York City and San Francisco CA (Fulltime Candidates Only)

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for a Data Scientist in New York City or San Francisco, requiring 4–6 years of experience in applied machine learning. Key skills include Python, Databricks, and ML frameworks. Contract length exceeds 6 months, with a competitive pay rate.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

Unknown

🗓️ - Date

February 17, 2026

🕒 - Duration

More than 6 months

🏝️ - Location

Unknown

📄 - Contract

Fixed Term

🔒 - Security

Unknown

📍 - Location detailed

San Francisco Bay Area

🧠 - Skills detailed

#Spark (Apache Spark) #ML (Machine Learning) #Databricks #Datasets #Libraries #NumPy #Data Engineering #PyTorch #TensorFlow #Data Ingestion #Security #Hadoop #Pandas #Regression #Deployment #PySpark #Documentation #AWS (Amazon Web Services) #Azure #Classification #Compliance #Data Science #GCP (Google Cloud Platform) #Cloud #SQL (Structured Query Language) #Python

Role description

Data Scientist Location: New York City and San Francisco CA Primary focus: Model reproduction, feature engineering logic, performance validation, and ensuring alignment with established modeling frameworks. • Rebuild and port existing Python-based models into the customer’s Databricks platform. • Develop, train, and validate predictive models using Python, PySpark, and ML frameworks such as scikit-learn, XGBoost, and Spark MLlib. • Develop, validate, and reproduce feature engineering logic, ensuring parity with baseline models. • Train, retrain, validate, and benchmark model performance using customer-provided datasets, while maintaining performance parity with reference models. • Work with Data Engineers to define feature requirements and ensure datasets support model needs. • Perform model diagnostics, bias checks, stability checks, and accuracy assessments. • Prepare model documentation, validation summaries, and stakeholder-ready insights. • Support scoring pipeline design and ensure reproducibility across Dev / QA / Prod. • Collaborate with compliance and platform teams to ensure adherence to governance requirements. • Perform model diagnostics, hyperparameter tuning, and stability analysis. • Evaluate model performance across population segments and time periods. • Work with platform and engineering teams to support scoring pipeline deployment across Dev / QA / Prod. Qualifications • 4–6 years of experience in applied machine learning or data science. • Strong hands-on experience with Python and ML libraries such as scikit-learn, XGBoost, LightGBM, CatBoost, or similar. • Experience developing ML models in Databricks using Python or PySpark. • Strong knowledge of feature engineering, model training workflows, and evaluation techniques. • Experience working with large structured datasets (financial or transactional data preferred). • Ability to write clear documentation and communicate technical results to non-technical stakeholders. • 4+ years of hands-on experience developing, deploying, and maintaining ML models. • Advanced proficiency in Python (NumPy, pandas, scikit-learn, PyTorch or TensorFlow). • Strong statistical and mathematical foundation, including regression, classification, probability, and optimization. • Experience building end-to-end ML pipelines: data ingestion, cleaning, feature engineering, modeling, evaluation, and deployment. • Experience working within client environments, including adapting to unfamiliar infrastructure, constraints, and security requirements. • Experience with cloud platforms (AWS, Azure, or GCP) and on-prem environments. • Advanced SQL ability and experience with big-data tools (Spark, Databricks, Hadoop).

Apply now Apply with DFH

← See all roles