

Centraprise
Senior Databricks Developer – PySpark & Delta Lake
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior Databricks Developer – PySpark & Delta Lake in Princeton, NJ (Hybrid, 2 days). Requires 5–8 years of experience in data engineering, strong PySpark and Databricks expertise, and knowledge of AWS. Databricks certification is a plus.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
440
-
🗓️ - Date
May 15, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Hybrid
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Princeton, NJ
-
🧠 - Skills detailed
#Monitoring #Databricks #Airflow #SQL (Structured Query Language) #Delta Lake #Deployment #Data Engineering #Spark SQL #Automation #Data Quality #Data Governance #Observability #AWS (Amazon Web Services) #Cloud #"ETL (Extract #Transform #Load)" #Apache Airflow #PySpark #Scala #Spark (Apache Spark) #Data Framework #Data Pipeline #GIT #Logging #Data Lineage
Role description
Senior Senior Databricks Developer – PySpark & Delta Lake
Location: Princeton, NJ - Hybrid 2 days,
Experience Required: 5–8 Years
Job Summary:
We are looking for an experienced and technically strong Senior Databricks Developer with 5–8 years of
experience in PySpark and Databricks. This role involves leading the design and development of scalable
data pipelines, reusable accelerator components, and performance-optimized Spark workloads. The
candidate will work closely with architects and platform teams and take ownership of delivering
production-grade data solutions.
Key Responsibilities:
● Lead the design and development of scalable, reusable data pipelines and accelerator
frameworks using PySpark and Databricks.
● Design and execute testing strategies for both structured and unstructured data (PDFs, Text files,
and RTF documents) to ensure high-fidelity transformation into structured formats within the Data
Lakehouse.
● Validate all Source-to-Target (S2T) mappings across bronze, silver, and gold layers to ensure
data lineage and integrity.
● Collaborate with architects and stakeholders to translate business and technical requirements
into robust data solutions.
● Own end-to-end development, testing, and deployment of data pipelines across multiple
environments.
● Drive Spark performance optimization, cost efficiency, and best practices across Databricks
workloads.
● Design and manage workflow orchestration using Databricks Workflows and Apache Airflow.
● Leverage strong PYSpark/ Spark SQL knowledge to create mockup data for exhaustive edge-
case testing.
● Review code, mentor junior developers, and enforce engineering best practices.
● Monitor, troubleshoot, and resolve complex development issues to ensure SLA and data
reliability.
● Contribute to CI/CD pipeline design and automation for data engineering workflows.
Required Skills & Experience:
● 5–8 years of experience in data engineering with strong expertise in PySpark and distributed data
processing.
● Extensive hands-on experience with Databricks (Notebooks, Jobs, Workflows, Delta Lake).
● Strong command of Spark SQL, Advanced SQL, performance tuning, and large-scale data
transformations.
● Proven experience designing modular, reusable, and testable data frameworks or accelerators.
● Hands-on experience with workflow orchestration using Databricks Workflows.
● Solid understanding of CI/CD practices and Git-based source control.
● Working knowledge of AWS cloud platform.
● Strong communication skills with experience collaborating across teams.
Preferred Qualifications:
● Experience building enterprise-grade internal accelerators or data platforms.
● Working knowledge of Unity Catalog for data governance, access control, and lineage.
● Databricks Hands-on experience with PDM (Patient Data Model) and OMOP (Observational
Medical Outcomes Partnership) common data models.
● Experience with K-anonymity testing and data de-identification validation protocols.
● Exposure to Delta Live Tables or declarative pipeline development patterns.
● Exposure to Data Quality Expectations
● Experience with monitoring, logging, and observability of Spark and Databricks workloads.
● Databricks certification (Associate or Professional) is a plus.
Senior Senior Databricks Developer – PySpark & Delta Lake
Location: Princeton, NJ - Hybrid 2 days,
Experience Required: 5–8 Years
Job Summary:
We are looking for an experienced and technically strong Senior Databricks Developer with 5–8 years of
experience in PySpark and Databricks. This role involves leading the design and development of scalable
data pipelines, reusable accelerator components, and performance-optimized Spark workloads. The
candidate will work closely with architects and platform teams and take ownership of delivering
production-grade data solutions.
Key Responsibilities:
● Lead the design and development of scalable, reusable data pipelines and accelerator
frameworks using PySpark and Databricks.
● Design and execute testing strategies for both structured and unstructured data (PDFs, Text files,
and RTF documents) to ensure high-fidelity transformation into structured formats within the Data
Lakehouse.
● Validate all Source-to-Target (S2T) mappings across bronze, silver, and gold layers to ensure
data lineage and integrity.
● Collaborate with architects and stakeholders to translate business and technical requirements
into robust data solutions.
● Own end-to-end development, testing, and deployment of data pipelines across multiple
environments.
● Drive Spark performance optimization, cost efficiency, and best practices across Databricks
workloads.
● Design and manage workflow orchestration using Databricks Workflows and Apache Airflow.
● Leverage strong PYSpark/ Spark SQL knowledge to create mockup data for exhaustive edge-
case testing.
● Review code, mentor junior developers, and enforce engineering best practices.
● Monitor, troubleshoot, and resolve complex development issues to ensure SLA and data
reliability.
● Contribute to CI/CD pipeline design and automation for data engineering workflows.
Required Skills & Experience:
● 5–8 years of experience in data engineering with strong expertise in PySpark and distributed data
processing.
● Extensive hands-on experience with Databricks (Notebooks, Jobs, Workflows, Delta Lake).
● Strong command of Spark SQL, Advanced SQL, performance tuning, and large-scale data
transformations.
● Proven experience designing modular, reusable, and testable data frameworks or accelerators.
● Hands-on experience with workflow orchestration using Databricks Workflows.
● Solid understanding of CI/CD practices and Git-based source control.
● Working knowledge of AWS cloud platform.
● Strong communication skills with experience collaborating across teams.
Preferred Qualifications:
● Experience building enterprise-grade internal accelerators or data platforms.
● Working knowledge of Unity Catalog for data governance, access control, and lineage.
● Databricks Hands-on experience with PDM (Patient Data Model) and OMOP (Observational
Medical Outcomes Partnership) common data models.
● Experience with K-anonymity testing and data de-identification validation protocols.
● Exposure to Delta Live Tables or declarative pipeline development patterns.
● Exposure to Data Quality Expectations
● Experience with monitoring, logging, and observability of Spark and Databricks workloads.
● Databricks certification (Associate or Professional) is a plus.






