

Soho Square Solutions
Senior Data Engineer
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior Data Engineer, offering a contract of unspecified length with a pay rate of "unknown". The position requires 7+ years of data engineering experience, expertise in Databricks and Medallion Architecture, and familiarity with enterprise systems like SAP and Salesforce.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
680
-
ποΈ - Date
June 3, 2026
π - Duration
Unknown
-
ποΈ - Location
Unknown
-
π - Contract
Unknown
-
π - Security
Unknown
-
π - Location detailed
United States
-
π§ - Skills detailed
#AWS (Amazon Web Services) #Data Lineage #Security #Databricks #SQL (Structured Query Language) #Spark (Apache Spark) #PySpark #Compliance #Metadata #"ETL (Extract #Transform #Load)" #Monitoring #SAP #Python #Data Pipeline #Scala #Data Quality #Azure #Cloud #Datasets #Data Engineering #Libraries
Role description
We are seeking a Senior Data Engineer with deep expertise in the Databricks ecosystem and Medallion Architecture to lead a critical regulatory inspection readiness data initiative. In this role, you will own the end-to-end design and implementation of scalable data pipelines built to ingest, parse, and transform vast volumes of unstructured quality and operational documents (PDFs, Word files, images, Excel sheets) into business-ready, structured Gold-layer datasets. The ideal candidate brings a proven track record of handling unstructured data pipelines natively within Databricks and has experience operating within enterprise environments utilizing SAP, Salesforce, or TrackWise.
Core Responsibilities:
β’ Pipeline Architecture: Architect, build, and maintain production-grade data pipelines utilizing Databricks and Medallion Architecture (Bronze -> Silver -> Gold).
β’ Unstructured Data Engineering: Design robust frameworks to ingest and transform unstructured data formats (PDFs, images, Word docs, text logs) from enterprise source systems into structured, query-ready Gold-layer assets.
β’ Regulatory Data Curation: Partner with Quality and Compliance teams to model data specifically optimized for rapid audit retrieval and regulatory inspection readiness.
β’ Framework Development: Build reusable data quality validation frameworks, monitoring rules, and error-handling mechanisms across all pipeline stages.
β’ Platform Governance: Leverage Databricks features (Unity Catalog, Workflows) to ensure data lineage, security compliance, and access control across dev and prod workspaces.
Qualifications-
Required:
β’ 7+ years of hands-on data engineering experience, with a heavy focus on Python, SQL, and PySpark.
β’ 3+ years of production experience designing and deploying Medallion Architecture frameworks natively inside Databricks.
β’ Demonstrated, real-world experience building extraction and parsing pipelines for unstructured data (extracting text/metadata from PDFs, images, docs).
β’ Proven ability to build highly reliable data transformation frameworks from the ground up.
Preferred:
β’ Technical familiarity or integration experience with enterprise systems: TrackWise, SAP, and Salesforce.
β’ Experience working within highly regulated industries (Life Sciences, Pharma, Biotech, or Medical Devices) under GxP or strict compliance standards.
β’ Experience with document parsing libraries or cloud OCR tools (e.g., Azure Document Intelligence, AWS Textract, Unstructured.io).
We are seeking a Senior Data Engineer with deep expertise in the Databricks ecosystem and Medallion Architecture to lead a critical regulatory inspection readiness data initiative. In this role, you will own the end-to-end design and implementation of scalable data pipelines built to ingest, parse, and transform vast volumes of unstructured quality and operational documents (PDFs, Word files, images, Excel sheets) into business-ready, structured Gold-layer datasets. The ideal candidate brings a proven track record of handling unstructured data pipelines natively within Databricks and has experience operating within enterprise environments utilizing SAP, Salesforce, or TrackWise.
Core Responsibilities:
β’ Pipeline Architecture: Architect, build, and maintain production-grade data pipelines utilizing Databricks and Medallion Architecture (Bronze -> Silver -> Gold).
β’ Unstructured Data Engineering: Design robust frameworks to ingest and transform unstructured data formats (PDFs, images, Word docs, text logs) from enterprise source systems into structured, query-ready Gold-layer assets.
β’ Regulatory Data Curation: Partner with Quality and Compliance teams to model data specifically optimized for rapid audit retrieval and regulatory inspection readiness.
β’ Framework Development: Build reusable data quality validation frameworks, monitoring rules, and error-handling mechanisms across all pipeline stages.
β’ Platform Governance: Leverage Databricks features (Unity Catalog, Workflows) to ensure data lineage, security compliance, and access control across dev and prod workspaces.
Qualifications-
Required:
β’ 7+ years of hands-on data engineering experience, with a heavy focus on Python, SQL, and PySpark.
β’ 3+ years of production experience designing and deploying Medallion Architecture frameworks natively inside Databricks.
β’ Demonstrated, real-world experience building extraction and parsing pipelines for unstructured data (extracting text/metadata from PDFs, images, docs).
β’ Proven ability to build highly reliable data transformation frameworks from the ground up.
Preferred:
β’ Technical familiarity or integration experience with enterprise systems: TrackWise, SAP, and Salesforce.
β’ Experience working within highly regulated industries (Life Sciences, Pharma, Biotech, or Medical Devices) under GxP or strict compliance standards.
β’ Experience with document parsing libraries or cloud OCR tools (e.g., Azure Document Intelligence, AWS Textract, Unstructured.io).






