

Jobs via Dice
Principal/Lead Data Engineer Contract W2
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Lead/Principal Data Engineer on a long-term W2 contract in Dallas, TX. Requires 15+ years of data engineering experience, expertise in Databricks, Scala, and Apache Spark, along with knowledge of Medallion architecture and AWS integrations.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
June 3, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
On-site
-
📄 - Contract
W2 Contractor
-
🔒 - Security
Unknown
-
📍 - Location detailed
Dallas, TX
-
🧠 - Skills detailed
#Leadership #AWS (Amazon Web Services) #Data Architecture #Data Lineage #Security #Databricks #Spark (Apache Spark) #S3 (Amazon Simple Storage Service) #PySpark #IAM (Identity and Access Management) #Data Processing #Classification #Metadata #Airflow #Monitoring #REST (Representational State Transfer) #Python #Apache Spark #Data Quality #Automated Testing #Scala #Cloud #Kafka (Apache Kafka) #Data Science #DevOps #VPC (Virtual Private Cloud) #Data Engineering #Data Governance
Role description
Dice is the leading career destination for tech experts at every stage of their careers. Our client, ConnectedX, Inc., is seeking the following. Apply via Dice today!
Role: Lead / Principal Data Engineer
Duration: LongTerm W2 Contract
Dallas, TX- Onsite Local Candidates Only
Position summary
We are seeking an experienced Lead or Principal Data Engineer to join a longterm W2 contract engagement based in Dallas, TX.
This is an onsite role for local candidates who can provide handson technical leadership and own the design, implementation, and operational excellence of largescale data platforms.
The ideal candidate has deep experience with Databricks and Scala, strong mastery of Spark performance tuning, and a proven track record building metadatadriven, governable data architectures (Medallion architecture preferred) that balance scalability and cost.
Key responsibilities
Architect and lead implementation of a Medallion data architecture that optimizes for scalability, performance, maintainability, and cost-efficiency on Databricks.
Design and implement efficient ingestion pipelines, including handling sparse column ingestion patterns and change-data-capture (CDC) scenarios and edge cases.
Lead Spark and Databricks performance optimization: analyze job profiles, optimize joins, shuffles, partitioning, caching, and resource configurations to reduce latency and cost.
Build metadatadriven frameworks for pipeline orchestration, schema evolution, data quality checks, and automated recovery from failures.
Implement and enforce data governance using Unity Catalog and other governance tools: access controls, lineage, classification, and auditability.
Design resilient distributed systems with automated failure detection and recovery strategies; investigate and remediate distributed system failures and stability issues.
Implement crossaccount AWS integrations securely and reliably (S3, IAM roles, KMS, VPC endpoints, Glue/Glue Catalog interoperability where applicable).
Collaborate with data scientists, analytics, DevOps, and security teams to translate business requirements into performant data solutions and SLAs.
Mentor engineers, conduct code and architecture reviews, and set best practices for Scala, Spark, and Databricks development.
Create runbooks, monitoring dashboards, and operational playbooks to support 24x7 production reliability and incident response.
Required Qualifications
15+ years of handson data engineering experience; 5+ years in a lead or principal role designing and operating production data platforms.
Extensive experience with Databricks and Apache Spark, including production job tuning, cluster sizing, and cost optimization.
Strong proficiency in Scala for data processing; experience with Python/PySpark is a plus.
Deep understanding of Medallion architecture patterns (bronze/silver/gold layers) and how to implement them in cloud data platforms.
Proven experience handling sparse column ingestion issues, schema drift, and CDC edge cases (Debezium/Kafka or vendor CDC solutions experience is a plus).
Experience building metadatadriven frameworks for schema management, pipeline orchestration (Airflow, Databricks Jobs, or similar), and automated testing.
Solid knowledge of data governance and security: Unity Catalog, IAM, RBAC, encryption at rest/in transit, and data lineage.
Strong AWS experience: S3 lifecycle policies, crossaccount access, IAM role assumptions, KMS, VPC endpoints, and Glue/Glue Catalog integration.
Demonstrated ability to design for distributed system resiliency and troubleshoot complex failures across clusters and networks.
Excellent communication skills; experience working directly with stakeholders and leading technical discussions.
Dice is the leading career destination for tech experts at every stage of their careers. Our client, ConnectedX, Inc., is seeking the following. Apply via Dice today!
Role: Lead / Principal Data Engineer
Duration: LongTerm W2 Contract
Dallas, TX- Onsite Local Candidates Only
Position summary
We are seeking an experienced Lead or Principal Data Engineer to join a longterm W2 contract engagement based in Dallas, TX.
This is an onsite role for local candidates who can provide handson technical leadership and own the design, implementation, and operational excellence of largescale data platforms.
The ideal candidate has deep experience with Databricks and Scala, strong mastery of Spark performance tuning, and a proven track record building metadatadriven, governable data architectures (Medallion architecture preferred) that balance scalability and cost.
Key responsibilities
Architect and lead implementation of a Medallion data architecture that optimizes for scalability, performance, maintainability, and cost-efficiency on Databricks.
Design and implement efficient ingestion pipelines, including handling sparse column ingestion patterns and change-data-capture (CDC) scenarios and edge cases.
Lead Spark and Databricks performance optimization: analyze job profiles, optimize joins, shuffles, partitioning, caching, and resource configurations to reduce latency and cost.
Build metadatadriven frameworks for pipeline orchestration, schema evolution, data quality checks, and automated recovery from failures.
Implement and enforce data governance using Unity Catalog and other governance tools: access controls, lineage, classification, and auditability.
Design resilient distributed systems with automated failure detection and recovery strategies; investigate and remediate distributed system failures and stability issues.
Implement crossaccount AWS integrations securely and reliably (S3, IAM roles, KMS, VPC endpoints, Glue/Glue Catalog interoperability where applicable).
Collaborate with data scientists, analytics, DevOps, and security teams to translate business requirements into performant data solutions and SLAs.
Mentor engineers, conduct code and architecture reviews, and set best practices for Scala, Spark, and Databricks development.
Create runbooks, monitoring dashboards, and operational playbooks to support 24x7 production reliability and incident response.
Required Qualifications
15+ years of handson data engineering experience; 5+ years in a lead or principal role designing and operating production data platforms.
Extensive experience with Databricks and Apache Spark, including production job tuning, cluster sizing, and cost optimization.
Strong proficiency in Scala for data processing; experience with Python/PySpark is a plus.
Deep understanding of Medallion architecture patterns (bronze/silver/gold layers) and how to implement them in cloud data platforms.
Proven experience handling sparse column ingestion issues, schema drift, and CDC edge cases (Debezium/Kafka or vendor CDC solutions experience is a plus).
Experience building metadatadriven frameworks for schema management, pipeline orchestration (Airflow, Databricks Jobs, or similar), and automated testing.
Solid knowledge of data governance and security: Unity Catalog, IAM, RBAC, encryption at rest/in transit, and data lineage.
Strong AWS experience: S3 lifecycle policies, crossaccount access, IAM role assumptions, KMS, VPC endpoints, and Glue/Glue Catalog integration.
Demonstrated ability to design for distributed system resiliency and troubleshoot complex failures across clusters and networks.
Excellent communication skills; experience working directly with stakeholders and leading technical discussions.






