Jobs via Dice

Principal/Lead Data Engineer Contract W2

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for a Lead/Principal Data Engineer on a long-term W2 contract in Dallas, TX. Requires 15+ years of data engineering experience, expertise in Databricks, Scala, and Apache Spark, along with knowledge of Medallion architecture and AWS integrations.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

Unknown

🗓️ - Date

June 3, 2026

🕒 - Duration

Unknown

🏝️ - Location

On-site

📄 - Contract

W2 Contractor

🔒 - Security

Unknown

📍 - Location detailed

Dallas, TX

🧠 - Skills detailed

#Leadership #AWS (Amazon Web Services) #Data Architecture #Data Lineage #Security #Databricks #Spark (Apache Spark) #S3 (Amazon Simple Storage Service) #PySpark #IAM (Identity and Access Management) #Data Processing #Classification #Metadata #Airflow #Monitoring #REST (Representational State Transfer) #Python #Apache Spark #Data Quality #Automated Testing #Scala #Cloud #Kafka (Apache Kafka) #Data Science #DevOps #VPC (Virtual Private Cloud) #Data Engineering #Data Governance

Role description

Dice is the leading career destination for tech experts at every stage of their careers. Our client, ConnectedX, Inc., is seeking the following. Apply via Dice today! Role: Lead / Principal Data Engineer Duration: LongTerm W2 Contract Dallas, TX- Onsite Local Candidates Only Position summary We are seeking an experienced Lead or Principal Data Engineer to join a longterm W2 contract engagement based in Dallas, TX. This is an onsite role for local candidates who can provide handson technical leadership and own the design, implementation, and operational excellence of largescale data platforms. The ideal candidate has deep experience with Databricks and Scala, strong mastery of Spark performance tuning, and a proven track record building metadatadriven, governable data architectures (Medallion architecture preferred) that balance scalability and cost. Key responsibilities Architect and lead implementation of a Medallion data architecture that optimizes for scalability, performance, maintainability, and cost-efficiency on Databricks. Design and implement efficient ingestion pipelines, including handling sparse column ingestion patterns and change-data-capture (CDC) scenarios and edge cases. Lead Spark and Databricks performance optimization: analyze job profiles, optimize joins, shuffles, partitioning, caching, and resource configurations to reduce latency and cost. Build metadatadriven frameworks for pipeline orchestration, schema evolution, data quality checks, and automated recovery from failures. Implement and enforce data governance using Unity Catalog and other governance tools: access controls, lineage, classification, and auditability. Design resilient distributed systems with automated failure detection and recovery strategies; investigate and remediate distributed system failures and stability issues. Implement crossaccount AWS integrations securely and reliably (S3, IAM roles, KMS, VPC endpoints, Glue/Glue Catalog interoperability where applicable). Collaborate with data scientists, analytics, DevOps, and security teams to translate business requirements into performant data solutions and SLAs. Mentor engineers, conduct code and architecture reviews, and set best practices for Scala, Spark, and Databricks development. Create runbooks, monitoring dashboards, and operational playbooks to support 24x7 production reliability and incident response. Required Qualifications 15+ years of handson data engineering experience; 5+ years in a lead or principal role designing and operating production data platforms. Extensive experience with Databricks and Apache Spark, including production job tuning, cluster sizing, and cost optimization. Strong proficiency in Scala for data processing; experience with Python/PySpark is a plus. Deep understanding of Medallion architecture patterns (bronze/silver/gold layers) and how to implement them in cloud data platforms. Proven experience handling sparse column ingestion issues, schema drift, and CDC edge cases (Debezium/Kafka or vendor CDC solutions experience is a plus). Experience building metadatadriven frameworks for schema management, pipeline orchestration (Airflow, Databricks Jobs, or similar), and automated testing. Solid knowledge of data governance and security: Unity Catalog, IAM, RBAC, encryption at rest/in transit, and data lineage. Strong AWS experience: S3 lifecycle policies, crossaccount access, IAM role assumptions, KMS, VPC endpoints, and Glue/Glue Catalog integration. Demonstrated ability to design for distributed system resiliency and troubleshoot complex failures across clusters and networks. Excellent communication skills; experience working directly with stakeholders and leading technical discussions.

Apply now Apply with DFH

Jobs via Dice

Principal/Lead Data Engineer Contract W2

Data Scientists (SC Cleared)

Sr Equipment Engineer

Data Architect

Datacenter Lab Technician

Book a

chat

with us

Company