Lead Data Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Lead Data Engineer with a contract length of "unknown," offering a pay rate of "$X per hour." Required skills include Databricks, Apache Kafka, AWS services, and Python. A Bachelor's degree and 7+ years of data engineering experience are essential.
🌎 - Country
United States
πŸ’± - Currency
$ USD
-
πŸ’° - Day rate
-
πŸ—“οΈ - Date discovered
August 21, 2025
πŸ•’ - Project duration
Unknown
-
🏝️ - Location type
Unknown
-
πŸ“„ - Contract type
Unknown
-
πŸ”’ - Security clearance
Unknown
-
πŸ“ - Location detailed
Dallas, TX
-
🧠 - Skills detailed
#IAM (Identity and Access Management) #Code Reviews #AutoScaling #Storage #Data Engineering #Apache Kafka #"ETL (Extract #Transform #Load)" #GCP (Google Cloud Platform) #Leadership #Data Quality #Batch #Data Ingestion #Lambda (AWS Lambda) #Data Science #SQL (Structured Query Language) #AWS (Amazon Web Services) #Python #Data Processing #PySpark #VPC (Virtual Private Cloud) #Data Lake #Databricks #Cloud #S3 (Amazon Simple Storage Service) #Deployment #Security #Spark (Apache Spark) #Kafka (Apache Kafka) #Azure #Compliance #Data Catalog #EC2 #Scala #Data Architecture #Spark SQL #Data Pipeline #Data Governance #Computer Science #Delta Lake
Role description
Job Summary: As a Databricks Lead, you will be a critical member of our data engineering team, responsible for designing, developing, and optimizing our data pipelines and platforms on Databricks, primarily leveraging AWS services. You will play a key role in implementing robust data governance with Unity Catalog and ensuring cost-effective data solutions. This role requires a strong technical leader who can mentor junior engineers, drive best practices, and contribute hands-on to complex data challenges. Responsibilities: β€’ Databricks Platform Leadership: β€’ Lead the design, development, and deployment of large-scale data solutions on the Databricks platform. β€’ Establish and enforce best practices for Databricks usage, including notebook development, job orchestration, and cluster management. β€’ Stay abreast of the latest Databricks features and capabilities, recommending and implementing improvements. β€’ Data Ingestion and Streaming (Kafka): β€’ Architect and implement real-time and batch data ingestion pipelines using Apache Kafka for high-volume data streams. β€’ Integrate Kafka with Databricks for seamless data processing and analysis. β€’ Optimize Kafka consumers and producers for performance and reliability. β€’ Data Governance and Management (Unity Catalog): β€’ Implement and manage data governance policies and access controls using Databricks Unity Catalog. β€’ Define and enforce data cataloging, lineage, and security standards within the Databricks Lakehouse. β€’ Collaborate with data governance teams to ensure compliance and data quality. β€’ AWS Cloud Integration: β€’ Leverage various AWS services (S3, EC2, Lambda, Glue, etc.) to build a robust and scalable data infrastructure. β€’ Manage and optimize AWS resources for Databricks workloads. β€’ Ensure secure and compliant integration between Databricks and AWS. β€’ Cost Optimization: β€’ Proactively identify and implement strategies for cost optimization across Databricks and AWS resources. β€’ Monitor DBU consumption, cluster utilization, and storage costs, providing recommendations for efficiency gains. β€’ Implement autoscaling, auto-termination, and right-sizing strategies to minimize operational expenses. β€’ Technical Leadership & Mentoring: β€’ Provide technical guidance and mentorship to a team of data engineers. β€’ Conduct code reviews, promote coding standards, and foster a culture of continuous improvement. β€’ Lead technical discussions and decision-making for complex data engineering problems. β€’ Data Pipeline Development & Optimization: β€’ Develop, test, and maintain robust and efficient ETL/ELT pipelines using PySpark/Spark SQL. β€’ Optimize Spark jobs for performance, scalability, and resource utilization. β€’ Troubleshoot and resolve complex data pipeline issues. β€’ Collaboration: β€’ Work closely with data scientists, analysts, and other engineering teams to understand data requirements and deliver solutions. β€’ Communicate technical concepts effectively to both technical and non-technical stakeholders. Qualifications: β€’ Bachelor's or Master's degree in Computer Science, Data Engineering, or a related quantitative field. β€’ 7+ years of experience in data engineering, with at least 3+ years in a lead or senior role. β€’ Proven expertise in designing and implementing data solutions on Databricks. β€’ Strong hands-on experience with Apache Kafka for real-time data streaming. β€’ In-depth knowledge and practical experience with Databricks Unity Catalog for data governance and access control. β€’ Solid understanding of AWS cloud services and their application in data architectures (S3, EC2, Lambda, VPC, IAM, etc.). β€’ Demonstrated ability to optimize cloud resource usage and implement cost-saving strategies. β€’ Proficiency in Python and Spark (PySpark/Spark SQL) for data processing and analysis. β€’ Experience with Delta Lake and other modern data lake formats. β€’ Excellent problem-solving, analytical, and communication skills. Added Advantage (Bonus Skills): β€’ Experience with Apache Flink for stream processing. β€’ Databricks certifications. β€’ Experience with CI/CD pipelines for Databricks deployments. β€’ Knowledge of other cloud platforms (Azure, GCP) is a plus.