PL/SQL PY Spark Developer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior Data Engineer (PL/SQL PY Spark Developer) with a contract length of "unknown," offering a pay rate of "unknown." Located in Houston, TX, it requires 10+ years of back-end development experience, expert PL/SQL, and 7+ years of PySpark.
🌎 - Country
United States
πŸ’± - Currency
$ USD
-
πŸ’° - Day rate
-
πŸ—“οΈ - Date discovered
August 28, 2025
πŸ•’ - Project duration
Unknown
-
🏝️ - Location type
Hybrid
-
πŸ“„ - Contract type
Unknown
-
πŸ”’ - Security clearance
Unknown
-
πŸ“ - Location detailed
Houston, TX
-
🧠 - Skills detailed
#Schema Design #Scala #Data Warehouse #Delta Lake #Debugging #Data Quality #Airflow #Azure #Data Lake #SQL (Structured Query Language) #Migration #Data Management #"ETL (Extract #Transform #Load)" #Spark (Apache Spark) #Azure cloud #Quality Assurance #Leadership #Oracle #PySpark #Databricks #Data Integration #Documentation #Data Modeling #MS SQL (Microsoft SQL Server) #Cloud #Data Engineering #EDW (Enterprise Data Warehouse) #GCP (Google Cloud Platform) #Base #AWS (Amazon Web Services) #SQL Server #BI (Business Intelligence) #Apache Airflow #Data Pipeline #Data Processing
Role description
We are seeking a Senior Data Engineer to lead a critical pilot project focused on modernizing our enterprise customer data consolidation from SQL Server to our Databricks-based data lake. This role combines traditional Oracle/PL/SQL expertise with modern PySpark development to support our East region initiative across three billing platforms. ## Key Responsibilities Primary Project (Enterprise Customer Table Pilot) β€’ Design and implement data consolidation solutions moving from SQL Server to Databricks data lake β€’ Work with business stakeholders and cross-functional teams to define enterprise customer table specifications β€’ Determine optimal approach for data processing - either within existing Oracle systems or in the data lake environment β€’ Collaborate with enterprise data lake team to leverage existing PySpark resources and infrastructure β€’ Produce modified data inputs for the new enterprise customer table consolidation process β€’ Ensure data quality and consistency across three different billing platform feeds Secondary Pilot (Code Conversion) β€’ Convert existing Oracle/PL/SQL code to PySpark for data lake processing β€’ Evaluate feasibility of migrating current data warehouse operations to PySpark β€’ Provide proof-of-concept for future large-scale migration initiatives β€’ Test and validate converted code performance in the data lake environment Team Development & Knowledge Transfer β€’ Train and mentor existing PL/SQL team members on PySpark technologies β€’ Work independently with minimal supervision while collaborating effectively with stakeholders β€’ Provide technical leadership and architectural guidance for data processing solutions β€’ Document best practices and create knowledge base for future PySpark implementations Required Technical Skills Core Requirements β€’ β€’ β€’ 10+ years β€’ β€’ of back-end development experience β€’ β€’ β€’ Expert-level PL/SQL and Oracle β€’ β€’ database development β€’ β€’ β€’ 7+ years of PySpark β€’ β€’ experience with data lake implementations β€’ Strong experience with β€’ β€’ Databricks β€’ β€’ platform β€’ Proficiency in β€’ β€’ data modeling and schema design β€’ β€’ β€’ Experience with β€’ β€’ data pipeline development β€’ β€’ β€’ Custom data warehouse development experience Preferred Technologies β€’ β€’ β€’ Delta Lake β€’ β€’ experience β€’ β€’ β€’ Apache Airflow β€’ β€’ for job scheduling and pipeline orchestration β€’ β€’ β€’ GCP (Google Cloud Platform) β€’ β€’ - our primary cloud environment β€’ β€’ β€’ AWS or Azure β€’ β€’ cloud experience (transferable) β€’ Data warehouse and ETL/ELT processes β€’ Experience with enterprise-scale data integration projects Technical Environment β€’ β€’ β€’ Current Stack β€’ β€’ : Oracle-based custom data warehouse, PL/SQL processing β€’ β€’ β€’ Target Stack β€’ β€’ : Databricks data lake, PySpark, Delta Lake, GCP β€’ β€’ β€’ Integration Points β€’ β€’ : Three separate billing systems, SQL Server consolidation layer β€’ β€’ β€’ Data Volume β€’ β€’ : Enterprise-scale customer data across multiple regions Business Context & Domain Knowledge β€’ Support for multiple regions: East, Texas (largest), and Panera β€’ Integration challenges across three separate billing systems with different data formats β€’ Enterprise-level customer data consolidation and reporting requirements β€’ Migration from legacy SQL Server data warehouse to modern data lake architecture β€’ Sales reporting and sales count reporting focus β€’ Experience with acquired company data integration challenges preferred Required Competencies Technical Leadership β€’ Ability to analyze existing systems and recommend architectural improvements β€’ Experience designing scalable data processing solutions β€’ Strong debugging and troubleshooting skills across multiple platforms β€’ Code review and quality assurance capabilities Business Acumen β€’ Understanding of enterprise data warehouse concepts β€’ Experience with customer data management and consolidation β€’ Knowledge of sales reporting and business intelligence requirements β€’ Familiarity with multi-system integration challenges Communication & Collaboration β€’ Excellent stakeholder management skills β€’ Ability to translate technical concepts to business users β€’ Experience working with cross-functional teams β€’ Strong documentation and knowledge sharing abilities Work Arrangement β€’ β€’ β€’ Hybrid role β€’ β€’ : 3 days on-site (Monday, Tuesday, Thursday) in Houston, TX β€’ 2 days remote work β€’ Candidates willing to relocate to Houston will be considered β€’ β€’ β€’ Office Location β€’ β€’ : Houston, Texas (specific location to be provided)