

Rezolve Ai
Performance Test Data Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Performance Test Data Engineer with a contract length of "unknown" and a pay rate of "$X per hour." Key skills include strong data QA/testing, SQL, Python, and experience with AWS or Azure data lakes.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
480
-
🗓️ - Date
March 22, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Roseville, CA
-
🧠 - Skills detailed
#Apache Spark #PySpark #Athena #Data Reconciliation #Data Warehouse #Spark (Apache Spark) #Azure ADLS (Azure Data Lake Storage) #Airflow #Kafka (Apache Kafka) #SQL (Structured Query Language) #Storage #Delta Lake #"ETL (Extract #Transform #Load)" #HDFS (Hadoop Distributed File System) #Data Lake #Pytest #GIT #Batch #Data Modeling #AWS (Amazon Web Services) #Automation #Data Processing #Data Engineering #ADLS (Azure Data Lake Storage) #Data Quality #Data Pipeline #Data Ingestion #Databricks #AWS S3 (Amazon Simple Storage Service) #S3 (Amazon Simple Storage Service) #ADF (Azure Data Factory) #Python #Cloud #Data Accuracy #Data Governance #Programming #Datasets #Azure
Role description
Job Description: Data Platform Engineer (QA + Storage Focus)
Role Overview
We are looking for a Data Platform Engineer with strong QA and Data Validation experience to support large-scale data platforms. The ideal candidate will have hands-on experience in testing data pipelines, validating data lakes/storage systems, and ensuring data quality, accuracy, and performance across distributed environments.
Key Responsibilities
• Design, develop, and execute data validation and QA test strategies for ETL/ELT pipelines
• Perform end-to-end data validation between source systems and target data platforms (Data Lake / Data Warehouse)
• Validate large-scale datasets (millions/billions of records) using SQL, Python, and PySpark
• Perform file-level and storage validation across data lakes (S3 / ADLS / HDFS)
• File count validation
• Schema validation
• Partition validation
• Data completeness checks
• Test and validate data ingestion pipelines (batch & streaming)
• Validate data across Bronze / Silver / Gold layers (Medallion architecture)
• Perform data reconciliation and consistency checks across multiple systems
• Develop and maintain automated data validation frameworks using Python (PyTest or similar)
• Implement and monitor data quality checks (nulls, duplicates, referential integrity)
• Validate data formats such as Parquet, ORC, Delta Lake
• Conduct performance testing of data pipelines and queries (Spark / SQL)
• Analyze and validate data processing performance, latency, and throughput
• Collaborate with Data Engineers to identify and fix data issues and optimize pipelines
Required Skills
Data QA / Testing
• Strong experience in ETL/ELT testing and data validation
• Expertise in SQL for data validation and reconciliation
• Experience with test case design, execution, and defect tracking
• Knowledge of data quality frameworks and validation techniques
Data Engineering Knowledge
• Understanding of data pipelines (ADF / Airflow / Glue / Databricks)
• Experience with PySpark / Apache Spark (basic to intermediate)
• Familiarity with data modeling and transformations
Storage / Data Lake Validation (MANDATORY)
• Hands-on experience with Data Lakes (AWS S3 / Azure ADLS / HDFS)
• Strong knowledge of:
• File-based validation
• Partitioning strategies
• Schema evolution
• Experience validating Parquet / ORC / Delta Lake datasets
Programming & Tools
• Python (for automation/testing)
• SQL (strong)
• Experience with PyTest / automation frameworks
• Git / CI-CD basics
Cloud Platforms (Any One)
• AWS (S3, Glue, Athena) OR
• Azure (ADLS, ADF, Databricks)
Nice to Have
• Experience with Great Expectations / Deequ (data quality tools)
• Knowledge of Kafka / streaming validation
• Experience with Delta Lake features (time travel, versioning)
• Exposure to data governance tools (Glue Catalog, Unity Catalog)
Ideal Candidate Profile
• Strong Data Engineer with QA/testing experience
• Hands-on with data validation + storage systems
• Comfortable working with large-scale distributed data platforms
• Detail-oriented with a focus on data accuracy, quality, and performance
Job Description: Data Platform Engineer (QA + Storage Focus)
Role Overview
We are looking for a Data Platform Engineer with strong QA and Data Validation experience to support large-scale data platforms. The ideal candidate will have hands-on experience in testing data pipelines, validating data lakes/storage systems, and ensuring data quality, accuracy, and performance across distributed environments.
Key Responsibilities
• Design, develop, and execute data validation and QA test strategies for ETL/ELT pipelines
• Perform end-to-end data validation between source systems and target data platforms (Data Lake / Data Warehouse)
• Validate large-scale datasets (millions/billions of records) using SQL, Python, and PySpark
• Perform file-level and storage validation across data lakes (S3 / ADLS / HDFS)
• File count validation
• Schema validation
• Partition validation
• Data completeness checks
• Test and validate data ingestion pipelines (batch & streaming)
• Validate data across Bronze / Silver / Gold layers (Medallion architecture)
• Perform data reconciliation and consistency checks across multiple systems
• Develop and maintain automated data validation frameworks using Python (PyTest or similar)
• Implement and monitor data quality checks (nulls, duplicates, referential integrity)
• Validate data formats such as Parquet, ORC, Delta Lake
• Conduct performance testing of data pipelines and queries (Spark / SQL)
• Analyze and validate data processing performance, latency, and throughput
• Collaborate with Data Engineers to identify and fix data issues and optimize pipelines
Required Skills
Data QA / Testing
• Strong experience in ETL/ELT testing and data validation
• Expertise in SQL for data validation and reconciliation
• Experience with test case design, execution, and defect tracking
• Knowledge of data quality frameworks and validation techniques
Data Engineering Knowledge
• Understanding of data pipelines (ADF / Airflow / Glue / Databricks)
• Experience with PySpark / Apache Spark (basic to intermediate)
• Familiarity with data modeling and transformations
Storage / Data Lake Validation (MANDATORY)
• Hands-on experience with Data Lakes (AWS S3 / Azure ADLS / HDFS)
• Strong knowledge of:
• File-based validation
• Partitioning strategies
• Schema evolution
• Experience validating Parquet / ORC / Delta Lake datasets
Programming & Tools
• Python (for automation/testing)
• SQL (strong)
• Experience with PyTest / automation frameworks
• Git / CI-CD basics
Cloud Platforms (Any One)
• AWS (S3, Glue, Athena) OR
• Azure (ADLS, ADF, Databricks)
Nice to Have
• Experience with Great Expectations / Deequ (data quality tools)
• Knowledge of Kafka / streaming validation
• Experience with Delta Lake features (time travel, versioning)
• Exposure to data governance tools (Glue Catalog, Unity Catalog)
Ideal Candidate Profile
• Strong Data Engineer with QA/testing experience
• Hands-on with data validation + storage systems
• Comfortable working with large-scale distributed data platforms
• Detail-oriented with a focus on data accuracy, quality, and performance






