

Smart IT Frame LLC
Data Engineering - Apache Spark, PySpark, Scala, ETL and SQL Expertise
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is a Data Engineering position focused on Apache Spark, PySpark, Scala, ETL, and SQL expertise, offered as a 6-month contract-to-hire in Pittsburgh (Hybrid). Requires 8 years of experience, financial domain knowledge, and skills in AWS services.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
March 5, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Hybrid
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Pittsburgh, PA
-
🧠 - Skills detailed
#PySpark #Data Quality #AWS EMR (Amazon Elastic MapReduce) #Hadoop #Cloud #Data Engineering #Apache Spark #AWS (Amazon Web Services) #Amazon ECS (Amazon Elastic Container Service) #Data Pipeline #EC2 #Python #Spark (Apache Spark) #S3 (Amazon Simple Storage Service) #Microservices #Data Storage #Scala #Data Processing #"ETL (Extract #Transform #Load)" #HDFS (Hadoop Distributed File System) #Data Access #SQL (Structured Query Language) #Java #Storage
Role description
Job Title: Data Engineering - Apache Spark, PySpark, Scala, ETL and SQL expertise
Location: Pittsburgh – Pennsylvania (Hybrid – 4 days in a week)
Employment Type: 6 Contract-To-Hire
About Smart IT Frame:
At Smart IT Frame, we connect top talent with leading organizations across the USA. With over a decade of staffing excellence, we specialize in IT, healthcare, and professional roles, empowering both clients and candidates to grow together.
• Senior Developer 8 years exp who has advanced handson skill in distributed data processing using pyspark scala.
• Worked with data processing ETL transformation logic Strong SQL.
• Prior experience in financial domain and entityidentity resolution is preferred
• Design and develop scalable data pipelines using Apache Spark and PySpark ensuring high performance and efficiency
• Optimize Spark jobs and clusters for performance tuning resource utilization and cost efficiency
• Implement and manage data storage solutions on Hadoop Distributed File System HDFS and cloud storage services like Amazon S3
• Utilize Scala or Java to write and optimize core data processing logic complex transformations and highly performant backend services
• Ensure strict data quality governance and validation checks are integrated into all data pipelines to maintain accuracy and reliability of the data ecosystem
• Deploy manage and scale data applications and services using various AWS compute resources including Amazon EC2 instances AWS EMR clusters and containerized environments via Amazon ECS
• Design and develop robust highperformance RESTful APIs and microservices using ScalaJava to facilitate realtime data access and transactional services
Mandatory Skills: ETL Concepts, Python, Scala
Job Title: Data Engineering - Apache Spark, PySpark, Scala, ETL and SQL expertise
Location: Pittsburgh – Pennsylvania (Hybrid – 4 days in a week)
Employment Type: 6 Contract-To-Hire
About Smart IT Frame:
At Smart IT Frame, we connect top talent with leading organizations across the USA. With over a decade of staffing excellence, we specialize in IT, healthcare, and professional roles, empowering both clients and candidates to grow together.
• Senior Developer 8 years exp who has advanced handson skill in distributed data processing using pyspark scala.
• Worked with data processing ETL transformation logic Strong SQL.
• Prior experience in financial domain and entityidentity resolution is preferred
• Design and develop scalable data pipelines using Apache Spark and PySpark ensuring high performance and efficiency
• Optimize Spark jobs and clusters for performance tuning resource utilization and cost efficiency
• Implement and manage data storage solutions on Hadoop Distributed File System HDFS and cloud storage services like Amazon S3
• Utilize Scala or Java to write and optimize core data processing logic complex transformations and highly performant backend services
• Ensure strict data quality governance and validation checks are integrated into all data pipelines to maintain accuracy and reliability of the data ecosystem
• Deploy manage and scale data applications and services using various AWS compute resources including Amazon EC2 instances AWS EMR clusters and containerized environments via Amazon ECS
• Design and develop robust highperformance RESTful APIs and microservices using ScalaJava to facilitate realtime data access and transactional services
Mandatory Skills: ETL Concepts, Python, Scala






