Senior PySpark Engineer – AWS/EMR

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for a Senior PySpark Engineer – AWS/EMR, offering a 6-month remote contract (EST preferred). Key requirements include 8–10 years of experience, proficiency in PySpark and Databricks, and familiarity with GxP-compliant environments.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

Unknown

🗓️ - Date discovered

April 23, 2025

🕒 - Project duration

More than 6 months

🏝️ - Location type

Remote

📄 - Contract type

Fixed Term

🔒 - Security clearance

Unknown

📍 - Location detailed

Boston, MA

🧠 - Skills detailed

#Data Engineering #Data Processing #PySpark #"ETL (Extract #Transform #Load)" #AI (Artificial Intelligence) #ML (Machine Learning) #Databricks #Python #Consulting #Distributed Computing #Scala #Deployment #Programming #Data Integrity #AWS EMR (Amazon Elastic MapReduce) #Spark (Apache Spark) #Consul #DevOps #AWS (Amazon Web Services) #Apache Spark #Data Pipeline #Storage #Compliance #Code Reviews #Cloud

Role description

Job Title: Senior PySpark Engineer – AWS/EMR

Location: Remote (EST Time Zone Preferred)- 5 Days a month in the Office

Duration: 6 Months Contract

About BigRio:

BigRio is a remote-based, technology consulting firm with headquarters in Boston, MA. We deliver software solutions ranging from custom development and software implementation to data analytics and machine learning/AI integrations. As a one-stop shop, we attract clients from a variety of industries due to our proven ability to deliver cutting-edge, cost-effective software solutions.

Job Overview:

We are seeking a Senior PySpark Engineer with strong hands-on experience in building distributed data pipelines using Apache Spark on AWS EMR. The ideal candidate is proficient in Python, has worked with Databricks, and has a solid understanding of GxP-compliant environments. This is a coding-heavy role — not DevOps or AWS administration — where you’ll contribute directly to the architecture and development of robust data solutions in a highly regulated, cloud-native environment.

Key Responsibilities:

• Design, develop, and maintain distributed ETL data pipelines using PySpark on AWS EMR

• Work within a GxP-compliant environment, ensuring data integrity and regulatory alignment

• Write clean, scalable, and efficient PySpark code for large-scale data processing

• Utilize AWS cloud services for pipeline orchestration, compute, and storage

• Collaborate closely with cross-functional teams to deliver end-to-end data solutions

• Participate in code reviews, testing, and deployment of pipeline components

• Ensure performance optimization, fault tolerance, and scalability of data workflows

Required Qualifications:

• 8–10 years of experience in software or data engineering with a focus on distributed systems

• Deep hands-on experience with Apache Spark, PySpark, and AWS (especially EMR)

• Experience building pipelines using Databricks required.

• Strong programming skills in Python

• Solid understanding of cloud-native architectures

• Familiarity with GxP compliance and working in regulated data environments

• Proven ability to independently design and develop data pipelines (not a DevOps/AWS admin role)

• Experience with distributed computing and high-volume ETL pipelines

Equal Opportunity Statement:

BigRio is an equal-opportunity employer. We prohibit discrimination and harassment of any kind based on race, religion, national origin, sex, sexual orientation, gender identity, age, pregnancy, status as a qualified individual with disability, protected veteran status, or other protected characteristic as outlined by federal, state, or local laws. BigRio makes hiring decisions based solely on qualifications, merit, and business needs at the time. All qualified applicants will receive equal consideration for employment.