Rivago Infotech Inc

Data Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for a Data Engineer in San Diego, CA, with a long-term contract. Key skills include Apache Spark, Spark SQL, Python, and AWS (EMR Serverless, S3, Redshift). Experience in fraud, risk, or compliance domains is preferred.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

Unknown

🗓️ - Date

May 1, 2026

🕒 - Duration

More than 6 months

🏝️ - Location

On-site

📄 - Contract

Unknown

🔒 - Security

Unknown

📍 - Location detailed

San Diego, CA

🧠 - Skills detailed

#Documentation #AWS (Amazon Web Services) #GCP (Google Cloud Platform) #Security #"ETL (Extract #Transform #Load)" #Spark SQL #Apache Spark #Compliance #Data Science #JSON (JavaScript Object Notation) #Data Governance #Data Quality #AWS EMR (Amazon Elastic MapReduce) #Spark (Apache Spark) #Monitoring #S3 (Amazon Simple Storage Service) #PySpark #Migration #Redshift #Data Lake #Data Engineering #AWS IAM (AWS Identity and Access Management) #Storage #Athena #ML (Machine Learning) #SQL (Structured Query Language) #Metadata #Python #Datasets #IAM (Identity and Access Management) #Data Documentation

Role description

Role : Data Engineer location : San Diego, CA 92129 (onsite) Duration: Long term Project Design, build, and performance-tune Apache Spark workloads using Spark SQL and PySpark for complex transformations (JSON/semi-structured data, nested structures, window functions, joins, aggregations). 1. Profile and optimize Spark jobs: partitioning, shuffles, join strategies, skew, memory/spill, and right-sized resource usage—especially on EMR Serverless—for large-scale and petabyte-scale data. 1. Support Customers and Monitor Pipelines with Strict SLA for Fixs and Re Instating Issues around the clock. 1. Implement reusable patterns for incremental loads, deduplication and CDC-style processing. 1. Build and maintain ETL/ELT on AWS EMR Serverless (Spark), with S3 as the data lake layer: partitioning, compression, external tables, and layouts that support fast Spark and downstream SQL. workloads: sort keys, distribution, and SQL patterns that fit S3 Spark Redshift flows. 1. Optimize cost and performance across Spark jobs, S3 storage, and Redshift (including retention and lifecycle thinking where relevant). 1. Produce end-to-end designs: pipeline topology, data models, staging vs curated layers, incremental strategies, and clear tradeoffs (freshness, cost, complexity, reliability). 1. Apply access controls for sensitive financial and user data (least privilege, row/column-level patterns where required). 1. Support data governance: metadata, documentation, and alignment with compliance expectations. 1. Implement data quality (validation rules, regex, null-safety) and monitoring/alerting with error handling for production pipelines. 1. Manage schema evolution and migrations with backward compatibility and risk reduction. 1. Partner with IRL Teams and ML/Data Science on feature-rich datasets; work with risk/compliance and platform teams. What we’re looking for 1. Strong Spark + Spark SQL + hands-on performance tuning (not only SQL writing). 1. Python for Spark/data engineering. 1. AWS: EMR Serverless, S3 (delta and data lake patterns), Redshift (SQL + tuning). 1. Ability to design pipelines and data models and communicate tradeoffs. 1. Familiarity with access control concepts for data platforms (AWS IAM, lake/warehouse permissions, RLS / column-level security as applicable). 1. Ownership of production systems, support, 24/7 monitoring and collaboration. Good to have 1. Fraud, risk, or compliance domains. 1. Athena,GCP and other S3-query engines alongside Spark. 1. Highly interactive or SLA-tight workloads support and monitoring on large data piplines. 1. Deeper Redshift ops (WLM, queues, workload patterns) alongside Spark.

Apply now Apply with DFH

← See all roles

Go to role

MRC Laboratory of Molecular Biology (LMB)

is hiring for a:

Rivago Infotech Inc

Data Engineer

Senior Investigator Scientist | Neurobiology | Dr Albert Cardona | LMB 2250

Senior Database Engineer (Onsite Role , Need Locals to DC)

Artificial Intelligence Consultant

Business Analyst- Healthcare Claims

Book a

chat

with us

Company