Rivago Infotech Inc

Data Architect (With Open Shift)

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Architect (Google Cloud) with a contract length of "unknown" and a pay rate of "unknown." Located in Dallas, TX or Charlotte, NC (Hybrid), it requires 10-14 years of experience, expertise in GCP, OCP, and PySpark, and relevant certifications.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
May 12, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Hybrid
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Dallas, TX
-
🧠 - Skills detailed
#Dataflow #Migration #Storage #Logging #Monitoring #Data Lake #Apache Beam #Strategy #Hadoop #Datasets #Data Management #Data Governance #Batch #Data Processing #Data Strategy #GIT #Metadata #Data Pipeline #Schema Design #BigQuery #Security #Clustering #Airflow #Cloud #GCP (Google Cloud Platform) #Data Lineage #Big Data #Spark (Apache Spark) #Observability #Data Engineering #Programming #AI (Artificial Intelligence) #Deployment #DevOps #Data Architecture #BI (Business Intelligence) #Containers #Pig #Data Quality #"ETL (Extract #Transform #Load)" #Scala #Computer Science #HDFS (Hadoop Distributed File System) #IAM (Identity and Access Management) #Trend Analysis #Sqoop (Apache Sqoop) #PySpark #Data Catalog #Compliance #Data Ingestion #Data Warehouse #SQL (Structured Query Language) #Python #Automation
Role description
Role: Google Cloud Data Architect – IAM Data Modernization Location: Dallas, TX / Charlotte, NC (Hybrid – 4 days office) Highly Preferred OCP exp Project/Program Identity & Access Management (IAM) Data Modernization – migration of an on‑premises SQL data warehouse to a target‑state Data Lake on Google Cloud (GCP), enabling metrics & reporting, advanced analytics, and GenAI use cases (natural language querying, accelerated summarization, cross‑domain trend analysis) leveraging PySpark‑based processing, cloud‑native DevOps CI/CD pipelines, and containerized deployments on OpenShift (OCP) to deliver scalable, secure, and high‑performance data solutions. About Program/Project The IAM Data Modernization project involves migrating an on-premises SQL data warehouse to a target state Data Lake in GCP cloud environment. Key highlights include: • Integration Scope: 30+ source system data ingestions and multiple downstream integrations • Capabilities: Metrics, reporting, and Gen AI use cases with natural language querying, advanced pattern/trend analysis, faster summarizations, and cross-domain metric monitoring • Benefits: • Scalability and access to advanced cloud functionality • Highly available and performant semantic layer with historical data support • Unified data strategy for executive reporting, analytics, and Gen AI across cyber domains This modernization establishes a single source of truth for enterprise-wide data-driven decision-making. Required Skills DevOps / CI‑CD • Experience implementing CI/CD pipelines for data and analytics workloads • Familiarity with Git‑based source control, build automation, and deployment strategies Containers & Platform • Experience with OpenShift Container Platform (OCP) for deploying data workloads and services • Understanding of containerized architecture, scaling, and environment management • Proven ability to build CI/CD pipelines for data and infrastructure workloads • Experience managing secrets securely using GCP Secret Manager • Ownership of observability, SLOs, dashboards, alerts, and runbooks • Proficiency in logging, monitoring, and alerting for data pipelines and platform reliability Big Data & Processing • Hands‑on experience with PySpark for ETL/ELT, data transformation, and performance optimization • Solid understanding of distributed data processing concepts Data & Cloud Architecture • Strong experience designing data platforms on Google Cloud Platform (GCP) • Experience with Data Lakes, data warehousing, and large‑scale migration programs Data Lake Architecture & Storage • Proven experience designing and implementing data lake architectures (e.g., Bronze/Silver/Gold or layered models). • Strong knowledge of Cloud Storage (GCS) design, including bucket layout, naming conventions, lifecycle policies, and access controls · Experience with Hadoop/HDFS architecture, distributed file systems, and data locality principles • Hands-on experience with columnar data formats (Parquet, Avro, ORC) and compression techniques • Expertise in partitioning strategies, backfills, and large-scale data organization • Ability to design data models optimized for analytics and BI consumption Data Ingestion & Orchestration · Experience building batch and streaming ingestion pipelines using GCP-native services · Knowledge of Pub/Sub-based streaming architectures, event schema design, and versioning · Strong understanding of incremental ingestion and CDC patterns, including idempotency and deduplication · Hands-on experience with workflow orchestration tools (Cloud Composer / Airflow) · Ability to design robust error handling, replay, and backfill mechanisms Data Processing & Transformation · Experience developing scalable batch and streaming pipelines using Dataflow (Apache Beam) and/or Spark (Dataproc) · Strong proficiency in BigQuery SQL, including query optimization, partitioning, clustering, and cost control. · Hands-on experience with Hadoop MapReduce and ecosystem tools (Hive, Pig, Sqoop) · Advanced Python programming skills for data engineering, including testing and maintainable code design · Experience managing schema evolution while minimizing downstream impact Analytics & Data Serving · Expertise in BigQuery performance optimization and data serving patterns · Experience building semantic layers and governed metrics for consistent analytics · Familiarity with BI integration, access controls, and dashboard standards · Understanding of data exposure patterns via views, APIs, or curated datasets Data Governance, Quality & Metadata · Experience implementing data catalogs, metadata management, and ownership models · Understanding of data lineage for auditability and troubleshooting · Strong focus on data quality frameworks, including validation, freshness checks, and alerting · Experience defining and enforcing data contracts, schemas, and SLAs Good to have Security, Privacy & Compliance · Hands-on experience implementing fine-grained access controls for BigQuery and GCS · Experience with Sprint planning and helping team technically. · Strong stakeholder communication and solution‑architecture skills Qualifications • Experience: [10–14]+ years in DevOps and Data Architecture, 5+ years designing on Pyspark/GCP/OCP at scale; prior on‑prem → cloud migration a must. • Education: Bachelor’s/Master’s in Computer Science, Information Systems, or equivalent experience. • Certifications: Google Cloud Professional Cloud Architect/DevOps/OCP (required or within 3 months). Plus: Professional Data Engineer, Security Engineer.