

Prime Health Technologies
Data Engineer (Healthcare)
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer (Healthcare) with a contract length of "unknown" and a pay rate of "unknown." It requires 7+ years of data engineering experience, strong SQL and Python skills, and familiarity with healthcare data regulations like HIPAA and GDPR.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
June 11, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#Data Engineering #Data Access #Observability #Deployment #ADLS (Azure Data Lake Storage) #Data Catalog #Datasets #ML (Machine Learning) #Security #React #Storage #BigQuery #"ETL (Extract #Transform #Load)" #Data Pipeline #Data Lake #AI (Artificial Intelligence) #REST (Representational State Transfer) #Python #Airflow #S3 (Amazon Simple Storage Service) #Snowflake #Metadata #Compliance #Logging #dbt (data build tool) #GDPR (General Data Protection Regulation) #Data Quality #Classification #SQL (Structured Query Language) #Kubernetes #Kafka (Apache Kafka) #Scala #Data Science #BI (Business Intelligence) #Documentation #Batch #FHIR (Fast Healthcare Interoperability Resources) #Data Documentation #REST API #Databases #Databricks #Monitoring #Cloud
Role description
About the job
Prime Health Technologies is redefining healthcare from reactive to proactive by building the world's first AI-driven Precision Health Operating System for governments and insurers. Our platform integrates biometric data, behavioral patterns, and intelligent reasoning to deliver personalized, preventative care at national scale. We partner with governments and insurers to improve population health outcomes, reduce healthcare costs, and empower every citizen with a personal AI health companion that adapts daily by guiding movement, nutrition, sleep, and lifestyle.
Your screening answers — especially years of experience — will be verified during the interview. Generic or AI-generated responses tend to be apparent under follow-up; we prefer honest, specific answers over a polished generic response. Read the entire JD.
Role Summary
As a Data Engineer, you will design, build, and operate the platform's data substrate — the pipelines, storage, governance, and analytics layers everything else stands on. You will own the data plane end-to-end, from ingestion through observability, with the architectural discipline required for regulated, sovereign-deployment work.
Your primary mandate is the substrate: governed, reliable, reproducible data flows and curated datasets that downstream engineering, product, clinical, and analytics workloads rely on. As the platform matures toward pilots, the role extends naturally into MLOps (model registry, training orchestration, serving infrastructure, inference logging), applying the same architectural discipline to the model layer.
This role is hands-on and production-oriented. Security and compliance are non-negotiable. Your work must align with our information security and AI governance posture (ISO 27001, ISO 42001) and support privacy obligations such as HIPAA and GDPR, per deployment jurisdiction.
Key Responsibilities
• Build & maintain reliable batch and (where appropriate) streaming pipelines for clinical, operational, product, and third-party data sources, including healthcare & consumer-health integrations: HL7/FHIR, REST APIs, Apple HealthKit, Android Health Connect, and governed adapters for external clinical or wellness sources.
• Design data models, transformations, and storage patterns supporting analytics, reporting, AI workloads, and product features — with reproducibility as a first-class requirement (any curated dataset must rebuild deterministically from raw inputs & transformation code).
• Design & operate core stores in the in-country PHI data plane (operational database, time-series store, object storage, audit logs) with encryption, access control, and lifecycle management.
• Build curated, de-identified-by-default analytics datasets powering operational, regulatory, and client dashboards.
• Implement & maintain PHI/PII de-identification & tokenization pipelines; support tightly controlled re-identification workflows when explicitly authorized.
• Establish data quality, integrity, and observability controls (validation, reconciliation, idempotency, late-arriving data handling, lineage, monitoring, alerting) and publish quality metrics.
• Deliver a discoverable metadata layer so teams can self-serve and trust datasets.
• Support sovereign / regional data-residency models, keeping PHI within an approved deployment boundary while enabling derived & aggregate views in out-of-country planes.
• Own pipeline observability — logging, metrics, tracing, alerting, cost & performance tuning — across the stack.
• Contribute to CI/CD for data components & participate in incident response and postmortems.
• Partner with engineering, product, clinical, and business stakeholders translating data needs into scalable technical solutions
As the platform approaches pilot deployments, this role extends into MLOps — building the model & serving infrastructure that a future Data Scientist will rely on. This is a natural trajectory for senior data engineers (the same architectural muscle applied to model artifacts) — shown here so candidates understand the path. The actual data science work — feature design, model development, evaluation — is a separate later hire who will plug into the substrate & scaffolding you build. MLOps scope includes:
• Training-job orchestration & reproducible dataset versioning
• Model registry & artifact storage
• Containerized model serving, routing, and shadow-deployment infrastructure
• Inference logging back into the warehouse for downstream evaluation
• CI/CD for model artifacts (schema validation, contract tests, automated rollouts)
Required Qualifications
• 7+ years in data engineering or backend engineering with significant data-pipeline ownership; substantial seniority is expected given the regulated, national-scale, and sovereign-deployment context. Prior work in healthcare, wellness, insurance, or other regulated domains
• Strong SQL & Python; proven track record building reliable ETL/ELT pipelines in production.
• Experience with modern storage patterns: operational databases, data lakes / object storage, and analytics warehouses or lakehouses.
• Hands-on experience with orchestration tools (Airflow, Dagster, Prefect, or equivalent) and transformation frameworks (dbt or equivalent).
• Demonstrated discipline around data contracts, schema evolution, and reproducible pipelines (deterministic rebuilds from raw + code).
• Experience working with sensitive data (PII/PHI), implementing least-privilege access patterns, audit logging, and consent-aware data access.
• Familiarity with data classification, retention, deletion, and auditability requirements for sensitive data.
• Experience with data quality & observability practices: validation/testing, lineage/metadata, monitoring/alerting, incident response.
• Clear written & verbal communication; able to produce data documentation, runbooks, and pragmatic design proposals.
Preferred Qualifications
• Experience supporting audits & control evidence in ISO 27001-aligned environments; familiarity with ISO 42001 AI governance expectations & privacy regimes such as HIPAA & GDPR.
• Exposure to HL7/FHIR or common clinical code sets (ICD, SNOMED) and the realities of integrating heterogeneous health datasets.
• Prior work in data-residency / sovereign-cloud environments with split-plane architectures (in-country PHI plane plus out-of-country derived/aggregate views).
• Experience with time-series databases & high-volume sensor / wearable data pipelines.
• Growth direction — MLOps: experience extending data platforms with ML infrastructure (training orchestration, model registries, feature-pipeline runtimes with batch-to-online parity, containerized serving, inference logging). Candidates who have collaborated closely with data science teams and have intuition for what makes good scaffolding for that workflow are particularly valuable, since this role will grow into MLOps as the platform matures toward pilots.
Technology Environment
The platform's architecture is modular and may be deployed into different cloud & sovereign environments. You should be comfortable across the following categories:
• Languages: Python, SQL (and comfort reading service code & APIs).
• Orchestration: Airflow, Dagster, Prefect, or equivalent; backfills & idempotent reprocessing patterns.
• Streaming: Kafka, Kinesis, Pub/Sub, or equivalent, where appropriate.
• Storage: Postgres or equivalent operational DB; time-series store; object storage (S3 / ADLS / GCS); analytics warehouse or lakehouse (Snowflake, BigQuery, Databricks, or similar).
• Transformations: dbt or similar; dimensional modeling & curated datasets for downstream BI.
• Governance: data catalog, lineage, schema registry, policy enforcement, strong audit logging.
• Platform: containerized workloads (Kubernetes or equivalent), CI/CD, infrastructure-as-code, and observability tooling.
• ML infrastructure (later-stage growth into MLOps): model registry, training orchestration, feature-pipeline runtime, containerized serving — at the platform level only; modeling decisions remain with data science.
Security, Privacy & Compliance Expectations
The platform handles sensitive personal & health information. You will build privacy-preserving data flows & support compliance across multiple jurisdictions. Examples include:
• Standards & regulatory alignment: implement & evidence data controls that support our ISO 27001 ISMS & ISO 42001 AIMS, and enable privacy obligations such as HIPAA & GDPR through least-privilege access, auditability, and governed data handling.
• Data classification & segregation: enforce separation of PHI/PII stores from derived & de-identified datasets.
• Encryption & key management: integrate with KMS/HSM-backed key management & rotate keys per policy.
• Access control: implement RBAC/ABAC & break-glass processes; ensure all sensitive access is logged & reviewable.
Soft Skills & Working Style
• Systems-level thinker with strong architectural intuition
• Bias toward clarity, structure, and proactive problem-solving; documents pipelines, schemas, and assumptions; produces rulebooks others follow.
• Comfortable in early-stage environments with evolving requirements while maintaining architectural discipline
• Strong communicator translating business, scientific, and AI questions into durable data models.
• High ownership mindset preferring accountability over ambiguity; owns data quality, reliability, and observability.
To demonstrate that you read and understand this role, please email jcooper@primehealthtechnologies.com with the subject line: Data Engineer – Your Name. In the body of the email, answer the following questions:
1. Describe one decision you would delay until later when building ELIA, and one decision you believe must be made correctly on day one. Explain why.
1. Briefly (2–3 sentences): What's your view on AI-assisted 'vibe coding' for production data pipelines that handle PHI?
Historically, over 95 percent of applicants skip this step. Completing it meaningfully is part of the evaluation.
About the job
Prime Health Technologies is redefining healthcare from reactive to proactive by building the world's first AI-driven Precision Health Operating System for governments and insurers. Our platform integrates biometric data, behavioral patterns, and intelligent reasoning to deliver personalized, preventative care at national scale. We partner with governments and insurers to improve population health outcomes, reduce healthcare costs, and empower every citizen with a personal AI health companion that adapts daily by guiding movement, nutrition, sleep, and lifestyle.
Your screening answers — especially years of experience — will be verified during the interview. Generic or AI-generated responses tend to be apparent under follow-up; we prefer honest, specific answers over a polished generic response. Read the entire JD.
Role Summary
As a Data Engineer, you will design, build, and operate the platform's data substrate — the pipelines, storage, governance, and analytics layers everything else stands on. You will own the data plane end-to-end, from ingestion through observability, with the architectural discipline required for regulated, sovereign-deployment work.
Your primary mandate is the substrate: governed, reliable, reproducible data flows and curated datasets that downstream engineering, product, clinical, and analytics workloads rely on. As the platform matures toward pilots, the role extends naturally into MLOps (model registry, training orchestration, serving infrastructure, inference logging), applying the same architectural discipline to the model layer.
This role is hands-on and production-oriented. Security and compliance are non-negotiable. Your work must align with our information security and AI governance posture (ISO 27001, ISO 42001) and support privacy obligations such as HIPAA and GDPR, per deployment jurisdiction.
Key Responsibilities
• Build & maintain reliable batch and (where appropriate) streaming pipelines for clinical, operational, product, and third-party data sources, including healthcare & consumer-health integrations: HL7/FHIR, REST APIs, Apple HealthKit, Android Health Connect, and governed adapters for external clinical or wellness sources.
• Design data models, transformations, and storage patterns supporting analytics, reporting, AI workloads, and product features — with reproducibility as a first-class requirement (any curated dataset must rebuild deterministically from raw inputs & transformation code).
• Design & operate core stores in the in-country PHI data plane (operational database, time-series store, object storage, audit logs) with encryption, access control, and lifecycle management.
• Build curated, de-identified-by-default analytics datasets powering operational, regulatory, and client dashboards.
• Implement & maintain PHI/PII de-identification & tokenization pipelines; support tightly controlled re-identification workflows when explicitly authorized.
• Establish data quality, integrity, and observability controls (validation, reconciliation, idempotency, late-arriving data handling, lineage, monitoring, alerting) and publish quality metrics.
• Deliver a discoverable metadata layer so teams can self-serve and trust datasets.
• Support sovereign / regional data-residency models, keeping PHI within an approved deployment boundary while enabling derived & aggregate views in out-of-country planes.
• Own pipeline observability — logging, metrics, tracing, alerting, cost & performance tuning — across the stack.
• Contribute to CI/CD for data components & participate in incident response and postmortems.
• Partner with engineering, product, clinical, and business stakeholders translating data needs into scalable technical solutions
As the platform approaches pilot deployments, this role extends into MLOps — building the model & serving infrastructure that a future Data Scientist will rely on. This is a natural trajectory for senior data engineers (the same architectural muscle applied to model artifacts) — shown here so candidates understand the path. The actual data science work — feature design, model development, evaluation — is a separate later hire who will plug into the substrate & scaffolding you build. MLOps scope includes:
• Training-job orchestration & reproducible dataset versioning
• Model registry & artifact storage
• Containerized model serving, routing, and shadow-deployment infrastructure
• Inference logging back into the warehouse for downstream evaluation
• CI/CD for model artifacts (schema validation, contract tests, automated rollouts)
Required Qualifications
• 7+ years in data engineering or backend engineering with significant data-pipeline ownership; substantial seniority is expected given the regulated, national-scale, and sovereign-deployment context. Prior work in healthcare, wellness, insurance, or other regulated domains
• Strong SQL & Python; proven track record building reliable ETL/ELT pipelines in production.
• Experience with modern storage patterns: operational databases, data lakes / object storage, and analytics warehouses or lakehouses.
• Hands-on experience with orchestration tools (Airflow, Dagster, Prefect, or equivalent) and transformation frameworks (dbt or equivalent).
• Demonstrated discipline around data contracts, schema evolution, and reproducible pipelines (deterministic rebuilds from raw + code).
• Experience working with sensitive data (PII/PHI), implementing least-privilege access patterns, audit logging, and consent-aware data access.
• Familiarity with data classification, retention, deletion, and auditability requirements for sensitive data.
• Experience with data quality & observability practices: validation/testing, lineage/metadata, monitoring/alerting, incident response.
• Clear written & verbal communication; able to produce data documentation, runbooks, and pragmatic design proposals.
Preferred Qualifications
• Experience supporting audits & control evidence in ISO 27001-aligned environments; familiarity with ISO 42001 AI governance expectations & privacy regimes such as HIPAA & GDPR.
• Exposure to HL7/FHIR or common clinical code sets (ICD, SNOMED) and the realities of integrating heterogeneous health datasets.
• Prior work in data-residency / sovereign-cloud environments with split-plane architectures (in-country PHI plane plus out-of-country derived/aggregate views).
• Experience with time-series databases & high-volume sensor / wearable data pipelines.
• Growth direction — MLOps: experience extending data platforms with ML infrastructure (training orchestration, model registries, feature-pipeline runtimes with batch-to-online parity, containerized serving, inference logging). Candidates who have collaborated closely with data science teams and have intuition for what makes good scaffolding for that workflow are particularly valuable, since this role will grow into MLOps as the platform matures toward pilots.
Technology Environment
The platform's architecture is modular and may be deployed into different cloud & sovereign environments. You should be comfortable across the following categories:
• Languages: Python, SQL (and comfort reading service code & APIs).
• Orchestration: Airflow, Dagster, Prefect, or equivalent; backfills & idempotent reprocessing patterns.
• Streaming: Kafka, Kinesis, Pub/Sub, or equivalent, where appropriate.
• Storage: Postgres or equivalent operational DB; time-series store; object storage (S3 / ADLS / GCS); analytics warehouse or lakehouse (Snowflake, BigQuery, Databricks, or similar).
• Transformations: dbt or similar; dimensional modeling & curated datasets for downstream BI.
• Governance: data catalog, lineage, schema registry, policy enforcement, strong audit logging.
• Platform: containerized workloads (Kubernetes or equivalent), CI/CD, infrastructure-as-code, and observability tooling.
• ML infrastructure (later-stage growth into MLOps): model registry, training orchestration, feature-pipeline runtime, containerized serving — at the platform level only; modeling decisions remain with data science.
Security, Privacy & Compliance Expectations
The platform handles sensitive personal & health information. You will build privacy-preserving data flows & support compliance across multiple jurisdictions. Examples include:
• Standards & regulatory alignment: implement & evidence data controls that support our ISO 27001 ISMS & ISO 42001 AIMS, and enable privacy obligations such as HIPAA & GDPR through least-privilege access, auditability, and governed data handling.
• Data classification & segregation: enforce separation of PHI/PII stores from derived & de-identified datasets.
• Encryption & key management: integrate with KMS/HSM-backed key management & rotate keys per policy.
• Access control: implement RBAC/ABAC & break-glass processes; ensure all sensitive access is logged & reviewable.
Soft Skills & Working Style
• Systems-level thinker with strong architectural intuition
• Bias toward clarity, structure, and proactive problem-solving; documents pipelines, schemas, and assumptions; produces rulebooks others follow.
• Comfortable in early-stage environments with evolving requirements while maintaining architectural discipline
• Strong communicator translating business, scientific, and AI questions into durable data models.
• High ownership mindset preferring accountability over ambiguity; owns data quality, reliability, and observability.
To demonstrate that you read and understand this role, please email jcooper@primehealthtechnologies.com with the subject line: Data Engineer – Your Name. In the body of the email, answer the following questions:
1. Describe one decision you would delay until later when building ELIA, and one decision you believe must be made correctly on day one. Explain why.
1. Briefly (2–3 sentences): What's your view on AI-assisted 'vibe coding' for production data pipelines that handle PHI?
Historically, over 95 percent of applicants skip this step. Completing it meaningfully is part of the evaluation.






