

E-IT
AWS Data Architect (Healthcare/Life Science Exp)
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for an AWS Data Architect with 12+ years of experience in data engineering, focusing on healthcare/life sciences. Contract length is unspecified, with a pay rate of "unknown". Key skills include PySpark, AWS Redshift, and Python.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
October 17, 2025
🕒 - Duration
Unknown
-
🏝️ - Location
On-site
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Tarrytown, NY
-
🧠 - Skills detailed
#Kubernetes #Elasticsearch #Databricks #PySpark #Security #Data Ingestion #Data Engineering #Airflow #Virtualization #Docker #Compliance #SQL (Structured Query Language) #DevOps #Looker #Data Security #Data Architecture #Redshift #Apache Spark #Apache Airflow #Dremio #Spark (Apache Spark) #Jenkins #Data Migration #Strategy #Python #GIT #AWS (Amazon Web Services) #Data Governance #Migration #ML (Machine Learning)
Role description
Role: AWS Data Architect
Location: Tarrytown NY 10591 (100% Onsite)
Contract
Skills: PySpark, AWS Redshift, EMR, Databricks, Python, Healthcare/Pharma/Life Sciences domain
Job Description:
• Candidate should have strong 12+ Years of experience in data engineering architecture for large-scale platforms.
• Contribute towards defining platform roadmap/Architecture/ solution, Design, POCs, prototype, technical evaluation for tech stack finalization and guiding principle for the best practices etc.
• Data platform development strategy, Data migration strategy, Data validation strategy, To review code, checklist / coding standards etc.
• Creating data models to reduce system complexities and hence increase efficiency & reduce cost.
• Expertise with data ingestion/orchestration tools and working experience in Real-time processing Framework (Apache Spark), PySpark and in AWS Redshift, Apache Airflow and EMR etc
• Strong coding background in Python, SQL, PySpark. Proficiency in data virtualization (Dremio or similar).
• Experience with data governance and access control frameworks (Privacera, Apache Ranger, etc.).
• Knowledge of search & discovery platforms (Solr, Elasticsearch, Looker).
• Solid understanding of data security, authentication (Okta), and compliance frameworks.
• Familiarity with CI/CD pipelines and DevOps practices (Jenkins, Git, Docker, Kubernetes).
• Prior experience designing enterprise data platforms in healthcare, pharma, or regulated industries.
• Knowledge of machine learning pipelines and integration into data platforms.
Role: AWS Data Architect
Location: Tarrytown NY 10591 (100% Onsite)
Contract
Skills: PySpark, AWS Redshift, EMR, Databricks, Python, Healthcare/Pharma/Life Sciences domain
Job Description:
• Candidate should have strong 12+ Years of experience in data engineering architecture for large-scale platforms.
• Contribute towards defining platform roadmap/Architecture/ solution, Design, POCs, prototype, technical evaluation for tech stack finalization and guiding principle for the best practices etc.
• Data platform development strategy, Data migration strategy, Data validation strategy, To review code, checklist / coding standards etc.
• Creating data models to reduce system complexities and hence increase efficiency & reduce cost.
• Expertise with data ingestion/orchestration tools and working experience in Real-time processing Framework (Apache Spark), PySpark and in AWS Redshift, Apache Airflow and EMR etc
• Strong coding background in Python, SQL, PySpark. Proficiency in data virtualization (Dremio or similar).
• Experience with data governance and access control frameworks (Privacera, Apache Ranger, etc.).
• Knowledge of search & discovery platforms (Solr, Elasticsearch, Looker).
• Solid understanding of data security, authentication (Okta), and compliance frameworks.
• Familiarity with CI/CD pipelines and DevOps practices (Jenkins, Git, Docker, Kubernetes).
• Prior experience designing enterprise data platforms in healthcare, pharma, or regulated industries.
• Knowledge of machine learning pipelines and integration into data platforms.






