AddSource

Data Architect with Life Science

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Architect with Life Science, offering a 6-month contract at $90.00 - $100.00 per hour, based in San Francisco, CA. Key skills required include medical imaging data expertise, DICOM standards, Python, ETL processes, and AWS services.
🌎 - Country
United States
πŸ’± - Currency
$ USD
-
πŸ’° - Day rate
800
-
πŸ—“οΈ - Date
December 11, 2025
πŸ•’ - Duration
Unknown
-
🏝️ - Location
On-site
-
πŸ“„ - Contract
Unknown
-
πŸ”’ - Security
Unknown
-
πŸ“ - Location detailed
San Francisco, CA 94114
-
🧠 - Skills detailed
#Metadata #Docker #PostgreSQL #GIT #Terraform #SQL (Structured Query Language) #NLP (Natural Language Processing) #Data Science #AI (Artificial Intelligence) #Python #EC2 #NoSQL #Athena #Data Architecture #Datasets #Libraries #Tableau #Data Management #Agile #Lambda (AWS Lambda) #ADaM (Analysis Data Model) #Apache Airflow #Classification #ML (Machine Learning) #Data Analysis #Compliance #Pandas #DevOps #AWS (Amazon Web Services) #RDS (Amazon Relational Database Service) #Spark (Apache Spark) #S3 (Amazon Simple Storage Service) #Jenkins #Talend #Data Pipeline #"ETL (Extract #Transform #Load)" #Airflow #Data Quality #NumPy #Data Integration #GitLab
Role description
Data Architect with Life Science Location (Onsite from Day 1): San Francisco, CA (4 days from office). Candidates must reside in San Francisco, CA, or be open to relocation.Candidates should be authorized to work in the United States. What’s in it for you:The Image Curation and Data Products team transforms biomedical imaging data from clinical trials and RWD by applying tools and workflows to deliver high-quality, FAIR imaging datasets. These enable imaging data scientists to discover and utilize data for exploratory use to algorithm development. Job DescriptionKey Responsibilities ● Imaging Data Pipeline Delivery:Design, implement and maintain automated pipelines for onboarding, verifying, transforming and curating biomedical imaging data from clinical trials and real-world data sources for therapeutic areasβ€”Oncology, Neurology, Ophthalmologyβ€”covering all image file formats. ● Data Quality and Integrity:Develop and implement solutions to detect and correct anomalies and inconsistencies to achieve the highest data quality of the imaging dataset per industry standards (DICOM) and internal specifications such as FFS, RTS, GDSR, etc. Ensure de-identification, PHI/PII controls, and image-specific QC checks are implemented at scale. ● Data Analysis and Integration:Integrate ML and AI-assisted tools in pipelines for inline image analysis, classifications, segmentations to extract and enrich metadata for various analyses, optimize performance, etc. ● Image Data Management:Build and maintain large-scale catalogs of curated imaging datasets enhancing FAIR principles, enabling easy discovery and access to imaging data assets. ● Compliance and Controls:Ensure applicable compliance and privacy controls are followed as required by GXP and CSV validations. ● Collaboration:Work closely with image scientists, data scientists, clinops, and biomarker research teams, supporting data needs for various primary and secondary endpoint analyses. ● External Collaboration:Work with external partners, e.g., CROs, to ensure imaging data received conforms to established agreements, quality standards, and completeness. ● Lead the Delivery Team:Ensure timely delivery of product backlog/features. ● Agile Participation:Participate with the team and lead various agile ceremonies throughout planning and execution. Ideal Candidate Would Have (multiple competencies from list below): Worked with medical imaging data and platforms, PACS, VNAs, etc. Worked with radiology imaging data such as CT, PET, MRI, NIfTI, and Ophthalmic imaging OCT, FA, CFP, etc. Good understanding of DICOM standards, structure, metadata parsing, tags, multi-frame images. Worked with clinical information data standards like SDTM, ADaM. Data integration across diverse data sources, e.g., imaging data with tabular clinical data. De-identification methodologies, PHI/PII detection and privacy controls. Good understanding of GXP and CSV validation frameworks. Proficient in Python and libraries such as pandas, pydicom, SimpleITK, dicom-numpy, dcm2niix. Hands-on experience with ETL/ELT involving large medical imaging datasets. Experience with Apache Airflow, Spark, Talend, or similar workflow orchestration tools. Proficiency with SQL and NoSQL and image metadata stores (PostgreSQL, Mongo, etc.). Practical experience with AWS infrastructure and Data Services such as RDS, Athena, Glue, EC2, Lambda, S3. Familiar with EKS, Docker, and HPC. Experience in data analysis and report generation using Tibco, Tableau, AWS QuickSight, etc. Good knowledge of Git, GitLab, and DevOps tools like Jenkins, Terraform. Familiar with ML workflows for Computer Vision tasks such as segmentation, classification, etc. Nice to have: implemented solutions on NLP and GenAI. Worked with cross-functional global teams in a dynamic Agile environment. Lead and mentor agile team members. Has 10+ years of experience with data platforms, analysis, and insights. Educational Qualifications: Engineering Degree – BE/ME/BTech/MTech/BSc/MSc.Technical certification in multiple technologies is desirable. Job Type: Contract Pay: $90.00 - $100.00 per hour Work Location: In person