

Ender-IT
Lead MS Fabric Data Engineer (Strong Python)
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Lead MS Fabric Data Engineer (Strong Python) on a 12-month remote contract, offering a competitive pay rate. Key skills include Python, PySpark, Azure, REST APIs, and data modeling. A Bachelor's/Master’s in a related field and 4+ years of Data Engineering experience are required.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
May 14, 2026
🕒 - Duration
More than 6 months
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#Monitoring #Storage #Data Lake #Data Science #Azure ADLS (Azure Data Lake Storage) #Airflow #Spark SQL #Cloud #Data Pipeline #Spark (Apache Spark) #Azure #Data Governance #GIT #API (Application Programming Interface) #PySpark #REST API #Version Control #Data Quality #REST (Representational State Transfer) #Schema Design #Deployment #Python #"ETL (Extract #Transform #Load)" #SQL (Structured Query Language) #Computer Science #Data Analysis #Data Modeling #Data Orchestration #Scala #Security #Metadata #Databricks #Datasets #ADLS (Azure Data Lake Storage) #Documentation #Delta Lake #Graph API #SharePoint #Normalization #Synapse #Data Engineering #Batch
Role description
Lead Fabric Data Engineer (Strong python/Pyspark Notebook)
Remote work
Duration: 12 Months+
Responsibilities
• Design, develop, test, and maintain PySpark notebooks to ingest data from:
SharePoint (REST/Graph API)
Outlook (Graph/REST API)
3rd party REST APIs
• Bronze layer (raw): persist ingestion in Delta Lake/Parquet with minimal transformations, preserving schema drift and auditing metadata
• Silver layer (curated): apply data quality checks, deduplication, normalization, standardization, and business-key alignment
• Implement and maintain a star-schema data model (facts and dimensions) for analytics
• Build and maintain CI/CD for data pipelines (Git, unit tests, deployment to Databricks/Azure Synapse, etc.)
• Implement monitoring, alerting, retries, and idempotent ingest strategies
• Data governance: lineage, masking of PII/PII-sensitive fields, role-based access
• Documentation: pipeline design docs, data dictionaries, and runbooks
• Collaborate with Data Analysts, Data Scientists, and Business Stakeholders to translate requirements into scalable pipelines
• Optimize performance (partitioning, caching, Delta tables, spark configurations) and control costs
Deliverables
• PySpark notebooks for ingestion from all sources
• Bronze layer datasets (RAW) in Delta/Parquet
• Silver layer datasets (CURATED) in Delta/Parquet
• Star schema data model: fact and dimension tables
• Data quality checks, metrics, and dashboards
• Schema definitions, data dictionaries, and runbooks
Technologies & Tools
• PySpark / Spark SQL
• Delta Lake or Parquet-based storage
• Databricks or Azure Synapse Spark environments
• Azure Data Lake Storage Gen2 (or equivalent) for storage
• Data orchestration: Airflow, Prefect, Dagster, or equivalent
• REST APIs, especially Microsoft Graph (SharePoint, Outlook)
• SQL for data modeling and queries
• Version control with Git
• Basic data governance and security practices
Required Qualifications
• Bachelor's or Master’s in Computer Science, Data Engineering, or related field
• 4+ years of experience as a Data Engineer
• Proficient in Python and PySpark; strong SQL skills
• Experience ingesting data from REST APIs (SharePoint/Graph API, Outlook/Exchange, 3rd party APIs)
• Demonstrated experience with Bronze/Silver/Gold data modeling, and star schema design
• Experience with Delta Lake / data lake architectures
• Familiarity with data quality frameworks and profiling
• Experience with cloud data platforms (Azure preferred: Databricks, ADLS Gen2, Synapse)
• Strong problem-solving, collaboration, and communication skills
Nice-to-Have
• Databricks certification or Azure Data Engineer Associate
• Experience with Graph API authentication (OAuth2), service principals
• Data governance tools and data masking concepts
• Experience with streaming/batch hybrid ETL patterns
• Knowledge of healthcare/finance data domain or other regulated data
Soft Skills
• Clear communication and ability to explain technical concepts to non-technical stakeholders
• Proactive, ownership-driven, and able to work in cross-functional teams
• Strong documentation and testing mindset
Lead Fabric Data Engineer (Strong python/Pyspark Notebook)
Remote work
Duration: 12 Months+
Responsibilities
• Design, develop, test, and maintain PySpark notebooks to ingest data from:
SharePoint (REST/Graph API)
Outlook (Graph/REST API)
3rd party REST APIs
• Bronze layer (raw): persist ingestion in Delta Lake/Parquet with minimal transformations, preserving schema drift and auditing metadata
• Silver layer (curated): apply data quality checks, deduplication, normalization, standardization, and business-key alignment
• Implement and maintain a star-schema data model (facts and dimensions) for analytics
• Build and maintain CI/CD for data pipelines (Git, unit tests, deployment to Databricks/Azure Synapse, etc.)
• Implement monitoring, alerting, retries, and idempotent ingest strategies
• Data governance: lineage, masking of PII/PII-sensitive fields, role-based access
• Documentation: pipeline design docs, data dictionaries, and runbooks
• Collaborate with Data Analysts, Data Scientists, and Business Stakeholders to translate requirements into scalable pipelines
• Optimize performance (partitioning, caching, Delta tables, spark configurations) and control costs
Deliverables
• PySpark notebooks for ingestion from all sources
• Bronze layer datasets (RAW) in Delta/Parquet
• Silver layer datasets (CURATED) in Delta/Parquet
• Star schema data model: fact and dimension tables
• Data quality checks, metrics, and dashboards
• Schema definitions, data dictionaries, and runbooks
Technologies & Tools
• PySpark / Spark SQL
• Delta Lake or Parquet-based storage
• Databricks or Azure Synapse Spark environments
• Azure Data Lake Storage Gen2 (or equivalent) for storage
• Data orchestration: Airflow, Prefect, Dagster, or equivalent
• REST APIs, especially Microsoft Graph (SharePoint, Outlook)
• SQL for data modeling and queries
• Version control with Git
• Basic data governance and security practices
Required Qualifications
• Bachelor's or Master’s in Computer Science, Data Engineering, or related field
• 4+ years of experience as a Data Engineer
• Proficient in Python and PySpark; strong SQL skills
• Experience ingesting data from REST APIs (SharePoint/Graph API, Outlook/Exchange, 3rd party APIs)
• Demonstrated experience with Bronze/Silver/Gold data modeling, and star schema design
• Experience with Delta Lake / data lake architectures
• Familiarity with data quality frameworks and profiling
• Experience with cloud data platforms (Azure preferred: Databricks, ADLS Gen2, Synapse)
• Strong problem-solving, collaboration, and communication skills
Nice-to-Have
• Databricks certification or Azure Data Engineer Associate
• Experience with Graph API authentication (OAuth2), service principals
• Data governance tools and data masking concepts
• Experience with streaming/batch hybrid ETL patterns
• Knowledge of healthcare/finance data domain or other regulated data
Soft Skills
• Clear communication and ability to explain technical concepts to non-technical stakeholders
• Proactive, ownership-driven, and able to work in cross-functional teams
• Strong documentation and testing mindset






