Ender-IT

Lead MS Fabric Data Engineer (Strong Python)

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Lead MS Fabric Data Engineer (Strong Python) on a 12-month remote contract, offering a competitive pay rate. Key skills include Python, PySpark, Azure, REST APIs, and data modeling. A Bachelor's/Master’s in a related field and 4+ years of Data Engineering experience are required.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
May 14, 2026
🕒 - Duration
More than 6 months
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#Monitoring #Storage #Data Lake #Data Science #Azure ADLS (Azure Data Lake Storage) #Airflow #Spark SQL #Cloud #Data Pipeline #Spark (Apache Spark) #Azure #Data Governance #GIT #API (Application Programming Interface) #PySpark #REST API #Version Control #Data Quality #REST (Representational State Transfer) #Schema Design #Deployment #Python #"ETL (Extract #Transform #Load)" #SQL (Structured Query Language) #Computer Science #Data Analysis #Data Modeling #Data Orchestration #Scala #Security #Metadata #Databricks #Datasets #ADLS (Azure Data Lake Storage) #Documentation #Delta Lake #Graph API #SharePoint #Normalization #Synapse #Data Engineering #Batch
Role description
Lead Fabric Data Engineer (Strong python/Pyspark Notebook) Remote work Duration: 12 Months+ Responsibilities • Design, develop, test, and maintain PySpark notebooks to ingest data from: SharePoint (REST/Graph API) Outlook (Graph/REST API) 3rd party REST APIs • Bronze layer (raw): persist ingestion in Delta Lake/Parquet with minimal transformations, preserving schema drift and auditing metadata • Silver layer (curated): apply data quality checks, deduplication, normalization, standardization, and business-key alignment • Implement and maintain a star-schema data model (facts and dimensions) for analytics • Build and maintain CI/CD for data pipelines (Git, unit tests, deployment to Databricks/Azure Synapse, etc.) • Implement monitoring, alerting, retries, and idempotent ingest strategies • Data governance: lineage, masking of PII/PII-sensitive fields, role-based access • Documentation: pipeline design docs, data dictionaries, and runbooks • Collaborate with Data Analysts, Data Scientists, and Business Stakeholders to translate requirements into scalable pipelines • Optimize performance (partitioning, caching, Delta tables, spark configurations) and control costs Deliverables • PySpark notebooks for ingestion from all sources • Bronze layer datasets (RAW) in Delta/Parquet • Silver layer datasets (CURATED) in Delta/Parquet • Star schema data model: fact and dimension tables • Data quality checks, metrics, and dashboards • Schema definitions, data dictionaries, and runbooks Technologies & Tools • PySpark / Spark SQL • Delta Lake or Parquet-based storage • Databricks or Azure Synapse Spark environments • Azure Data Lake Storage Gen2 (or equivalent) for storage • Data orchestration: Airflow, Prefect, Dagster, or equivalent • REST APIs, especially Microsoft Graph (SharePoint, Outlook) • SQL for data modeling and queries • Version control with Git • Basic data governance and security practices Required Qualifications • Bachelor's or Master’s in Computer Science, Data Engineering, or related field • 4+ years of experience as a Data Engineer • Proficient in Python and PySpark; strong SQL skills • Experience ingesting data from REST APIs (SharePoint/Graph API, Outlook/Exchange, 3rd party APIs) • Demonstrated experience with Bronze/Silver/Gold data modeling, and star schema design • Experience with Delta Lake / data lake architectures • Familiarity with data quality frameworks and profiling • Experience with cloud data platforms (Azure preferred: Databricks, ADLS Gen2, Synapse) • Strong problem-solving, collaboration, and communication skills Nice-to-Have • Databricks certification or Azure Data Engineer Associate • Experience with Graph API authentication (OAuth2), service principals • Data governance tools and data masking concepts • Experience with streaming/batch hybrid ETL patterns • Knowledge of healthcare/finance data domain or other regulated data Soft Skills • Clear communication and ability to explain technical concepts to non-technical stakeholders • Proactive, ownership-driven, and able to work in cross-functional teams • Strong documentation and testing mindset