

GP Strategies Corporation
Data Foundations & Lineage Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Foundations & Lineage Engineer, a 6-month remote contract position. Requires 4+ years in data engineering, expert SQL, and experience with Azure Data Lake and Databricks. Familiarity with Learning or HR data domains preferred.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
January 28, 2026
🕒 - Duration
More than 6 months
-
🏝️ - Location
Remote
-
📄 - Contract
1099 Contractor
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#Data Quality #Semantic Models #Data Lineage #Data Design #BI (Business Intelligence) #Data Modeling #Data Architecture #Data Analysis #Documentation #Databricks #"ETL (Extract #Transform #Load)" #Synapse #Data Engineering #Data Governance #Data Catalog #AI (Artificial Intelligence) #Microsoft Power BI #Datasets #Scala #Azure #Data Lake #Metadata #SQL (Structured Query Language)
Role description
Position: Data Foundations & Lineage Engineer
Work Location: USA (Remote)
Position: Contract for 6 months (possible extension)
We need a Data Foundations & Lineage Engineer to build, document, and maintain the core data ecosystem that drives Learning Data Intelligence. This role involves defining the structure, lineage, quality, and meaning of datasets within the Learning Lake (including HCM, Finance, HRDP, and FDL). The engineer will ensure every dataset is discoverable, well-documented, and trustworthy. This position requires hands-on work across the Lakehouse, including mapping schemas, tracing lineage, profiling quality, eliminating manual dependencies, and constructing a durable documentation layer that serves engineering, analytics, AI agents, and business stakeholders.
Data Discovery & Documentation
• Perform deep, brute‑force exploration of all Learning Lake schemas and tables to understand their meaning, business purpose, and dependencies.
• Build a comprehensive documentation repository describing dataset definitions, column‑level semantics, business logic, refresh cadences, source systems, and downstream consumption patterns.
• Translate implicit, tribal‑knowledge data flows into explicit, searchable documentation consistent with guidance
Data Lineage & Architecture Clarity
• Develop end‑to‑end lineage for Learning datasets, mapping sources, transformations, pipelines, and consumption (Power BI, semantic models, AI agents, etc.).
• Identify and eliminate manual or undocumented data feeds, aligning with the Manual Dependency Elimination initiative
Collaboration & Stakeholder Alignment
• Work closely with the DRI team as subject‑matter partners; escalate questions and validate assumptions.
• Partner with analytics, engineering, content, and program teams to ensure data design supports downstream reporting, modelling, and AI use cases.
Enablement & Self‑Service
• Build the foundational metadata that powers data discovery, semantic models, and self‑service analytics.
• Produce guides, readme files, and onboarding materials for all teams relying on Learning Lake.
Required Qualifications
• 4+ years of experience in data engineering, data analysis, data governance, or related fields.
• Expert SQL and data‑profiling skills, with the ability to reverse‑engineer undocumented or ambiguous datasets. Hands‑on experience with Azure Data Lake, Microsoft Fabric, Databricks, or Synapse in production environments.
• Familiarity with metadata systems, data cataloguing, lineage tooling, and orchestration best practices.
• Demonstrated ability to operate effectively in ambiguous, poorly documented, and fast‑changing data environments.
Preferred Qualifications
• Experience working across large‑scale data ecosystems with shifting taxonomies and inconsistent data quality, combined with strong foundations in data modeling, documentation systems, data product ownership, or semantic model design.
• Proven ability to partner with engineering teams on data governance, lineage, metadata standards, and quality frameworks to improve reliability and trust.
• Exposure to Learning or HR data domains (e.g., HCM, HRDP, Finance, Skills/Learning datasets), including familiarity with soft‑skilling, competency models, or employee capability frameworks.
• Experience or working knowledge of data architecture concepts (lakehouse, domain‑driven design, data contracts, schema governance).
• A strategic thinker who can link data foundations to business impact and AI‑driven outcomes, with strong prioritization and cross‑functional influence.
Position: Data Foundations & Lineage Engineer
Work Location: USA (Remote)
Position: Contract for 6 months (possible extension)
We need a Data Foundations & Lineage Engineer to build, document, and maintain the core data ecosystem that drives Learning Data Intelligence. This role involves defining the structure, lineage, quality, and meaning of datasets within the Learning Lake (including HCM, Finance, HRDP, and FDL). The engineer will ensure every dataset is discoverable, well-documented, and trustworthy. This position requires hands-on work across the Lakehouse, including mapping schemas, tracing lineage, profiling quality, eliminating manual dependencies, and constructing a durable documentation layer that serves engineering, analytics, AI agents, and business stakeholders.
Data Discovery & Documentation
• Perform deep, brute‑force exploration of all Learning Lake schemas and tables to understand their meaning, business purpose, and dependencies.
• Build a comprehensive documentation repository describing dataset definitions, column‑level semantics, business logic, refresh cadences, source systems, and downstream consumption patterns.
• Translate implicit, tribal‑knowledge data flows into explicit, searchable documentation consistent with guidance
Data Lineage & Architecture Clarity
• Develop end‑to‑end lineage for Learning datasets, mapping sources, transformations, pipelines, and consumption (Power BI, semantic models, AI agents, etc.).
• Identify and eliminate manual or undocumented data feeds, aligning with the Manual Dependency Elimination initiative
Collaboration & Stakeholder Alignment
• Work closely with the DRI team as subject‑matter partners; escalate questions and validate assumptions.
• Partner with analytics, engineering, content, and program teams to ensure data design supports downstream reporting, modelling, and AI use cases.
Enablement & Self‑Service
• Build the foundational metadata that powers data discovery, semantic models, and self‑service analytics.
• Produce guides, readme files, and onboarding materials for all teams relying on Learning Lake.
Required Qualifications
• 4+ years of experience in data engineering, data analysis, data governance, or related fields.
• Expert SQL and data‑profiling skills, with the ability to reverse‑engineer undocumented or ambiguous datasets. Hands‑on experience with Azure Data Lake, Microsoft Fabric, Databricks, or Synapse in production environments.
• Familiarity with metadata systems, data cataloguing, lineage tooling, and orchestration best practices.
• Demonstrated ability to operate effectively in ambiguous, poorly documented, and fast‑changing data environments.
Preferred Qualifications
• Experience working across large‑scale data ecosystems with shifting taxonomies and inconsistent data quality, combined with strong foundations in data modeling, documentation systems, data product ownership, or semantic model design.
• Proven ability to partner with engineering teams on data governance, lineage, metadata standards, and quality frameworks to improve reliability and trust.
• Exposure to Learning or HR data domains (e.g., HCM, HRDP, Finance, Skills/Learning datasets), including familiarity with soft‑skilling, competency models, or employee capability frameworks.
• Experience or working knowledge of data architecture concepts (lakehouse, domain‑driven design, data contracts, schema governance).
• A strategic thinker who can link data foundations to business impact and AI‑driven outcomes, with strong prioritization and cross‑functional influence.






