

Intelance
Data Engineer (OCR & Data Pipelines, Contract)
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer (OCR & Data Pipelines) on a contract basis (2-3 days/week) with a competitive pay rate. Key requirements include 3-5+ years of Data Engineering experience, strong Python skills, and practical OCR experience, preferably in healthcare or regulated environments. Remote work is available.
π - Country
United Kingdom
π± - Currency
Β£ GBP
-
π° - Day rate
750
-
ποΈ - Date
November 22, 2025
π - Duration
Unknown
-
ποΈ - Location
Remote
-
π - Contract
Unknown
-
π - Security
Unknown
-
π - Location detailed
United Kingdom
-
π§ - Skills detailed
#Data Processing #Databricks #Monitoring #Metadata #Azure #AI (Artificial Intelligence) #Lean #ADF (Azure Data Factory) #REST (Representational State Transfer) #Data Engineering #Security #Data Pipeline #AWS (Amazon Web Services) #API (Application Programming Interface) #JSON (JavaScript Object Notation) #Data Quality #Azure Data Factory #ML (Machine Learning) #Logging #Python #Storage #Cloud #Batch #GCP (Google Cloud Platform) #"ETL (Extract #Transform #Load)"
Role description
Intelance is a specialist architecture and AI consultancy working with clients in regulated, high-trust environments (healthcare, pharma, life sciences, financial services). We are assembling a lean senior team to deliver an AI-assisted clinical report marking tool for a UK-based, UKAS-accredited organisation in human genetic testing.
We are looking for a Data Engineer (OCR & Pipelines) who can turn messy PDFs and documents into clean, reliable, auditable data flows for ML and downstream systems. This is a contract / freelance role (2-3 days/week) working closely with our AI Solution Architect, Lead ML Engineer, and Integration Engineer.
Tasks
β’ Design and implement the end-to-end data pipeline for the project:
β Ingest PDF/Word reports from secure storage
β Run OCR / text extraction and layout parsing
β Normalise, structure, and validate the data
β Store outputs in a form ready for ML and integration.
β’ Evaluate and configure OCR / document AI services (e.g. Azure Form Recognizer or similar), and wrap them in robust, retry-safe, cost-aware scripts/services.
β’ Define and implement data contracts and schemas between ingestion, ML, and integration components (JSON/Parquet/relational as appropriate).
β’ Build quality checks and validation rules (field presence, format, range checks, duplicate detection, basic anomaly checks).
β’ Implement logging, monitoring, and lineage so every processed document can be traced from source > OCR > structured output > model input.
β’ Work with the ML Engineer to ensure the pipeline exposes exactly the features and metadata needed for training, evaluation, and explainability.
β’ Collaborate with the Integration Engineer to deliver clean batch or streaming feeds into the clientβs assessment system (API, CSV exports, or SFTP drop-zone).
β’ Follow good security and privacy practices in all pipelines: encryption, access control, least privilege, and redaction where needed.
β’ Contribute to infrastructure decisions (storage layout, job orchestration, simple CI/CD for data jobs).
β’ Document the pipeline clearly: architecture diagrams, table/field definitions, data dictionaries, operational runbooks.
Requirements
Must-have
β’ 3-5+ years of hands-on Data Engineering experience.
β’ Strong Python skills, including building and packaging data processing scripts or services.
β’ Practical experience with OCR / document processing (e.g. Tesseract, Azure Form Recognizer, AWS Textract, Google Document AI, or equivalent).
β’ Solid experience building ETL / ELT pipelines on a major cloud platform (ideally Azure, but AWS/GCP is fine if youβre comfortable switching).
β’ Good knowledge of data modelling and file formats (JSON, CSV, Parquet, relational schemas).
β’ Experience implementing data quality checks, logging, and monitoring for pipelines.
β’ Understanding of security and privacy basics: encryption at rest/in transit, access control, secure handling of potentially sensitive data.
β’ Comfortable working in a small, senior, remote team; able to take a loosely defined problem and design a clean, maintainable solution.
β’ Available for 2-3 days per week on a contract basis, working largely remotely in UK or close European time zones.
Nice-to-have
β’ Experience in healthcare, life sciences, diagnostics, or other regulated environments.
β’ Familiarity with Azure Data Factory, Azure Functions, Databricks, or similar orchestration/compute tools.
β’ Knowledge of basic MLOps concepts (feature stores, model input/output formats).
β’ Experience with SFTP-based exchanges and batch integrations with legacy systems.
Benefits
β’ Core impact role: you own the pipeline that makes the entire AI solution possible β without you, nothing moves.
β’ Meaningful domain: your work supports external quality assessment in human genetic testing for labs worldwide.
β’ Lean, senior team: work alongside experienced architects and ML engineers; minimal bureaucracy, direct access to decision-makers.
β’ Remote-first, flexible: work from anywhere compatible with UK hours, 2-3 days/week.
β’ Contract / freelance: competitive day rate, with potential extension into further phases and additional schemes if the pilot is successful.
β’ Opportunity to build reusable data pipeline components that Intelance will deploy across future AI engagements.
We review every application personally. If thereβs a good match, weβll invite you to a short call to walk through the project, expectations, and next steps.
Intelance is a specialist architecture and AI consultancy working with clients in regulated, high-trust environments (healthcare, pharma, life sciences, financial services). We are assembling a lean senior team to deliver an AI-assisted clinical report marking tool for a UK-based, UKAS-accredited organisation in human genetic testing.
We are looking for a Data Engineer (OCR & Pipelines) who can turn messy PDFs and documents into clean, reliable, auditable data flows for ML and downstream systems. This is a contract / freelance role (2-3 days/week) working closely with our AI Solution Architect, Lead ML Engineer, and Integration Engineer.
Tasks
β’ Design and implement the end-to-end data pipeline for the project:
β Ingest PDF/Word reports from secure storage
β Run OCR / text extraction and layout parsing
β Normalise, structure, and validate the data
β Store outputs in a form ready for ML and integration.
β’ Evaluate and configure OCR / document AI services (e.g. Azure Form Recognizer or similar), and wrap them in robust, retry-safe, cost-aware scripts/services.
β’ Define and implement data contracts and schemas between ingestion, ML, and integration components (JSON/Parquet/relational as appropriate).
β’ Build quality checks and validation rules (field presence, format, range checks, duplicate detection, basic anomaly checks).
β’ Implement logging, monitoring, and lineage so every processed document can be traced from source > OCR > structured output > model input.
β’ Work with the ML Engineer to ensure the pipeline exposes exactly the features and metadata needed for training, evaluation, and explainability.
β’ Collaborate with the Integration Engineer to deliver clean batch or streaming feeds into the clientβs assessment system (API, CSV exports, or SFTP drop-zone).
β’ Follow good security and privacy practices in all pipelines: encryption, access control, least privilege, and redaction where needed.
β’ Contribute to infrastructure decisions (storage layout, job orchestration, simple CI/CD for data jobs).
β’ Document the pipeline clearly: architecture diagrams, table/field definitions, data dictionaries, operational runbooks.
Requirements
Must-have
β’ 3-5+ years of hands-on Data Engineering experience.
β’ Strong Python skills, including building and packaging data processing scripts or services.
β’ Practical experience with OCR / document processing (e.g. Tesseract, Azure Form Recognizer, AWS Textract, Google Document AI, or equivalent).
β’ Solid experience building ETL / ELT pipelines on a major cloud platform (ideally Azure, but AWS/GCP is fine if youβre comfortable switching).
β’ Good knowledge of data modelling and file formats (JSON, CSV, Parquet, relational schemas).
β’ Experience implementing data quality checks, logging, and monitoring for pipelines.
β’ Understanding of security and privacy basics: encryption at rest/in transit, access control, secure handling of potentially sensitive data.
β’ Comfortable working in a small, senior, remote team; able to take a loosely defined problem and design a clean, maintainable solution.
β’ Available for 2-3 days per week on a contract basis, working largely remotely in UK or close European time zones.
Nice-to-have
β’ Experience in healthcare, life sciences, diagnostics, or other regulated environments.
β’ Familiarity with Azure Data Factory, Azure Functions, Databricks, or similar orchestration/compute tools.
β’ Knowledge of basic MLOps concepts (feature stores, model input/output formats).
β’ Experience with SFTP-based exchanges and batch integrations with legacy systems.
Benefits
β’ Core impact role: you own the pipeline that makes the entire AI solution possible β without you, nothing moves.
β’ Meaningful domain: your work supports external quality assessment in human genetic testing for labs worldwide.
β’ Lean, senior team: work alongside experienced architects and ML engineers; minimal bureaucracy, direct access to decision-makers.
β’ Remote-first, flexible: work from anywhere compatible with UK hours, 2-3 days/week.
β’ Contract / freelance: competitive day rate, with potential extension into further phases and additional schemes if the pilot is successful.
β’ Opportunity to build reusable data pipeline components that Intelance will deploy across future AI engagements.
We review every application personally. If thereβs a good match, weβll invite you to a short call to walk through the project, expectations, and next steps.




