Intelance

Data Engineer (OCR & Data Pipelines, Contract)

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer (OCR & Data Pipelines) on a contract basis (2-3 days/week) with a competitive pay rate. Key requirements include 3-5+ years of Data Engineering experience, strong Python skills, and practical OCR experience, preferably in healthcare or regulated environments. Remote work is available.
🌎 - Country
United Kingdom
πŸ’± - Currency
Β£ GBP
-
πŸ’° - Day rate
750
-
πŸ—“οΈ - Date
November 22, 2025
πŸ•’ - Duration
Unknown
-
🏝️ - Location
Remote
-
πŸ“„ - Contract
Unknown
-
πŸ”’ - Security
Unknown
-
πŸ“ - Location detailed
United Kingdom
-
🧠 - Skills detailed
#Data Processing #Databricks #Monitoring #Metadata #Azure #AI (Artificial Intelligence) #Lean #ADF (Azure Data Factory) #REST (Representational State Transfer) #Data Engineering #Security #Data Pipeline #AWS (Amazon Web Services) #API (Application Programming Interface) #JSON (JavaScript Object Notation) #Data Quality #Azure Data Factory #ML (Machine Learning) #Logging #Python #Storage #Cloud #Batch #GCP (Google Cloud Platform) #"ETL (Extract #Transform #Load)"
Role description
Intelance is a specialist architecture and AI consultancy working with clients in regulated, high-trust environments (healthcare, pharma, life sciences, financial services). We are assembling a lean senior team to deliver an AI-assisted clinical report marking tool for a UK-based, UKAS-accredited organisation in human genetic testing. We are looking for a Data Engineer (OCR & Pipelines) who can turn messy PDFs and documents into clean, reliable, auditable data flows for ML and downstream systems. This is a contract / freelance role (2-3 days/week) working closely with our AI Solution Architect, Lead ML Engineer, and Integration Engineer. Tasks β€’ Design and implement the end-to-end data pipeline for the project: β—‹ Ingest PDF/Word reports from secure storage β—‹ Run OCR / text extraction and layout parsing β—‹ Normalise, structure, and validate the data β—‹ Store outputs in a form ready for ML and integration. β€’ Evaluate and configure OCR / document AI services (e.g. Azure Form Recognizer or similar), and wrap them in robust, retry-safe, cost-aware scripts/services. β€’ Define and implement data contracts and schemas between ingestion, ML, and integration components (JSON/Parquet/relational as appropriate). β€’ Build quality checks and validation rules (field presence, format, range checks, duplicate detection, basic anomaly checks). β€’ Implement logging, monitoring, and lineage so every processed document can be traced from source > OCR > structured output > model input. β€’ Work with the ML Engineer to ensure the pipeline exposes exactly the features and metadata needed for training, evaluation, and explainability. β€’ Collaborate with the Integration Engineer to deliver clean batch or streaming feeds into the client’s assessment system (API, CSV exports, or SFTP drop-zone). β€’ Follow good security and privacy practices in all pipelines: encryption, access control, least privilege, and redaction where needed. β€’ Contribute to infrastructure decisions (storage layout, job orchestration, simple CI/CD for data jobs). β€’ Document the pipeline clearly: architecture diagrams, table/field definitions, data dictionaries, operational runbooks. Requirements Must-have β€’ 3-5+ years of hands-on Data Engineering experience. β€’ Strong Python skills, including building and packaging data processing scripts or services. β€’ Practical experience with OCR / document processing (e.g. Tesseract, Azure Form Recognizer, AWS Textract, Google Document AI, or equivalent). β€’ Solid experience building ETL / ELT pipelines on a major cloud platform (ideally Azure, but AWS/GCP is fine if you’re comfortable switching). β€’ Good knowledge of data modelling and file formats (JSON, CSV, Parquet, relational schemas). β€’ Experience implementing data quality checks, logging, and monitoring for pipelines. β€’ Understanding of security and privacy basics: encryption at rest/in transit, access control, secure handling of potentially sensitive data. β€’ Comfortable working in a small, senior, remote team; able to take a loosely defined problem and design a clean, maintainable solution. β€’ Available for 2-3 days per week on a contract basis, working largely remotely in UK or close European time zones. Nice-to-have β€’ Experience in healthcare, life sciences, diagnostics, or other regulated environments. β€’ Familiarity with Azure Data Factory, Azure Functions, Databricks, or similar orchestration/compute tools. β€’ Knowledge of basic MLOps concepts (feature stores, model input/output formats). β€’ Experience with SFTP-based exchanges and batch integrations with legacy systems. Benefits β€’ Core impact role: you own the pipeline that makes the entire AI solution possible – without you, nothing moves. β€’ Meaningful domain: your work supports external quality assessment in human genetic testing for labs worldwide. β€’ Lean, senior team: work alongside experienced architects and ML engineers; minimal bureaucracy, direct access to decision-makers. β€’ Remote-first, flexible: work from anywhere compatible with UK hours, 2-3 days/week. β€’ Contract / freelance: competitive day rate, with potential extension into further phases and additional schemes if the pilot is successful. β€’ Opportunity to build reusable data pipeline components that Intelance will deploy across future AI engagements. We review every application personally. If there’s a good match, we’ll invite you to a short call to walk through the project, expectations, and next steps.