

Brooksource
LLM Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for an LLM Engineer specializing in geotechnical data digitization, offered as a remote contract for 8-10 weeks (potentially up to 1 year). Key skills include fine-tuning multi-modal LLMs, Python, ML frameworks, and geospatial data experience.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
November 7, 2025
🕒 - Duration
3 to 6 months
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#MongoDB #AI (Artificial Intelligence) #ML (Machine Learning) #OpenCV (Open Source Computer Vision Library) #PyTorch #Datasets #Cloud #"ETL (Extract #Transform #Load)" #Python #Schema Design #JSON (JavaScript Object Notation) #Classification #Normalization #Spatial Data #TensorFlow #Documentation
Role description
LLM Engineer – Geotechnical Data Digitization
Location: Remote
Engagement Type: Contract (Full-Time, 40 hours/week)
Duration: an initial 8-10 weeks, with strong likelihood for extension into future phases (Up to 1 year in length)
Role Overview:
We are seeking a highly skilled LLM Engineer to assist in the development of a multi-modal Large Language Model (LLM) pipeline for digitizing geotechnical bore log data. This role is critical to transforming unstructured PDF documents into structured, machine-readable JSON outputs that support downstream analytics, GIS integration, and AI-powered search.
You will work closely with a Project Manager and technical stakeholders at our customer to build, fine-tune, and evaluate a custom LLM solution capable of interpreting complex geotechnical documents across multiple vendors.
Key Responsibilities:
Phase 1 –
Pilot Development
• Fine-tune a multi-modal LLM (e.g., Pixtral-12B, PaliGemma, Gemma 3) using annotated bore log PDFs and JSON samples.
• Build preprocessing pipelines for: Page segmentation, Figure isolation, Normalization of units and soil classification.
• Develop and implement an evaluation framework including Precision/Recall/F1, domain-specific metrics, and JSON schema conformance.
Cross-Vendor Generalization
• Test model generalization on bore logs from 3 additional vendors.
• Identify and categorize failure cases.
• Compare performance across vendors and recommend strategies for scaling.
Pipeline Packaging & Handoff
• Package preprocessing scripts, model artifacts, and evaluation dashboards into a reproducible workflow.
• Deliver structured JSON outputs and final benchmark reports.
• Provide all source code and documentation for handoff.
Required Qualifications:
• Proven experience fine-tuning and deploying multi-modal LLMs (e.g., Pixtral, LLaMA, Gemma, etc.)
• Ollama/llama.ccp, mongodb/non-relational dbs, and ai coding tools (cursor/windsurf/co-pilot.) experience.
• Experience using OSS models
• Strong proficiency in Python and ML frameworks (e.g., PyTorch, TensorFlow)
• Experience with OCR, image preprocessing (OpenCV), and document parsing
• Familiarity with geospatial data and JSON schema design
• Ability to work with GPU environments (e.g., A100s) and cloud-based training setups
• Strong understanding of evaluation metrics and model benchmarking
• Excellent communication and documentation skills
Preferred Qualifications (nice to have):
• Experience with geotechnical or engineering datasets
• Familiarity with MongoDB, vector search, and embedding-based retrieval
• Exposure to MLOps practices and CI/CD for ML pipelines
• Prior work in AI document ingestion or enterprise-scale data transformation
LLM Engineer – Geotechnical Data Digitization
Location: Remote
Engagement Type: Contract (Full-Time, 40 hours/week)
Duration: an initial 8-10 weeks, with strong likelihood for extension into future phases (Up to 1 year in length)
Role Overview:
We are seeking a highly skilled LLM Engineer to assist in the development of a multi-modal Large Language Model (LLM) pipeline for digitizing geotechnical bore log data. This role is critical to transforming unstructured PDF documents into structured, machine-readable JSON outputs that support downstream analytics, GIS integration, and AI-powered search.
You will work closely with a Project Manager and technical stakeholders at our customer to build, fine-tune, and evaluate a custom LLM solution capable of interpreting complex geotechnical documents across multiple vendors.
Key Responsibilities:
Phase 1 –
Pilot Development
• Fine-tune a multi-modal LLM (e.g., Pixtral-12B, PaliGemma, Gemma 3) using annotated bore log PDFs and JSON samples.
• Build preprocessing pipelines for: Page segmentation, Figure isolation, Normalization of units and soil classification.
• Develop and implement an evaluation framework including Precision/Recall/F1, domain-specific metrics, and JSON schema conformance.
Cross-Vendor Generalization
• Test model generalization on bore logs from 3 additional vendors.
• Identify and categorize failure cases.
• Compare performance across vendors and recommend strategies for scaling.
Pipeline Packaging & Handoff
• Package preprocessing scripts, model artifacts, and evaluation dashboards into a reproducible workflow.
• Deliver structured JSON outputs and final benchmark reports.
• Provide all source code and documentation for handoff.
Required Qualifications:
• Proven experience fine-tuning and deploying multi-modal LLMs (e.g., Pixtral, LLaMA, Gemma, etc.)
• Ollama/llama.ccp, mongodb/non-relational dbs, and ai coding tools (cursor/windsurf/co-pilot.) experience.
• Experience using OSS models
• Strong proficiency in Python and ML frameworks (e.g., PyTorch, TensorFlow)
• Experience with OCR, image preprocessing (OpenCV), and document parsing
• Familiarity with geospatial data and JSON schema design
• Ability to work with GPU environments (e.g., A100s) and cloud-based training setups
• Strong understanding of evaluation metrics and model benchmarking
• Excellent communication and documentation skills
Preferred Qualifications (nice to have):
• Experience with geotechnical or engineering datasets
• Familiarity with MongoDB, vector search, and embedding-based retrieval
• Exposure to MLOps practices and CI/CD for ML pipelines
• Prior work in AI document ingestion or enterprise-scale data transformation






