Brooksource

LLM Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for an LLM Engineer specializing in geotechnical data digitization, offered as a remote contract for 8-10 weeks (potentially up to 1 year). Key skills include fine-tuning multi-modal LLMs, Python, ML frameworks, and geospatial data experience.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

Unknown

🗓️ - Date

November 7, 2025

🕒 - Duration

3 to 6 months

🏝️ - Location

Remote

📄 - Contract

Unknown

🔒 - Security

Unknown

📍 - Location detailed

United States

🧠 - Skills detailed

#MongoDB #AI (Artificial Intelligence) #ML (Machine Learning) #OpenCV (Open Source Computer Vision Library) #PyTorch #Datasets #Cloud #"ETL (Extract #Transform #Load)" #Python #Schema Design #JSON (JavaScript Object Notation) #Classification #Normalization #Spatial Data #TensorFlow #Documentation

Role description

LLM Engineer – Geotechnical Data Digitization Location: Remote Engagement Type: Contract (Full-Time, 40 hours/week) Duration: an initial 8-10 weeks, with strong likelihood for extension into future phases (Up to 1 year in length) Role Overview: We are seeking a highly skilled LLM Engineer to assist in the development of a multi-modal Large Language Model (LLM) pipeline for digitizing geotechnical bore log data. This role is critical to transforming unstructured PDF documents into structured, machine-readable JSON outputs that support downstream analytics, GIS integration, and AI-powered search. You will work closely with a Project Manager and technical stakeholders at our customer to build, fine-tune, and evaluate a custom LLM solution capable of interpreting complex geotechnical documents across multiple vendors. Key Responsibilities: Phase 1 – Pilot Development • Fine-tune a multi-modal LLM (e.g., Pixtral-12B, PaliGemma, Gemma 3) using annotated bore log PDFs and JSON samples. • Build preprocessing pipelines for: Page segmentation, Figure isolation, Normalization of units and soil classification. • Develop and implement an evaluation framework including Precision/Recall/F1, domain-specific metrics, and JSON schema conformance. Cross-Vendor Generalization • Test model generalization on bore logs from 3 additional vendors. • Identify and categorize failure cases. • Compare performance across vendors and recommend strategies for scaling. Pipeline Packaging & Handoff • Package preprocessing scripts, model artifacts, and evaluation dashboards into a reproducible workflow. • Deliver structured JSON outputs and final benchmark reports. • Provide all source code and documentation for handoff. Required Qualifications: • Proven experience fine-tuning and deploying multi-modal LLMs (e.g., Pixtral, LLaMA, Gemma, etc.) • Ollama/llama.ccp, mongodb/non-relational dbs, and ai coding tools (cursor/windsurf/co-pilot.) experience. • Experience using OSS models • Strong proficiency in Python and ML frameworks (e.g., PyTorch, TensorFlow) • Experience with OCR, image preprocessing (OpenCV), and document parsing • Familiarity with geospatial data and JSON schema design • Ability to work with GPU environments (e.g., A100s) and cloud-based training setups • Strong understanding of evaluation metrics and model benchmarking • Excellent communication and documentation skills Preferred Qualifications (nice to have): • Experience with geotechnical or engineering datasets • Familiarity with MongoDB, vector search, and embedding-based retrieval • Exposure to MLOps practices and CI/CD for ML pipelines • Prior work in AI document ingestion or enterprise-scale data transformation

Apply now Apply with DFH Sign up

← See all roles

Go to role

Navitas Partners, LLC

is hiring for a:

Brooksource

LLM Engineer

Senior Data Engineer (Data Modeler) 25-32293

Power BI Developer Contractor (Splunk)

Data Engineer, Product Analytics

Veeva RIM Expert - 25-31311

Book a

chat

with us

Company