

Senior Machine Learning Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior Machine Learning Engineer on a 6–12 month contract, offering $225 - $275/hour. Key skills include Python, ML systems, and tokenization algorithms. Experience in data pipelines and multilingual challenges is preferred.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
275
-
🗓️ - Date discovered
September 5, 2025
🕒 - Project duration
More than 6 months
-
🏝️ - Location type
Unknown
-
📄 - Contract type
Unknown
-
🔒 - Security clearance
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#Data Pipeline #Python #Documentation #ML (Machine Learning) #Data Processing #Monitoring #"ETL (Extract #Transform #Load)" #Debugging #AI (Artificial Intelligence)
Role description
We are looking for an experienced Machine Learning Systems Engineer to join our Encodings and Tokenization efforts. This cross-functional role focuses on developing and optimizing encoding and tokenization systems that support large-scale model training workflows. Acting as a bridge between pretraining and finetuning pipelines, you will build and refine critical infrastructure that directly influences how models process, learn from, and interpret data. Your contributions will be foundational to advancing research, improving efficiency, and ensuring that AI systems remain reliable, interpretable, and adaptable.
Key Responsibilities
• Design, develop, and maintain tokenization systems for large-scale model training workflows.
• Optimize encoding techniques to enhance training efficiency and model performance.
• Collaborate with research teams to understand evolving requirements for data representation.
• Build infrastructure that supports experimentation with novel tokenization approaches.
• Implement monitoring and debugging tools for tokenization-related issues in training pipelines.
• Develop robust testing frameworks for tokenization across diverse data types and languages.
• Identify and resolve bottlenecks in data processing pipelines.
• Produce clear documentation and communicate technical decisions across teams.
Qualifications
• Strong software engineering background with applied machine learning experience.
• Proficiency in Python and familiarity with modern ML development practices.
• Experience with ML systems, data pipelines, or ML infrastructure.
• Strong analytical skills with the ability to evaluate the impact of engineering changes.
• Comfortable working in dynamic, research-driven environments.
• Ability to work independently while contributing effectively in cross-functional teams.
• Results-driven, with flexibility to adapt to evolving priorities.
Preferred Experience
• Experience with machine learning data processing pipelines.
• Knowledge of tokenization algorithms (BPE, WordPiece, etc.).
• Performance optimization of ML data systems.
• Handling multilingual tokenization challenges.
• Familiarity with transformer-based architectures and LLM workflows (not required).
• Exposure to distributed systems and parallel computing for ML training.
• Experience in research settings where engineering enables scientific progress.
Contract Details
• Type: Temporary/Contract (6–12 months, possibility of extension)
• Engagement: Full-time
• Compensation: $225 - $275 / hour
We are looking for an experienced Machine Learning Systems Engineer to join our Encodings and Tokenization efforts. This cross-functional role focuses on developing and optimizing encoding and tokenization systems that support large-scale model training workflows. Acting as a bridge between pretraining and finetuning pipelines, you will build and refine critical infrastructure that directly influences how models process, learn from, and interpret data. Your contributions will be foundational to advancing research, improving efficiency, and ensuring that AI systems remain reliable, interpretable, and adaptable.
Key Responsibilities
• Design, develop, and maintain tokenization systems for large-scale model training workflows.
• Optimize encoding techniques to enhance training efficiency and model performance.
• Collaborate with research teams to understand evolving requirements for data representation.
• Build infrastructure that supports experimentation with novel tokenization approaches.
• Implement monitoring and debugging tools for tokenization-related issues in training pipelines.
• Develop robust testing frameworks for tokenization across diverse data types and languages.
• Identify and resolve bottlenecks in data processing pipelines.
• Produce clear documentation and communicate technical decisions across teams.
Qualifications
• Strong software engineering background with applied machine learning experience.
• Proficiency in Python and familiarity with modern ML development practices.
• Experience with ML systems, data pipelines, or ML infrastructure.
• Strong analytical skills with the ability to evaluate the impact of engineering changes.
• Comfortable working in dynamic, research-driven environments.
• Ability to work independently while contributing effectively in cross-functional teams.
• Results-driven, with flexibility to adapt to evolving priorities.
Preferred Experience
• Experience with machine learning data processing pipelines.
• Knowledge of tokenization algorithms (BPE, WordPiece, etc.).
• Performance optimization of ML data systems.
• Handling multilingual tokenization challenges.
• Familiarity with transformer-based architectures and LLM workflows (not required).
• Exposure to distributed systems and parallel computing for ML training.
• Experience in research settings where engineering enables scientific progress.
Contract Details
• Type: Temporary/Contract (6–12 months, possibility of extension)
• Engagement: Full-time
• Compensation: $225 - $275 / hour