

NineTech
Artificial Intelligence Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for an "Artificial Intelligence Engineer" on a 6-month remote contract, offering a pay rate of "unknown." Requires 10 years of experience in technical support, strong L1/L2 support knowledge, and familiarity with AI/ML workflows and cloud platforms.
🌎 - Country
United Kingdom
💱 - Currency
£ GBP
-
💰 - Day rate
640
-
🗓️ - Date
November 5, 2025
🕒 - Duration
More than 6 months
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
United Kingdom
-
🧠 - Skills detailed
#ML (Machine Learning) #Automation #AI (Artificial Intelligence) #Ansible #Scala #Data Science #Linux #Splunk #Data Pipeline #Logging #Security #Documentation #Grafana #Monitoring #Alation #Observability #Kubernetes #Base #Cloud
Role description
Role: AI Platform Engineer (6-Month Contract)
Location: Remote
Contract Type: 6-Month Contract
Overview
We are seeking an experienced AI Platform Engineer to provide Level 1 and Level 2 operational support for our enterprise AI platform. This customer-facing role involves technical troubleshooting, proactive platform management, and close collaboration with vendor engineering teams to ensure seamless and reliable AI platform operations.
Key Responsibilities
Operational Support
• Provide L1 support for customer-reported issues and service requests.
• Deliver L2 troubleshooting by diagnosing, replicating, and resolving issues across platform components and underlying infrastructure.
• Coordinate L3 escalations with vendor product and engineering teams, tracking responses and resolutions.
• Monitor system health, alerts, and customer usage patterns to proactively identify potential issues.
• Maintain detailed documentation, knowledge base articles, and support procedures.
• Automate recurring operational tasks and fixes to improve efficiency.
• Support tooling integration and configuration to enhance monitoring, reporting, and performance.
• Assist customers with onboarding, configuration, and platform best practices.
• Collaborate with infrastructure, platform, and application teams to resolve integration and interoperability issues.
• Ensure adherence to SLAs, uptime targets, and customer satisfaction goals.
• Provide reporting on platform usage, workflows, and billing insights for stakeholders.
Technical Responsibilities
• Cluster Infrastructure Management: Administer and support GPU cluster infrastructure.
• High Availability & Resilience: Implement failover and redundancy strategies to ensure minimal downtime.
• Resource Optimization: Manage GPU resource partitioning, workload scheduling, and capacity planning.
• Performance Monitoring: Utilize HPE tools for real-time monitoring, diagnostics, and tuning.
• Incident Response: Address node failures, driver issues, and networking incidents; escalate to vendors when needed.
• Security & Access Control: Manage RBAC, user permissions, platform hardening, and data protection measures.
Required Skills & Experience
• 10 years of experience in technical support, systems engineering, or platform operations.
• Strong knowledge of L1/L2 support processes, including ticketing, escalation, and troubleshooting workflows.
• Familiarity with cloud-based platforms, APIs, and distributed systems.
• Understanding of AI/ML workflows (model training, inference, data pipelines).
• Experience with monitoring and logging tools (e.g., Grafana, Kibana, Splunk).
• Excellent communication and customer engagement skills.
• Working knowledge of ML engineering and data science toolchains to optimize user experience.
Core Technical Skills
• System Administration: RHEL/CentOS, Ubuntu, Linux kernel tuning.
• Automation & Orchestration: Ansible, Kubernetes, container management.
• GPU & AI Tooling
• Automation, Monitoring & Security: Experience delivering GPU-as-a-Service with appropriate observability and controls.
Role: AI Platform Engineer (6-Month Contract)
Location: Remote
Contract Type: 6-Month Contract
Overview
We are seeking an experienced AI Platform Engineer to provide Level 1 and Level 2 operational support for our enterprise AI platform. This customer-facing role involves technical troubleshooting, proactive platform management, and close collaboration with vendor engineering teams to ensure seamless and reliable AI platform operations.
Key Responsibilities
Operational Support
• Provide L1 support for customer-reported issues and service requests.
• Deliver L2 troubleshooting by diagnosing, replicating, and resolving issues across platform components and underlying infrastructure.
• Coordinate L3 escalations with vendor product and engineering teams, tracking responses and resolutions.
• Monitor system health, alerts, and customer usage patterns to proactively identify potential issues.
• Maintain detailed documentation, knowledge base articles, and support procedures.
• Automate recurring operational tasks and fixes to improve efficiency.
• Support tooling integration and configuration to enhance monitoring, reporting, and performance.
• Assist customers with onboarding, configuration, and platform best practices.
• Collaborate with infrastructure, platform, and application teams to resolve integration and interoperability issues.
• Ensure adherence to SLAs, uptime targets, and customer satisfaction goals.
• Provide reporting on platform usage, workflows, and billing insights for stakeholders.
Technical Responsibilities
• Cluster Infrastructure Management: Administer and support GPU cluster infrastructure.
• High Availability & Resilience: Implement failover and redundancy strategies to ensure minimal downtime.
• Resource Optimization: Manage GPU resource partitioning, workload scheduling, and capacity planning.
• Performance Monitoring: Utilize HPE tools for real-time monitoring, diagnostics, and tuning.
• Incident Response: Address node failures, driver issues, and networking incidents; escalate to vendors when needed.
• Security & Access Control: Manage RBAC, user permissions, platform hardening, and data protection measures.
Required Skills & Experience
• 10 years of experience in technical support, systems engineering, or platform operations.
• Strong knowledge of L1/L2 support processes, including ticketing, escalation, and troubleshooting workflows.
• Familiarity with cloud-based platforms, APIs, and distributed systems.
• Understanding of AI/ML workflows (model training, inference, data pipelines).
• Experience with monitoring and logging tools (e.g., Grafana, Kibana, Splunk).
• Excellent communication and customer engagement skills.
• Working knowledge of ML engineering and data science toolchains to optimize user experience.
Core Technical Skills
• System Administration: RHEL/CentOS, Ubuntu, Linux kernel tuning.
• Automation & Orchestration: Ansible, Kubernetes, container management.
• GPU & AI Tooling
• Automation, Monitoring & Security: Experience delivering GPU-as-a-Service with appropriate observability and controls.






