

AIops Engineer
β - Featured Role | Apply direct with Data Freelance Hub
This role is for an AIOps Engineer with 10 years of experience, offering a hybrid contract in Frisco, TX. Key skills include AI/ML frameworks, monitoring tools, and automation scripting. A background in DevOps/SRE is preferred. Pay rate is unspecified.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
-
ποΈ - Date discovered
June 18, 2025
π - Project duration
Unknown
-
ποΈ - Location type
Hybrid
-
π - Contract type
Unknown
-
π - Security clearance
Unknown
-
π - Location detailed
Frisco, TX
-
π§ - Skills detailed
#Scala #Scripting #Cloud #Automation #Anomaly Detection #Prometheus #Docker #TensorFlow #Visualization #Logging #AI (Artificial Intelligence) #Ansible #ML (Machine Learning) #Kubernetes #PyTorch #Grafana #Bash #DevOps #Monitoring #Data Processing #Observability #Python #Splunk
Role description
Heading 1
Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Block quote
Ordered list
- Item 1
- Item 2
- Item 3
Unordered list
- Item A
- Item B
- Item C
Bold text
Emphasis
Superscript
Subscript
Role : AIOps Engineer
Contract
Location : Frisco TX -Hybrid
Experience : 10 Years
Job Overview: The AIOps Engineer is responsible for integrating machine learning and advanced analytics into our existing monitoring and logging systems. This role will leverage artificial intelligence to automate routine operational tasks, detect anomalies proactively, and implement self-healing frameworks to enhance the stability and performance of our infrastructure. The ideal candidate will be proactive in identifying gaps, creating strategic roadmaps, and implementing phased improvements to achieve operational excellence.
Key Responsibilities:
Apply machine learning algorithms to existing operational data (logs, metrics, events) to predict system failures and proactively address potential incidents.
Implement automation for routine DevOps practices including automated scaling, resource optimization, and controlled restarts.
Develop and maintain self-healing systems to reduce manual intervention and enhance system reliability.
Build anomaly detection models to quickly identify and address unusual operational patterns.
Collaborate closely with SREs, developers, and infrastructure teams to continuously enhance the operational stability and performance of the system.
Provide insights and improvements through visualizations and reports leveraging AI-driven analytics.
Create a phased roadmap to incrementally enhance operational capabilities and align with strategic business goals.
Required Skills and Qualifications:
Strong experience with AI/ML frameworks and tools (e.g., TensorFlow, PyTorch, scikit-learn).
Proficiency in data processing and analytics tools (e.g., Splunk, Prometheus, Grafana, ELK stack).
Solid background in scripting and automation (Python, Bash, Ansible, etc.).
Experience with cloud environments and infrastructure automation.
Proven track record in implementing proactive monitoring, anomaly detection, and self-healing techniques.
Excellent analytical, problem-solving, and strategic planning skills.
Strong communication skills and the ability to effectively collaborate across teams.
Preferred Experience:
Background in DevOps/Site Reliability Engineering.
Familiarity with containerization and orchestration platforms (Kubernetes, Docker).
Experience in building scalable, distributed systems.
This role is pivotal in enabling our organization to achieve and sustain Operational Excellence through intelligent automation and proactive monitoring practices.
Short summary: An experienced SRE with knowledge of how to implement AI/ML
Mandatory Skills(Only 3-4)Machine Learning & AI Frameworks(e.g., TensorFlow, PyTorch, scikit-learn) Monitoring & Observability Tools(e.g., Splunk, Prometheus, Grafana, ELK Stack) Automation & Scripting(e.g., Python, Bash, Ansible)DevOps / SRE Background