

Synechron
Lead Observability & Analytics Engineer (Datadog)
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Lead Observability & Analytics Engineer (Datadog) with a contract length of "Unknown," offering a pay rate of "Unknown." Key skills required include Datadog, Splunk, and production support analytics, with 10+ years of relevant experience in enterprise payment or financial platforms.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
May 9, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Charlotte, NC
-
🧠 - Skills detailed
#DevOps #GitLab #Leadership #Monitoring #"ETL (Extract #Transform #Load)" #Visualization #Data Analysis #Observability #Splunk #Jira #Strategy #Alation #Consulting #Deployment #Trend Analysis #Data Science #Scala #JQL (Jira Query Language) #Cloud #Datadog #AI (Artificial Intelligence)
Role description
We are
At Synechron, we believe in the power of digital to transform businesses for the better. Our global consulting firm combines creativity and innovative technology to deliver industry-leading digital solutions. Synechron’s progressive technologies and optimization strategies span end-to-end Artificial Intelligence, Consulting, Digital, Cloud & DevOps, Data, and Software Engineering, servicing an array of noteworthy financial services and technology firms. Through research and development initiatives in our FinLabs we develop solutions for modernization, from Artificial Intelligence and Blockchain to Data Science models, Digital Underwriting, mobile-first applications and more. Over the last 20+ years, our company has been honored with multiple employer awards, recognizing our commitment to our talented teams. With top clients to boast about, Synechron has a global workforce of 14,500+, and has 58 offices in 21 countries within key global markets.
Our challenge
We are seeking a highly experienced Observability & Resiliency Engineer with deep expertise in leveraging Datadog as an operational intelligence and engineering observability platform within high-availability production environments.
This role is not focused on infrastructure administration, Datadog deployment, or cloud provisioning. Instead, we are looking for an engineer who can transform production monitoring data into actionable operational insights, resiliency improvements, executive visibility, and measurable reductions in incident impact.
The ideal candidate will have strong experience building operational dashboards, incident analytics, MTTR reporting, alerting strategies, production support analytics, and service reliability metrics across enterprise payment or financial platforms.
This individual will work closely with Engineering, SRE, DevOps, Production Support, Operations, and Leadership teams to improve operational visibility, accelerate recovery processes, reduce recurring incidents, and strengthen overall production resiliency.
Job responsibilities:
Observability & Monitoring Engineering
• Design and develop meaningful operational dashboards using Datadog and Splunk for engineering teams, operations, and executive leadership.
• Build actionable monitoring and alerting frameworks focused on reducing incident response time and improving operational awareness.
• Develop service health dashboards and resiliency scorecards across payment rails and critical services.
• Define monitoring standards, alert thresholds, escalation logic, and operational visibility best practices.
• Improve signal-to-noise ratio by reducing alert fatigue and improving alert quality.
Production Incident Analytics
• Analyze production incidents, recurring outages, and operational patterns to identify resiliency gaps and reliability risks.
• Create MTTR tracking dashboards and operational KPI reporting.
• Build incident trend analysis dashboards to identify recurring failures by service, payment rail, or platform component.
• Convert monitoring insights into prioritized resiliency and operational improvement initiatives.
• Partner with engineering teams to improve operational recovery workflows using observability data.
Release & CI/CD Observability
• Build release monitoring dashboards and deployment health analytics.
• Generate CI/CD operational metrics and reporting from GitLab and related DevOps tooling.
• Analyze deployment trends, release stability, rollback frequency, and change failure rates.
• Improve release visibility and operational readiness reporting.
Operational Visibility & Executive Reporting
• Create executive-level operational dashboards focused on:
• Service reliability
• Incident trends
• Recovery performance
• MTTR improvements
• Operational risk visibility
• Provide operational insights that support leadership decision-making and platform reliability initiatives.
Resiliency & Reliability Engineering
• Build service reliability metrics and operational health indicators.
• Track platform resiliency improvements using monitoring and production analytics.
• Support RCA initiatives through monitoring insights and operational data analysis.
• Improve recovery readiness through actionable observability strategies and operational tooling.
Cross-Functional Collaboration
Collaborate closely with:
• Production Support teams
• SRE teams
• Engineering teams
• DevOps teams
• Operations leadership
• Drive adoption of monitoring best practices and observability standards across teams.
• Participate in operational reviews, resiliency initiatives, and continuous improvement programs.
Technical Experience
• 10 + years of experience in production support, observability engineering, SRE, or resiliency engineering roles.
Strong hands-on expertise with:
• Datadog
• Splunk
• Operational dashboards
• Monitoring strategy
• Alert engineering
• Proven experience creating:
• MTTR dashboards
• Incident analytics dashboards
• Executive operational reporting
• Service health visualizations
• Release monitoring dashboards
• CI/CD metrics reporting
• Strong experience analyzing production incidents and converting findings into operational improvements.
• Experience building service reliability metrics and operational KPIs.
Strong understanding of:
• Incident management
• RCA methodologies
• Operational resiliency
• Production stability practices
• Experience with:
• GitLab
• JQL (Jira Query Language)
• CI/CD operational reporting
• Monitoring data analysis
• Strong analytical and troubleshooting capabilities in enterprise production environments.
We offer:
• A highly competitive compensation and benefits package.
• A multinational organization with 58 offices in 21 countries and the possibility to work abroad.
• 10 days of paid annual leave (plus sick leave and national holidays).
• Maternity & paternity leave plans.
• A comprehensive insurance plan including medical, dental, vision, life insurance, and long-/short-term disability (plans vary by region).
• Retirement savings plans.
• A higher education certification policy.
• Commuter benefits (varies by region).
• Extensive training opportunities, focused on skills, substantive knowledge, and personal development.
• On-demand Udemy for Business for all Synechron employees with free access to more than 5000 curated courses.
• Coaching opportunities with experienced colleagues from our Financial Innovation Labs (FinLabs) and Center of Excellences (CoE) groups.
• Cutting edge projects at the world’s leading tier-one banks, financial institutions and insurance firms.
• A flat and approachable organization.
• A truly diverse, fun-loving, and global work culture.
SYNECHRON’S DIVERSITY & INCLUSION STATEMENT
Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.
All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.
We are
At Synechron, we believe in the power of digital to transform businesses for the better. Our global consulting firm combines creativity and innovative technology to deliver industry-leading digital solutions. Synechron’s progressive technologies and optimization strategies span end-to-end Artificial Intelligence, Consulting, Digital, Cloud & DevOps, Data, and Software Engineering, servicing an array of noteworthy financial services and technology firms. Through research and development initiatives in our FinLabs we develop solutions for modernization, from Artificial Intelligence and Blockchain to Data Science models, Digital Underwriting, mobile-first applications and more. Over the last 20+ years, our company has been honored with multiple employer awards, recognizing our commitment to our talented teams. With top clients to boast about, Synechron has a global workforce of 14,500+, and has 58 offices in 21 countries within key global markets.
Our challenge
We are seeking a highly experienced Observability & Resiliency Engineer with deep expertise in leveraging Datadog as an operational intelligence and engineering observability platform within high-availability production environments.
This role is not focused on infrastructure administration, Datadog deployment, or cloud provisioning. Instead, we are looking for an engineer who can transform production monitoring data into actionable operational insights, resiliency improvements, executive visibility, and measurable reductions in incident impact.
The ideal candidate will have strong experience building operational dashboards, incident analytics, MTTR reporting, alerting strategies, production support analytics, and service reliability metrics across enterprise payment or financial platforms.
This individual will work closely with Engineering, SRE, DevOps, Production Support, Operations, and Leadership teams to improve operational visibility, accelerate recovery processes, reduce recurring incidents, and strengthen overall production resiliency.
Job responsibilities:
Observability & Monitoring Engineering
• Design and develop meaningful operational dashboards using Datadog and Splunk for engineering teams, operations, and executive leadership.
• Build actionable monitoring and alerting frameworks focused on reducing incident response time and improving operational awareness.
• Develop service health dashboards and resiliency scorecards across payment rails and critical services.
• Define monitoring standards, alert thresholds, escalation logic, and operational visibility best practices.
• Improve signal-to-noise ratio by reducing alert fatigue and improving alert quality.
Production Incident Analytics
• Analyze production incidents, recurring outages, and operational patterns to identify resiliency gaps and reliability risks.
• Create MTTR tracking dashboards and operational KPI reporting.
• Build incident trend analysis dashboards to identify recurring failures by service, payment rail, or platform component.
• Convert monitoring insights into prioritized resiliency and operational improvement initiatives.
• Partner with engineering teams to improve operational recovery workflows using observability data.
Release & CI/CD Observability
• Build release monitoring dashboards and deployment health analytics.
• Generate CI/CD operational metrics and reporting from GitLab and related DevOps tooling.
• Analyze deployment trends, release stability, rollback frequency, and change failure rates.
• Improve release visibility and operational readiness reporting.
Operational Visibility & Executive Reporting
• Create executive-level operational dashboards focused on:
• Service reliability
• Incident trends
• Recovery performance
• MTTR improvements
• Operational risk visibility
• Provide operational insights that support leadership decision-making and platform reliability initiatives.
Resiliency & Reliability Engineering
• Build service reliability metrics and operational health indicators.
• Track platform resiliency improvements using monitoring and production analytics.
• Support RCA initiatives through monitoring insights and operational data analysis.
• Improve recovery readiness through actionable observability strategies and operational tooling.
Cross-Functional Collaboration
Collaborate closely with:
• Production Support teams
• SRE teams
• Engineering teams
• DevOps teams
• Operations leadership
• Drive adoption of monitoring best practices and observability standards across teams.
• Participate in operational reviews, resiliency initiatives, and continuous improvement programs.
Technical Experience
• 10 + years of experience in production support, observability engineering, SRE, or resiliency engineering roles.
Strong hands-on expertise with:
• Datadog
• Splunk
• Operational dashboards
• Monitoring strategy
• Alert engineering
• Proven experience creating:
• MTTR dashboards
• Incident analytics dashboards
• Executive operational reporting
• Service health visualizations
• Release monitoring dashboards
• CI/CD metrics reporting
• Strong experience analyzing production incidents and converting findings into operational improvements.
• Experience building service reliability metrics and operational KPIs.
Strong understanding of:
• Incident management
• RCA methodologies
• Operational resiliency
• Production stability practices
• Experience with:
• GitLab
• JQL (Jira Query Language)
• CI/CD operational reporting
• Monitoring data analysis
• Strong analytical and troubleshooting capabilities in enterprise production environments.
We offer:
• A highly competitive compensation and benefits package.
• A multinational organization with 58 offices in 21 countries and the possibility to work abroad.
• 10 days of paid annual leave (plus sick leave and national holidays).
• Maternity & paternity leave plans.
• A comprehensive insurance plan including medical, dental, vision, life insurance, and long-/short-term disability (plans vary by region).
• Retirement savings plans.
• A higher education certification policy.
• Commuter benefits (varies by region).
• Extensive training opportunities, focused on skills, substantive knowledge, and personal development.
• On-demand Udemy for Business for all Synechron employees with free access to more than 5000 curated courses.
• Coaching opportunities with experienced colleagues from our Financial Innovation Labs (FinLabs) and Center of Excellences (CoE) groups.
• Cutting edge projects at the world’s leading tier-one banks, financial institutions and insurance firms.
• A flat and approachable organization.
• A truly diverse, fun-loving, and global work culture.
SYNECHRON’S DIVERSITY & INCLUSION STATEMENT
Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.
All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.






