

Senior SRE Engineer
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior SRE Engineer on a contract-to-perm basis in Hartford, CT (Hybrid). Requires 5+ years of SRE/DevOps experience, expertise in Grafana, Prometheus, GCP, and strong skills in Python, Kubernetes, and API management.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
-
ποΈ - Date discovered
September 24, 2025
π - Project duration
Unknown
-
ποΈ - Location type
Hybrid
-
π - Contract type
Unknown
-
π - Security clearance
Unknown
-
π - Location detailed
Hartford, CT
-
π§ - Skills detailed
#Prometheus #AI (Artificial Intelligence) #Monitoring #DevOps #Cloud #Logging #YAML (YAML Ain't Markup Language) #Docker #Splunk #Python #GIT #JSON (JavaScript Object Notation) #Automation #Security #Observability #Java #Grafana #Kubernetes #API (Application Programming Interface) #Linux #Deployment #Bash #GCP (Google Cloud Platform)
Role description
Hartford, CT (Hybrid)
Contract to perm role
MUST HAVE: GRAFANA, PROMETHEUS, CLOUD exp (GCP desired), LOGGING, TRACING, WEB PORTAL Metrics
Looking for a senior level resource that can grow into lead role as team expands
Job Description
β’ Design and implement comprehensive SRE monitoring for web portal on GCP
β’ Set up JVM metrics collection and performance monitoring for Java applications using GCP Monitoring
β’ Implement logging and tracing standards across all portal components using Cloud Logging and Cloud Trace
β’ Configure APIGEE monitoring and API performance tracking for portal services
β’ Implement distributed tracing with W3C Trace Context headers and OpenTelemetry
β’ Create drill-down dashboards with correlation between metrics, logs, and traces using GCP tools
β’ Integrate GCP Monitoring, Logging, and Trace with existing Prometheus/Grafana stack
β’ Configure GMP (Google Managed Prometheus) for enhanced metrics collection
β’ Implement UI zero code instrumentation for frontend monitoring and traceability
β’ Create RED (Request, Error, Duration) dashboards for Performance and Production environments
β’ Build service health dashboards with drill-down capabilities and error message analysis
β’ Develop and maintain SRE automation/scripts within GKE namespaces (SRE and others) for monitoring, deployment, and troubleshooting.
Experience: 5+ years in SRE/DevOps with proven JVM, APIGEE, GCP observability, Grafana stack, GKE, OpenTelemetry, and UI instrumentation implementation experience
Clear Skills Needed
β’ Technical: Python, Linux, Prometheus, Grafana, Kubernetes, Docker, Loki, Tempo
β’ JVM Metrics: Java application monitoring, JVM performance tuning, heap analysis, garbage collection optimization for portal applications
β’ Logging & Tracing: Splunk, distributed tracing, log aggregation standards, correlation IDs across portal systems
β’ API Management: APIGEE experience, API monitoring, rate limiting, security, performance tracking for portal APIs
β’ Infrastructure: CI/CD pipelines , AI tools like GIT copilot , Cursor etc.
β’ Observability Tools & Query Languages: PromQL, InfluxQL for querying metrics(Grafana)
β’ Strong experience with Kubernetes (GKE), including namespace management, RBAC, and deploying/maintaining SRE tools via code (Python, Bash, YAML, Helm).
Additional Critical Skills
β’ Distributed Tracing Standards: W3C Trace Context headers implementation
β’ Structured Logging: JSON format with specific fields (trace\_id, service.name, log.level, customer.id, request.id)
β’ Performance Baseline Establishment: Ability to collect and analyze 2-4 weeks historical data for performance baselines
β’ Dashboard Implementation: Drill-down capabilities, service mapping from trace data, correlation between metrics/logs/traces
GCP-Specific Observability Skills (CRITICAL)
β’ Google Cloud Monitoring: GMP (Google Managed Prometheus), Cloud Monitoring dashboards, alerting policies
β’ Google Cloud Logging: Centralized logging, log-based metrics, log exports
β’ OpenTelemetry (OTEL): Instrumentation, collectors, data collection from GCP services
UI Instrumentation & Frontend Monitoring (CRITICAL)
β’ UI Span Management: Naming conventions for UI-initiated spans, W3C Trace Context headers for frontend
β’ Frontend Observability: User session tracking, component-level monitoring, UI performance metrics
β’ Cross-Platform Tracing: End-to-end traceability from UI to backend services
Hartford, CT (Hybrid)
Contract to perm role
MUST HAVE: GRAFANA, PROMETHEUS, CLOUD exp (GCP desired), LOGGING, TRACING, WEB PORTAL Metrics
Looking for a senior level resource that can grow into lead role as team expands
Job Description
β’ Design and implement comprehensive SRE monitoring for web portal on GCP
β’ Set up JVM metrics collection and performance monitoring for Java applications using GCP Monitoring
β’ Implement logging and tracing standards across all portal components using Cloud Logging and Cloud Trace
β’ Configure APIGEE monitoring and API performance tracking for portal services
β’ Implement distributed tracing with W3C Trace Context headers and OpenTelemetry
β’ Create drill-down dashboards with correlation between metrics, logs, and traces using GCP tools
β’ Integrate GCP Monitoring, Logging, and Trace with existing Prometheus/Grafana stack
β’ Configure GMP (Google Managed Prometheus) for enhanced metrics collection
β’ Implement UI zero code instrumentation for frontend monitoring and traceability
β’ Create RED (Request, Error, Duration) dashboards for Performance and Production environments
β’ Build service health dashboards with drill-down capabilities and error message analysis
β’ Develop and maintain SRE automation/scripts within GKE namespaces (SRE and others) for monitoring, deployment, and troubleshooting.
Experience: 5+ years in SRE/DevOps with proven JVM, APIGEE, GCP observability, Grafana stack, GKE, OpenTelemetry, and UI instrumentation implementation experience
Clear Skills Needed
β’ Technical: Python, Linux, Prometheus, Grafana, Kubernetes, Docker, Loki, Tempo
β’ JVM Metrics: Java application monitoring, JVM performance tuning, heap analysis, garbage collection optimization for portal applications
β’ Logging & Tracing: Splunk, distributed tracing, log aggregation standards, correlation IDs across portal systems
β’ API Management: APIGEE experience, API monitoring, rate limiting, security, performance tracking for portal APIs
β’ Infrastructure: CI/CD pipelines , AI tools like GIT copilot , Cursor etc.
β’ Observability Tools & Query Languages: PromQL, InfluxQL for querying metrics(Grafana)
β’ Strong experience with Kubernetes (GKE), including namespace management, RBAC, and deploying/maintaining SRE tools via code (Python, Bash, YAML, Helm).
Additional Critical Skills
β’ Distributed Tracing Standards: W3C Trace Context headers implementation
β’ Structured Logging: JSON format with specific fields (trace\_id, service.name, log.level, customer.id, request.id)
β’ Performance Baseline Establishment: Ability to collect and analyze 2-4 weeks historical data for performance baselines
β’ Dashboard Implementation: Drill-down capabilities, service mapping from trace data, correlation between metrics/logs/traces
GCP-Specific Observability Skills (CRITICAL)
β’ Google Cloud Monitoring: GMP (Google Managed Prometheus), Cloud Monitoring dashboards, alerting policies
β’ Google Cloud Logging: Centralized logging, log-based metrics, log exports
β’ OpenTelemetry (OTEL): Instrumentation, collectors, data collection from GCP services
UI Instrumentation & Frontend Monitoring (CRITICAL)
β’ UI Span Management: Naming conventions for UI-initiated spans, W3C Trace Context headers for frontend
β’ Frontend Observability: User session tracking, component-level monitoring, UI performance metrics
β’ Cross-Platform Tracing: End-to-end traceability from UI to backend services