Site Reliability Engineer – SRE

Remote, USA Full-time
Job Description: • Serve as first responder for production incidents during U.S. operating hours (±2h EST). • Lead triage during outages, analyzing logs, metrics, and traces to identify root causes. • Drive incident postmortems and follow-ups to prevent recurrence. • Communicate clearly and quickly during incidents to internal stakeholders. • Own reliability outcomes across all OpenFX systems, with a focus on uptime, latency, and error budgets. • Enhance observability through logging, metrics, alerting, and dashboards. • Optimize on-call processes and ensure smooth handoffs across IST, EST, and PST coverage. • Partner with DevOps and engineering pods to implement fixes or approve production changes. • Proactively identify systemic reliability risks and propose improvements. • Contribute automation and tooling to reduce manual incident handling. • Champion best practices in reliability engineering and operational excellence. Requirements: • 5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering. • Proven experience leading incident response, running postmortems, and communicating during outages. • Strong background with cloud infrastructure (AWS preferred), container orchestration (Kubernetes, ECS), and Infrastructure-as-Code (Terraform, CloudFormation). • Familiarity with observability stacks (e.g., Prometheus, Grafana, Datadog, ELK, OpenTelemetry). • Ability to triage errors at both the infrastructure and application level, and escalate effectively when deeper intervention is required. • Ownership mindset with strong communication skills in high-pressure situations. Benefits: • Competitive salary and benefits package. • Equity in a rapidly growing company. • Opportunity to work on mission-critical infrastructure in fintech. • A collaborative team culture with a bias toward ownership and outcomes. • The chance to make a direct impact on the resilience of global financial infrastructure. Apply tot his job
Apply Now

Similar Jobs

Site Reliability Engineer | Together AI | $160k – $230k | Remote (USA)

Remote, USA Full-time

Solution Engineer, Enterprise

Remote, USA Full-time

Sr Snowflake Data Engineer

Remote, USA Full-time

Data Migration Engineer, Snowflake, dbt

Remote, USA Full-time

Senior Data Engineer - Snowflake

Remote, USA Full-time

Senior Software Engineer - Machine Learning Feature Store

Remote, USA Full-time

Data Engineer Snowflake

Remote, USA Full-time

Analyst - Social Listening & Insights

Remote, USA Full-time

Social Media & Community Manager (Work from Home) - VacancyGlobal

Remote, USA Full-time

Paid Media Analyst-Hybrid/ remote only in Chicago, Denver or Des Moines

Remote, USA Full-time

Pocket-lint - Culture & Celebrity News Writer

Remote, USA Full-time

Experienced Data Entry Specialist - Remote Opportunity with blithequark

Remote, USA Full-time

**Experienced Entry-Level Data Entry Clerk – Remote Opportunity in Raleigh, NC (100 Positions Available)**

Remote, USA Full-time

Experienced Remote Data Entry Specialist and Customer Service Representative - Full Time Opportunity for Career Growth and Development at blithequark

Remote, USA Full-time

**Experienced Hybrid Customer Care Professional - Small Business Sales**

Remote, USA Full-time

Android/Flutter Developer for "Offline-First" Safety Hardware (BLE Scanning + Kiosk Mode)

Remote, USA Full-time

Pharmacy Prior Authorization Senior Representative - Remote AZ or IL or MA

Remote, USA Full-time

Assistant Manager, Digital Commerce & Operations, e.l.f. Cosmetics & e.l.f. SKIN

Remote, USA Full-time

Experienced Healthcare Customer Advisor for Special Needs Families - National Remote Opportunity with arenaflex

Remote, USA Full-time

**Experienced Customer Service Representative - Data Entry & Document Review Specialist for arenaflex Railcar Maintenance Services**

Remote, USA Full-time
Back to Home