Site Reliability Engineer - Telecommute
Doctor On Demand
 Chicago, IL

About Doctor On Demand

Doctor On Demand's mission is to improve the world's health through compassionate care and innovation. We believe that health is personal, and means so much more than treating illness. We're proud of the care we've provided over the years and the relationships we've developed with our patients, as evidenced by the 5-star reviews we continually receive. People use our service to gain access to some of the best physicians and licensed therapists in the country, all whenever and wherever is most convenient. It's as simple as opening the Doctor On Demand app on a smartphone or computer.

Through live video visits, our hand-picked, US-trained doctors take patient history, perform an exam, and recommend a treatment plan. Prescriptions, if needed, go directly to the pharmacy of choice. While insurance isn't required, tens of millions of Americans enjoy covered medical and mental health visits through employer and health plan partnerships. To learn more about the hundreds of medical issues we treat, visit us at

About the Role

Site Reliability Engineers are hybrid systems and software engineers who take ownership of managing our distributed systems while improving reliability and automation. At Doctor On Demand, the SRE team works closely with the broader engineering organization in order to implement tools and practices that increase the reliability of systems and increase the velocity of deployments.


  • You will build tooling to enable us to deploy faster and reliably rollback
  • You will improve monitoring and alerting to enable faster mean time to response and resolution
  • You will containerize legacy applications and migrate those services to Kubernetes
  • You will write code and design services to improve the reliability of our applications
  • You will troubleshoot, mitigate, investigate, and resolve production issues and their underlying causes


  • You have a proven track record of supporting production systems, and are able to effectively coordinate incident response
  • You have an expert understanding of Linux systems and services
  • You have advanced practical experience with AWS or GCP
  • You have written production code in a high level language
  • You have a strong sense of ownership and drive; able to prioritize and work on tasks independently
  • You have excellent written communication, interpersonal communication, and documentation skills

Bonus Points

  • You have production Kubernetes experience
  • You have advanced knowledge of Python or Go
  • You have production experience with GCP, Prometheus, Splunk, RabbitMQ, Redis, and/or PostgreSQL
  • You have a history of working in a HIPAA compliant environment
  • You are familiar with Terraform and/or Helm
  • You are familiar with Infrastructure as Code and GitOps


  • Be a core leading member of a small, elite engineering team
  • Be part of a startup that is gaining national recognition and actually making a difference
  • Fluid work hours, fun, fast-paced environment
  • Full benefits

    • competitive compensation

  • Unlimited PTO; wellness and cell phone allowance