Share this Job
Apply now »

Site Reliability Engineer-IT Engineer


Requisition Number: 85827 

As a Site Reliability Engineer (SRE), you will help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of our support and software development focuses on optimizing existing systems, building infrastructure, and reducing work through automation. You will join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment, you will take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an SRE, you will be focused on running better production applications and systems.



  • Design, code, test, and deliver software to automate manual operational work
  • Troubleshoot priority incidents, facilitate blameless post-mortems, and ensure permanent closure of incidents
  • Engage with Development and DevOps team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes
  • Identify application patterns and analytics in support of better service level objectives
  • Design self-healing and resiliency patterns
  • Design automated software and product upgrades, change management, and release management solutions
  • Coach or manage teams as applicable
  • Participate in the 24x7 support coverage as needed



  • Bachelor's degree or equivalent experience in a software engineering discipline
  • Expertise in at least one technology stack designing, coding, testing, and delivering software - Java Stack, Kubernetes, Microservices, Kafka, Azure
  • Proficiency in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firm
  • Working knowledge of infrastructure components (e.g., routers, load balancers, cloud products, container systems, compute, storage, and networks)
  • Excellent debugging and trouble shooting skills
  • Software Engineering background with a focus on Systems Engineering.
  • Some of the SRE tools we need are ELK, Akamai CDN/mPulse, Prometheus, Grafana.
  • Scripting - Ansible, Puppet, Chef
  • Strong Linux (RHEL) administration skills



What You Bring to the Team

  • Possess an extremely high attention to detail and organization with the passion and ability to create order out of disorder with excellence and efficiency.
  • A desire to automate everything. Whether that be infrastructure as code or tooling to eliminate toil, automation should be a core focus of your mindset and the elimination of repetitive tasks should be a constant desire in the role.
  • Natural curiosity. You are not simply satisfied with something working, you want to know why it works and how it works.
  • A mindset of total ownership - you are not afraid to dig into things you have never worked on before, from the browser all the way to the persistence layer. You have a solid foundation in debugging and can jump in when needed to any problem you are asked to help with.
  • An architectural mind. You understand the fundamentals of distributed computing and look for ways to make systems more resilient, self-healing, and eliminate the need for human intervention as much as possible.
  • Extraordinarily effective communication and interpersonal skills allowing the candidate to work well in a team environment and deliver excellent customer service.
  • The ability to convey the importance of site reliability in both business and technical terms to a wide variety of audiences that range from non-technical to the most technical of engineers. Drive stakeholders buy-in of key metrics such as SLAs/SLOs for all supported systems.
  • Ability to maintain SLOs through the implementation of proactive issue detection and reporting
  • Experience developing scripts or tools for automating administrative tasks.
  • Prior successful experience as a systems performance or site/systems reliability engineer.
  • Demonstrated experience working in large, complex systems environments.


The estimated salary for this position is from $90k to $120k

The position described above provides a summary of some the job duties required and what it would be like to work at Insight. For a comprehensive list of physical demands and work environment for this position, click here.


Today, every business is a technology business. Insight Enterprises, Inc. empowers organizations of all sizes with Insight Intelligent Technology Solutions™ and services to maximize the business value of IT. As a Fortune 500-ranked global provider of digital innovation, cloud/data center transformation, connected workforce, and supply chain optimization solutions and services, we help clients successfully manage their IT today while transforming for tomorrow. From IT strategy and design to implementation and management, our 11,000 teammates help clients innovate and optimize their operations to run smarter. Discover more at 

  • Founded in 1988 in Tempe, Arizona
  • 11,000+ teammates in 19 countries providing Insight Intelligent Technology Solutions™ for organizations across the globe
  • $8.3 billion in revenue in 2020
  • Ranked #409 on the Fortune 500, #15 on the CRN Solution Provider 500, 2020 CRN Innovator of the Year Award
  • 2020 Intel Innovation Partner of Year, 2020 Microsoft U.S. Partner of the Year and Worldwide Customer Experience Partner of the Year
  • Ranked #7 on the 2021 Fortune World's Most Admired Companies (Information Technology Services industry), #70 on the Fortune 100 Best Workplaces for Diversity, #296 on Forbes World's Best Employers (#27 within IT), and #5 on the Phoenix Business Journal 2020 list of Best Places to Work
  • Signatory of the United Nations (UN) Global Compact and Affiliate Member of the Responsible Business Alliance


Today's talent leads tomorrow's success. Learn about careers at Insight:


Insight is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veteran status, sexual orientation or any other characteristic protected by law.



Posting Notes: Remote || Texas (US-TX) || United States (US) || None || None || Remote ||

Job Segment: Cloud, Test Engineer, Testing, Software Engineer, Engineer, Technology, Engineering

Apply now »