This is an exciting opportunity for a Senior Site Reliability Engineer in the Consumer SRE Team at IMT division, to provide secure, resilient, scalable and maintainable services for mortgage borrowers and lenders. IMT is a division of our client based in Atlanta, which operates numerous financial and commodity marketplaces and exchanges, including the New York Stock Exchange (NYSE). Automation is a big part of what we do – we use infrastructure-as-code within our hybrid cloud to bring stability and scalability to Windows, Linux, Docker and Serverless applications in AWS, On-Prem and Azure environments. We reduce toil through scripting and automation of repetitive tasks. You will collaborate with Developers to deliver robust services, build actionable alerts to detect / avoid incidents and to detect performance bottlenecks, as well as automation to remediate issues.
Responsibilities
Employ deep troubleshooting skills to improve the availability, performance, and security of Ellie Mae Services.
Ensure services are designed with 24/7 availability and operational readiness and rigor
Implement proactive monitoring, alerting, trend analysis and self-healing systems
Define and measure KPIs and SLOs
Build automated deployments, automated tests, and operational tools
Participate in on-call rotation for Production support
Collaborate with Product and Support teams to plan and deploy product releases
Partner with other SREs and lead by example
Knowledge and Experience
10+ years of Application/Systems engineering in 24x7 Production Services environments
BS in Computer Science, Computer Engineering, Math, or equivalent professional experience
Excellent troubleshooter, utilizing a systematic problem-solving approach
Demonstrate the ability to lead Incident Response and root cause analysis (RCA)
Fluency with one or more current generation scripting language used by SRE/DevOps professionals (Powershell, Python, Perl, PHP, Ruby) + Java/.NET development
Experience running a SaaS application in a public cloud, on-prem or hybrid cloud environment
Additional credit for:
Proficiency in Windows and on-prem environments
Experience with Continuous Integration and Continuous Delivery concepts.
Automation in RunDeck or Jenkins
Infrastructure-as-code or Configuration Management, utilizing tools like Terraform, CloudFormation or Chef/SaltStack/Puppet/DSC