Senior Director, Cloud Operations (SRE, SDM)

Location IN-Remote
ID 2025-8828

Job Summary

The Senior Director of Cloud Operations is responsible for the operational integrity, performance, and reliability of enterprise cloud environments. This role leads a global, data-driven operations team with a strong emphasis on incident management, service continuity, and continuous improvement. This role reports directly to the Vice President of Cloud.  

This position will be responsible for leading a global team of cloud engineers, SRE practice, service management tools and operations using a metrics-first approach. 

What Your Impact Will Look Like

  • Cloud Infrastructure Operations 
  • Oversee the daily operations of cloud platforms (AWS, Azure, GCP), ensuring high availability and performance across global regions. 
  • Lead the development and execution of operational runbooks, SOPs, and escalation paths. 
  • Incident Management & Response 
  • Own the end-to-end incident management lifecycle: detection, triage, escalation, resolution, and post-incident review. 
  • Lead a global incident response team with 24/7 coverage, ensuring seamless handoffs across time zones. 
  • Implement real-time monitoring, alerting, and automated remediation to reduce MTTD and MTTR. 
  • Use data analytics to identify incident trends, recurring issues, and systemic risks. 
  • Conduct blameless postmortems and ensure corrective actions are prioritized and tracked to closure. 
  • Data-Driven Operational Leadership 
  • Build and lead a global team of cloud engineers, SREs, and operations analysts using a metrics-first approach. 
  • Define and track operational KPIs (e.g., uptime, incident frequency, resolution time, change success rate) to drive accountability and performance. 
  • Leverage dashboards and analytics platforms (e.g., Datadog, Grafana, Splunk, ServiceNow) to provide real-time visibility into system health and team performance. 
  • Use data to inform staffing models, on-call rotations, and workload balancing across regions. 
  • Foster a culture of continuous improvement through data-backed retrospectives and operational reviews. 
  • AI enabled Focus
  • Drive AI and ML adoption in operational workflows (e.g., predictive monitoring, incident pattern analysis etc.,) to improve uptime and automate repetitive tasks.
  • Define and execute AI-driven observability strategy using tools like AIOps platforms for intelligent alerting and root cause analysis.
  • Collaborate with Engineering, Security, and Product teams to embed AI-enabled automation in deployment pipelines, change management etc.,.
  • Establish and maintain SLOs/SLAs leveraging AI-generated insights to prioritize engineering work that improves reliability and customer experience.
  • Oversee incident management, post-mortems, and continuous improvement, incorporating AI tools for impact analysis and knowledge retention.
  • Operational Governance 
  • Define and enforce SLAs, SLOs, and operational KPIs. 
  • Ensure compliance with security, regulatory, and audit requirements. 
  • Manage change control, configuration management, and release processes to minimize operational risk. 
  • Cost & Vendor Management 
  • Monitor and optimize cloud spend through cost governance and usage analysis. 
  • Manage vendor relationships, contracts, and service-level agreements. 
  • Collaboration & Communication 
  • Partner with engineering, security, and business teams to align operations with product and service goals. 
  • Provide regular reporting and updates to executive leadership on operational health, risks, and incident trends. 

You Will Love This Job If You Have

  • Education 
  • Bachelor’s or master’s degree in computer science, Information Systems, or related field. 
  • Experience 
  • 14+ years in IT operations, with 7+ years in cloud infrastructure and operations leadership. 
  • Proven experience leading global teams and managing high-severity incidents in large-scale environments. 
  • Skills 
  • Deep expertise in cloud operations, incident response, and service reliability. 
  • Strong knowledge of ITIL, SRE, and DevOps practices. 
  • Proficiency in operational analytics and observability tools. 
  • Excellent leadership, communication, and cross-functional collaboration skills. 
  • Strong presentation skills, including experience presenting to large global audiences. 
  • Certifications (Preferred) 
  • AWS Certified DevOps Engineer – Professional 
  • Azure Administrator Associate 
  • ITIL Foundation or Practitioner 

The Benefits

At Granicus, we offer a comprehensive and flexible benefits package designed to support your well-being, growth, and work-life balance.

Here’s what you can expect as a India-based team member:


Flexibility & Balance

  • Paid Time Off– Take the time you need to rest, recharge, and live your life.
  • Company-Wide Wellbeing Days – Paid days off to unplug and focus on your mental health.
  • Work From Home Reimbursement – Support a productive home office environment.

Health & Wellness
  • Private healthcare benefits - Comprehensive coverage for you and your family.
  • On-Demand Mental Health Support – Access to Headspace and other wellness tools.
  • Fitness Reimbursement & Cycle Program – Stay active, your way.
  • Critical Illness and Life Insurance Benefits

Family & Future
  • Paid Parental Leave - For both birthing and non-birthing parents.
  • Pension plan with employer contributions

Growth & Recognition
  • Online Learning Platforms – Fuel your professional development.
  • Competitive Salary & Bonuses – Your contributions are valued and rewarded.

Options

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.