Too easy is boring! Together, we are on a mission to drive forward the energy transition. We love what we do, and we are unafraid to dive in. We believe in taking ownership of our work and in continuously growing and evolving.
In short: own it, love it, grow with it.
We are a humble team of coffee and maté lovers with over 20 nationalities. With our geek humor, our love for emojis and random facts is only natural. Over 150 envelians are already on board. Dive in and thrive!
Team Lead Platform Operations (all genders)
Cologne / Remote from Germany
Full-time
Permanent employee
About Working at envelio
Your Role
As Team Lead Platform Operations (all genders) you will build a deeply technical team of around 6 people focused on the stable, secure, and predictable operation of our product: the Intelligent Grid Platform (IGP).
Your team is responsible for product operations: keeping customer IGP environments healthy, managing operational processes such as incident handling and releases, and driving systematic reliability improvements based on real production signals.
You work closely with Product, Customer Success, and Engineering teams. You also partner with the SRE/Infrastructure team that owns the platform foundation (cluster provisioning, deployment pipelines, observability tooling, etc.), while your team focuses on running IGP for customers day to day.
You will help evolve our operating model towards 24/7 reliability for customer environments (processes, ownership, and escalation), together with Engineering, SRE/Infrastructure, and Customer Success.
Your team is responsible for product operations: keeping customer IGP environments healthy, managing operational processes such as incident handling and releases, and driving systematic reliability improvements based on real production signals.
You work closely with Product, Customer Success, and Engineering teams. You also partner with the SRE/Infrastructure team that owns the platform foundation (cluster provisioning, deployment pipelines, observability tooling, etc.), while your team focuses on running IGP for customers day to day.
You will help evolve our operating model towards 24/7 reliability for customer environments (processes, ownership, and escalation), together with Engineering, SRE/Infrastructure, and Customer Success.
How You Make an Impact
- You coach, mentor, and help your direct reports grow through 1:1s, performance reviews, and regular feedback
- You own and evolve the operational execution of the IGP across customer environments
- You ensure fast, structured handling of customer-impacting issues and incidents and drive effective follow-ups so the same issues do not reoccur
- You create clarity around ownership and escalation paths for production topics and coordinate efficiently across squads and Customer Success
- You drive operational excellence: pragmatic incident response, calm communication, and a continuous improvement culture with blameless learnings
- You balance short-term operational work (restore service) with long-term investments (reduce toil, improve reliability, improve tooling and runbooks)
- You shape team priorities, capacity, and roadmap: decide what gets attention now vs. what becomes a planned reliability investment
- You support hiring and team development by identifying and attracting talent, and by shaping growth paths within your team
Your Profile
Perfection is a myth! We’re more interested in the human behind the screen, so think of these criteria as helpful directions — we're excited to see how your unique skills might fit in.
- You have strong experience operating complex cloud applications and understand how to run services reliably under real-world constraints
- You operate production services on cloud infrastructure (AWS/Azure/GCP) and understand typical failure modes
- You have hands-on experience with Linux and networking fundamentals for troubleshooting (logs, system health, connectivity)
- You are familiar with modern operating models such as containers and Kubernetes (or similar) and can assess deployments in production (rollouts, rollbacks)
- You are comfortable with incident management, root cause analysis, and prioritizing operational work under time pressure
- You have proven experience leading and developing a team in an operations-heavy environment
- You are strong at stakeholder management and coordination across teams (Engineering squads, Product, Customer Success)
- You have a continuous improvement mindset: you reduce operational toil via better processes, automation, and documentation
- You can communicate clearly in high-pressure situations and create alignment on next steps
- You are fluent in German and English
How we develop Software
- Clear ownership for production topics, and efficient coordination across squads and Customer Success
- Structured incident handling (restore service, communicate clearly, then follow up on root causes)
- Release operations with a pragmatic risk mindset (safe changes, fast rollback when needed)
- Monitoring and alerting hygiene (signal over noise)
- Strong runbooks and automation to reduce operational toil over time
Our Tech Stack
- Multi-cloud, hybrid on-prem setup with Kubernetes and Helm as the common denominator
- Application primarily written in Python and TypeScript
- Standard backing services like PostgreSQL, RabbitMQ, Redis
- Gitlab & Gitlab CI for managing the Software Delivery Lifecycle
- Terraform for Infrastructure as Code
Your Benefits
- Join us fully remote (#LI-Remote) or at our lovely office in Cologne in a hybrid working mode
- Option for remote work from abroad (up to three months per year from anywhere in the EU or the USA)
- State of the art technology and modern tech stack
- Excellent hardware equipment (16 inch MacBooks, 2 screens at your workplace)
- 30 holidays + 3 corporate holidays
- Support for your health through sports membership cooperations
- Flexible use of a monthly mobility budget (e.g. Jobrad, ÖPNV)
- Time and resources for individual growth
- envelio pension plan
- Regular company and team events
About us
Intelligent grids for a sustainable future worldwide — this is our vision! Therefore, we are building the digital hub for the future of power grid management: The Intelligent Grid Platform (IGP). The IGP is the core of our software as a service solution. It’s our ambition to make the IGP and envelio 1 % better everyday. Grow with us!
As a remote-first company, we let you decide how much you want to work remotely – from 0% to 100%. Whether you live in Cologne or Aachen, Bonn or Berlin, Munich or Castrop-Rauxel; all you need is internet! You can also work from abroad for up to three months per year. Since 2017, we have built a diverse international team, with English as our company language.
As a remote-first company, we let you decide how much you want to work remotely – from 0% to 100%. Whether you live in Cologne or Aachen, Bonn or Berlin, Munich or Castrop-Rauxel; all you need is internet! You can also work from abroad for up to three months per year. Since 2017, we have built a diverse international team, with English as our company language.
