The CloudFoundry Platform runs on Virtual Machines (VMs). At SAP it runs on four different hyperscalers at significant scale. The operating system (OS) used for those VMs (Ubuntu Xenial) was reaching its end of life in April 2021 and with that not receiving any security updates. Significant scale combined with no security updates does not only sound like a big issue, it is one. Fortunately, all machines now run on the successor version of that OS, Ubuntu Bionic, which is still maintained. To learn why that was a challenge read on.
What does significant scale mean?
In total the VMs of all SAP-operated CloudFoundry-based installations sum up to over 42.000 VMs. And this includes only the relevant integration, hotfix and live systems, but not development and staging systems.
How does CloudFoundry get on those hyperscaler VMs?
The CloudFoundry Platform is managed by a tool called BOSH. It’s a release engineering tool that handles the deployment as well as the so called day 2 operations like rolling out updates and keeping the deployments healthy. It also defines how the software is packaged, called BOSH releases, and provides a common operating system image, called Stemcell. The Stemcell provides an abstraction across the different hyperscalers and provides a common interface for the workload packaged as BOSH releases. This makes it possible to easily update the underlying OS while keeping the workload stable. So easy, in fact, that the VMs of CloudFoundry@SAP are repaved every 2 weeks. Yes, that really means, that over 42.000 VMs are destroyed and re-created every 2 weeks. Security patches can be rolled out within hours, if necessary and … if available. Until recently it was exactly this Stemcell, that needed an update from Xenial to Bionic and without this no security patches would be available anymore.
Who maintains the Stemcell?
In the past the Stemcell provided to the community was created, maintained and released by VMWare. In the beginning of 2021, VMWare decided to stop their work on the Bionic Stemcell. With this decision it became clear that the community would not have a maintained Stemcell from May 2021 onwards. What made this more problematic was the fact that the setup to create and release Stemcells was running on VMWare-owned infrastructure. It was clear that SAP was in the need of a maintained Stemcell, but at the same time SAP had no intention to become the new VMWare taking over the maintenance on SAP-owned infrastructure accounts.
First things first, making the Bionic Stemcell GA (General Availability)
Facing the May 2021 deadline, the most important part was to productize the Stemcell as quickly as possible. SAP volunteered to drive this, with the infrastructure still on VMWare-owned accounts. Running on four different hyperscalers and consisting of a substantial amount of custom and upstream BOSH releases SAP was in a great position to make necessary changes and to validate those in fast iterations. With the help of many teams in the CF Core Area@SAP it was possible to solve any open issue and to adopt the new Stemcell in May right after hitting the deadline. At the same time SAP announced the general availability of the new Stemcell to the CloudFoundry community.
The future of Stemcell creation
As already said, SAP has no plans to become the new sole maintainer of the Stemcell. Instead, discussions with the CloudFoundry Foundation (CFF) were started in parallel to move the entire setup to create, maintain and release Stemcells to CFF-owned hyperscaler accounts. The members of the CFF considered this a good way forward. The setup has recently reached a first milestone and the first Stemcell has been created with a release infrastructure that is completely owned by the CFF. While most maintenance efforts are currently driven by SAP, this lays the foundation for easy onboarding of other contributors from the CloudFoundry Community.