Equinix Metal is venturing into new territory as we look to bring future-forward software practices to the arcane world of BMCs and firmware.
In the past, you couldn’t buy a server without a commercial operating system, likely Microsoft Windows or UNIX. With the Linux revolution and open source in general, we’ve been able to rethink our relationship with servers and — more recently — with networking as well. Just like with servers, network operating systems like SONiC are thriving in the open.
The BMC (baseboard management controller) is a small but critical hardware component embedded on a server’s motherboard that provides an interface between system management software and platform hardware. It is used to remotely gather information on machine metrics, power status, and other relevant information. This has historically been done via the IPMI protocol.
The problem is that for operators at scale, BMCs tend to be a bit unreliable and there hasn’t been much we could do about it. With Linux on a COTS (commercial off-the-shelf) server, you have the ability to add packages, remove packages, alter the server's behavior, and write software to manage the system in whatever way you choose. BMCs, on the other hand, have remained far less adaptable, which has presented a real barrier to innovation.
Working with BMCs in ways that matter to us
As a cloud provider, we need to control the pace at which we deploy the software that runs our platform, and the ability to deploy fixes rapidly when needed, including at the hardware layer. Unfortunately, server manufacturers still largely hold onto the old concept that end users aren’t allowed (or don’t need) to touch the software that runs on the BMC. Support requests to make fixes or changes to the BMC software on any given machine might not show up for months or even years. To say the least, firmware releases to resolve issues at 6-12 month intervals do not align with our operational model.
We began working with the OpenBMC project about 18 months ago in order to open up one of the last bastions of closed-source proprietary software that runs inside our servers.
OpenBMC and Equinix Metal are kindred
As a Linux Foundation project, OpenBMC aligns with the decades-long lineage of ecosystems and neutrality at Equinix. At Equinix Metal, our investments in Tinkerbell, Open19, and the CNCF have allowed us to further that commitment by proactively contributing to the open source software and hardware communities.
Having good vibes about open source is one thing, but contributing something useful is quite another! In this case, we’ve had to assemble a team that can wrangle the past while building for the future.
For example, most software developers looking to build at our kind of scale are well-versed in RESTful APIs, message busses, and software release cycles measured in days. All of that goes out the window in the world of firmware, however, where entrenched protocols like IPMI, SNMP, and HTTP rule the roost. We’re doing our part to replace them with more modern alternatives like Redfish, Prometheus, and HTTPS. Our goal is to see BMC software reflect the state of the art without the legacy baggage and constraints of proprietary software.
This isn’t just about making developers happy, although that’s a pretty good reason! As hardware becomes more important, gets put into more places, and is consumable by automation anywhere, the software that runs the lowest levels of hardware is critical to security, performance, and reliability.
To advance the cause, we’ve been upstreaming OpenBMC support for the ASRock Rack E3C246D4I-2T system board, which powers some of our most popular instance types. It feels good to say that we’ve recently started deploying OpenBMC in production on this hardware.
What’s under the hood?
A successful OpenBMC build for the E3C246D4I-2T board will get you all the features that come “for free” from OpenBMC – iKVM, virtual media, Redfish (and yes, even IPMI!). It’s safe to say that not everything on the OpenBMC path is easy to implement. There are many specialized areas that require a detailed understanding of arcane hardware minutia.
Thermal control is one of the areas that server vendors spend a lot of time and money to optimize – there is no one-size-fits-all solution. Sensors, especially those on the CPU package, are extremely important to a thermal solution. Intel's Xeon class CPUs provide this data via PECI, an Intel-specific interface. Intel's team provided support in getting this implemented.
Most of the implementation work took place in the Linux kernel for gathering thermal data from the CPU and other components on the motherboard. Because the chassis we use with our E3C246D4I systems handle thermal management outside of the system board itself, our port does not provide fan control – configuring OpenBMC's existing PID controller to add this functionality could be a good first project for an enterprising new OpenBMC developer, however!
We want to support the community with our development work and make OpenBMC more accessible, so you don't have to buy a $10,000 server just to get access to this one inexpensive part of it (BMC hardware itself doesn't cost much). If you’re a visual learner, come see our Twitch stream as we convert a standard retail board to OpenBMC on April 28th at noon PDT.
Shout out to AsRock Rack
We’re super excited about what lies ahead for Equinix Metal and OpenBMC, but credit should go where it is due. Our work on OpenBMC couldn’t be done without the support and cooperation of ASRock Rack, an ODM partner that helps us make our Open19 infrastructure. Their collaboration on enabling our OpenBMC porting efforts has been a huge contributor to its success.