It’s a funny thing — this idea of “creating” or turning up a cloud. Consumers of bare metal are typically much more aware of the physical considerations involved in running hardware than those using hypervisor-based, virtualized clouds.
It’s a funny thing — this idea of “creating” or turning up a cloud. Consumers of bare metal are typically much more aware of the physical considerations involved in running hardware than those using hypervisor-based, virtualized clouds. One thing I still find people curious about is how we at Equinix Metal make our bare metal “cloudy”. That is, how do we get physical infrastructure into the digital space? As a special holiday treat, I’m taking you behind the scenes of how we build out our data centers and how we bridge the divide between the physical and the digital.
Let’s start with the foundational aspects: every cloud, bare metal or not, starts with space and power. In our annual planning, we identify which regions we plan to enter. As we get closer to our planned build, we work within our Equinix family to identify the best facility for our new build. While the size of each footprint depends on demand and other factors, we usually look to start with contiguous space of 10-40 racks.
Once that’s squared away, our data center operations team (affectionately known as DCOps) racks and cables our network infrastructure. They also patch all the connectivity we’ve ordered in advance of standing up the facility, including internet transit, transport between our facilities, out of band connectivity, and–unique to Equinix Metal–integration with Equinix Fabric. As soon as out of band connectivity is available, the network operations (NetOps) team configures our new infrastructure. While core-level configurations often require fairly significant customization, we use zero-touch provisioning (ZTP) to configure all switching infrastructure from templates in NetBox.
Once we have the network infrastructure deployed and configured, we can get started with the fun stuff: standing up our platform and enrolling customer infrastructure.
It’s a cluster!
With the foundation set, it's time to build our site control cluster.
To support our site controller stack, we deploy are a series of m3.large machines in a dedicated cabinet. These machines are used to create a Kubernetes cluster that hosts the local bits of our control plane, including all of the microservices required to assign IPs, boot, partition, install an OS, perform hardware reloads, manage services like SOS, and all the other goodies our customers know and love. Many of these microservices are available as part of our Tinkerbell open source project.
To avoid any circular dependencies, our tiny-but-mighty Delivery Engineering team builds and manages their own PXE stack, as well as all the cluster bootstrapping automation to bring this environment up. This cluster is then registered with Delivery’s multi-cluster orchestration and CD tooling to get all of our facility-specific microservices deployed.
During this bring-up (which Delivery Engineering can do in as short as an hour!) the DCOps crew is busy out on the raised floors, installing initial inventory of our standard Gen3 server infrastructure: c3.small, c3.medium, m3.large, and s3.xlarge. We also build variants of these platforms when customers need additional drives, RAM, or even GPUs.
Serving up our cloud on a silver platter
Once the site cluster is up and the servers are physically in place, it’s time to create the final bridge between the hardware and our users: enrollment.
In order to turn physical hardware into a bare metal cloud, our API needs to know everything about each and every piece of hardware in the environment. To do this, we enroll the new devices into our API (and therefore our various portals) using our Staff API endpoints. At this point, our API has everything it needs to successfully provision and deprovision servers, configure network settings, meter traffic, and all the other bits.
Before handing anything over to customers, we have some final housekeeping to do, including firmware updates (to bring everything up to our latest pinned version) and tests to validate all of our standard OSs are provisioning correctly. We call this "burning in" — and it's a great time for our team to stretch their legs and grab a cup of coffee.
Finally — when everything turns green — we flip the switch and make the site available to you!!