The last decade has been huge for developers. With growing adaption of cloud native principles, it's become easier to get applications from idea to production. Seriously, look at growth we've seen with the development of VMs, containers, Kubernetes, serverless...the list goes on and on. We have folks in the workforce who have never worked without the simplicity of API calls.
IT's been an exciting time across the board, but one area feels left behind - bare metal. If you (or anyone you know) has worked with servers on in data centers back in the 80's, for the most part, you could walk in and pick up right where you left off. The management and provisioning of bare metal servers has moved very little. With all of these new technologies, why hasn't bare metal kept up?
Why are people using bare metal?
The intro may not have been entirely correct. When I say people haven't worked with bare metal, I say that with a very narrow idea of what bare metal servers are when in reality that includes everything from servers that cost thousands of dollars in a data center, to home labs, to private clouds, and even Raspberry Pis. Chances are if you're reading this, you've tinkered with bare metal and may not have even known it.
Talking about deploying workloads in a work environment, why would someone opt to use bare metal servers instead of the seemingly easier cloud options available? It depends on your use case and what you like. You can get more specified performance and behavior for your applications on bare metal by removing layers and being as close to the physical hardware as possible to remove latency. Maybe you need to keep systems like hypervisors or VMs that share a server from using excess resources and become noisy neighbors and cause apps to fail. Real-time operating systems may need full access to the hardware without anything standing between it and the software layer. Plus, bare metal gives you control; you build directly on the hardware, having the choice for each aspect of your infrastructure without a cloud provider making assumptions that may not be the correct ones for your workloads; not all hypervisors let you run anything you want. While cloud computing appears to be a cheap option, depending on your workloads and your server usage can become more expensive to maintain, plus chances are you now need someone to manage that virtualization, and potential license costs.
How deploying physical infrastructure has worked
Deploying on virtual machines has become pretty simple and many folks make use of that. Developers would like to be able to do something similar on bare metal, something that hasn't been possible until recently.
Back in the day, to install software on a device was much like doing it at home - you needed a stack of floppy disks or CDs to install. You had to wait for those to install and run. That wasn't fun when we had to do it on one computer, imagine how difficult it would be to manage on hundreds of servers in a data center. You had to be physically present to do manage servers.
Today, it hasn't changed much. The first bit of technology you need to install on the machines dates back to Bill Clinton's first year in office (that was 1993). That's the year DHCP was introduced. Next, we need Preboot Execution Environment Specification (PXE), which was also introduced during the Clinton administration. Strange reference, but it shows just how long we've been using the same technologies without much advancement for managing servers.
What do these technologies do? DHCP is necessary for asking for networking configuration to a device. You use this every time you use your mobile device to communicate with and connect to your cell or wireless network. You'll also use Trivial File Transfer Protocol (TFTP), which is a lightweight and insecure method for a server on a network to download files to itself. Then we have the final piece of the puzzle, PXE standard, which is a way to start a program that's loaded on a network.
Typically, a machine boots from a disk with an operating system on it. So what happens when there is nothing on the disk or the BIOS boots from the network? This is what happens in a bare metal environment.
When the server starts for the first time, depending on the BIOS, it may look on a disk, which it won't find anything from, or it'll look on the network. Running DHCP over the network enables the deployment server to give the new server an IP address on the network. With this address the new server can download the PXE code, the PXE code is executed and will repeat the DHCP request. However DHCP lease will not only present an IP address but also an operating system install script. The PXE code downloads and parses the script, instructing it to download a kernel, a `ramdisk`, which it needs to boot, and now installs the OS.
This isn't an easy process. It's a lot of work for members of ops teams to try to provide any kind of automation, things like custom scripts and the use of old and legacy software presents several challenges. Even after you've done all of this with a variety of scripts, you still aren't really automating, scaling isn't possible, upgrading is difficult, it's hard to handover to other areas of infrastructure without docs and training, and it's hard to integrate into existing ecosystems and easily becomes a source of technical debt.
Making Servers Fly Like Clouds
While running dozens of data centers with bare metal servers, the Equinix Metal (formerly Packet) team needed a better way to automate servers to increase productivity and reduce costs. That's where Tinkerbell came from. Looking at the needs of developers in the wild, so to speak, it was obvious that Tinkerbell needed to be made open source, giving everyone the tooling to simplify bare metal automation and deployment.
Tinkerbell addresses most of the problems we've discussed earlier because it was built by developers who dealt with them with the goals of automating deployment steps, simplifying scaling, modernizing deployment, and creating an easy way to build an ecosystem that sits on top of a platform bringing modern tooling, modern platforms, modern infrastructure that can easily scale up, and manage infrastructure.
How Tinkerbell Works?
Tinkerbell consists of 4 microservices that are automatically deployed for you as part of deploying Tinkerbell:
BOOTS service - This is a reimagining of all the components needed to remotely start a server. It's a reimplementation with DHCP in Go, a reimplementation of TFTP in Go, provides a HTTP service for Kernels and ramdisks to be downloaded, and iPXE. Since it runs in Go, it's memory safe so it's more secure and less likely to be exploited through memory issues, it's simple to run lightweight within a containers.
OSIE - OSIE is what BOOTS gives to a server after the DHCP requests and gives an offer, which is a boot filename and address to download OSIE. OSIE is built using Alpine Linux, a lightweight distribution, and the Docker. All of this is downloaded from the TFTP operation, and it'll start this from memory.
Tinkerbell - Once OSIE is running, it reaches out to our Tinkerbell server to see what it needs to do. The Tinkerbell server will then send a declarative YAML file, telling it what to run, and identifies the machine or worker. kexeconce the OS has been written to the physical, persistent disk, which will start the kernel on the disk, like a fast reboot. Everything that was running is thrown out of memory and the machine pivots to the newly deployed disk.
HEGEL - This is basically a metadata service that always provides the information you set for your server. So, you can deploy the same OS for every machine, but the individual server can read the metadata and implement the unique about this server when cloud-init interrogates. This is what you'll find in most cloud environments through API when a server boots up.
This is a more modern, API-driven, cloud native like environment for bare metal. For those who have spent most of their time with other cloud providers.
Want to try it out?
This is all open source, so you can automate and manage and all servers you have, including something as simple as the Raspberry Pi sitting in your desk drawer.
Tinkerbell also has a sandbox for you to play with. It's built using Docker components and starts everything up for you as a full service.