Turn Your Hardware into a Chameleon Associate Site with CHI-in-a-Box
- April 19, 2021 by
- Michael Sherman
Do you happen to have a research cluster, but users complain that the interface is inflexible, or it doesn’t provide the level of access (root) and isolation needed to run repeatable experiments? Or maybe you would like a well-defined and easy to implement way to contribute resources to the community at large when your users don’t need them. Or perhaps you are simply worried about the cost of setting up a research testbed from scratch – and then operating it for others.
Setting up a Chameleon Associate Site (potentially on a part-time basis) allows you to join a federation of testbed resources specifically designed to meet the Computer Science research use case. Chameleon Infrastructure (CHI) provides industry-standard APIs for interacting with resources, and provides best-practices and automation around running a research testbed, separate from just running a commodity cloud. Your users will have the same experience that they do at our other sites, being able to run the same experiments and use the same interfaces while the Chameleon team provides support both to you and your users.
What is an Associate Site?
A Chameleon Associate Site can be thought of as a cloud “region” of Chameleon, running on your own hardware. It is federated with the users and groups of the larger Chameleon testbed and has access to a number of centralized resources, such as the Chameleon shared Jupyter environment and Experiment Precis -- but it operates independently. The specific configuration and policy are largely determined by the site’s own needs and interests. An Associate Site can easily be configured by using the packaging of CHameleon Infrastructure called CHI-in-a-box.
The smallest possible site might consist of a single management node, and a few compute nodes. However, the management node can also easily support several racks of nodes. Once your management node is up and running, bare-metal nodes can come and go as needed, depending on what you’re able to support allowing you to flexibly regulate the number of resources you contribute at any one time.
What is CHI-in-a-box, anyway?
Associate Sites are configured using a tool called CHI-in-a-box. CHI (pronounced like the “chee” in cheese) stands for “Chameleon Infrastructure” -- and it comes as a package (“box”), to make it easy for you to deploy.
Since CHI-in-a-box is used for managing the core sites as well as associate sites, it shares the same codebase and is kept up to date as we add features. It comes with sane defaults, a tested set of features, and tools to automate common operations tasks.
You can follow our step-by-step QuickStart Guide to get a minimal site up and running, then add nodes and features incrementally as you go. During normal operations, a set of site-configurations are checked into source control, and deployed via ansible playbooks. This ensures that the configuration is repeatable and observable. Enrollment and modification of bare-metal nodes can be done via the openstack CLI interface, using the plugin for our inventory service.
You can learn more about the experience of setting up an Associate Site in this interview with the first CHI-in-a-box site set up at Northwestern University!
What are the minimum Associate Site requirements?
For operators, you’ll need a server to use as the management node, and ensure that your site meets our installation assumptions.
The management node should have 8gb of ram, 40gb of disk space, two physical network interfaces, one of which needs to be on a publicly routable network. The compute node can either run virtualization using KVM, or be a bare-metal node managed via IPMI.
For a larger site, it’s recommended to have 10gb/s or faster network interfaces on the management node, since it acts as a router for all tenant networks. We also recommend a dedicated interface for local administration of the management node. Additional storage, either within the management node, or external, is needed as the number of disk images grows.
Specific requirements are documented in our Site Requirements Guide.
What does a completed site look like?
After setup is completed, your site will appear to the rest of the Chameleon testbed, with its resources on the Chameleon host calendar, which shows active reservations, with time on the horizontal axis, and nodes IDs on the vertical.
Users will be able to access your site using their existing accounts through our federated login, and the Chameleon team will triage user support requests, and delegate to you on questions relating to your site configuration.
What does it take to build an associate site?
Building a site consists of the following major steps:
-
Plan site requirements, and gather information.
-
Provision a management node, and use that information to generate a site-config file.
-
Customize features and services on the management node according to your site’s needs.
-
Add compute nodes, or enroll bare-metal nodes for use.
-
In the bare-metal use-case, configure vlan aware switches for management by the controller.
A development setup can be built within a day, and we provide a QuickStart Notebook to test it on top of Chameleon.
The rough time estimate to set up a production site depends on the scale of the planned site, and the completeness of information gathered. While the initial setup of the control node is mostly automated, there will be trial and error in customizing your site configuration, especially around networking, as so much is site-specific. If staff are familiar with OpenStack, the time spent in trial and error can be reduced as the requirements will be well understood.
To set up a medium sized production site, a rough estimate of the time needed would be roughly two weeks of total effort: roughly a week for the management node and network and the rest for node configuration. The time spent on adding bare-metal nodes is mostly spent on physical racking and cabling, and scales with the size of the site. Once nodes are enrolled, significant automation is available for testing and maintenance.
How hard is it to run an Associate Site?
CHI-in-a-Box takes advantage of our experience running Chameleon at several sites, with a diverse set of hardware and workloads, so that you don’t need to re-learn the same operational lessons.
Once operational, maintenance of the deployment primarily revolves around system upgrades and fixing occasional glitches. CHI-in-a-Box packaging includes operational tools called “Hammers” -- bots that fix the testbed automatically -- and they should cover most known issues.
You will also be able to take advantage of the built-in alerting infrastructure (using Prometheus, Alertmanager, Grafana, and Kibana) , which will make dealing with many other issues such as hardware or deployment failures easy. Every alert comes with an actionable runbook to help the operator triage and identify the root cause of the failure.
The tooling provides a HA-ready setup using HAProxy/keepalived for redundancy, for when uptime is a primary concern (requires multi-node deployment) and automated backups of important data (Glance images, MySQL databases), for a better night's rest.
Since it integrates with Chameleon's existing user and allocation management system, it removes the need to operate your own user workflow, authentication, and authorization systems.
System upgrades are pushed out by the Chameleon team in the form of new releases to CHI-in-a-Box, which may come with new OpenStack service container images. The rollout of upgrades is automated, though requires a brief window of downtime (a conservative 2-4 hours is typical).
User support of resources contributed to Chameleon, as well as operations of shared services, such as Jupyter or creation and maintenance of images, are handled by the Chameleon team so all your operational effort is contained by operating the site and its resources.
I’m interested, now what?
For starters, read through the repository for more information. We’ve tried to address common use cases, but we’re always interested to learn about more. If you’re interested in a feature that’s not described, it may be in the works. Letting us know helps us prioritize which features to focus on next. This is especially true if you’re able to contribute development time.
Whether you’re interested in setting up a full associate site, or using the infrastructure for your own, more independent use-case, please let us know! Send an email to contact@chameleoncloud.org.
No comments