Happy Independence Day from Chameleon!

Posted by Kate Keahey on July 01, 2016

When in the Course of human events, one people take upon themselves to nourish and support the research needs of another, a decent respect for opinions of mankind requires that, in well-appointed intervals, a declaration of the works performed should be made to those whose research can profit by them.

We hold these truths to be self-evident:

More types of hardware will support a broader range of experiments. We have made available new hardware consisting of two storage hierarchy nodes, two K80 GPU nodes, and two M40 GPU nodes. Each of the additional six nodes is a Dell PowerEdge R730 server with the same CPUs as our compute nodes. The two storage hierarchy nodes have been designed with enabling experiments using multiple layers of caching in mind: they are configured with 512 GiB of memory, four Intel S3610 SSDs of 1.6 TB each, and four 15K SAS HDDs of 600 GB each. The GPU nodes are targeting experiments using accelerators to improve the performance of some algorithms, experiments with new visualization systems, and deep machine learning. Each K80 GPU node is upgraded with an NVIDIA Tesla K80 accelerator, consisting of two GK210 chips with 2496 cores each (4992 cores in total) and 24 GiB of GDDR5 memory. Each M40 node is upgraded with an NVIDIA Tesla M40 accelerator, consisting of a GM200 chip with 3072 cores and 12 GiB of GDDR5 memory. In order to make it easy for users to get started with the GPU nodes, we have developed a CUDA appliance that includes NVIDIA drivers as well as the CUDA framework. For more information on how you can reserve these nodes, see the heterogeneous hardware section of the bare metal user’s guide. For a limited time, we have an “opening sale” on these nodes: right now they will cost you the same in SUs as regular service nodes while offering significantly more powerful capabilities. Last but not least, we would like to thank Early User Luc Renambot from University of Illinois in Chicago for early testing and providing useful feedback on our new hardware.

Big Data experiments require a large data store close to the testbed. To facilitate this we have developed a new Chameleon object store service. The object store can be accessed via the OpenStack Swift interface and is currently backed by a 1.6 PB Ceph cluster with double replication, effectively providing 0.8 PB for general use. We intend to enlarge this storage capacity as needed, but for now, here too we have an “opening sale”: for a limited time there are no limits on usage. The object store is located at TACC, but available to users of both CHI and KVM resources at TACC, UC, and anywhere else on the Internet. To make it easier to use, we added a Swift client to all appliances supported by Chameleon. More details can be found in the object store section of the bare-metal documentation guide.

Better appliance management tools make experimentation easier and more fun. We have put in place a process that allows us to easily create new images as well as maintain, and customize them -- and we are now making it available to you. Our process uses an adaptation of the OpenStack diskimage-builder to make it compatible with the Chameleon testbed. We use this process to create, support, and customize appliances available via the appliance catalog. We have made the scripts to generate the base appliances (CC-CentOS7, CC-Ubuntu14.04, CC-Ubuntu16.04) available, so you can easily fork them to create your own customizations of base images. For more information see our documentation on Chameleon Image Builder. Further, having this process in place has allowed us to develop an image that works for both TACC and UC clouds (whole disk image format only). This means that you can snapshot your instance at TACC, transfer it to UC, and launch it there without making any changes -- which can be very handy if you need to move your work from site to site quickly. And talking of snapshotting, we improved that too: we now have a tool, called cc-snapshot, that allows you to save the instance you are working on, with all its modifications, and upload a copy to Glance; what used to take multiple complicated commands can now be done quickly with just one command.

Having more ready to use appliances allows scientists to spend more time on research. We immediately put the Chameleon Image Builder to good use and created (and even already updated) multiple new appliances that can be found in the appliance catalog. In addition to the CUDA appliance described above, we also added the Ubuntu appliance (and already upgraded it to 16.04), upgraded the CentOS 7 appliance, added the SR-IOV MVAPICH2-Virt appliance that packages efficient MPI for KVM running over IB, SR-IOV RDMA-Hadoop appliance that packages efficient Hadoop over IB, and last but not least the Ubuntu 14.04 DevStack Mitaka which packages OpenStack.

Making it easy to learn about Chameleon will make experimental capabilities more accessible. In order to make it easier for users to use Chameleon we have developed and scheduled a series of webinars that provide a hands-on, in-depth introduction to running your experiments on bare metal in only an hour. Each webinar will guide users through completing a small Computer Science research project using Chameleon and will provide support every step of the way. It is designed to introduce a workflow that can be easily extended to support the user’s own research experimentation. The current schedule will be extended on demand.

Have a fantastic Independence Day and may your pursuit of happiness be fruitful!