Articles by Michael Sherman

Building MPI Clusters on Chameleon: A Practical Guide

Simplifying Distributed Computing Setup with Jupyter, Ansible, and Python

Running distributed applications across multiple nodes is a common need in scientific computing, but setting up MPI clusters can be challenging, especially in cloud environments. In this post, we'll explore a template for creating MPI clusters on Chameleon that handles the key configuration steps automatically, letting you focus on your research rather than infrastructure setup.

Power Measurement and Management on Chameleon

Exploring Power Monitoring Techniques with RAPL, DCMI, and Scaphandre

Monitoring power consumption is crucial for understanding the energy efficiency of your applications and systems. In this post, we explore various techniques for measuring power usage on Chameleon nodes, including leveraging Intel's RAPL interface for fine-grained CPU and memory power data, utilizing IPMI's DCMI commands for system-level power information, and employing the Scaphandre tool for detailed per-process power monitoring and visualization. We provide practical examples and step-by-step instructions to help you get started with power measurement on Chameleon, enabling you to gain valuable insights into the energy footprint of your workloads.

Understanding the New FABRIC Layer 3 Connection

Multi-site experiments with FABRIC: Sometimes Layer 3 is all you need

Discover how Chameleon's testbed capabilities have been enhanced with the introduction of the FABRIC Layer 3 connection in our new blog post. Learn about the powerful and flexible networking options it offers, how it simplifies the process of routing to FABRIC resources, and how it enables low-latency and high-bandwidth traffic routing between CHI@UC and CHI@TACC.

Using Terraform with Chameleon

Declarative Orchestration Examples

Terraform is both a command line tool, and a configuration language to build, change, and version resources from various Infrastructure as a Service (IaaS) providers. There are pre-existing providers to integrate with major cloud platforms, both private and open-source.

In particular, since Terraform natively supports Openstack, it will also work with Chameleon :)

If you have a complex configuration, involving multiple nodes and networks, across one or more Chameleon Sites, defining them in a declarative format can be easier than creating them imperatively.

The examples from this post show how to provision instances, networks, and floating IPs across multiple Chameleon …

How to Port your experiments between Chameleon Sites

Best practices for using resources across multiple sites

Chameleon's resources are distributed across multiple sites. If you'd like to move your work between sites, say to take advantage of different hardware, or to find available nodes, good news! It's pretty easy, and this post spells out the details.

Fugaku Nodes now at CHI@TACC

Interested in the Fugaku Supercomputer? We now have 8 Fugaku nodes (Fujitsu FX700), available in CHI@TACC! Each of these nodes has a 48 core ARM A64FX CPU, 32 GiB of HBM2 memory, 512GB of NVMe storage, and HDR100 Infiniband. Notably, the high-bandwidth memory and non-x86 architecture are hard to find in other systems. TACC’s Frontera Supercomputer originally procured these for evaluation, but they’re now available for general use in Chameleon.

CHI-in-a-Box Update

Did you ever wonder what makes your favorite testbed “go”? The answer is  CHameleon Infrastructure, or CHI for short – packaged as CHI-in-a-Box so that anybody can run their own testbed. We blogged about it a year or so ago, and a lot can change in a year, so this blog brings you some important updates. Not least, there is now a paper on CHI-in-a-Box so you can join the testbed as an Associate Site!

IndySCC 2021 on Chameleon

This year Chameleon hosted the IndySCC competition, analogous to the in-person Student Cluster Competition (SCC) of Supercomputing. Teams use cloud/shared resources to optimize a variety of HPC workloads in order to complete the most computations during a 48 hour final, all while staying below a strict power cap. Learn more about the compettion, Chameleon's support, and where you can see the results. 

New Experiments with New CHI@UC Hardware

The recently released Chameleon Phase 3 hardware will support new experiments in networking, disaggregated hardware, and more. Learn more about the different types of hardware and what kind of experiments they're best suited for!

Back