Dear Chameleon Users,
We are kicking off the first month of 2024 by focusing on improving Chameleon services, and we are looking for more things to improve too.
Give us your feedback! There are only a few days left to submit your response to the Chameleon User Survey. This survey is your opportunity to voice your needs, experiences, and suggestions, directly impacting the development of new hardware features and capacity improvements on Chameleon, and the deadline is February 6. So far, the main feedback we’ve heard is that users are very interested in more GPUs. If there are other special hardware requests that would aid in your research, please tell us. We are also looking to redesign the interface to python-chi, and if you have any suggestions, please let us know via the help desk.
Improved CC Images. As you know, we provide a core set of supported images for use on the testbed for use in your experiments, which come with testbed utilities like cc-snapshot or special libraries like CUDA. This month, we did a major overhaul to our Ubuntu images, and their variants. Notably, we removed many outdated, infrequently used dependencies, reducing their size significantly, mainly affecting the uncommon usage of TripleO. Additionally, the following changes have been made:
- Fixed issues with Nvidia-SMI not showing GPUs on A100 nodes
- Resolved network connection failures on nodes with qlogic or broadcom 25g interfaces
- Increased reliability of NTP time synchronization at CHI@TACC, authentication for CC-Snapshot and CC-Cloudfuse, and loading additional SSH public-keys.
- Corrected cases where SSH daemon would fail to start, and where the serial console did not automatically log in.
- Fixed DNS and other network timeouts when running “docker build”
- Baremetal nodes now have a “cloud-init ConfigDrive '', and their network config is now set up via cloud-init, the same way as for instances on KVM. This should give a more reliable experience when configuring nodes with multiple network interfaces. Notably, on ubuntu images, this network configuration is now under /etc/netplan, rather than under `/etc/network/interfaces`
- firewalld has replaced ufw. There were issues with ufw and docker iptables rules that required this change.
If you are interested in technical details of what to expect on one of these images, you can read this document with all of our customizations. These updates are deployed at CHI@UC, CHI@TACC, and KVM@TACC, and we are working with associate sites to roll out these updates. As always, if you have any issues with these images, or anything unexpected is happening on your nodes, please contact the help desk.
CHI@Edge device enrollment improvements. This month we rolled out some improvements to the Bring Your Own Device (BYOD) capability of CHI@Edge. This means that things should now be more reliable when enrolling devices. We are working hard on rounding out other much-needed CHI@Edge features and will share them next month, so stay tuned for more!
CHI@UC Outage. Lastly, we’d like to remind you that on Tuesday February, 6 CHI@UC is undergoing total site maintenance.