Outage start |
Monday, June 06, 2022 8 a.m. |
Expected end |
Tuesday, June 07, 2022 6 p.m. |
Update 5:30 pm: Issues are resolved, all nodes are usable again.
Update 4pm June 7th: Provisioning of baremetal nodes is restored. We're seeing failures to create leases for P2 nodes (types compute_skylake, gpu_rtx_6000), but reservation of P3 nodes is succeeding.
Outage start |
Thursday, March 24, 2022 10:25 a.m. |
Expected end |
Thursday, March 24, 2022 12 p.m. |
Update: This has been resolved as of 11:42 AM, and the site is back up. Running nodes should not have been affected, aside from the temporary loss of network connectivity.
CHI@UC is currently down due to a failure of the controller node's load-balancer. We will update here with more information.
Outage start |
Tuesday, March 01, 2022 4:04 p.m. |
Expected end |
Sunday, May 01, 2022 4:04 p.m. |
Update: Connectivity has been restored. Root cause was a software bug preventing the creation of a PVST instance on the switch, due to a large number of configured vlans. Using a single instance for all VLANs restored functionality.
The 1g switch serving out-of-band access for nodes in rack BG-41 has encountered a (so far) unrecoverable software error, preventing traffic to the out of band interface on nodes P3-CPU-020 to P3-CPU-038.