| Outage start |
Monday, June 06, 2022 8 a.m. |
| Expected end |
Tuesday, June 07, 2022 6 p.m. |
Update 5:30 pm: Issues are resolved, all nodes are usable again.
Update 4pm June 7th: Provisioning of baremetal nodes is restored. We're seeing failures to create leases for P2 nodes (types compute_skylake, gpu_rtx_6000), but reservation of P3 nodes is succeeding.
| Outage start |
Thursday, March 24, 2022 10:25 a.m. |
| Expected end |
Thursday, March 24, 2022 12 p.m. |
Update: This has been resolved as of 11:42 AM, and the site is back up. Running nodes should not have been affected, aside from the temporary loss of network connectivity.
CHI@UC is currently down due to a failure of the controller node's load-balancer. We will update here with more information.
| Outage start |
Tuesday, March 01, 2022 4:04 p.m. |
| Expected end |
Sunday, May 01, 2022 4:04 p.m. |
Update: Connectivity has been restored. Root cause was a software bug preventing the creation of a PVST instance on the switch, due to a large number of configured vlans. Using a single instance for all VLANs restored functionality.
The 1g switch serving out-of-band access for nodes in rack BG-41 has encountered a (so far) unrecoverable software error, preventing traffic to the out of band interface on nodes P3-CPU-020 to P3-CPU-038.
| Outage start |
Monday, February 21, 2022 3 p.m. |
| Expected end |
Tuesday, February 22, 2022 11:17 a.m. |
Update 11:16 CST: This should now be resolved. A forwarding loop in the underlying network topology caused some ports to become shut down. Instance provisioning and floating IPs should now be working again. Please reach out if you're still seeing issues on the UC site.
We're currently observing networking issues at UC. New instances are failing to provision, and existing ones are unreachable. We're still investigating the root cause, but will update here when resolved. Other sites are unaffected.