Outage start | Friday, January 10, 2020 4 p.m. |
Expected end | Monday, January 13, 2020 9 a.m. |
We experienced a networking outage across our TACC cluster starting Friday at approximately 4pm. Our internal DHCP service at CHI@TACC stopped being responsive, and as a result when DHCP leases expired for nodes within an experiment, they effectively were disconnected from the network. This causes experimental nodes to be unreachable via SSH, though the Chameleon portal and user interfaces were still operational. CHI@UC was unaffected.
Connectivity was largely restored on Sunday evening at 4pm, and was fully resolved Monday morning by 9am.