Reported Outages

Upcoming Maintenance window for Chameleon Auth server

Resolved Posted by Mark Powers on August 22, 2024
Outage start Tuesday, August 27, 2024 9:30 a.m.
Expected end Tuesday, August 27, 2024 9:35 a.m.

On the morning of Tuesday, August 27rd, there will be a brief outage affecting login to all Chameleon sites and services at 9:30 AM central time.

We expect a 5-10 minute outage while we apply updates to the service that handles federated login.
This won't affect any running workloads or nodes, but you may need to refresh your browser once the outage ends.

DNS outage for chi.uc.chameleoncloud.org

Resolved Posted by Michael Sherman on August 22, 2024
Outage start Thursday, August 22, 2024 2:12 p.m.
Expected end Thursday, August 22, 2024 2:50 p.m.

At 2:12 PM a DNS issue took down access to chi.uc.chameleoncloud.org. We have rolled back the responsible change, and the site should come back online in the next 30 minutes.

CHI@TACC Maintenance window for OpenStack version upgrade

Resolved Posted by Cody Hammock on August 07, 2024
Outage start Wednesday, August 21, 2024 8 a.m.
Expected end Tuesday, August 27, 2024 9:44 a.m.

RESOLVED: CHI@TACC maintenance is complete.

UPDATE: The OpenStack upgrade is complete. Staff are aware of some potential issues launching new instances, and are working to resolve them. Existing instances are unaffected, and once again available.

At 8am central time, on August 21st, CHI@TACC will be unavailable for use, as we upgrade the core OpenStack services on the controller hosts. During this time, the chi.tacc.chameleoncloud.org webpage and APIs will be inaccessible, as will network connectivity to any running instances.

CHI@UC: some rtx_6000 nodes not provisioning

Resolved Posted by Michael Sherman on August 05, 2024
Outage start Thursday, August 01, 2024 6 p.m.
Expected end Tuesday, August 06, 2024 6 p.m.

We're observing intermittent provisioning failures for a small set of nodes at CHI@UC, all of which are "phase 2" nodes, mostly rtx_6000s.

The nodes currently known to be affected are listed below, and have been placed into a non-reservable maintenance mode until we can resolve the issue.

CHI@UC Lease enforcement

Resolved Posted by Michael Sherman on July 25, 2024
Outage start Thursday, July 25, 2024 12 p.m.
Expected end Thursday, July 25, 2024 6 p.m.

At CHI@UC, when submitting a request for a new lease, users are intermittently receiving an error message about "enforcement failed". After restarting a relevant service, lease requests are now succeeding again.

We suspect this issue is related to a service token expiring and not being correctly renewed, and are investigating a proper fix.

Upcoming Maintenance window for Chameleon Auth server

Resolved Posted by Mark Powers on July 16, 2024
Outage start Tuesday, July 23, 2024 9:30 a.m.
Expected end Tuesday, July 23, 2024 9:35 a.m.

On the morning of Tuesday, July 23rd, there will be a brief outage affecting login to all Chameleon sites and services at 9:30 AM central time.

We expect a 5-10 minute outage while we apply updates to the service that handles federated login.
This won't affect any running workloads or nodes, but you may need to refresh your browser once the outage ends.

CHI@UC Maintenance window for Openstack version upgrade

Resolved Posted by Michael Sherman on July 02, 2024
Outage start Monday, July 22, 2024 9 a.m.
Expected end Tuesday, July 23, 2024 1 p.m.

As of 1pm, July 23rd, reservation enformenents are fixed, and we're declaring the CHI@UC outage over.
You may notice some minor changes to the horizon dashboard as we fix some remaining UI issues in the instance launch dialog, but these should not affect any functionality.

Upcoming Maintenance window for Chameleon Auth server

Resolved Posted by Mark Powers on June 11, 2024
Outage start Tuesday, June 18, 2024 10 a.m.
Expected end Tuesday, June 18, 2024 10:06 a.m.

UPDATE: All services are updated and should be working as expected


On the morning of Tuesday, June 18th, there will be a brief outage affecting login to all Chameleon sites and services at 9:30 AM central time.

We expect a 5-10 minute outage while we apply updates to the service that handles federated login.
This won't affect any running workloads or nodes, but you may need to refresh your browser once the outage ends.

CHI@UC Uplink networking

Resolved Posted by Michael Sherman on May 29, 2024
Outage start Wednesday, May 29, 2024 10:28 a.m.
Expected end Wednesday, May 29, 2024 11:28 a.m.

CHI@UC had a brief interruption in connectivity, preventing access to chi.uc.chameleoncloud.org. This issue manifested as a failure for the UC control-plane servers to contact the chameleon authentication server, triggered by what was planned to be unrelated work elsewhere in the network.

Upon discovering the issue, the other network changes were backed out, and service has been restored.