Reported Outages | Chameleon

CHI@EVL maintenance window April 3rd

Resolved Posted by Michael Sherman on March 22, 2023

Outage start	Monday, April 03, 2023 11 a.m.
Expected end	Wednesday, April 05, 2023 4 p.m.

On April 3rd, CHI@EVL will be down while we replace the controller node. Conservatively, this should take about 4 hours before services are restored. Running instances and leases won't be modified, but will not be accessible during the outage window.

temporary outage for object store at CHI@UC

Resolved Posted by Michael Sherman on March 15, 2023

Outage start	Thursday, March 16, 2023 1 p.m.
Expected end	Thursday, March 16, 2023 1:30 p.m.

Update 5PM CT: Object store is back online via a workaround.
There will be a blip tomorrow at 1PM so we can test a permanent fix for this issue that triggered this outage.

User portal and lease/allocation system scheduled maintenance 02/28

Resolved Posted by Adam Cooper on February 14, 2023

Outage start	Tuesday, February 28, 2023 10 a.m.
Expected end	Tuesday, February 28, 2023 11 a.m.

The user portal will be down for about an hour for scheduled maintenance on 02/28.

CHI@IIT currently down

Resolved Posted by Michael Sherman on January 27, 2023

Outage start	Friday, January 27, 2023 5:01 p.m.
Expected end	Monday, January 30, 2023 5:14 p.m.

CHI@IIT is back up, but we're still waiting for the arrival of replacement hardware. Currently, bringing the site back online requires in-person actions, and so you'll observe instability until said hardware is installed. We plan for this work to be completed by the end of this week, subject to parts availability.

Network maintenance at TACC January 25, 2023

Resolved Posted by Cody Hammock on January 25, 2023

Outage start	Wednesday, January 25, 2023 11 a.m.
Expected end	Wednesday, January 25, 2023 4:10 p.m.

COMPLETE: The work is complete. Please let us know via the helpdesk if you encounter any ongoing issues.

In order to perform some necessary network maintenance, there will be brief interruptions to compute instances for CHI@TACC and KVM@TACC.

chi@edge public IPs unavailable

Resolved Posted by Michael Sherman on January 18, 2023

Outage start	Wednesday, January 18, 2023 5:35 p.m.
Expected end	Tuesday, January 31, 2023 5:11 p.m.

Current status:
Stability issues are resolved, we had observed and fixed deadlocks in both container launches and lease creation, due to an upstream eventlet bug.
Public Floating IPs are now functional again. The network operated by our infrastructure provider was filtering some mac-addresses, which bridged networks from working.

Scheduled JupyterHub maintenance

Resolved Posted by Mark Powers on December 22, 2022

Outage start	Thursday, January 05, 2023 9 a.m.
Expected end	Thursday, January 05, 2023 7:17 p.m.

We will be bringing down JupyterHub at 9:00 AM on January 5 for scheduled maintenance.

UPDATE: The upgrade is now complete. If you encounter any issues with jupyter or trovi, please let us know via the help desk

Scheduled outage affecting stitching from CHI@UC to FABRIC

Resolved Posted by Michael Sherman on December 09, 2022

Outage start	Wednesday, December 21, 2022 10:30 a.m.
Expected end	Wednesday, December 21, 2022 2:30 p.m.

On Wednesay, December 21st, from 10:30am - 2:30pm vlans 3300-3309 connecting CHI@UC to FABRIC will be down due to emergency maintenance on an upstream router. This maintenance was delayed from Dec 13th due to parts availability.

Authentication Outage for CHI@TACC

Resolved Posted by Cody Hammock on December 07, 2022

Outage start	Wednesday, December 07, 2022 6:30 a.m.
Expected end	Wednesday, December 07, 2022 10:20 a.m.

Resolved: Authentication for CHI@TACC has been resolved.

Starting at approximately 6:30 AM Central Time CHI@TACC has been experiencing an outage relating to authenticaion of the web interface. CLI access is not affected. Staff are working to resolve the issue.

KVM@TACC instance launch issues

Resolved Posted by Michael Sherman on November 23, 2022

Outage start	Wednesday, November 16, 2022 10:40 a.m.
Expected end	Wednesday, November 30, 2022 10:30 a.m.

We are observing intermittent timeouts accessing the dashboard and launching instances on KVM@TACC.

This appears to be linked to higher than expected load on a database node, we are investigating solutions.

Running instances should not be affected.