Tickets of the Year: Solutions to Your 2020 (Ticket) Problems

Is your instance not launching? Are your Floating IPs drifting aimlessly through the ether? Do you have a PI eligibility request? Chameleon tickets are the fastest way to reach the Chameleon support team and receive assistance for all your testbed needs. It’s 2020. Everyone could use a little extra help. 

As 2021 and Oscars season approaches, the Chameleon team has compiled “Tickets of the Year” designed to help you avoid (at least some of) the same stumbling blocks of 2020. Read on to learn about some of the most common tickets, their solutions, and some special ticket award categories. You can always reach out to the Help Desk team for white-glove troubleshooting help. 

 

Without further ado, the Oscar for the Most Often Encountered Error goes to… “No Host Found” Error

Problem: Are you frequently encountering the “No Host Found” error when trying to reserve a lease? While COVID might prevent you from hosting a typical Christmas, with this fix, you’ll always be able to find a host on Chameleon.

Solution: This error occurs when all the nodes of the type you’re trying to reserve have already been reserved for the time frame that you want. While we can’t free them up for you, you can visit the Host Calendar, select the type of resource you want to experiment on, and see the time frames that are reserved and available. One important thing to note is the site that you’re reserving resources on - like UChicago versus TACC - as the sites have different resources and availability. 

 

Category: Most Wanted: The Problem Report

Winner: The Help Desk Email: What Should I Include?

Problem: You probably haven’t been in an office since the beginning of 2020. Whether you’ve mastered email writing or still rethink every sentence, we’ve got you covered. Here’s everything you can include to help us resolve your problem faster and get you back to experimenting. 

Solution: More information is always better! Attachments, error messages, outputs - include it all. That being said, here are some key points to make sure to include: 

  • Project Name 

  • Username

  • UUID of the Instance or Lease (if applicable): To find the UUID in the GUI, click on your Instance/Lease name and ‘Lease Detail’ will appear. The UUID is located below the ‘Name’, with the heading ‘Id’.

Just as a reminder, the Help Desk button is available on the Chameleon webpage, right next to your username. When you’re in the experimental portal (CHI@UC, CHI@TACC, CHI@NU), the Help Desk button is in the drop down menu after clicking on your username.

 

Category: The Most Avoidable Problem

Winner: Launching with Specialized Nodes

Problem: Chameleon offers a variety of hardware (explore the options); one common difficulty our users encounter are launch issues when using a specialized node of certain types: either GPU, FPGA or ARM64. Usually, users reserve the specialized node, but then try to launch a standard image rather than a specialized image. This normally leads to launch issues, and reports of being unable to ssh into the instance.

Solution: Luckily, there’s a quick fix for this difficulty. Simply use the specialized image that corresponds to the node you’re reserving. 

For GPU: use CUDA specific images like CC-CentOS7-CUDA10

For FPGA: use FPGA specific images like CC-CentOS7-FPGA

For ARM64: use ARM64 specific images like CC-Ubuntu16.04-ARM64

Note: You may encounter a similar issue if launching a non-specialized node with a specialized image. For example, trying to launch a Skylakes or Haswell node with a FPGA image. 

 

Category: The Largest Problem 

Winner: Large Scale Network Deployment

Problem: In large-scale network projects, the experiment can be performance sensitive. However, the location of the nodes and racks that you’re using in your experiment can significantly impact performance, leading to asymmetric network behavior. If your experiment is performance sensitive, mitigating against this kind of behavior is crucial.

Solution: Use nodes that are in the same section of a rack. If you’re seeing asymmetric network behavior, it is likely because your nodes are at different locations on a rack. The bottom and the top of a rack each have their own switch, with an interconnect between them, which explains the difference in networking behavior.

  1. To choose your nodes, use the host calendar under the Leases tab on your Chameleon portal page to identify free nodes in the same section of the rack. 

          

  1. After identifying enough free nodes, you can create a lease with specific nodes using the CLI and this guide. A Chameleon blog post also highlights how to allocate nodes on the same rack. 

As an example, if you wanted to create a lease with multiple nodes, you can adapt the following command (currently set for 2 nodes): 

blazar lease-create --physical-reservation min=2,max=2,resource_properties='["or",["=","$name","<node_name_1>"],["=","$name","<node_name_2>"]]' <your_lease_name></your_lease_name></node_name_2></node_name_1>

 

Category: The Most Annoying

Winner: CloudFuse Breaking Up

Problem: Experiencing trouble with CloudFuse for object storage and migration? CloudFuse can sometimes have stability issues and will have trouble uploading large files (especially above 4GB). 

Solution: One simple, quick solution is to try remounting: 

  1. Download your v2 or v3 RC file from the Chameleon web interface (click your username at the top right).

  2. Create a file in your home directory (/home/cc/openrc) on the instance and copy the contents of the RC file into it.

  3. Source the file: $ source openrc

  4. Now you need to mount the object store. It's usually better to unmount it first:

    1. $ cc-cloudfuse unmount my_mounting_point

    2. $ cc-cloudfuse mount my_mounting_point

While remounting is a good fix, if stability is important to you, try using Swift, the OpenStack Object Store. Though Swift requires more scripting knowledge, it’s more stable, and you can also automate with it! Check out their Large Object documentation to learn more. 

 

Category: Easiest to Fix on Your Own

Winner: It’s Raining Errors: Leases in Error State

Problem: Are your leases falling into an error state? This can happen when an action isn’t applied to a lease, likely because it fails to meet Chameleon requirements, like an extension is too long. Luckily, there’s an easy fix to get it back up and running.

Solution: To reset your lease, it just needs to be ‘touched’. You can do this within the GUI or CLI with the blazar lease-update process: 

In the GUI: 

  1. Click update lease

  2. Set the name to what it already is

  3. Click save!

In the CLI: 

Simply use this command: $ blazar lease-update <lease uid=""> --name <current lease="" name=""></current></lease>

 

Category: Most Overlooked Fix

Winner: Connectivity: Pinging Your Instance

Problem: While 2020 is the poster child for connectivity issues, don’t let that be the case with your Chameleon leases! Look no farther if you’re having trouble pinging your instance or ssh-ing in.

Solution: Using Chameleon’s GUI interface, you can access a remote console to your instance. Once you’re there, you can check the console to see if instances have been deployed (and troubleshoot) or determine what failed. To access the console on your Chameleon portal page simply go to ‘Compute’ > ‘Instances’. Once you’re on the ‘Instances’ page, click the instance name and select the ‘Console’ tab. It may take a minute to load, but then you’ll be able to troubleshoot your connectivity issues yourself, and see why you can’t connect to your instances!

 

Category: Commonly Misused Feature

Winner: PI Requests 

Problem: While it's tempting to want to be a PI - and have all the power and responsibility that accompany it - there are a few logistical requirements to check off first. 

Solution: To be a PI, you must be a member of one of the following groups: 

  1. A faculty member of an academic institution 

  2. Research staff of a federally funded lab, center or agency

  3. Conducting research at a museum, observatory, library etc. 

  4. NSF Graduate Student Fellows

  5. State educational offices or organizations

Each group is explained more fully in the Chameleon FAQ. If you believe that you are an exception, Chameleon occasionally provides exceptions on a case-by-case basis. 

 

One last reminder before Chameleon signs off for the holidays, the Help Desk button is available on the Chameleon webpage, right next to your username. When you’re in the experimental portal (CHI@UC, CHI@TACC, CHI@NU), the Help Desk button is in the drop down menu after clicking on your username. No ticket is too small, no problem too complicated, and now you have everything you need to write the perfect Help Desk ticket.

 

That’s all, folks. We’ll see you next year!


Add a comment

No comments