Extending Your Research Artifacts' Lifespan

How to Preserve Your Valuable Data on Chameleon Cloud

Understanding how to preserve your valuable research on Chameleon Cloud is crucial for research continuity and community contribution. Here's how to extend the lifespan of your resources through smart public sharing.

Understanding Chameleon's Resource Retention Policy

Chameleon's retention policy varies significantly based on resource type and visibility. The table below shows the retention period (defined as the period occurring after allocation expiration) for various Chameleon resource types:

 

Resource Type Retention Period
KVM instances Auto-shutdown and shelving after 48 hours, deleted after 1 year
Bare metal instances, leases, networks Deleted immediately
Private resources (images, volumes, object stores) Retained for 1 year
Public resources (images, volumes, object stores) Retained for 5 years
SSH keys Retained for the project lifetime

 

Public resources are stored for free for 5 years to ensure persistent access to valuable research artifacts. As a testbed dedicated to reproducible science, we encourage public sharing to bolster transparency in CS research.

Make Your Work Public: The #1 Strategy for Long-Term Retention

The most important takeaway from Chameleon's retention policy is this: Public resources are kept for 5 years, while private resources are only retained for 1 year at most. This is to ensure persistent access to valuable research artifacts that are needed for further validation, reproduciblity, or new research avenues. As a testbed decidated to reproducible science, we encourage public sharing to bolster transparency in CS research.

This significant difference makes sharing your work publicly the single most effective strategy for preserving your research artifacts and contributing to science reproducibility.

Benefits of Making Your Resources Public

  • Extended retention (5 years vs. 1 year) - Your work remains accessible five times longer
  • Research reproducibility - Other researchers can verify and build upon your results
  • Community impact - Your contributions can help advance work in your field
  • Visibility for your research - Public resources can lead to citations and collaborations. (Hint: You can even track your artifact usage through our Trovi service.)
  • Scientific legacy - Your work remains available even after your project concludes

What Types of Resources Should You Make Public?

Not all resources are suitable for public sharing. Here's what to consider making public:

Highly Recommended for Public Sharing

  • Custom images with research software stacks; experiment environments; Heat templates
  • Datasets that could benefit other researchers (non-sensitive) or that are needed for reproducibility
  • Experiment configurations that demonstrate novel approaches
  • Benchmarking results for particular hardware configurations
  • Jupyter Notebooks with your experiment orchestration and/or results analyses

Less Suitable for Public Sharing

  • Resources containing sensitive or proprietary information
  • Incomplete or untested configurations
  • Temporary or intermediary resources
  • Resources with hardcoded credentials

Making Chameleon Resources Public: Step-by-Step

Most resources on the Chameleon Testbed include a visibility attribute that you can modify to make them public. Here's how to do it for different resource types:

For Images (Bare Metal and Virtual Machine):

  1. Navigate to the appropriate site (CHI@TACC, CHI@UC, or KVM@TACC )
  2. Select "Compute" > "Images" from the sidebar menu
  3. Find your image and click the dropdown in the "Actions" column
  4. Select "Edit Image" and change "Visibility" to "Public"
  5. Add documentation in the description and save changes

Tip: When creating images with cc-snapshot, you can share them to your project automatically. These images can then be made public following the steps above.

For Object Storage Containers:

  1. Navigate to "Object Store" > "Containers"
  2. Click on your container and then "Public Access" button
  3. Consider adding a README.md file explaining the contents

For Trovi Artifacts:

  1. Select your artifact in Trovi and click "Share"
  2. Check "Enable all users to find and launch"
  3. For maximum longevity, select "Publish with DOI" to publish to Zenodo

Using Trovi for Maximum Resource Longevity

Chameleon's Trovi sharing portal is ideal for preserving research artifacts, with powerful integrations for long-term storage and version control.

Trovi-Zenodo Integration

When you publish a Trovi artifact with a DOI:

  • Your artifact is automatically archived in Zenodo with a permanent DOI
  • The artifact becomes formally citable in academic publications
  • All included resources follow Zenodo's long-term archival policies
  • Your work becomes discoverable through both Trovi and Zenodo search interfaces

Trovi-Git Integration

Trovi also supports Git integration for version-controlled artifacts:

  • Import existing Git repositories directly into Trovi artifacts
  • Create new Trovi artifact versions from specific Git commits or branches
  • Export Trovi content to Git repositories for collaborative development
  • Link your artifact to external repositories on GitHub, GitLab, or other platforms

To create a Trovi artifact:

  1. Package your experiment in Jupyter or prepare a Git repository
  2. For Jupyter: Click "Share" and select "Package as a new artifact"
  3. For Git: Use the "Import Artifact" function and select "Import from Git"
  4. Fill out metadata carefully and consider publishing with a DOI

For detailed instructions on these integrations, see the Trovi Git documentation and Zenodo publishing guide.

Other Data Retention Strategies

Community Impact Through Sharing

By making your artifacts public, you're contributing to a growing ecosystem of reusable research components. The most successful Chameleon projects have created resources used the researchers community, significantly amplifying their impact.

The key takeaway: If you want your Chameleon resources to have longevity and impact, make them public! The 5-year retention period benefits both your research continuity and the broader scientific community.

Where Do I Put My Data in Chameleon?

Have you ever lost your data after your instance failed, or are your instances failing to launch with a custom image? You may be handling your data incorrectly in the cloud. Read on to learn how to keep your data persistent and your custom images small.

Transferring Large Data Flows on Chameleon

Ready-to-use Data Transfer Node (DTN) is provided, and it can be used to provide efficient network data transfer over a long fat network. In addition, a Chameleon Complex Appliance is publish for easy spawning a set of DTNs in Chameleon Cloud.


Add a comment

No comments