The Practical Reproducibility Opportunity

 

Imagine a world where you don’t just read about new research in papers, blogs, and reports – but can immediately verify or challenge new results, extend them by interactively inserting new ideas, or integrate them into your teaching. Now, open your eyes and help us build it – and lest you think this is not achievable, let us explain. 

We first note that open platforms, like Chameleon, are a tremendous opportunity for creating, shaping, and enjoying reproducible research: when everybody has access to the same cutting edge hardware,  it is no longer the case that I can do ML research because I have a GPU cluster – but you can’t, because you don’t. You can access the same hardware I can – which means that you can reproduce or extend my result. Second, using a cloud like Chameleon means that at minimum you have to create an image that you will deploy on remote testbed resources and that fully encapsulates your experimental environment – you just grab one of our stock images, add whatever configuration you need, and then snapshot. And once you snapshot it, anybody could boot from this image as just as easily. This is not the case when you do experiments on your local resource, workstation, or laptop where you may not fully know – or may have forgotten – how they were configured. So using a testbed means that anybody could easily redeploy your experimental environment which is halfway to reproducibility – and we have not done anything special yet, we are just following the instructions and using Chameleon the way it is supposed to be used!

But here’s where we need your help. Over the years, our users created literally hundreds of thousands of images, orchestration templates, and countless notebooks. They are public, too. That’s a wealth of content, but as of right now it is not really shareable for various reasons: because it can be very hard to find, it represents only half of a solution and is not much use without the other half, or potential “consumers” of this content lack access to hardware, content, or both. Overtime, we sought to create tools and process for overcoming obstacles to creating better interfaces, services, and tools that should help our users make their research more sharable – a quick review is below with a big request to help us understand how we can make them better. 

Packaging experiments. Packaging experiments requires a programmatic interface to the testbed – in other words, turning your experiment configuration into something that you can run and re-run. This could mean experiment orchestration – but many of us prefer a non-transactional, imperative interface such as a command line interface (CLI) – or, even better, a python program – because they make it easier to re-run experiments bit by bit, fixing or extending things as we go. We have provided python and CLI interfaces to the testbed long ago  – but the really exciting thing is that now we can provide them via Jupyter which will allow you to integrate code with text and graphs, very handy for that immediate gratification of visualizing results – in other words, you are not only using a programmatic interface to the testbed – but an interface that is a fusion of results, explanation, and data analysis – realistically, something like that has to happen if we want to marry results and process. Tips on using Chameleon via the Jupyter interface have been used by quite a few of our users with interesting results.

Sharing experiments. A Jupyter notebook allows you to create a particularly elegant connection between the images you use, the resource selection for your experiment, and the body and data used in your experiment. The next challenge is how to publish, advertise, and ultimately find those notebooks such that they can actually be read (remember those hundreds of thousands of images that are not easy to find for the experiment you want to reproduce?) Here, the big difference between traditional methods of sharing research – such as papers – and sharing digital artifacts is that a paper does not need any special equipment to interpret, while digital artifacts need to be processed, visualized, or executed. The Chameleon Trovi experiment repository provides a solution to these two challenges in that it creates an indexing system for experiments that is also integrated with the testbed and therefore means that experiments are packaged as “compute capsules”, i.e., a combination of digital content and resources on which it can be executed.

Access for reproducibility. Last but not least, what good is a packaged experiment if not everybody can gain access to reproduce it? This is why we created Chameleon daypass that allows experiment authors to give access to the testbed to colleagues for the purpose of reproducing their experiments. This means that reproducing an experiment could be as easy as clicking through a graph in the electronic copy of your paper or scanning a QR code on a poster!

These three capabilities combined should allow you to create experiment patterns, i.e., experimental containers that could support multiple experiments. Taking an extra few steps in your experiment development process, packaging them in notebooks, and making them available via Trovi will multiply their usefulness by making them available to others. We implemented metrics in Trovi so you could can see how many people view and/or execute your experiments. Give it a whirl and let us know if it works for you – or if not, what we should change!

Last, but not least – come to the reproducibility hackathon at the Chameleon User Meeting – we will help you package your experiments! And if you are interested in reproducibility, take a look at  the REPETO project – it is organizing activities around the practical side of reproducibility.

 

SC: The largest Reproducibility Laboratory

Today we share a very unique user experience -- a conversation with Rafael Tolosana Calasanz who is an Associate Professor in the Department of Informatics of the University of Zaragoza, Spain and has participated in the reproducibility initiative at SC. Rafael shares with us his experiences reproducing artifacts on Chameleon and his insights on reproducibility and its importance to the modern scientific process. 

Sharing Experiments with Trovi

Learn more about Trovi, Chameleon's experiment repository, and how you can use it to collaborate on experiments and share your work. The blog also covers Trovi's integrations with Zenodo and GitHub, creating a more seamless process for running your experiment - from production to publication. 

Chameleon Hackathon 2021 -- Experiments Reproducibility and Packagability

Call For Participations

The Chameleon team is excited to hold our first Chameleon Hackathon event sometime in the 4th week of August or 1st week of September.  This year’s hackathon will focus on reproducing and packaging experiments on the Chameleon platform.  In this Call for Participations, we would like to survey Chameleon users who are interested in joining this hackathon. Please continue reading and fill out the Google form at the very end.

Reproducibility on Chameleon: Trovi meets YouTube

Explore experiments packaged and runnable on Chameleon with ~5 minute videos by the authors explaining how to launch the notebook, provision resources, and run the experiment. Whether you’re new to Chameleon, Jupyter, or Trovi, these videos can help you get started quickly and easily!

Reproducing Solid State Drive Simulator Research Results on Chameleon

November’s Chameleon User Experiments blog features Nanqinqin Li, a first-year PhD student at Princeton University. Learn more about Li, his summer research on reproducibility and Solid-State Drive Simulators, and learn where to replicate his experiment on Trovi!

Trovi: the Google Drive for Chameleon Experiments

Trovi is the next iteration of the Chameleon experiment management and sharing platform. With Trovi, you can set up and configure your experimental environment from within a Jupyter notebook, document and save your experiment similarly in notebook form, and privately share it with collaborators or publish it for any Chameleon user to build on. Learn more inside!

Trovi: Your Tool for Reproducible Research

Trovi is the next iteration of the Chameleon experiment management and sharing platform. With Trovi you can set up and configure your experimental environment from within a Jupyter notebook, document and save your experiment similarly in notebook form, and privately share it with collaborators or publish it for any Chameleon user to build on. Learn more inside!

Chameleon and Reproducibility: LinnOS Case Study

This summer, a team of students worked on an experiment that ultimately became part of the LinnOS paper that infers the SSD performance with the help of its built in light neural network architecture. The LinnOS paper, which utilizes Chameleon testbed to provide a public executable workflow, will be presented in OSDI ’20 and is available here


Two of the students, Levent Toksoz and Mingzhe Hao, write about their experience in this Chameleon User Stories series. Toksoz is a recent graduate of the University of Chicago computer science masters program. He studied physics and math as an undergrad at the University of Michigan and is planning to apply to PhD programs in computer science. Hao is a Ph.D candidate of the UCARE group in the Department of Computer Science at the University of Chicago. His research interests include operating systems, storage systems, and distributed systems.

Packaging Experiments for Reproducibility

Chameleon integrates directly with Jupyter Notebook to provide an experimental environment that has everything you could need for research - a cloud testbed, a way to combine actionable code with written documentation, and sharing capabilities through Zenodo. Learn more about how to take advantage of all these capabilities and package your notebooks for publishing. 

A reproducible workflow with Jupyter on Chameleon

Jupyter notebooks are a great tool for structuring your computer science experiments on Chameleon because they allow you to iterate on your idea interactively, intuitively, and quickly. But, it may not be obvious how you can leverage this tool for running an experiment...


Add a comment

No comments