Using cloud servers for GPU-based inference

Machine learning models are most often trained in the "cloud", on powerful centralized servers with specialized resources (like GPU acceleration) for training machine learning models. These servers are also well-resources for inference, i.e. making predictions on new data.

In this experiment, we will use a cloud server equipped with GPU acceleration for fast inference in an image classification context.

This notebook assumes you already have a "lease" available for an RTX6000 GPU server on the CHI@UC testbed. Then, it will show you how to:

  • launch a server using that lease
  • attach an IP address to the server, so that you can access it over SSH
  • install some fundamental machine learning libraries on the server
  • use a pre-trained image classification model to do inference on the server
  • optimize the model for fast inference on NVIDIA GPUs, and measure reduced inference times
  • delete the server

Consider running this together with Using edge devices for CPU-based inference!

Materials are also available at: https://github.com/teaching-on-testbeds/cloud-gpu-inference

3 2 1 1 Nov. 14, 2023, 8:01 PM

Authors

Launch on Chameleon

Launching this artifact will open it within Chameleon’s shared Jupyter experiment environment, which is accessible to all Chameleon users with an active allocation.

Download Archive

Download an archive containing the files of this artifact.

Download with git

Clone the git repository for this artifact, and checkout the version's commit

git clone https://github.com/teaching-on-testbeds/cloud-gpu-inference
# cd into the created directory
git checkout 011c89df414e617aa3f6e04ebbe8e95007c912c0
Feedback

Submit feedback through GitHub issues

Version Stats

3 2 1