FSA-benchmark

This project aims to explore and benchmark various machine learning models for detecting disks at high risk of experiencing fail-slow anomalies.

Implemented Algorithms

  1. Cost-Sensitive Ranking Model
    Inspired by the paper "Improving Service Availability of Cloud Systems by Predicting Disk Error" presented at the USENIX ATC '18 conference, this model ranks disks based on fail-slow risk.

  2. Multi-Prediction Models
    Drawing from "Improving Storage System Reliability with Proactive Error Prediction" presented at the USENIX ATC '17 conference, this approach uses multiple traditional machine learning models to evaluate disk health using diverse features. Various models were tested, with the Random Forest classifier proving most effective.

  3. LSTM Model
    This model employs Long Short-Term Memory (LSTM) networks, trained on the first day's data for each cluster and evaluated on data spanning all days. It captures temporal dependencies to accurately predict fail-slow anomalies over time.

  4. PatchTST Model
    An advanced sequence model that leverages transformers to handle time series prediction and fail-slow detection.

  5. GPT-4o-mini
    A large language model used to analyze disk metrics and detect fail-slow conditions

Performance Analysis

To evaluate model performance, we generate heatmaps depicting precision and recall across various clusters. These visualizations offer a clear representation of each algorithm's effectiveness, enabling us to assess prediction accuracy and inter-cluster performance

Requirements

  • A Chameleon account with an active project allocation.
55 15 12 8 Aug. 16, 2024, 5:13 PM

Authors

Launch on Chameleon

Launching this artifact will open it within Chameleon’s shared Jupyter experiment environment, which is accessible to all Chameleon users with an active allocation.

Request daypass

If you do not have an active Chameleon allocation, or would prefer to not use your allocation, you can request a temporary one from the PI of the project this artifact belongs to.

Download Archive

Download an archive containing the files of this artifact.

Version Stats

55 15 12