FSA-benchmark
This project aims to explore and benchmark various machine learning models for detecting disks at high risk of experiencing fail-slow anomalies.
Implemented Algorithms
Cost-Sensitive Ranking Model
Inspired by the paper "Improving Service Availability of Cloud Systems by Predicting Disk Error" presented at the USENIX ATC '18 conference, this model ranks disks based on fail-slow risk.Multi-Prediction Models
Drawing from "Improving Storage System Reliability with Proactive Error Prediction" presented at the USENIX ATC '17 conference, this approach uses multiple traditional machine learning models to evaluate disk health using diverse features. Various models were tested, with the Random Forest classifier proving most effective.LSTM Model
This model employs Long Short-Term Memory (LSTM) networks, trained on the first day's data for each cluster and evaluated on data spanning all days. It captures temporal dependencies to accurately predict fail-slow anomalies over time.PatchTST Model
An advanced sequence model that leverages transformers to handle time series prediction and fail-slow detection.GPT-4o-mini
A large language model used to analyze disk metrics and detect fail-slow conditions
Performance Analysis
To evaluate model performance, we generate heatmaps depicting precision and recall across various clusters. These visualizations offer a clear representation of each algorithm's effectiveness, enabling us to assess prediction accuracy and inter-cluster performance
Requirements
- A Chameleon account with an active project allocation.
Launching this artifact will open it within Chameleon’s shared Jupyter experiment environment, which is accessible to all Chameleon users with an active allocation.
Request daypassIf you do not have an active Chameleon allocation, or would prefer to not use your allocation, you can request a temporary one from the PI of the project this artifact belongs to.
Download ArchiveDownload an archive containing the files of this artifact.