HDFS-17216
When distcp handles small files (file size slightly smaller than bandwidth), the throttler only starts to throttle after 1 second, and the throttled is specific to a single file, causing distcp to fill the cluster bandwidth and crush production traffic. This is a workload scale bug that manifests when processing many small files.
Root Cause: The ThrottledInputStream class has a bug in getBytesPerSec() method where division by zero or very small elapsed time results in incorrect throttling calculations for small files.
Launching this artifact will open it within Chameleon’s shared Jupyter experiment environment, which is accessible to all Chameleon users with an active allocation.
Download ArchiveDownload an archive containing the files of this artifact.