YARN-10649

The RMNodeImpl.updatedExistContainers was not removing entries related to containers that had already finished execution. As the number of finished containers increased, these objects stayed in memory indefinitely, which gradually consumed more memory, ultimately causing the system to run out of memory (OOM).

5 2 - 1 Nov. 13, 2024, 8:48 AM

Authors

Launch on Chameleon

Launching this artifact will open it within Chameleon’s shared Jupyter experiment environment, which is accessible to all Chameleon users with an active allocation.

Download Archive

Download an archive containing the files of this artifact.

Version Stats

5 2 -