This bug concerns a connection leak in the Flink Table Store, which is a data lake storage developed under the umbrella of Apache Flink. In particular, the extract() method in the OrcFileStatsExtractor class creates a new reader configured with a HadoopReadOnlyFileSystem, both of which leak connections. OrcFileStatsExtractor forgets to close the reader, and HadoopReadOnlyFileSystem fails to close the input streams it opens. As such, any procedure or function that requires an OrcFileStatsExtractor will leak connections.

This notebook reproduces this connection leak bug by hacking the unit tests that the fix patch contributed. The unit tests create a new TraceableFileSystem for testing purposes that keeps track of a List of open connections a file system currently has for testing purposes. Then after each file system test, it asserts that the List is empty to ensure all connections are closed. This notebook instead creates a new test called testCloseConnections, which creates a new file store table, writes/commits to it 500 times, and then records the number of open connections after each commit. Since commits create an Orc reader to get the metadata of the table, we should see that connections are leaked every commit.

3 2 2 1 May. 13, 2024, 11:39 PM


Launch on Chameleon

Launching this artifact will open it within Chameleon’s shared Jupyter experiment environment, which is accessible to all Chameleon users with an active allocation.

Download Archive

Download an archive containing the files of this artifact.

Version Stats

3 2 2