Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigation into the reduction of runtime and memory usage for StreamGobbler #2095

Open
georgetayqy opened this issue Jan 23, 2024 · 0 comments

Comments

@georgetayqy
Copy link
Contributor

georgetayqy commented Jan 23, 2024

What feature(s) would you like to see in RepoSense

As detailed in issue #2091, the StreamGobbler class consumes a large amount of memory when in use, upwards of around > 500 MB per run.

After some digging around through the codebase, and looking through the different source codes for String and StringBuilder class, it appears that there might be some performance bottlenecks with the way that the code is currently written.

Currently, the code is implemented as such:

ReadableByteChannel ch = Channels.newChannel(is);
int len;
while ((len = ch.read(buffer)) > 0) {
    sb.append(new String(buffer.array(), 0, len));
    buffer.rewind();
}
value = sb.toString();

We can observe that a new String is created for every 8 KB of data read into the buffer and that the string is subsequently appended with the other strings stored in StringBuilder before the buffer is rewound and overwritten in the next file read operation.

After reading through the String API, I noticed that the creation of a new string from the buffer array possibly creates a new copy of the array with Arrays::copyOf or Arrays::copyOfRange within the StringCoding class, which handles the decoding of String objects.

Moreover, the String appending process for StringBuilder could possibly make a call to the AbstractStringBuilder::getBytes method, which makes another call to System::arraycopy.

The combination of both method calls means that repeated work may have been done, first to copy the byte buffer into the byte array contained in a String, and thereafter to copy the byte array in the String out into the internal byte array of StringBuilder.

This repeated work, as well as the creation of multiple String objects (which could be problematic when the files are huge, since each String object can only contain at most 8 KB of data from the file), could result in a significant decrease in runtime performance (also possibly from garbage collection) and an increase in (heap) memory usage.

We could look into finding new ways to read all data in an input stream and avoid repeated work to improve both runtime and memory performance.

Is the feature request related to a problem?

This issue is not related to a problem, but it is related to the overall goal of making RepoSense more performant.

If possible, describe the solution

Currently, I am unable to find a solution that works sufficiently well. Improving memory performance necessarily means that runtime performance would degrade and vice versa.

Some resources that we might wish to take a look at would be this. I have tried out using BufferedReader from the guide and it seems to reduce the overall runtime and memory usage but it seems that it is occasionally failing test cases and system test cases.

Here is the result of one of the profiling runs:

Screenshot 2024-01-23 at 7 05 16 PM Screenshot 2024-01-23 at 7 05 36 PM

The overall runtime and memory usage were lower compared to the improvements made in #2091.

The code tested is as follow:

StringBuilder sb = new StringBuilder();

try (BufferedReader streamReader = new BufferedReader(new InputStreamReader(is))) {
    int c;

    while ((c = streamReader.read()) != -1) {
        sb.append((char) c);
    }

    value = sb.toString();
}

If applicable, describe alternatives you've considered

Currently, no other alternatives have been considered.

Additional context

N/A

@georgetayqy georgetayqy changed the title Suggestions for reducing runtime and memory usage for StreamGobbler Investigation into the reduction of runtime and memory usage for StreamGobbler Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

1 participant