-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leak in CommonMark implementation #308
Comments
Here is the average memory leaked at tag 2.23.0 before #226 was merged:
Here is the average memory leaked at commit bf6f0c5 from the current branch main, drilled down to the affected call sites:
This shows that there was no major memory leak before #226. Furthermore, this also shows that the memory leak is related to patterns for links, images. and emphasis, and that the brunt of the memory leak happens during copying syntax tables at the end of Experimental detailsFirst, I updated file *** build-docker-image.sh.old 2023-08-12 16:29:14.796017962 +0200
--- build-docker-image.sh.new 2023-08-12 15:25:45.246758729 +0200
***************
*** 32,34 ****
--- 32,36 ----
# Clean up
+ git add markdown.dtx
git checkout .
+ git restore --staged markdown.dtx
Then, I added the following sentinel code into different parts of the Lua code in file -- Log RAM at the beginning of the test.
os.execute("free -m | head -2 | awk '{ print $1,$2,$3,$4 }' | sed 's/ shared//'") -- Log RAM at the end of the test.
os.execute("free -m | head -2 | tail -1 | awk '{ print $1,$2,$3,$4 }'") Next, I created a file import statistics
import sys
deltas = []
previous_used = None
for line in sys.stdin:
if line.startswith('Mem: '):
_, total, used, free = line.split()
used = int(used)
if previous_used is None:
previous_used = used
continue
delta = used - previous_used
deltas.append(delta)
previous_used = None
print(f'Average memory leak based on {len(deltas)} runs: {statistics.mean(deltas):.2f}MiB') Finally, I moved measured the average memory leaked in |
@lostenderman Above, I have narrowed down the memory leaks to tens of lines of code. However, I have mainly focused on discovery so far and I don't have a precise idea about the nature of the memory leak yet. If you can spare the time, I would appreciate it if you could glance over the affected code and help me brainstorm about the possible causes. |
I had a short brainstorming session about the issue with AI-enabled Bing, which suggested the following: ``One possible cause is that you are defining recursive patterns that create reference cycles between them.'' I can see how this could be a problem and how this would prevent garbage collection. I haven't reviewed the code to see if this could actually be the issue yet, but the solution would be to save the cyclic patterns in the grammar and make the references between them indirect using |
To improve the measurement accuracy, I prefixed both parts of the sentinel code from #308 (comment) with |
As reported by @TeXhackse, testing can take up to 8G RAM with 4 CPUs and batch size 100, which seems excessive. This is related to #226, #308, and #318.
In #226 (comment), we have identified a memory leak in our implementation of CommonMark. #226 (comment) has shown that large data structures are created at every call of
markdown.new()
and not freed by garbage collection. The issue has been hotfixed in 1c4b844 by adding optionsingletonCache
that caches the result of callingmarkdown.new(options)
to an LRU cache of size 1 keyed byoptions
, but documents with a large number of Markdown fragments typeset with different options are still affected.Acceptance criteria
First, navigate to the root folder of the git repository and save the following script to a file named
memory-leak.lua
:Next, create executable files named
bisect.sh
,build-docker-image.sh
,run-test.sh
, andremove-docker-image.sh
, as described in #226 (comment), section Experimental setup.Finally, run the command
./bisect.sh
. Here is the expected result:Afterwards, you may want to clear the build cache of Docker:
The text was updated successfully, but these errors were encountered: