Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dumps of large jsons run into MemoryError causing migrations of large databases to fail #3716

Closed
broeder-j opened this issue Jan 17, 2020 · 3 comments · Fixed by #5145
Closed

Comments

@broeder-j
Copy link
Member

Hi. I just tried to migrate a 27 GB database from the schema version 1.0.23 to 1.043 using the aiida 1.0.1 release.

The migration fails due to a json dump which runs out of memory (16 GB RAM on machine plus 16 GB swap):

aiida/common/json.py line 44 in dumps
return simplejson.dumps(data, ensure_ascii=False, encoding='utf8', **kwags)

This problem is known for large json dumps and since this code is in the common module of AiiDA it might happen in other cases (import/export) too. I suggest a solution along the line:
https://stackoverflow.com/questions/24239613/memoryerror-using-json-dumps

i.e stream json instead of full dump in memory, or at least for large json dumps.

@greschd
Copy link
Member

greschd commented Jan 17, 2020

Related: #3712

@ltalirz
Copy link
Member

ltalirz commented Feb 5, 2020

For the import I've discussed json stream parsing (including a practical example) here #493 (comment)

Unfortunately, the current design of the export file makes using this in verdi import not very straightforward.
We've just created a "task force" for the repository/file format issue, and this will be one of the issues on the agenda.

On a related note: A DB of 27GB is... impressive.
Are these millions of calculations or are you perhaps storing stuff in the DB that doesn't necessarily need to be there? (see also #3714)

@broeder-j
Copy link
Member Author

@Italirz After the migration it is a lot smaller actually 4-5 GB. The DB contains only around 500 K nodes (calcs, workchains, dicts and structures). The json file with logs that was dumped during migration was around 7+ GB. I have not checked where the other size differences come from or what the schema version was (it became more and more efficient). Maybe stored python codes from inlinecalcs? Since 1.0 it is only hashed not stored if I am correct.

@chrisjsewell chrisjsewell linked a pull request Sep 24, 2021 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants