Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make system dictionary singleton #107

Merged
merged 7 commits into from
Feb 23, 2022
Merged

Conversation

mocobeta
Copy link
Owner

Closes #104

In Janome 4.0.1, every Tokenizer object independently opens the binary dictionary files with mmap=True (default); this can cause 'Too many open files' errors when multiple Tokenizer objects are created.
This would be resolved by making MMapSystemDictionary singleton and reusing them across Tokenizers.

Reproducible test code

# openfiles_mmap.py
import psutil
from janome.tokenizer import Tokenizer

tokenizers = []
for i in range(10):
    tokenizers.append(Tokenizer(mmap=True))
print('Created Tokenizers = %d' % len(tokenizers))

p = psutil.Process()
open_dic_files = list(filter(lambda x: x.path.find('janome/sysdic') >= 0, p.open_files()))

print('Opened dictionary files = %d' % len(open_dic_files))

With janome 4.0.1:

$ python openfiles_mmap.py 
Created Tokenizers = 10
Opened dictionary files = 400

With this PR:

$ python openfiles_mmap.py 
Created Tokenizers = 10
Opened dictionary files = 40

@coveralls
Copy link

coveralls commented Feb 23, 2022

Coverage Status

Coverage decreased (-0.3%) to 86.878% when pulling 0dd31dc on feat/make-mmapdictionary-singleton into 1b25f94 on master.

@mocobeta mocobeta merged commit 38c2678 into master Feb 23, 2022
@mocobeta mocobeta deleted the feat/make-mmapdictionary-singleton branch February 23, 2022 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'Too many open files' with mmap=True
2 participants