-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault when tokenizer.tokenize() is used repetitively #16
Comments
I can reproduce the problem with a simple text file and feeding it twice as you said, ucto crashes with a segfault (which is not something that should ever happen). It seems there are some loose ends we need to solve if we want to call import ucto
configurationfile_ucto = "tokconfig-nld-historical"
files = ["test.txt", "test.txt"]
for f in files:
tokenizer = ucto.Tokenizer(configurationfile_ucto, foliaoutput = True)
tokenizer.tokenize(f, "/tmp/") This is a bit less performant due to the added initialization time every iteration, but hopefully still manageable. As to the crash, I produced the following traceback so we (me and @kosloot?) can debug and fix it:
|
I changed the title a bit, I know you meant "kernel" to refer to the jupyter kernel, but people might misunderstand and think the entire linux kernel crashed because of ucto, that'd be quite a feat ;) |
Ok, this is definitely a bug in ucto itself. I can reproduce it without Python. |
Some data was not reset on next invocation of tokenize(). Should be fixed now in Ucto. |
Nice work! Are we ready for new releases? I guess such a crash warrants a new release quickly. |
Thanks a lot for the quick replies! Great work! :) |
ucto v0.29 and python-ucto v0.6.5 are now released, solving this issue |
I am trying to tokenize a bunch of txt-files and store them as folia.xml-files.
The first file works fine, but after that the kernel crashes.
A little bit more info:
The Kernel crashed while executing code in the the current cell or a previous cell;
Am I doing something wrong, or is there a bug here?
The text was updated successfully, but these errors were encountered: