-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent text inference output with plain text #225
Comments
If the model predictions are a little off, maybe because your provided PDFs (format, content and so on..) deviate to some degree from the training material, this is nothing to worry about and not uncommon. Could be the fonts or spacing are different and therefore harder to parse correctly for the model. I'd suggest to post-process the predictions yourself in this case, using an NLP package to detect word boundaries (an idea from here) and remove the faulty spacing within those boundaries. Or you could fine-tune the model on your data, which I guess if it's just a spacing issue should be resolved quickly. |
Hi @paulgekeler, Indeed when using the fined tuned version this issue no longer exists. Do you have any idea if GOT can handle images that might be skewed? PS: all my documents are french legal documents with, sometimes, complicated layouts. |
@ep0p yes, I've experienced the same thing. When I try to run multi page inference, I barely get any output. Maybe the first couple of lines of text. My suspicion is that the compression of the visual information is too much for dense text over multiple pages. I think their multi page training consisted of multiple pages of sparse text. |
Hi, it would help if you use a for-loop for multi-page inference. The multi-page is only for training, more details can be found in the paper. |
@Ucas-HaoranWei thanks I read the paper. I will try to fine-tune some more on multi-page data. |
@paulgekeler and @Ucas-HaoranWei Would fine-tuning with skewed images help in this case? |
@paulgekeler thanks a lot. i will add a skewed subset in my dataset as well and attempt a fine tuning |
@ep0p Did you manage to finetune your dataset? If you did sucessfully, would you mind sharing the format of your data and training settings? |
@thhung i did finish the fine tuning with no errors, however i don't know if i can say if it is successfully My dataset, at the moment contains, around 6k images, full page images from documents, and in jsonl, the records are of this type :
As for the training params, i changed only the batch size, epochs number and fp16 in order to speed up a bit the training and use less memory since i can work only with 2gpus at a time.
PS: also, since i can have full pages of text, i have changed |
@ep0p So the performance is not as you expected? |
@thhung It is more accurate than the original model, but there are still words that are not recognized properly. |
@ep0p Did you try to fine tune with the prompt "OCR with format across multiple pages: "? Because you are fine tuning on multiple pages right? |
@paulgekeler Since I had to introduce multiple types of noise, diversify fonts, and skew the images, I kept it simple with one page per entry. So, I opted for straightforward "OCR" training rather than fine-tuning for multiple pages. |
I'm encountering an issue when using GOT for inferencing plain text. The output is not consistent: sometimes it detects the text correctly, but other times, it introduces spaces between letters, creating nonsense words:
For example:
This inconsistency becomes particularly problematic when processing PDFs with multiple pages. Even if most pages are inferenced correctly, a couple of pages might have this spacing issue, which disrupts the results.
I can't figure out why this happens or how to enforce a consistent format, ensuring only the "good" text format is used
The text was updated successfully, but these errors were encountered: