You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
0.4.6
Loosen the default cap threshold to 0.5.
Add a UNSTRUCTURED_NARRATIVE_TEXT_CAP_THRESHOLD environment variable for controlling
the cap ratio threshold.
Unknown text elements are identified as Text for HTML and plain text documents.
Body Text styles no longer default to NarrativeText for Word documents. The style information
is insufficient to determine that the text is narrative.
Upper cased text is lower cased before checking for verbs. This helps avoid some missed verbs.
Adds an Address element for capturing elements that only contain an address.
Suppress the UserWarning when detectron is called.
Checks that titles and narrative test have at least one English word.
Checks that titles and narrative text are at least 50% alpha characters.
Restricts titles to a maximum word length. Adds a UNSTRUCTURED_TITLE_MAX_WORD_LENGTH
environment variable for controlling the max number of words in a title.
Updated partition_pptx to order the elements on the page