You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Docling library defines a DeclarativeDocumentBackend abstract class to transform different document formats to DoclingDocument without a recognition pipeline. Implementations include HTMLDocumentBackend for HTML pages and MsWordDocumentBackend for MS Word documents.
The United States Patent and Trademark Office (USPTO) is the federal agency for granting U.S. patents and registering trademarks. The USPTO disseminate public patent and trademark pre-packaged or user-customized bulk data products through the [Bulk Data Storage System.
Patent applications and grants are available in several formats. In particular, full text data (no images) are available in XML format and packaged in zip files. Some old grants though are in tabular format (grants from January 1976 till December 2001).
This feature consists of providing a document backend implementation that parses USPTO patent and application content (text) into a docling document.
Alternatives
There are no alternatives at this point, since this is a new feature.
The text was updated successfully, but these errors were encountered:
Requested feature
DeclarativeDocumentBackend
abstract class to transform different document formats toDoclingDocument
without a recognition pipeline. Implementations includeHTMLDocumentBackend
for HTML pages andMsWordDocumentBackend
for MS Word documents.This feature consists of providing a document backend implementation that parses USPTO patent and application content (text) into a docling document.
Alternatives
There are no alternatives at this point, since this is a new feature.
The text was updated successfully, but these errors were encountered: