Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a backend to transform USPTO patents (XML and TXT) to DoclingDocument #605

Closed
ceberam opened this issue Dec 16, 2024 · 0 comments · Fixed by #606
Closed

Create a backend to transform USPTO patents (XML and TXT) to DoclingDocument #605

ceberam opened this issue Dec 16, 2024 · 0 comments · Fixed by #606
Assignees
Labels
enhancement New feature or request

Comments

@ceberam
Copy link
Contributor

ceberam commented Dec 16, 2024

Requested feature

  • The Docling library defines a DeclarativeDocumentBackend abstract class to transform different document formats to DoclingDocument without a recognition pipeline. Implementations include HTMLDocumentBackend for HTML pages and MsWordDocumentBackend for MS Word documents.
  • The United States Patent and Trademark Office (USPTO) is the federal agency for granting U.S. patents and registering trademarks. The USPTO disseminate public patent and trademark pre-packaged or user-customized bulk data products through the [Bulk Data Storage System.
  • Patent applications and grants are available in several formats. In particular, full text data (no images) are available in XML format and packaged in zip files. Some old grants though are in tabular format (grants from January 1976 till December 2001).

This feature consists of providing a document backend implementation that parses USPTO patent and application content (text) into a docling document.

Alternatives

There are no alternatives at this point, since this is a new feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant