Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix legacy doc ref #48

Merged
merged 1 commit into from
Oct 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,37 +36,37 @@ poetry run pytest test
- You can validate your JSON objects using the pydantic class definition.

```py
from docling_core.types import Document
from docling_core.types import DoclingDocument

data_dict = {...} # here the object you want to validate, as a dictionary
Document.model_validate(data_dict)
DoclingDocument.model_validate(data_dict)

data_str = {...} # here the object as a JSON string
Document.model_validate_json(data_str)
DoclingDocument.model_validate_json(data_str)
```

- You can generate the JSON schema of a model with the script `generate_jsonschema`.

```py
# for the `Document` type
generate_jsonschema Document
# for the `DoclingDocument` type
generate_jsonschema DoclingDocument

# for the use `Record` type
generate_jsonschema Record
```

## Documentation

Docling supports 3 main data types:
Docling Core contains 3 top-level data types:

- **Document** for publications like books, articles, reports, or patents. When Docling converts an unstructured PDF document, the generated JSON follows this schema.
The Document type also models the metadata that may be attached to the converted document.
Check [Document](docs/Document.json) for the full JSON schema.
- **DoclingDocument** for publications like books, articles, reports, or patents. When Docling converts an unstructured PDF document, the generated JSON follows this schema.
The DoclingDocument type also models the metadata that may be attached to the converted document.
Check [DoclingDocument](docs/DoclingDocument.json) for the full JSON schema.
- **Record** for structured database records, centered on an entity or _subject_ that is provided with a list of attributes.
Related to records, the statements can represent annotations on text by Natural Language Processing (NLP) tools.
Check [Record](docs/Record.json) for the full JSON schema.
Check [Record](docs/Record.json) for the full JSON schema.
- **Generic** for any data representation, ensuring minimal configuration and maximum flexibility.
Check [Generic](docs/Generic.json) for the full JSON schema.
Check [Generic](docs/Generic.json) for the full JSON schema.

The data schemas are defined using [pydantic](https://pydantic-docs.helpmanual.io/) models, which provide built-in processes to support the creation of data that adhere to those models.

Expand Down
25 changes: 3 additions & 22 deletions docling_core/types/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,25 +5,6 @@

"""Define the main types."""

from docling_core.types.gen.generic import Generic # noqa
from docling_core.types.legacy_doc.base import BoundingBox # noqa
from docling_core.types.legacy_doc.base import Table # noqa
from docling_core.types.legacy_doc.base import TableCell # noqa
from docling_core.types.legacy_doc.base import ( # noqa
BaseCell,
BaseText,
PageDimensions,
PageReference,
Prov,
Ref,
)
from docling_core.types.legacy_doc.document import ( # noqa
CCSDocumentDescription as DocumentDescription,
)
from docling_core.types.legacy_doc.document import ( # noqa
CCSFileInfoObject as FileInfoObject,
)
from docling_core.types.legacy_doc.document import ( # noqa
ExportedCCSDocument as Document,
)
from docling_core.types.rec.record import Record # noqa
from docling_core.types.doc.document import DoclingDocument
from docling_core.types.gen.generic import Generic
from docling_core.types.rec.record import Record
2 changes: 1 addition & 1 deletion docling_core/utils/generate_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

from docling_core.utils.generate_jsonschema import generate_json_schema

MODELS: Final = ["Document", "Record", "Generic"]
MODELS: Final = ["DoclingDocument", "Record", "Generic"]


def _prepare_directory(folder: str, clean: bool = False) -> None:
Expand Down
Loading
Loading