Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: proper class for field info #1730

Draft
wants to merge 6 commits into
base: fix/parallel-datapackage
Choose a base branch
from

Conversation

pierrecamilleri
Copy link
Collaborator

@pierrecamilleri pierrecamilleri commented Jan 24, 2025

This is a refactoring PR.

Currently, a complex private object field_info is created in "Table.__open_row_stream" (resources/table.py) and used in the (non-public) Row __init__ method.

This PR introduces the FieldsInfo class that holds the same information but with types and a proper interface.

Next steps / investigation: 

  • Explore the reason why there is a create_cell_reader function, instead of a more direct read_cell, which at first glance would simplify the logic.
    1. Some constraints parsing happens in create_cell_reader (maybe to reuse the value_reader). This does not seem the right place.
    2. for creating the value_reader once and for all (but same question, why create a value_reader instead of a read_row method.
  • do not mess with schema fields when schema_sync=True, instead, create a separate list or mapping of the actual data fields.

@pierrecamilleri pierrecamilleri marked this pull request as draft January 24, 2025 14:47
Test passes, surprisingly.
No special effort has been made to support `header_case` option, or
"required" columns with `schema_sync`
@pierrecamilleri pierrecamilleri changed the base branch from main to fix/parallel-datapackage January 29, 2025 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant