Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jpuerto/nihdev 454 ivt adjust dir schema validation for shared uploads #1290

Conversation

jpuerto-psc
Copy link
Collaborator

@jpuerto-psc jpuerto-psc commented Jan 25, 2024

This PR adds support for the global and non_global shared uploads defined here: https://docs.google.com/document/d/1n2McSs9geA9Eli4QWQaB3c9R3wo5d5U1Xd57DWQfN5Q

This PR also adds an example shared PhenoCycler dataset for automated testing.

@jpuerto-psc jpuerto-psc marked this pull request as ready for review February 1, 2024 18:50
@jpuerto-psc
Copy link
Collaborator Author

Two example uploads in DEV that passed metadata/directory schema validation (fail on plugins because the files are empty).

@jpuerto-psc
Copy link
Collaborator Author

Leaving this open for now as it requires some input from Stephen Fisher

@gesinaphillips
Copy link
Collaborator

gesinaphillips commented Feb 6, 2024

Unless uploads with shared directories will be passed "global" and "non_global" as part of the Upload's upload_ignore_globs argument, I think the __get_no_ref_errors method needs to be updated to ignore the presence of global/non_global directories because they are not technically referenced. If you copy the new good-cedar-phenocycler-shared example from dataset-iec-examples to dataset-examples (because the two dirs are called with different args during testing) you should see the following error when you run the tests:

Reference Errors:
  No References:
    Files:
    - global.
    - non_global.

One way of solving this would be to add to self.upload_ignore_globs if self.shared_directories is True, e.g.:

        if self.shared_directories:
            self.upload_ignore_globs = [
                *self.upload_ignore_globs,
                *["global", "non_global"],
            ]

I added this inside _get_no_ref_errors just before non_metadata_paths is created and it avoided the no-ref error in the phenocycler example. HOWEVER I am not sure if this has unwanted downstream effects!

src/ingest_validation_tools/upload.py Outdated Show resolved Hide resolved
src/ingest_validation_tools/upload.py Outdated Show resolved Hide resolved
src/ingest_validation_tools/directory_validator.py Outdated Show resolved Hide resolved
@jpuerto-psc
Copy link
Collaborator Author

Unless uploads with shared directories will be passed "global" and "non_global" as part of the Upload's upload_ignore_globs argument, I think the __get_no_ref_errors method needs to be updated to ignore the presence of global/non_global directories because they are not technically referenced. If you copy the new good-cedar-phenocycler-shared example from dataset-iec-examples to dataset-examples (because the two dirs are called with different args during testing) you should see the following error when you run the tests:

@gesinaphillips - we should be fine here, the ingest-pipeline invocation basically ignores all directories at the top-level (which is why extras doesn't get caught either).

Juan Puerto added 12 commits February 6, 2024 13:32
…idation-tools into jpuerto/NIHDEV-454-IVT-Adjust-dir-schema-validation-for-shared-uploads

# Conflicts:
#	src/ingest_validation_tools/directory_validator.py
#	src/ingest_validation_tools/plugin_validator.py
#	src/ingest_validation_tools/validation_utils.py
…idation-tools into jpuerto/NIHDEV-454-IVT-Adjust-dir-schema-validation-for-shared-uploads

# Conflicts:
#	CHANGELOG.md
#	examples/dataset-examples/bad-cedar-multi-assay-visium-with-standalone-histology-bad-dir-schema/MOCK_RESPONSE.json
#	examples/dataset-examples/good-cedar-multi-assay-visium-with-standalone-histology/MOCK_RESPONSE.json
#	src/ingest_validation_tools/upload.py
…idation-tools into jpuerto/NIHDEV-454-IVT-Adjust-dir-schema-validation-for-shared-uploads

# Conflicts:
#	CHANGELOG.md
#	src/ingest_validation_tools/plugin_validator.py
#	src/ingest_validation_tools/upload.py
@jpuerto-psc jpuerto-psc merged commit 0ed60d6 into main Mar 25, 2024
8 checks passed
@jpuerto-psc jpuerto-psc deleted the jpuerto/NIHDEV-454-IVT-Adjust-dir-schema-validation-for-shared-uploads branch March 25, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants