Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common prefix applies to DUOS bundles #6609

Open
nadove-ucsc opened this issue Oct 3, 2024 · 2 comments
Open

Common prefix applies to DUOS bundles #6609

nadove-ucsc opened this issue Oct 3, 2024 · 2 comments
Labels
-- [priority] Low bug [type] A defect preventing use of the system as specified debt [type] A defect incurring continued engineering cost indexer [subject] The indexer part of Azul orange [process] Done by the Azul team spike:1 [process] Spike estimate of one point

Comments

@nadove-ucsc
Copy link
Contributor

nadove-ucsc commented Oct 3, 2024

The DUOS bundle should always be indexed, regardless of the choice of common prefix. This used to be the case, but a bug was introduced here: 0959f94#diff-574e06963918c328fe8ef7ee75b0281cf8a7f10e8f4a5bc9d1fa8b91df5a0e92R229

There is currently no observable symptom of this logic failure, because the only AnVIL snapshot that currently uses a common prefix (ANVIL_1000G_2019_Dev_20230609_ANV5_202306121732 on anvilbox) is not registered on Duos, and thus no Duos bundle will be emitted regardless.

@nadove-ucsc nadove-ucsc added orange [process] Done by the Azul team bug [type] A defect preventing use of the system as specified indexer [subject] The indexer part of Azul -- [priority] Low spike:1 [process] Spike estimate of one point labels Oct 3, 2024
@nadove-ucsc
Copy link
Contributor Author

nadove-ucsc commented Oct 3, 2024

The first step is to confirm the existence of the bug by forcing an observable symptom to manifest. To do this, choose one of the other sources in catalog anvil (either CMG or CCDG) and reindex that same source twice, using a hardcoded prefix in environment.py. Note that both sources are faux-managed access and currently have non-null DUOS descriptions on anvilbox and anvildev.

The first reindex should use a 1-character common prefix that includes both the source's Duos bundle and at least one other bundle. It may be tricky or even impossible to find such a prefix since both sources are very small. The Duos bundle's UUID should match the document_id field of the source's dataset, aside from the version field. The second reindex should use a different 1-character common prefix.

If we observe that the Duos description is present for the chosen source after the first reindex, but not after the second, then we'll know the bug is real.

@nadove-ucsc
Copy link
Contributor Author

If the bug is real, then I think the fix is:

Subject: [PATCH] FIX
---
Index: src/azul/plugins/repository/tdr_anvil/__init__.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/azul/plugins/repository/tdr_anvil/__init__.py b/src/azul/plugins/repository/tdr_anvil/__init__.py
--- a/src/azul/plugins/repository/tdr_anvil/__init__.py	(revision af05032981d433a2738d1c847b8123cb56083a17)
+++ b/src/azul/plugins/repository/tdr_anvil/__init__.py	(date 1727918440073)
@@ -237,7 +237,10 @@
                 duos_count += 1
                 # Ensure that one partition will always contain the DUOS bundle
                 # regardless of the choice of common prefix
-                if not bundle_uuid.startswith(prefix):
+                common_prefix = spec.prefix.common
+                assert prefix.startswith(common_prefix), (prefix, spec)
+                partition_prefix = prefix.removeprefix(common_prefix)
+                if not bundle_uuid[len(common_prefix):].startswith(partition_prefix):
                     continue
             bundles.append(TDRAnvilBundleFQID(
                 source=source,

@hannes-ucsc hannes-ucsc changed the title Common prefix applies to Duos bundles Common prefix applies to DUOS bundles Jan 7, 2025
@hannes-ucsc hannes-ucsc added the debt [type] A defect incurring continued engineering cost label Jan 7, 2025
@hannes-ucsc hannes-ucsc removed their assignment Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-- [priority] Low bug [type] A defect preventing use of the system as specified debt [type] A defect incurring continued engineering cost indexer [subject] The indexer part of Azul orange [process] Done by the Azul team spike:1 [process] Spike estimate of one point
Projects
None yet
Development

No branches or pull requests

2 participants