Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix problems with missing groups/metadata and sparse plates #119

Merged
merged 14 commits into from
Mar 1, 2022

Conversation

chris-allan
Copy link
Member

@chris-allan chris-allan commented Nov 22, 2021

Fix Plate Zarr group and metadata population issues with sparse or otherwise partially populated plates. Specifically:

  • Adhere very closely to the suggestions made upstream (Clarify plate and well specifications for sparse plates ome/ngff#24)
  • Do not create Zarr groups for rows or columns that have no data
  • Ensure Zarr groups are always created for rows
  • Do not populate Plate acquisition metadata with empty arrays
  • Do not populate Well/Image acquisition metadata with junk (-1)
  • Handle missing Rows or Columns Plate attributes better and consistently with the upstream suggestions
  • Expand test cases to assert as much of the above as possible

@chris-allan
Copy link
Member Author

Test cases currently expecting to fail. There's a lot of sparse data handling with Plates that we are still failing to do and to keep up with our specification suggestions already made upstream (ome/ngff#24).

int fieldCount = 1;

// Only two rows are filled out with one column each (two Wells total)
checkPlateGroupLayout(output, 2, 1, fieldCount, 2, 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check won't be consistent with lines 907-924. The idea is to now have basically everything except arrays written for any wells that don't have data, right? Or have I misunderstood the problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed earlier today, and I did indeed misunderstand. This PR is meant to follow the example in https://github.com/ome/ngff/blob/0c286908a4fbaceb2c622b21e5a52cc041eea7c0/latest/index.bs#L468, with the added constraint that wells without pixel data cannot have a .zgroup.

@melissalinkert
Copy link
Member

@chris-allan : I think 293d03a covers the issues we discussed earlier today. In the end that's just updating the tests to make sure that they are checking the correct plate layouts including both the presence of existing rows/columns and absence of missing rows/columns.

@chris-allan
Copy link
Member Author

Updated description with the changes made so far. Need to discuss naming conventions and potential changes to the row/column group names with @melissalinkert.

@sbesson
Copy link
Member

sbesson commented Dec 2, 2021

While reviewing the concepts described in ome/ngff#24, I have been reviewing this PR and testing it against a public IDR example of sparse plate - https://idr.openmicroscopy.org/webclient/?show=plate-9473.

Comparing the output of bioformat2raw with this PR with the output of bioformats2raw 0.3.0 for the top-level plate

(base) [sbesson@pilot-zarr2-dev sparse_hcs]$ diff -u bioformats2raw_0.3.0/9473.zarr/.zattrs bioformats2raw_hcs/9473.zarr/.zattrs
--- bioformats2raw_0.3.0/9473.zarr/.zattrs	2021-12-01 15:46:32.040089253 +0000
+++ bioformats2raw_hcs/9473.zarr/.zattrs	2021-12-01 16:17:21.231373373 +0000
@@ -3,55 +3,133 @@
   "plate" : {
     "columns" : [
       {
+        "name" : "0"
+      },
+      {
+        "name" : "1"
+      },
+      {
+        "name" : "2"
+      },
+      {
+        "name" : "3"
+      },
+      {
+        "name" : "4"
+      },
+      {
+        "name" : "5"
+      },
+      {
+        "name" : "6"
+      },
+      {
+        "name" : "7"
+      },
+      {
+        "name" : "8"
+      },
+      {
+        "name" : "9"
+      },
+      {
+        "name" : "10"
+      },
+      {
         "name" : "11"
+      },
+      {
+        "name" : "12"
+      },
+      {
+        "name" : "13"
+      },
+      {
+        "name" : "14"
+      },
+      {
+        "name" : "15"
+      },
+      {
+        "name" : "16"
+      },
+      {
+        "name" : "17"
+      },
+      {
+        "name" : "18"
+      },
+      {
+        "name" : "19"
+      },
+      {
+        "name" : "20"
+      },
+      {
+        "name" : "21"
+      },
+      {
+        "name" : "22"
       }
     ],
     "name" : "VAMP1_DEG_OLDPLATE_20120511 384_Evotec_CellCarrier",
     "wells" : [
       {
         "path" : "4/11",
-        "row_index" : 0,
-        "column_index" : 0
+        "row_index" : 4,
+        "column_index" : 11
       },
       {
         "path" : "5/11",
-        "row_index" : 1,
-        "column_index" : 0
+        "row_index" : 5,
+        "column_index" : 11
       },
       {
         "path" : "6/11",
-        "row_index" : 2,
-        "column_index" : 0
+        "row_index" : 6,
+        "column_index" : 11
       },
       {
         "path" : "7/11",
-        "row_index" : 3,
-        "column_index" : 0
+        "row_index" : 7,
+        "column_index" : 11
       },
       {
         "path" : "12/11",
-        "row_index" : 4,
-        "column_index" : 0
+        "row_index" : 12,
+        "column_index" : 11
       },
       {
         "path" : "13/11",
-        "row_index" : 5,
-        "column_index" : 0
+        "row_index" : 13,
+        "column_index" : 11
       },
       {
         "path" : "14/11",
-        "row_index" : 6,
-        "column_index" : 0
+        "row_index" : 14,
+        "column_index" : 11
       },
       {
         "path" : "15/11",
-        "row_index" : 7,
-        "column_index" : 0
+        "row_index" : 15,
+        "column_index" : 11
       }
     ],
     "field_count" : 30,
     "rows" : [
       {
+        "name" : "0"
+      },
+      {
+        "name" : "1"
+      },
+      {
+        "name" : "2"
+      },
+      {
+        "name" : "3"
+      },
+      {
         "name" : "4"
       },
       {
@@ -64,6 +142,18 @@
         "name" : "7"
       },
       {
+        "name" : "8"
+      },
+      {
+        "name" : "9"
+      },
+      {
+        "name" : "10"
+      },
+      {
+        "name" : "11"
+      },
+      {
         "name" : "12"
       },
       {

And for the individual wells and images,

(base) [sbesson@pilot-zarr2-dev sparse_hcs]$ diff -u bioformats2raw_0.3.0/9473.zarr/4/11/.zattrs  bioformats2raw_hcs/9473.zarr/4/11/.zattrs 
(base) [sbesson@pilot-zarr2-dev sparse_hcs]$ diff -u bioformats2raw_0.3.0/9473.zarr/4/11/0/.zattrs  bioformats2raw_hcs/9473.zarr/4/11/0/.zattrs 
(base) [sbesson@pilot-zarr2-dev sparse_hcs]$ 

From the indexing perspective, the outcome of this PR is as expected with all the rows and columns of the plate populated and the row/column index of the first well at (4,11) rather than (0,0).

As a preamble of the upcoming discussion around the HCS NGFF spec, I also computed the diff with the metadata written by omero-cli-zarr 0.2.1 for the same plate:

(base) [sbesson@pilot-zarr2-dev sparse_hcs]$ diff -uw omero-cli-zarr/9473.zarr/.zattrs bioformats2raw_hcs/9473.zarr/.zattrs
--- omero-cli-zarr/9473.zarr/.zattrs	2021-12-01 15:43:31.304371194 +0000
+++ bioformats2raw_hcs/9473.zarr/.zattrs	2021-12-01 16:17:21.231373373 +0000
@@ -1,18 +1,11 @@
 {
-    "_creator": {
-        "name": "omero-zarr",
-        "version": "0.2.1"
-    },
+  "bioformats2raw.layout" : 3,
     "plate": {
-        "acquisitions": [
-            {
-                "id": 7323,
-                "maximumfieldcount": 30,
-                "name": "00-37-47.423"
-            }
-        ],
         "columns": [
             {
+        "name" : "0"
+      },
+      {
                 "name": "1"
             },
             {
@@ -47,85 +40,135 @@
             },
             {
                 "name": "12"
-            }
-        ],
-        "field_count": 30,
-        "name": "VAMP1_DEG_OLDPLATE_20120511/Meas_01",
-        "rows": [
+      },
             {
-                "name": "A"
+        "name" : "13"
             },
             {
-                "name": "B"
+        "name" : "14"
             },
             {
-                "name": "C"
+        "name" : "15"
             },
             {
-                "name": "D"
+        "name" : "16"
             },
             {
-                "name": "E"
+        "name" : "17"
             },
             {
-                "name": "F"
+        "name" : "18"
             },
             {
-                "name": "G"
+        "name" : "19"
             },
             {
-                "name": "H"
+        "name" : "20"
             },
             {
-                "name": "I"
+        "name" : "21"
             },
             {
-                "name": "J"
+        "name" : "22"
+      }
+    ],
+    "name" : "VAMP1_DEG_OLDPLATE_20120511 384_Evotec_CellCarrier",
+    "wells" : [
+      {
+        "path" : "4/11",
+        "row_index" : 4,
+        "column_index" : 11
+      },
+      {
+        "path" : "5/11",
+        "row_index" : 5,
+        "column_index" : 11
             },
             {
-                "name": "K"
+        "path" : "6/11",
+        "row_index" : 6,
+        "column_index" : 11
             },
             {
-                "name": "L"
+        "path" : "7/11",
+        "row_index" : 7,
+        "column_index" : 11
             },
             {
-                "name": "M"
+        "path" : "12/11",
+        "row_index" : 12,
+        "column_index" : 11
             },
             {
-                "name": "N"
+        "path" : "13/11",
+        "row_index" : 13,
+        "column_index" : 11
             },
             {
-                "name": "O"
+        "path" : "14/11",
+        "row_index" : 14,
+        "column_index" : 11
             },
             {
-                "name": "P"
+        "path" : "15/11",
+        "row_index" : 15,
+        "column_index" : 11
             }
         ],
-        "version": "0.3",
-        "wells": [
+    "field_count" : 30,
+    "rows" : [
             {
-                "path": "P/12"
+        "name" : "0"
             },
             {
-                "path": "H/12"
+        "name" : "1"
             },
             {
-                "path": "G/12"
+        "name" : "2"
             },
             {
-                "path": "N/12"
+        "name" : "3"
             },
             {
-                "path": "O/12"
+        "name" : "4"
             },
             {
-                "path": "E/12"
+        "name" : "5"
             },
             {
-                "path": "F/12"
+        "name" : "6"
             },
             {
-                "path": "M/12"
+        "name" : "7"
+      },
+      {
+        "name" : "8"
+      },
+      {
+        "name" : "9"
+      },
+      {
+        "name" : "10"
+      },
+      {
+        "name" : "11"
+      },
+      {
+        "name" : "12"
+      },
+      {
+        "name" : "13"
+      },
+      {
+        "name" : "14"
+      },
+      {
+        "name" : "15"
+      }
+    ],
+    "acquisitions" : [
+      {
+        "id" : "0"
             }
         ]
     }
     (base) [sbesson@pilot-zarr2-dev sparse_hcs]$ diff -uw omero-cli-zarr/9473.zarr/E/12/.zattrs bioformats2raw_hcs/9473.zarr/4/11/.zattrs 
--- omero-cli-zarr/9473.zarr/E/12/.zattrs	2021-12-01 15:42:52.103215449 +0000
+++ bioformats2raw_hcs/9473.zarr/4/11/.zattrs	2021-12-01 16:17:21.224373345 +0000
@@ -2,126 +2,125 @@
     "well": {
         "images": [
             {
-                "acquisition": 7323,
-                "path": "0"
+        "path" : "0",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "1"
+        "path" : "1",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "2"
+        "path" : "2",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "3"
+        "path" : "3",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "4"
+        "path" : "4",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "5"
+        "path" : "5",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "6"
+        "path" : "6",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "7"
+        "path" : "7",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "8"
+        "path" : "8",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "9"
+        "path" : "9",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "10"
+        "path" : "10",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "11"
+        "path" : "11",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "12"
+        "path" : "12",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "13"
+        "path" : "13",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "14"
+        "path" : "14",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "15"
+        "path" : "15",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "16"
+        "path" : "16",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "17"
+        "path" : "17",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "18"
+        "path" : "18",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "19"
+        "path" : "19",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "20"
+        "path" : "20",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "21"
+        "path" : "21",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "22"
+        "path" : "22",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "23"
+        "path" : "23",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "24"
+        "path" : "24",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "25"
+        "path" : "25",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "26"
+        "path" : "26",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "27"
+        "path" : "27",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "28"
+        "path" : "28",
+        "acquisition" : 0
             },
             {
-                "acquisition": 7323,
-                "path": "29"
+        "path" : "29",
+        "acquisition" : 0
             }
-        ],
-        "version": "0.3"
+    ]
     }
 }
\ No newline at end of file

Part of the mismatch like the acquisition.id value is fully expected and reflects a context difference between converting a standalone file on disk vs a fileset in a DB with existing identifiers. We might want to review the other variations and discuss either a roadmap towards unification or at least a clarification of the different styles in the specification.

@melissalinkert
Copy link
Member

Latest 2 commits here should bring the well attributes in line with the latest changes on ome/ngff#24.

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-using the sample IDR plate mentioned above, I generated three NGFF representations and uploaded them to a temporary bucket. This allows a client like vizarr with some awareness of the HCS spec to highlight differences:

Comparing the difference in the .zattrs of the plate and well groups between the two last implementations, the divergence is now fairly minimal and restricted to:

  • plate name, acquisition ID and metadata. This heterogeneity is expected as explained above
  • tool-specific metadata e.g. the omero-zarr version
  • extra empty plate columns (13-23) populated by bioformats2raw
  • the newly introduced rowIndex/columnIndex fields populated by bioformats2raw

Assuming we agree with the formal ome/ngff#24 proposal (which I will comment on separately), the last two items can be captured as issues on omero-cli-zarr and/or ome-zarr-py that would be required to implement the proposed specification change.

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had someway missed the fact this PR was not merged yet. Given #119 (review) and the fact the corresponding specification PR has now been merged and released as part of the official OME-NGFF 0.4 specification, I think it makes sense to get this in.

Once #121 and the coordinateTransformations metadata is implemented, it should be possible to bump the metadata version to 0.4 across the board.

@melissalinkert
Copy link
Member

Conflicts in the tests are now fixed, so 👍 to merging assuming @chris-allan agrees.

@chris-allan chris-allan merged commit d6eb09e into glencoesoftware:master Mar 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants