Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore chunks #5

Merged
merged 2 commits into from
Aug 28, 2023
Merged

Ignore chunks #5

merged 2 commits into from
Aug 28, 2023

Conversation

will-moore
Copy link
Member

@will-moore will-moore commented Aug 21, 2023

As discussed this morning, we want to try ignoring chunks to reduce the number of OriginalFiles being created (e.g. see #4 (comment))

Reverted 9a001a3

@will-moore
Copy link
Member Author

Testing on idr0138-pilot...

sudo -u omero-server -s
conda activate mkngff
pip uninstall omero-mkngff
pip install 'omero-mkngff @ git+https://github.com/will-moore/omero-mkngff@ignore_chunks'

$ omero mkngff sql --secret=$SECRET 5811533 --symlink_repo /data/OMERO/ManagedRepository "/idr0054/zarr/Tonsil 2.ome.zarr/" > idr0054_2.sql
Found prefix demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15 // 15-28-44.081_converted for fileset 5811533
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted
Creating dir at /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr
Creating symlink /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr -> /idr0054/zarr/Tonsil 2.ome.zarr/


$ psql -U omero -d idr -h 192.168.10.102 -f idr0054_2.sql 
BEGIN
psql:idr0054_2.sql:29: ERROR:  null value in column "permissions" violates not-null constraint
DETAIL:  Failing row contains (5287368, null, demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_..., null, 326588607, null, null, null, 326588607).
CONTEXT:  SQL statement "insert into fileset
        (id, templateprefix, permissions, creation_id, group_id, owner_id, update_id)
        values
        (nextval('seq_fileset'), prefix, old_perms, new_event, old_group, old_owner, new_event)
        returning id"
PL/pgSQL function mkngff_fileset(bigint,character varying,character varying,character varying,text[]) line 21 at SQL statement
ROLLBACK
$ cat idr0054_2.sql 

begin;
    select mkngff_fileset(
      5811533,
      '4b358149-af39-49f0-882d-10884fab7133',
      'cdf35825-def1-4580-8d0b-9c349b8f78d6',
      'demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/',
      array[
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/', '.zattrs', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/', '.zgroup', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/', '0', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/', '.zattrs', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/', '.zgroup', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/', '0', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/0/', '.zarray', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/', '1', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/1/', '.zarray', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/', '2', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/2/', '.zarray', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/', '3', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/3/', '.zarray', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/', '4', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/0/4/', '.zarray', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/', 'OME', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/OME/', '.zattrs', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/OME/', '.zgroup', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-10/2019-03/15/15-28-44.081_converted_converted/idr0054/zarr/Tonsil 2.ome.zarr/OME/', 'METADATA.ome.xml', 'application/octet-stream']
      ]::text[][]
    );
commit;

@will-moore
Copy link
Member Author

@joshmoore I don't think the error I got above is due to the removal of chunks rows, but I'm not sure why I'm seeing that or what to do about it.
Maybe it is something to do with trying to run the sql on an image that we've previously run this script on?

I can try on a plate where I've not previously run this script....

@will-moore
Copy link
Member Author

Trying on a fresh plate from idr0012... (see #4 (comment))

TLDR: got the same error:

idr0012 plate HT02...

psql -U omero -d idr -h 192.168.10.231 -c "select fileset from Image where id= 14058769"
 fileset 
---------
 5808583
(1 row)

$ omero mkngff sql --secret=$SECRET --symlink_repo=/data/OMERO/ManagedRepository 5808583 "/idr0012/ngff/HT02.ome.zarr/" > idr0012_HT02.sql

$ cat idr0012_HT02.sql | wc
   8450   25325 1287987
$ cat idr0012_HT01.sql | wc
  45410  136205 6891315

$ psql -U omero -d idr -h 192.168.10.102 -f idr0012_HT02.sql 
BEGIN
psql:idr0012_HT02.sql:8448: ERROR:  null value in column "permissions" violates not-null constraint
DETAIL:  Failing row contains (5287378, null, demo_2/Blitz-0-Ice.ThreadPool.Server-9/2023-05/03/12-52-39.994_c..., null, 326589020, null, null, null, 326589020).
CONTEXT:  SQL statement "insert into fileset
        (id, templateprefix, permissions, creation_id, group_id, owner_id, update_id)
        values
        (nextval('seq_fileset'), prefix, old_perms, new_event, old_group, old_owner, new_event)
        returning id"
PL/pgSQL function mkngff_fileset(bigint,character varying,character varying,character varying,text[]) line 21 at SQL statement
ROLLBACK

@joshmoore
Copy link
Member

That's really odd. Can you share the SQL file with me? I don't know why one row would suddenly not have permissions set.

@will-moore
Copy link
Member Author

@joshmoore The sql above cat idr0054_2.sql gave this error.
But it's not coming from the change in this PR as I'm seeing the same error with the previous symlink branch.
Just can't work out what I'm doing differently...

@will-moore
Copy link
Member Author

Importing fresh idr0054 images into idr0138-pilot to use for testing mkngff.
We need to import NGFF images, since we can't import original pattern file versions of these...

omero import -d 17351 --transfer=ln_s --skip=all --depth=100 /idr0054/zarr/Tonsil\ 1.ome.zarr/ --file /tmp/idr0054_1.log  --errs /tmp/idr0054_1.err
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/omero/server/OMERO.server-5.6.6-ice36/lib/client/OMEZarrReader.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/omero/server/OMERO.server-5.6.6-ice36/lib/client/logback-classic.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
2023-08-22 10:18:22,351 276        [      main] INFO          ome.formats.importer.ImportConfig - OMERO.blitz Version: 5.6.0
2023-08-22 10:18:22,368 293        [      main] INFO          ome.formats.importer.ImportConfig - Bioformats version: 0.3.2-SNAPSHOT revision: ef64d3da4cc1dd41adad20a8f0ee936383e578ec date: 20230816-0010
2023-08-22 10:18:22,434 359        [      main] INFO   formats.importer.cli.CommandLineImporter - Setting checksum algorithm to File-Size-64
2023-08-22 10:18:22,435 360        [      main] INFO   formats.importer.cli.CommandLineImporter - Skipping thumbnails creation
2023-08-22 10:18:22,435 360        [      main] INFO   formats.importer.cli.CommandLineImporter - Skipping minimum/maximum computation
2023-08-22 10:18:22,435 360        [      main] INFO   formats.importer.cli.CommandLineImporter - Disabling upgrade check
2023-08-22 10:18:22,435 360        [      main] INFO   formats.importer.cli.CommandLineImporter - Setting transfer to ln_s
2023-08-22 10:18:22,438 363        [      main] INFO   formats.importer.cli.CommandLineImporter - Log levels -- Bio-Formats: ERROR OMERO.importer: INFO
2023-08-22 10:18:22,817 742        [      main] INFO      ome.formats.importer.ImportCandidates - Depth: 100 Metadata Level: MINIMUM
2023-08-22 10:23:26,023 303948     [      main] INFO      ome.formats.importer.ImportCandidates - 444 file(s) parsed into 0 group(s) with 433 call(s) to setId in 301561ms. (303206ms total) [0 unknowns]
2023-08-22 10:23:26,069 303994     [      main] INFO       ome.formats.OMEROMetadataStoreClient - Attempting initial SSL connection to localhost:4064
2023-08-22 10:23:26,525 304450     [      main] INFO       ome.formats.OMEROMetadataStoreClient - Insecure connection requested, falling back
2023-08-22 10:23:26,875 304800     [      main] INFO       ome.formats.OMEROMetadataStoreClient - Pinging session every 300s.
2023-08-22 10:23:26,885 304810     [      main] INFO       ome.formats.OMEROMetadataStoreClient - Server: 5.6.6
2023-08-22 10:23:26,885 304810     [      main] INFO       ome.formats.OMEROMetadataStoreClient - Client: 5.6.0
2023-08-22 10:23:26,885 304810     [      main] INFO       ome.formats.OMEROMetadataStoreClient - Java Version: 11.0.15
2023-08-22 10:23:26,885 304810     [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Name: Linux
2023-08-22 10:23:26,885 304810     [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Arch: amd64
2023-08-22 10:23:26,885 304810     [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Version: 3.10.0-1160.66.1.el7.x86_64
No imports found

@will-moore
Copy link
Member Author

Repeated attempt to run mkngff for idr0012 HT02 without this PR...
This time it fails on SECRET, even though I verified that the key in the sql is correct...

$ omero mkngff sql --secret=$SECRET 5808583 "/idr0012/ngff/HT02.ome.zarr/" > idr0012_HT02.sql
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Found prefix demo_2/Blitz-0-Ice.ThreadPool.Server-9/2023-05/03 // 12-52-39.994 for fileset 5808583
$ psql -U omero -d idr -h 192.168.10.231 -f setup.sql
CREATE FUNCTION
$ psql -U omero -d idr -h 192.168.10.231 -f idr0012_HT02.sql 
BEGIN
psql:idr0012_HT02.sql:45408: ERROR:  cannot set original repo property without secret key
CONTEXT:  PL/pgSQL function _protect_originalfile_repo_insert() line 28 at RAISE
SQL statement "insert into originalfile
          (id, permissions, creation_id, group_id, owner_id, update_id, mimetype, repo, path, name)
          values (nextval('seq_originalfile'), old_perms, new_event, old_group, old_owner, new_event,
            info[i][3], repo, info[i][1], uuid || info[i][2])
          returning id"
PL/pgSQL function mkngff_fileset(bigint,character varying,character varying,character varying,text[]) line 42 at SQL statement
ROLLBACK
(mkngff) bash-4.2$ cat idr0012_HT02.sql | grep 4b358149-af39-49f0-882d-10884fab7133
      '4b358149-af39-49f0-882d-10884fab7133',
(mkngff) bash-4.2$ psql -U omero -d idr -h 192.168.10.102 -c "select uuid from (select * from session where node = 0 and owner = 0 and defaulteventtype = 'Sessions' order by id desc limit 1) x order by x.id asc limit 1;"
                 uuid                 
--------------------------------------
 4b358149-af39-49f0-882d-10884fab7133
(1 row)

@will-moore
Copy link
Member Author

@joshmoore I seem to have hit various blockers on me being able to run mkngff sql scripts at-all this week, due to wrong $SECRET or due to null value in column "permissions" error.

I've updated my current workflow at #2 (that also includes other steps to prep the Fileset IDs etc), so maybe you could review that and/or try it and see if you can work out what's not working for me?

@will-moore
Copy link
Member Author

will-moore commented Aug 23, 2023

Testing on idr0125-pilot as omero_server user.
Started from scratch, installed conda etc..

Testing with idr0051 image http://localhost:1040/webclient/?show=image-4007821
https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD815/b2633930-86b0-489e-a845-d2a7afe6ff15.html

Installed main branch of mkngff - everything works (but sql command took a long time!).

$ omero mkngff sql --secret=$SECRET 604309 /bia-integrator-data/S-BIAD815/b2633930-86b0-489e-a845-d2a7afe6ff15/b2633930-86b0-489e-a845-d2a7afe6ff15.zarr > 604309.sql

$ psql -U omero -d idr -h 192.168.10.102 -f 604309.sql 
BEGIN
 mkngff_fileset 
----------------
        5287380
(1 row)
COMMIT

Try with another image from idr0051... with THIS branch... creating symlinks...

http://localhost:1040/webclient/?show=image-4007817 (Fileset 604305)
https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD815/c49efcfd-e767-4ae5-adbf-299cafd92120.html

# commit 36242c8a86b
pip install 'omero-mkngff @ git+https://github.com/will-moore/omero-mkngff@ignore_chunks'

omero mkngff sql --symlink_repo /data/OMERO/ManagedRepository --secret=$SECRET 604305 /bia-integrator-data/S-BIAD815/c49efcfd-e767-4ae5-adbf-299cafd92120/c49efcfd-e767-4ae5-adbf-299cafd92120.zarr > 604305.sql

$ psql -U omero -d idr -h 192.168.10.102 -f 604305.sql
BEGIN
 mkngff_fileset 
----------------
        5287383
(1 row)
COMMIT

This worked!
MUCH faster without all the chunks. Only 13 files in the Fileset and image is viewable (on idr0125-pilot):

Screenshot 2023-08-23 at 13 50 57

Screenshot 2023-08-23 at 13 50 42

@will-moore
Copy link
Member Author

Tested this PR at IDR/idr-metadata#639 (comment) with Plates from idr0035.
Without chunks the sql commands were much faster (1 or 2 secs each) and the data looks good.

Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @will-moore!

@joshmoore
Copy link
Member

Assuming failure is still related to test-infra.

@joshmoore joshmoore merged commit 4c1e32b into IDR:main Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants