Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalog 1.58 docs #4336

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Catalog 1.58 docs #4336

wants to merge 5 commits into from

Conversation

drernie
Copy link
Member

@drernie drernie commented Feb 22, 2025

Description

Tabulator and Packager changes

TODO

  • Documentation
    • Markdown somewhere in docs/**/*.md that explains the feature to end users (said .md files should be linked from SUMMARY.md so they appear on https://docs.quiltdata.com)

drernie and others added 2 commits February 21, 2025 16:42
Co-authored-by: Dr. Ernie Prabhakar <[email protected]>
Co-authored-by: Dr. Ernie Prabhakar <[email protected]>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Sergey Fedoseev <[email protected]>
Copy link

codecov bot commented Feb 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 39.07%. Comparing base (07c85ce) to head (a36aa59).

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #4336    +/-   ##
========================================
  Coverage   39.07%   39.07%            
========================================
  Files         787      787            
  Lines       34813    34813            
  Branches     5525     5525            
========================================
  Hits        13604    13604            
- Misses      20026    20666   +640     
+ Partials     1183      543   -640     
Flag Coverage Δ
api-python 91.39% <ø> (ø)
catalog 18.09% <ø> (ø)
lambda 91.53% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR introduces comprehensive documentation for the new Packaging Engine feature and updates Tabulator documentation with version 1.58 enhancements, focusing on error handling and configuration options.

  • Added new /docs/Catalog/Packaging.md detailing automated package creation from S3 data via Admin GUI, SQS queue, and EventBridge rules
  • Updated /docs/advanced-features/tabulator.md with new continue_on_error configuration and $issue column functionality
  • Added Packaging Engine link to /docs/SUMMARY.md under Catalog User section
  • Fixed undefined rule_name variable in EventBridge example code
  • Corrected typo in ORCID website link ("ORDiD" to "ORCID")

3 file(s) reviewed, 4 comment(s)
Edit PR Review Bot Settings | Greptile

drernie and others added 3 commits February 21, 2025 21:29
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@drernie drernie requested a review from QuiltSimon February 22, 2025 05:31
Copy link
Contributor

@QuiltSimon QuiltSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few questions and changes.

The Quilt Packaging Engine in the Quilt Platform allows administrators and
developers to automate the process of creating Quilt packages from data stored
in Amazon S3. It serves as a key component in Quilt's SDMS (Scientific Data
Management System) strategy, enabling automated data ingestion and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this relevant to end users? This is important to us. Maybe we could rephrase as "It serves as a key compoent of Quilt's functionality as a Scientific Data Management System" ...

developers to automate the process of creating Quilt packages from data stored
in Amazon S3. It serves as a key component in Quilt's SDMS (Scientific Data
Management System) strategy, enabling automated data ingestion and
standardization. It consists of:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest "currently consists of" implying that there are features to come down the line.


1. Admin Settings GUI to enable package creation based on notifications from:
1. AWS Health Omics
2. Nextflow workflows using `nf-prov`'s WRROC ([Workflow Run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to NF-Prov


When enabled, this will create a package when indexing any folder containing an
`ro-crate-manifest.json`. Indexing happens when the bucket is added to the
stack, or when a folder is written to a bucket already in the stack.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing when on its own, I think we can drive clarity by linking to Quilt's Indexing documentation


## SQS Message Processing

The primary interface to the Packaging Engine is through an SQS queue in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order for this to make sense we need an overview of the components of the packaging engine. Right now you're assuming that the reader understands what the packaging engine is, and why the SQS queue is used. An overview of how the packaging engine is composed might be helpful (in a section above).

### Example: Event-Driven Packaging (EDP)

[Event-Driven Packaging](../advanced-features/event-driven-packaging.md) is a
high-end add-on to Quilt that coalesces multiple S3 uploads into a single
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a high-end add-on?

Comment on lines +34 to +36
When enabled, this will create a package when indexing any folder containing an
`ro-crate-manifest.json`. Indexing happens when the bucket is added to the
stack, or when a folder is written to a bucket already in the stack.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't create packages for existing ro-crate-manifest.json when bucket is added to the stack

@@ -0,0 +1,226 @@
# Packaging Engine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a note that it requires 1.58

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants