Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operationalize MDC: Create Cron Jobs, Acquire, Configure Prod Web Token, Handle Logs #3

Open
kcondon opened this issue Mar 11, 2019 · 18 comments
Assignees
Labels
FY25 Sprint 15 FY25 Sprint 15 (2025-01-15 - 2025-01-29) FY25 Sprint 16 FY25 Sprint 16 (2025-01-29 - 2025-02-12) FY25 Sprint 17 FY25 Sprint 17 (2025-02-12 - 2025-02-26) FY25 Sprint 18 FY25 Sprint 18 (2025-02-26 - 2025-03-12) GREI 4 Analytics and Reporting NIH OTA: 1.5.1 collection: 5 | 1.5.1 | Standardize download metrics for the Harvard Dataverse repository... pm.GREI-d-1.5.1 NIH, yr1, aim5, task1: Standardize download metrics pm.GREI-d-1.5.2 NIH, yr1, aim5, task2: WG with other repositories to follow Make Data Count recommendations Size: 10 A percentage of a sprint.

Comments

@kcondon
Copy link
Contributor

kcondon commented Mar 11, 2019

The MDC feature is well documented but there are a few items that need to be addressed to operate in a production environment:
-Create Cron job(s) to cal the various API endpoints needed to process various files, import to db, including error detection and notification of failure
-Acquire and configure a production web token that allows publishing stats to DataCite
-Consider/Plan/Monitor growth of log files
-Consider how to troubleshoot or rerun failed jobs.

@djbrooke
Copy link
Contributor

I'll add one more...

  • we'll need to figure out how to handle pre-MDC download counts. I'd like to reflect them so that researchers don't need to start at zero. :)

@dlowenberg @mfenner it would be good to get some thoughts from you and the team on how other groups have handled this. Thanks in advance for any guidance or for pointing us to any docs!

@dlowenberg
Copy link

Hi there, if you would like to look at or copy the code that we wrote for Dryad in processing the last ten years of downloads, here is some info that may be useful:

The main reporting code is here: https://github.com/datadryad/dryad-repo/blob/dryad-master/dspace/modules/api/src/main/java/org/dspace/curate/DashStats.java
Though it’s pretty specific to the existing Dryad setup. It writes out a text file that is formatted for the counter-processor, but it’s sorted by dataset. Then there is a script that re-sorts everything based on time: https://github.com/datadryad/dryad-utils/blob/master/dash-migration/sort_dash_stats.sh

Happy to set up time for you to talk with Ryan Scherle (Dryad) if that would be helpful. Otherwise, the DataCite and DataONE folks may also have some tips.

@djbrooke
Copy link
Contributor

Thanks @dlowenberg! I'll check in with the team here and we'll get back with you if we feel a discussion with Dryad is needed. Thanks again!

P.S. I just pinged you on another issue in the main Dataverse repo: IQSS/dataverse#5957

@djbrooke
Copy link
Contributor

djbrooke commented Jul 24, 2019

  • We should do the things outlined in the original comment and other items not yet identified
  • The current suggestion is to seed the count with the downloads that already exist, but we can discuss during the sprint.
  • We should make note for users about how the numbers are derived (some from before the standard was implemented and others from after)

@djbrooke djbrooke self-assigned this Jul 25, 2019
@djbrooke
Copy link
Contributor

I picked this up out of the sprint column today to begin stubbing out documentation regarding migrating counts and other things that installations will need to know to use Make Data Count in production, but I don't have the bandwidth this week. I will re-visit early next week.

@djbrooke djbrooke removed their assignment Jul 25, 2019
@pdurbin
Copy link
Member

pdurbin commented Aug 9, 2019

@djbrooke if you're stubbing out documentation, you might want to create a branch for IQSS/dataverse#6082 which was just opened. The issue title is "Documentation: Some tweaks to Make Data Count doc based on recent experience".

@jggautier
Copy link
Collaborator

See #75 (comment)

@mreekie mreekie added the NIH OTA: 1.5.1 collection: 5 | 1.5.1 | Standardize download metrics for the Harvard Dataverse repository... label Oct 6, 2022
@mreekie mreekie added pm.GREI-d-1.5.1 NIH, yr1, aim5, task1: Standardize download metrics pm.GREI-d-1.5.2 NIH, yr1, aim5, task2: WG with other repositories to follow Make Data Count recommendations labels Mar 20, 2023
@cmbz cmbz moved this to NIH bklog items (Stefano) in IQSS Dataverse Project Jul 21, 2023
@cmbz
Copy link
Collaborator

cmbz commented Jul 21, 2023

  • I moved this issue into the Global Backlog in the NIH Backlog column, as per conversations with @siacus and current AIM 5 Year 2 plans.

@cmbz cmbz moved this from NIH bklog items (Stefano) to SPRINT- NEEDS SIZING in IQSS Dataverse Project Jul 24, 2023
@cmbz cmbz added Size: 33 A percentage of a sprint. and removed sz.Medium labels Jul 25, 2023
@cmbz cmbz added the Size: 10 A percentage of a sprint. label Mar 27, 2024
@cmbz cmbz moved this from In Progress 💻 to Done 🧹 in IQSS Dataverse Project Mar 28, 2024
@stevenwinship stevenwinship moved this from Done 🧹 to In Progress 💻 in IQSS Dataverse Project Apr 2, 2024
@cmbz cmbz moved this from In Progress 💻 to Done 🧹 in IQSS Dataverse Project Apr 4, 2024
@sbarbosadataverse
Copy link

sbarbosadataverse commented Apr 4, 2024

@pdurbin
Copy link
Member

pdurbin commented Aug 6, 2024

This was accidentally and automatically closed when IQSS/dataverse#10424 was merged. Re-opening.

@pdurbin pdurbin reopened this Aug 6, 2024
@cmbz cmbz added the GREI 4 Analytics and Reporting label Nov 20, 2024
@cmbz
Copy link
Collaborator

cmbz commented Jan 21, 2025

2025/01/21: @pdurbin @landreev @scolapasta and @stevenwinship what additional tasks are needed to finalize this work? Is it the work described here: IQSS/dataverse#10406?

@cmbz cmbz moved this to This Sprint 🏃‍♀️ 🏃 in IQSS Dataverse Project Jan 21, 2025
@cmbz cmbz added the FY25 Sprint 15 FY25 Sprint 15 (2025-01-15 - 2025-01-29) label Jan 21, 2025
@stevenwinship stevenwinship moved this from This Sprint 🏃‍♀️ 🏃 to In Progress 💻 in IQSS Dataverse Project Jan 22, 2025
@cmbz cmbz added the FY25 Sprint 16 FY25 Sprint 16 (2025-01-29 - 2025-02-12) label Jan 30, 2025
@pdurbin
Copy link
Member

pdurbin commented Jan 30, 2025

@stevenwinship stevenwinship added Size: 80 A percentage of a sprint. and removed Size: 10 A percentage of a sprint. labels Jan 30, 2025
@stevenwinship
Copy link
Contributor

bumped the size due to a bug in the counter-processor code

@cmbz cmbz added the FY25 Sprint 17 FY25 Sprint 17 (2025-02-12 - 2025-02-26) label Feb 12, 2025
@stevenwinship stevenwinship added Size: 10 A percentage of a sprint. and removed Size: 80 A percentage of a sprint. labels Feb 25, 2025
@cmbz cmbz added the FY25 Sprint 18 FY25 Sprint 18 (2025-02-26 - 2025-03-12) label Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 15 FY25 Sprint 15 (2025-01-15 - 2025-01-29) FY25 Sprint 16 FY25 Sprint 16 (2025-01-29 - 2025-02-12) FY25 Sprint 17 FY25 Sprint 17 (2025-02-12 - 2025-02-26) FY25 Sprint 18 FY25 Sprint 18 (2025-02-26 - 2025-03-12) GREI 4 Analytics and Reporting NIH OTA: 1.5.1 collection: 5 | 1.5.1 | Standardize download metrics for the Harvard Dataverse repository... pm.GREI-d-1.5.1 NIH, yr1, aim5, task1: Standardize download metrics pm.GREI-d-1.5.2 NIH, yr1, aim5, task2: WG with other repositories to follow Make Data Count recommendations Size: 10 A percentage of a sprint.
Projects
Status: In Progress 💻
Development

No branches or pull requests