Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate release_date property #42

Closed
wants to merge 1 commit into from

Conversation

HebaruSan
Copy link
Member

@HebaruSan HebaruSan commented Sep 24, 2019

Motivation

Users ask fairly frequently for the ability to sort modules by release date. Up to now we can't provide that info because the bot is stateless, and keeping track of the first time that we found a version requires state.

Changes

Now the Indexer will add a release_date property to each module that it processes:

  • If the file exists, it will pull the release_date property forward unchanged
  • If the file exists but has no release_date property, it will set it to the file creation date in git
  • If the file is brand new, it will set the release_date property to now

Note that the schema/spec currently do not have a release_date property. It will have to be added before this can be merged. In the meantime I want to see how many tests fail and why.

Fixes #6.
Fixes KSP-CKAN/CKAN#2916.

@HebaruSan HebaruSan added the Enhancement New feature or request label Sep 24, 2019
@HebaruSan HebaruSan requested a review from techman83 September 24, 2019 13:53
@techman83
Copy link
Member

I have 2 comments without looking too closely at the code

  • This is a break away from our long established single source of generation and letting the indexer make decisions
  • The indexer should not be responsible for historical data

Now I'm not opposed to adding this to the meta data, but we really must flesh out the delineation of responsibility.

  • We probably should have the discussion on the feature as it feels a little beyond the scope of an implementation detail.
  • Historical data should be out of scope for the indexer. We have done this before with the hashes, we crawled all the metadata, added the hashes and mirrored them. In this case we can probably do it in a single pass as it is purely a metadata change, rather than downloading thousands of mods and calculating checksums.

@DasSkelett DasSkelett added the Indexer Receives inflated modules and adds them to CKAN-meta label Oct 23, 2019
@HebaruSan HebaruSan force-pushed the feature/release-date branch 2 times, most recently from eee2254 to 661989a Compare November 3, 2019 15:43
@HebaruSan HebaruSan force-pushed the feature/release-date branch 5 times, most recently from 48222bc to 61b6b46 Compare November 6, 2019 15:45
@HebaruSan
Copy link
Member Author

This is a break away from our long established single source of generation and letting the indexer make decisions

I think that's unavoidable. See the comments in KSP-CKAN/CKAN#2916; the Inflator doesn't know whether a file that it has generated already exists, so some other entity has to be in charge of copying forward already established release dates.

The indexer should not be responsible for historical data

What's the reason behind that? The Indexer is built around a Git repo, so the historical data is all right there, and it is responsible for integrating new data into the existing database. Seems like a pretty natural fit.

@HebaruSan HebaruSan force-pushed the feature/release-date branch from 61b6b46 to 78b1550 Compare November 14, 2019 04:46
@techman83
Copy link
Member

I think that's unavoidable. See the comments in KSP-CKAN/CKAN#2916; the Inflator doesn't know whether a file that it has generated already exists, so some other entity has to be in charge of copying forward already established release dates.

Yes, I think we're going to have to step away from this slightly (as. But I feel very strongly the source of this truth needs to be the inflator. It gives us scope to implement the release dates from the APIs we consume where available and leaves the indexer with the sole job of deciding of handling/publishing changes. As noted in KSP-CKAN/CKAN#2916

What's the reason behind that? The Indexer is built around a Git repo, so the historical data is all right there, and it is responsible for integrating new data into the existing database. Seems like a pretty natural fit.

To clarify, I'm not opposed to creating a stub like I did for the restore_status. But it doesn't belong in the main indexing logic as it will be a once off task. The method itself looks ok to me, which we could use in a 'populate_historical_release_dates' if we wanted.

It also relies on a deep clone (which we could make optional), which we just don't need in regular operations. Part of my reasoning for using shallow cloning was to reduce our memory usage as during startup, as time goes on and as we add more services that'll become more important.

@HebaruSan
Copy link
Member Author

Ahh right, shallow clones. FWIW, I think I checked this before, but just now I got 137.7 MiB for a deep clone and 100.1 MiB for shallow, a difference of 25% of the size of the deep clone.

@techman83
Copy link
Member

Ahh right, shallow clones. FWIW, I think I checked this before, but just now I got 137.7 MiB for a deep clone and 100.1 MiB for shallow, a difference of 25% of the size of the deep clone

That's space on disk, which in reality is neither here nor there. Memory usage during clone is where the bigger difference is.

@HebaruSan
Copy link
Member Author

Ahh, OK. Looks like that's on the order of a 45% difference:

$ /usr/bin/time --format='%M KiB' git clone https://github.com/KSP-CKAN/CKAN-meta.git
48804 KiB

$ /usr/bin/time --format='%M KiB' git clone --depth 1 https://github.com/KSP-CKAN/CKAN-meta.git
26616 KiB

@HebaruSan
Copy link
Member Author

Replaced by KSP-CKAN/CKAN#3059.

@HebaruSan HebaruSan closed this May 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request Indexer Receives inflated modules and adds them to CKAN-meta
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] Release Date [Feature] Generate release_date property in Indexer
3 participants