Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Release Date #2916

Closed
techman83 opened this issue Nov 8, 2019 · 5 comments · Fixed by #3096
Closed

[Proposal] Release Date #2916

techman83 opened this issue Nov 8, 2019 · 5 comments · Fixed by #3096
Labels
Discussion needed Enhancement New features or functionality Spec Issues affecting the spec

Comments

@techman83
Copy link
Member

techman83 commented Nov 8, 2019

Problem

Knowing when there are new mod releases has been a long standing request, #1155 was opened in Mid 2015! Historically this has been challenging for two reasons:

  1. Not all of sources for metadata contain this information.
  2. One of the early design philosophies behind Netkan was that anyone could produce a ckan from a netkan in exactly the same manner as our bots. This had the side effect of it being stateless, which has meant (especially with the old infrastructure) that populating a field with a current time stamp would mean it would always change.

Since #2789, we have a number of ways we can tackle this and a solution should be straight forward!

Suggestions

Spec

Add 'release_date' to the schema, that complies with the following:

  • Field assumed to be UTC
  • Full date / time
  • ISO 8601 compliant (2019-11-07T01:29:59+00:00)

Example:

{
    "spec_version": 1,
    "comment": "flagmod",
    "identifier": "DogeCoinFlag",
    "name": "Dogecoin Flag",
    "release_date": "2015-01-05T01:56:00+00:00",
    "abstract": "Such flag. Very currency. Wow.",
    "description": "Adorn your craft with your favourite cryptocurrency. To the mün!",
    "author": "daviddwk",
    "license": "CC-BY",
    "resources": {
        "homepage": "https://www.reddit.com/r/dogecoin/comments/1tdlgg/i_made_a_more_accurate_dogecoin_and_a_ksp_flag/",
        "repository": "https://github.com/pjf/DogeCoinFlag"
    },
    "version": "v1.02",
    "ksp_version": "any",
    "download": "https://github.com/pjf/DogeCoinFlag/releases/download/v1.02/DogeCoinFlag-1.02.zip",
    "download_size": 53359,
    "download_hash": {
        "sha1": "BFB78381B5565E1AC7E9B2EB9C65B76EF7F6DF71",
        "sha256": "CF70CE1F988F908FB9CF641C1E47CDA2A30D5AD951A4811A5C0C5D30F6FD3BA9"
    },
    "download_content_type": "application/zip",
    "x_generated_by": "netkan"
}

NetKAN

It is desirable to keep Netkan as the source of the truth. It persists the ability for anyone to produce metadata that matches exactly what the bot produces.

As a consequence

  • It would be responsible for populating the release_date field
    • This would enable us to initially use the system timestamp, but also enhance our transformers to retrieve this information from the relevant APIs where available
  • An optional --release-date flag on Netkan means we can switch this on when we're ready

Indexer

Currently the indexer is using an MD5 generated by SQS to compare with the file on disk. @HebaruSan has already put a lot of work into creating a Ckan class we can lean on to do an equivalence check. For example, if an existing ckan is found, load it in as a Ckan and compare it to the received inflation. The Ckan class to have a custom __eq__ method that compares the objects, but ignores the release_date.

  • This keeps Netkan as the source of the truth
  • Allows us to write to disk exactly as it's received from Netkan
  • Keeps our existing logic, we just compare inflated == existing instead of the MD5s
  • We can do this in advance to enabling the release_date flag
  • This is trivial to write a test case for and doesn't require any complicated logic changes to the indexer

Manual Metadata

We still need to update manual data occasionally, so we should add some tests to our CI to ensure that we don't accidentally overwrite the release data field if it exists or submit metadata without the field if it doesn't.

Historical Metadata

Historical metadata should be treated as a separate exercise. We can dig out reasonable approximations from the git history and populate them in all in one pass (disabling webhooks/canceling the tests as needed). We did something similar when we populated the historical download hashes (#1682 + #1703).

Summary

It would be wonderful to get this long standing request added, so I welcome any critique and suggestions.

@techman83 techman83 added the Enhancement New features or functionality label Nov 8, 2019
@techman83 techman83 added Discussion needed Spec Issues affecting the spec labels Nov 8, 2019
@HebaruSan
Copy link
Member

Early thoughts...

Spec

Concur 100%, ISO 8601 UTC is the way to go. In fact, I think that's what KSP-CKAN/NetKAN-Infra#42 did.

NetKAN

I need to think about this idea a bit more. One point is that "the ability for anyone to produce metadata that matches exactly what the bot produces" is already sacrificed if the bot runs with --release-date set but users (mod authors) don't. Probably better to keep one consistent output format rather than toggling properties with flags.

Indexer

I'm lukewarm on building this around a special override of __eq__. Equality comparison is a generic operation, and excluding release_date from the comparison seems pretty arbitrary and specific to this one task. What are the chances that all future uses of the Ckan class will find it desirable to define equality in this way?

Luckily the overall architecture does not depend on that detail and would work just as well with a function like inflated.mostlyEqual(existing) (or whatever we want to call it).

Manual Metadata

What if we need to change release_date manually to fix a problem? I'm not a fan of locking down a particular field because we think somehow it will always be generated perfectly.

Historical Metadata

Sounds fine, probably easy enough to adapt KSP-CKAN/NetKAN-Infra#42 to create the script.

@HebaruSan
Copy link
Member

So since NetKAN sets release_date to the current time, and the Indexer just accepts/rejects that value by comparing all other properties, changes like KSP-CKAN/NetKAN#7482 (fixing spec_version) would bump the release date to the current time, right?

I'm thinking that's not desirable. There is nothing in those changes that would justify to a user why we were now saying that a previously available mod now has a newer release date. I think this was the reason why KSP-CKAN/NetKAN-Infra#42 copies forward the date property when the file already exists.

@techman83
Copy link
Member Author

I need to think about this idea a bit more. One point is that "the ability for anyone to produce metadata that matches exactly what the bot produces" is already sacrificed if the bot runs with --release-date set but users (mod authors) don't. Probably better to keep one consistent output format rather than toggling properties with flags.

We can make this an implicit default, but it would make a smoother transition if it's optional to begin with. Or just write the Indexer side of things first (making the equality check handle the lack of release_date accordingly) and not worry about it at all (I'm not tied to it).

I'm lukewarm on building this around a special override of eq. Equality comparison is a generic operation, and excluding release_date from the comparison seems pretty arbitrary and specific to this one task. What are the chances that all future uses of the Ckan class will find it desirable to define equality in this way?

That's a fair point. However we don't currently use equality in this manner and overriding these methods is common practice (they aren't special, just defaults inherited from object). We could even override it just for this class:

class CkanInflated(Ckan):

    def __eq__:
        fuzzy equality check here

I'm not tied to either way, but rather keeping our metadata_changed function as tightly scoped and simple as possible.

So since NetKAN sets release_date to the current time, and the Indexer just accepts/rejects that value by comparing all other properties, changes like KSP-CKAN/NetKAN#7482 (fixing spec_version) would bump the release date to the current time, right?

That's a good point and entirely undesirable. I was hoping for us to be able to get away with not altering the output of the NetKAN (a lot of effort has gone into making the output consistent, putting it through python's parser introduces variability), but maybe we'll have to.

NetKAN should still be the source of truth for this information, so that in future it can pull it from the relevant APIs when available.

What if we need to change release_date manually to fix a problem? I'm not a fan of locking down a particular field because we think somehow it will always be generated perfectly.

We wouldn't be locking it down, but it would be intentional at that point to accept it, rather than accidental. The CI is currently pass/fail, so we don't have much option beyond that.

@techman83
Copy link
Member Author

That's a good point and entirely undesirable. I was hoping for us to be able to get away with not altering the output of the NetKAN (a lot of effort has gone into making the output consistent, putting it through python's parser introduces variability), but maybe we'll have to.

Thinking on this a little more, we could add 2 methods to Ckan

  • write: Checks for file existence first, logs error and skips, else writes output to disk. Intention being that this would only be called when Ckan doesn't exist.
  • update: Takes a inflated_ckan arg (which expects a Ckan) and does the necessary business of updating the relevant fields. We can use a pop on release_date, but we might want to use a deepcopy to ensure we're only modifying our local copy of the dict in our received netkan.

This shifts most of the burden out of the indexer, into Ckan (or CkanInflated if we want to keep the logic local) and will make it reasonably straight forward to write some test cases for. Along with being something that can be done in advance of the Spec and Inflator components being completed.

@HebaruSan
Copy link
Member

HebaruSan commented May 22, 2020

Working on #3059 revealed that all but 15 currently indexed modules can get release_date from host APIs. I think that means we can just treat this as a normal property populated by the Inflator. For most modules, this gives us "the ability for anyone to produce metadata that matches exactly what the bot produces" for free, with no need for a special parameter to toggle it on or off. The Indexer can continue to compare hashes because inflating the same release will always produce the same output.

For the 15 mods that won't have release_date populated, maybe that's just too bad for them? Why torture ourselves with hugely complex logic for such a tiny group?

We can still do a historical import if we feel like it. I think this should be done after the Inflator is updated and we have the current releases' properties set, so we don't clobber them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion needed Enhancement New features or functionality Spec Issues affecting the spec
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants