Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept feature metrics from Experimenter #86

Closed
6 of 7 tasks
shell1 opened this issue Jun 2, 2020 · 10 comments · Fixed by #141
Closed
6 of 7 tasks

Accept feature metrics from Experimenter #86

shell1 opened this issue Jun 2, 2020 · 10 comments · Fixed by #141
Assignees
Labels

Comments

@shell1
Copy link
Collaborator

shell1 commented Jun 2, 2020

Some acceptance criteria for this issue:

@tdsmith tdsmith changed the title work out feature metrics Accept feature metrics from Experimenter Jun 2, 2020
@tdsmith tdsmith self-assigned this Jun 16, 2020
@tdsmith
Copy link
Contributor

tdsmith commented Jun 16, 2020

This depends on mozilla/nimbus-shared#38.

@tdsmith tdsmith added the mvp label Jun 17, 2020
@tdsmith tdsmith removed the blocked label Jul 9, 2020
@tdsmith
Copy link
Contributor

tdsmith commented Jul 9, 2020

[moved to top comment]

@mythmon
Copy link

mythmon commented Jul 9, 2020

Decide on a strategy for consuming feature definitions from nimbus-shared (depend on the package? read the repo directly from Github?)

nimbus-shared has a bunch of build system stuff that translates the raw files from the repo into more consumable formats. Pulling straight from Github would likely makes things quite difficult, depending on what you are doing.

@tdsmith
Copy link
Contributor

tdsmith commented Jul 9, 2020

All of the ways that Pensieve could keep its definitions up-to-date at runtime are a little annoying -- we can run pip in the docker container at runtime (though IIRC it's non-obvious to upgrade exactly one package with pip), we can try to build new docker images every time nimbus-shared is updated, or we could just slurp Github. I think door number 1 is probably our best choice but we'll see how painful it is.

@scholtzan
Copy link
Collaborator

Just wondering, would it make sense for the nimbus-shared CI to automatically push generated JSON schemas onto a separate branch in the repo? mozilla-pipeline-schemas is doing something similar (though not using the CI) with generated schemas being available on the generated-schemas branch.

@mythmon
Copy link

mythmon commented Jul 9, 2020

What if nimbus-shared automatically made PRs to Pensieve (and any other relevant projects) when a new version was released? That could be merged easily, and then I assume a new Docker image would be built then.

@scholtzan
Copy link
Collaborator

That would require for the PR to get merged as soon as possible, right? I could see it happen that nimbus-shared releases a new version, a PR gets created but not reviewed for a few days (PTOs, holidays, weekends, ...), pensieve continues running with the old version and runs with the outdated version (or potentially fails if there are some breaking changes, not sure if that case can happen).

@mythmon
Copy link

mythmon commented Jul 9, 2020

I'm hesitant to go with the idea of something pulling down JSON from a Github branch (or anywhere else) only because I've been planning the primary use case of this data to be via a code library that provides an API to the data so that it is consistent between projects. I've been thinking through this workflow for a lot longer than I've considered treating the contents of nimbus-shared as just a blob of JSON.

Package managers get us strong guarantees around versioning and lets use tools like package hashes and lock files to keep things consistent. If we just pull in JSON we lose out on all of that, and any guarantees about consistency that we want would have to be rebuilt. Having a consistent Python library means that we are all using the same schema validators and other methods of using the data.

I also think that I don't really understand the model here. What's the harm if Pensieve doesn't have the most up to date schemas? Is it really important that changes to schemas roll out across the Nimbus ecosystem in seconds? What's the "threat model" (to borrow some language from another realm) that having rapid updates protects us against that justifies building our own deployment mechanism?

@tdsmith
Copy link
Contributor

tdsmith commented Jul 9, 2020

If an experiment launches with a feature that Pensieve doesn't understand, the analysis job should fail once that experiment is first ready for analysis. So we need to catch up eventually; in practice we'll usually have a few days; requiring a human to click a button in that window carries some small risk. I think trying to hustle an experiment launch through after adding a Feature definition will be a common scenario.

If we make reasonable promises about Features in nimbus-shared (Feature data is append-only, Feature schemas must change in backwards-compatible ways; data landing to main on nimbus-shared means it's passed a schema validation) then the assurances that versioning and bilateral testing provide are less urgent.

@tdsmith
Copy link
Contributor

tdsmith commented Jul 24, 2020

Completing this is blocked on mozilla/experimenter#3085.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants