Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce richer input #1615

Merged
merged 7 commits into from
Jun 6, 2024
Merged

feat: introduce richer input #1615

merged 7 commits into from
Jun 6, 2024

Conversation

bassosimone
Copy link
Contributor

@bassosimone bassosimone commented Jun 5, 2024

Most of the existing code is designed to move around lists of model.OOAPIURLInfo and measuring such URLs.

The model.OOAPIURLInfo type is like:

// internal/model/ooapi.go

type OOAPIURLInfo struct {
	CategoryCode string
	CountryCode string
	URL string
}

This design originally suited Web Connectivity but it's not good enough for richer input because it does not contain options.

With this diff, we move into the direction of richer input by replacing model.OOAPIURLInfo lists with lists of:

// internal/model/experiment.go

type ExperimentTarget struct {
	Category() string
	Country() string
	Input() string
}

where *model.OOAPIURLInfo implements model.ExperimentTarget in a trivial way and where, additionally:

  1. the InputLoader is modified to load ExperimentTarget;

  2. the Experiment is modify to measure an ExperimentTarget.

In addition to applying these changes, this diff also adapts the whole tree to use ExperimentTarget in all places and adds a trivial constructor to obtain OOAPIURLInfo when the category code and the country code are unknown.

With this diff merged, implementing richer input for real is a matter of implementing the following changes:

  1. the *registry.Factory has a new func field, defined by each experiment, that loads a list of ExperimentTarget;

  2. we have a library for input loading containing the same code that we currently use for the input loader;

  3. the InputLoader is gone and instead we use the factory (or its *engine.experimentBuilder wrapper) for input loading;

  4. we modify the ExperimentArgs passed to the ExperimentMeasurer to contain an additional field that is the ExperimentTarget we want to measure;

  5. each experiment that needs richer input type-casts from the ExperimentTarget interface to the concrete type that the experiment richer input should have and accesses any option.

Part of #1612.

This implementation strategy emerged while discussing this matter with @ainghazal, thank you so much for that!

@bassosimone bassosimone force-pushed the issue/2607d branch 3 times, most recently from f59d7fc to 5846dfe Compare June 5, 2024 19:55
Most of the existing code is designed to move around lists of
`model.OOAPIURLInfo` and measuring such URLs.

This design originally suited Web Connectivity but it's not good
enough for richer input because it does not contain options.

With this diff, we move into the direction of richer input by
replacing `model.OOAPIURLInfo` lists with lists of:

```Go
// internal/model/experiment.go

type ExperimentTarget struct {
	Category() string
	Country() string
	Input() string
}
```

where `*model.OOAPIURLInfo` implements `model.ExperimentTarget`
in a trivial way and where, additionally:

1. the `InputLoader` is modified to load `ExperimentTarget`;

2. the `Experiment` is modify to measure an `ExperimentTarget`.

In addition to applying these changes, this diff also adapts the
whole tree to use `ExperimentTarget` in all places and adds a
trivial constructor to obtain `OOAPIURLInfo` when the category
code and the country code are unknown.

With this diff merged, implementing richer input for real is a
matter of implementing the following changes:

1. the `*registry.Factory` has a new func field, defined by
each experiment, that loads a list of `ExperimentTarget`;

2. we have a library for input loading containing the same code
that we currently use for the input loader;

3. the `InputLoader` is gone and instead we use the factory (or its
`*engine.experimentBuilder` wrapper for input loading;

4. we modify the `ExperimentArgs` passed to the `ExperimentMeasurer`
to contain an additional field that is the `ExperimentTarget` we
want to measure;

5. each experiment that needs richer input type-casts from the
`ExperimentTarget` interface to the concrete type that the experiment
richer input should have and accesses any option.

Part of #1612.

This implementation strategy emerged while discussing this matter
with @ainghazal, thank you so much for that!
@@ -204,7 +207,7 @@ func (e *experiment) MeasureWithContext(ctx context.Context, input string) (*mod
// by adding the test keys etc. Please, note that, as of 2024-06-05, we're using
// the measurement Input to provide input to an experiment. We'll probably
// change this, when we'll have finished implementing richer input.
measurement := e.newMeasurement(input)
measurement := e.newMeasurement(target.Input())
Copy link
Contributor Author

@bassosimone bassosimone Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A follow-up diff will modify newMeasurement to also read the options from the target such that we can correctly field the top-level key named options of the measurement.

for _, URL := range input {
if _, err := url.Parse(URL); err != nil {
return nil, err
}
output = append(output, model.OOAPIURLInfo{
CategoryCode: "MISC", // hard to find a category
CountryCode: "XX", // representing no country
Copy link
Contributor Author

@bassosimone bassosimone Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, ZZ represents no country. I don't know where I did take XX from.

Is this something that check-in may possibly return?! (I am not sure here!)

Copy link
Contributor Author

@bassosimone bassosimone Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's what check-in would do: https://github.com/ooni/backend/blob/f7a93f477111c7278424996815b91e6300d66b83/api/ooniapi/prio.py#L182.

So, I guess we should revert back to using "ZZ", since it's what we'd get if check-in didn't know the country.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by 5830436

@bassosimone bassosimone marked this pull request as ready for review June 5, 2024 21:20
Copy link
Contributor

@DecFox DecFox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff looks good to me. I left a bikeshedding remark.

internal/engine/inputloader.go Outdated Show resolved Hide resolved
bassosimone and others added 2 commits June 6, 2024 11:23
Conflicts:
	internal/engine/inputloader.go
Copy link
Contributor

@DecFox DecFox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@bassosimone bassosimone merged commit 7f53b45 into master Jun 6, 2024
17 checks passed
@bassosimone bassosimone deleted the issue/2607d branch June 6, 2024 09:43
bassosimone added a commit that referenced this pull request Jun 6, 2024
This commit moves the engine.InputLoader type to a new package called
inputloading and adapts the naming to avoid stuttering.

The reason for moving InputLoader is that the engine package depends on
registry, and, per the plan described by the first richer input PR,
#1615, we want to move input loading
directly inside the registry. To this end, we need to move the input
loading feature outside of engine to avoid creating import loops.

We keep an integration test inside the engine package because it seems
such an integration test was checking both engine and the InputLoader
together. We may further refactor this test in the future.

Part of #1612
bassosimone added a commit that referenced this pull request Jun 6, 2024
This commit moves the engine.InputLoader type to a new package called
inputloading and adapts the naming to avoid stuttering. We therefore
have engine.InputLoaderSession => targetloading.Session and other
similar renames.

The reason for moving InputLoader is that the engine package depends on
registry, and, per the plan described by the first richer input PR,
#1615, we want to move target
loading directly inside the registry. To this end, we need to move the
target loading feature outside of engine to avoid creating import loops,
which prevent the code from compiling because Go does not support them.

While there, name the package targetloading rather than inputloading
since richer input is all about targets, where a target is defined by
the (input, options) tuple. Also, try to consistently rename types to
mention targets.

We keep an integration test inside the engine package because it seems
such an integration test was checking both engine and the Loader
together. We may further refactor this test in the future.

Part of #1612

---------

Co-authored-by: DecFox <[email protected]>
@bassosimone bassosimone added the 2024-06-richer-input Tracking 2024-06 richer input work label Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024-06-richer-input Tracking 2024-06 richer input work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants