Output refactoring #1869

na-- · 2021-02-23T10:13:24Z

This completely fixes #917, fixes #1423, and fixes #1606. It also mostly fixes #1075. I think we can close it now and do the transition of the remaining Collectors to Outputs in separate PRs, since they should be perfectly usable with the simple Output adapter that wraps them for now.

This PR already includes the new json output and an adapter for every other Collector-type one. The current plan is to also move the cloud and csv outputs to the new Output interface for k6 v0.31.0, while influxdb, statsd, datadog, and kafka are left with the old Collector interface and moved in k6 v0.32.0. The major bugs (#1075 and #1606) should be fixed for all outputs in v0.31.0 regardless, we're just leaving the semi-mechanical translation for later. We can use the additional time to also revisit some other issues (e.g. #1801, #1052, #737 and many more).

This also should allow k6 to support xk6 extensions for outputs.

codecov-io · 2021-02-23T12:58:22Z

Codecov Report

Merging #1869 (5e41f5e) into master (51790fc) will decrease coverage by 0.73%.
The diff coverage is 26.79%.

@@            Coverage Diff             @@
##           master    #1869      +/-   ##
==========================================
- Coverage   71.50%   70.76%   -0.74%     
==========================================
  Files         180      181       +1     
  Lines       13969    14163     +194     
==========================================
+ Hits         9988    10022      +34     
- Misses       3344     3518     +174     
+ Partials      637      623      -14

Flag	Coverage Δ
ubuntu	`70.76% <26.79%> (-0.71%)`	⬇️
windows	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
cloudapi/config.go	`41.55% <0.00%> (-8.45%)`	⬇️
cmd/cloud.go	`4.83% <0.00%> (-0.03%)`	⬇️
cmd/login_cloud.go	`0.00% <0.00%> (ø)`
cmd/login_influxdb.go	`0.00% <0.00%> (ø)`
cmd/outputs.go	`0.00% <0.00%> (ø)`
cmd/run.go	`11.96% <0.00%> (-0.06%)`	⬇️
cmd/ui.go	`22.59% <0.00%> (+0.37%)`	⬆️
output/extensions.go	`0.00% <0.00%> (ø)`
stats/csv/config.go	`53.48% <0.00%> (-31.70%)`	⬇️
stats/datadog/collector.go	`33.33% <0.00%> (-40.87%)`	⬇️
... and 27 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51790fc...5e41f5e. Read the comment docs.

output/helpers.go

cmd/outputs.go

output/json/json.go

output/helpers_test.go

mstoykov

LGTM. I will need to try it out some as trying to figure out if we've broken a config without trying to use it ... is too complicated :P

na-- · 2021-02-24T09:25:50Z

Hmm actually with the changes from e5afaa3 that @mstoykov suggested, I think even #1423 and #1606 are fixed... 🎉 I'll add some tests for the JSON output and remove the oldjson one and run some manual tests and this PR should be ready for merging. I'd still like to transition at least the csv and cloud Collectors to the new Output interface before k6 v0.31.0, but we can leave the other ones for after.

output/helpers.go

mstoykov · 2021-02-24T16:26:11Z

Did some runs with -o json and -o cloud - no problems and it did pickup some cloud settings.
So it LGTM to be merged.

imiric

Mostly LGTM. I left a few comments we can discuss, and the only blocker for me would be the way built-in outputs are "registered" and that whole built-in/external distinction.

Some general notes:

I like the output.Params approach and every output plucking what it needs from it.
The config consolidation in each output feels like a future wart, but maybe we can't improve that before Configuration issues #883.
👍 for abandoning the "collector" naming and standardizing everything on "output".
The PeriodicFlusher is a neat abstraction to avoid every output re-implementing it. 👍

core/engine.go

imiric · 2021-02-24T11:43:23Z

core/engine.go

+	e.logger.Debugf("Stopping %d outputs...", upToID)
+	for i := 0; i < upToID; i++ {
+		if err := e.outputs[i].Stop(); err != nil {
+			e.logger.WithError(err).Errorf("Stopping output %d failed", i)


Maybe add Name() to the Output interface so that this message can be friendlier?

The problem here is that I don't have a Name(), I have a Description(), which wouldn't be suitable here 😞 ... To improve the error message, I either have to change the output.Output interface and force every output to expose a Name(), or I'd have to add a wrapper struct for outputs in the Engine that contains its type (or whole Params). And I can't just use a map, since users can have multiple outputs with the same type.

I either have to change the output.Output interface and force every output to expose a Name()

👍 That sounds OK to me. Description() is too "loose" of an interface and outputs can return whatever they want there. The description should rather be assembled from different output information so that we can control how it's rendered. Maybe even Version() should be added, and make it no-op for built-in outputs.

Anyway, it's a minor thing, not a big deal.

hmm 👎 for Version()... It would be something that only extensions would use, and it would add unnecessary complexity for writing them. Only xk6 cares about the versions for now, and if we ever start caring, I'd rather figure out some way to extract their versions from runtime/debug.BuildInfo than forcing extension authors to deal with built-time linker flags or bumping an in-code Version constant on every release of their extension.

about Name(), cc @mstoykov what do you think? I'd personally rather have a wrapper struct around output.Output than forcing every output to have one extra helper method for information we already have elsewhere.

yeah I also think it's better if we can just use the name we already have ... otherwise I would argue that we should get the name from Name() everywhere.

I just realized is that this makes it ... harder to reason about if you decide to have multiple outputs of the same type .... is this even supported now?

I am not certain that was supported even before that, and I am not certain is all that needed, but seems like something that can be useful especially if outputs can filter metrics on their own. So you basically can (for example) shard based on something between multiple influxdb instances ?

otherwise I would argue that we should get the name from Name() everywhere.

We can't, since Name() would be a method of an output instance, and we can't instantiate the output without giving it some valid parameters first. That's why we have the output type as a string in https://github.com/loadimpact/k6/blob/7e1bc5dcab910f0d197a63e843e5591de4c60423/output/extensions/registry.go#L56

The only way around that is to have 2 interfaces, the current Output and a new OutputConstructor which has a Name() string property and a New(Params) Output method. I don't mind that refactoring, but it would require every output to have these two interfaces, and likely an empty struct with the second one that is passed to the Register() function.

I just realized is that this makes it ... harder to reason about if you decide to have multiple outputs of the same type .... is this even supported now?

Yes, it was supported even before this PR... The main problem with it is that JSON and environment variable configuration would be shared between all outputs, because they are based on the type names. Only the CLI flag config could be distinct. I've explored this issue and potential solution a bit before (#587 (comment) and in #883), but I don't think it's a very urgent problem to solve.

imiric · 2021-02-25T09:17:21Z

cloudapi/config.go

+
+// GetConsolidatedConfig combines the default config values with the JSON config
+// values and environment variables and returns the final result.
+func GetConsolidatedConfig(jsonRawConf json.RawMessage, env map[string]string) (Config, error) {


This feels like some tech debt we'll have to address when working on #883. :-/

Ideally outputs shouldn't have to deal with raw configuration sources and this consolidation should happen before they're initialized. It is a bit cleaner and more consistent than before, but there's still a lot of duplication and slight differences between each implementation which is probably what made it difficult to abstract away.

this consolidation should happen before they're initialized

It does 😕 This is precisely what this bit of code does: https://github.com/loadimpact/k6/blob/7e1bc5dcab910f0d197a63e843e5591de4c60423/cmd/outputs.go#L68-L72

And this isn't new technical debt, it was already part of k6 and #883, it was just squirreled away in cmd/ in a much, much worse way, see these snippets from master:
https://github.com/loadimpact/k6/blob/51790fc8df37113ac20b586b28051e21672dab16/cmd/config.go#L68
https://github.com/loadimpact/k6/blob/51790fc8df37113ac20b586b28051e21672dab16/cmd/config.go#L96
https://github.com/loadimpact/k6/blob/51790fc8df37113ac20b586b28051e21672dab16/cmd/config.go#L174
https://github.com/loadimpact/k6/blob/51790fc8df37113ac20b586b28051e21672dab16/cmd/config.go#L196
https://github.com/loadimpact/k6/blob/51790fc8df37113ac20b586b28051e21672dab16/cmd/collectors.go#L90-L97

Though I just saw that I missed config.Name = null.StringFrom(arg) from that last code snippet, so I'll fix that... 😅

But yeah, we have to fix this configuration mess - the current state is definitely not the end goal, it is just slightly more sane than before, #883 is still very much required.

It does 😕 This is precisely what this bit of code does

I know. My point was that each output package shouldn't handle raw configuration at all, and should just receive the consolidated values, which should be done much earlier and only once / in a single location.

But yeah, maybe something for #883 then. At least it's consistently bad now 😄

each output package shouldn't handle raw configuration at all,

We can debate this when we start dealing with #883, but if I correctly understand what you want, I am not sure I agree. I don't think having a central configuration clearing house is wise - why should we have one huge module that deals with every type of configuration? Why not have each component expose a unified interface for dealing with raw configuration sources and returning errors, but leave the particular configuration logic internal for every module?

Besides, this will never work with extensions, there is no way for k6 to know how they are configured, so the only way is to to pass the "raw" values to the extension and return any resulting validation values, since it's the only thing that knows its own config.

Btw I think that the new outputs would be ideal testing grounds for potential #883 solutions. They are now self-contained, and each one receives a raw JSON chunk, a CLI argument, and a map of environment variables in their constructor. So we can start converting their configuration parsing and validation to something saner (that avoids Apply() and envconfig and all of the other issues) one by one.

Fixed the cloud regression in 626694d

cmd/login_cloud.go

imiric · 2021-02-25T09:51:05Z

cmd/outputs.go

+func getAllOutputConstructors() (map[string]func(output.Params) (output.Output, error), error) {
+	// Start with the built-in outputs
+	result := map[string]func(output.Params) (output.Output, error){
+		"json": json.New,


Hhmm why aren't all built-in outputs registered the same way as "extension" outputs are? I think this should be consistent with the way JS modules are registered, and is much more elegant than a static map IMO. Plus there would be no implementation differences between built-in and external outputs, making potential external->core merges easier.

I started that way, but then I refactored it the way it currently is, for several different reasons, and I'd advocate maybe refactoring the JS extensions as well in the future. Here are the reasons:

This allows us to print much better error messages instead of panicking when there are name conflicts, see below.

It allows us to very easily distinguish between built-in outputs and extension, which might be very useful if we want to send the latter in the usage reports.

Package init() functions are a bit of an antipattern (see this, for example), and in this case it's completely unnecessary to have them for built-in modules.

This doesn't look more elegant than a static map, to me it looks ugly as hell, I'd normally consider such imports code smells in a Go program 😅 : https://github.com/loadimpact/k6/blob/51790fc8df37113ac20b586b28051e21672dab16/js/modules.go#L23-L34

You have a valid argument for easier extension->core merges, but I think this is a minor concern. Cleaning up the extension would take 1 minute of just removing the init() from it and adding the same constructor that was referenced in it to this map instead

I agree, but those are arguments for where Register() is being called, which could just as well happen in a way that avoids these anti-patterns. Where do you envision Register() being called from for external outputs now?

My main issue is there being a distinction between internal and external outputs, but if you think there are benefits to that, I'm OK with it.

Where do you envision Register() being called from for external outputs now?

For extensions it would have to be in init() again, that unfortunately can't be easily changed due to the xk6 architecture. It's the simplest way to go about it, and is good enough for extensions. We just don't need to do the same in k6, we have perfect knowledge of the built-in outputs, we can even write a simple //go:generate helper to generate the list for us, if the number of built-in outputs ever grows substantially.

I am mostly for everything to be done the same way (in this case in init()) so we don't by accident break something and then for it to work for our "special" internal ones but not for the extensions.

Introducing externally developed extensions in the base k6 by removing init is not as interesting IMO but it also ... kind of prevents us from introducing module/outputs by still letting them be outside the k6 repo(and be able to be imported as external) and by just importing them and registering as "internal". Or more accurately while we can import them and register them as internal, the fact that they have init basically means that they will be registered as non 'internal' as well, which I don't like ;).

I think I brought this up back when xk6 was designed but there is nothing stopping us from requiring that parh.to/extension/module/name to have a function NewModule() interface{} and for xk6 to generate additional_extensions.go file in the root of the k6 project that just calls the correct Register function based on something ... for example the name or the returned type. This even means that a single "extension" can both register a js module and an output if we go the "name" variant.

The above suggestion can still be made and combined with #1802 and/or #1858 both of which will likely require some changes to the already written extensions.

With the above we can completely remove the init from anywhere but additional_extensions.go

or just have it instead of adding additional files to add to change main.go for example

or for that matter just generate static maps in the corresponding files as in nedomi

we don't by accident break something and then for it to work for our "special" internal ones but not for the extensions.

I doubt we'll ever touch the extension registration after we have it set up once, and it already uses the same interfaces as internal modules, so this is mostly a theoretical concern. In any case, the proper way to ensure extension compatibility would be to to have a CI task that actually uses xk6. I created #1873

This even means that a single "extension" can both register a js module and an output if we go the "name" variant.

I am confused, this can easily be done now 😕 A single extension repo can register however many extension types and modules it wants to. Right now xk6 essentially does a import _ whatever and that's all it does, I don't think it cares one bit for what every extension does. So I don't see a benefit for the extra complexity.

cmd/outputs.go

output/extensions/registry.go

output/json/wrapper.go

cmd/ui.go

imiric

I still have some minor reservations about some of the design decisions here, but you've explained the reasons behind them, so no need to keep discussing them.

This also should allow k6 to support xk6 extensions for outputs. So far the JSON output is the only one that has been fully transitioned to the new output.Output interface, the rest will be done in future PRs.

na-- requested review from mstoykov and imiric February 23, 2021 10:13

Use a new interface for Outputs and adapt the old Collectors to it

b7abb73

This also should allow k6 to support xk6 extensions for outputs.

na-- force-pushed the output-refactoring-2 branch from b9e01e5 to b7abb73 Compare February 23, 2021 12:49

na-- mentioned this pull request Feb 23, 2021

Require cloud scripts to always have a name #1870

Merged

na-- added this to the v0.31.0 milestone Feb 23, 2021

Improve output.SampleBuffer and add some tests

8bdb2b2