-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: libp2p implementation interoperability dashboard on various dimensions. Uses testplans-dsl. #85
Conversation
ping-with-dsl/go/v0.23.0/Makefile
Outdated
@@ -0,0 +1,5 @@ | |||
image.json: Dockerfile main.go go.mod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long term, I see CI building these images, uploading them to some container registry, then committing this json file directly into the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share the same vision apart from the committing to the repo part. Can you elaborate why do you think that is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I don't quite understand the purpose of the JSON files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate why do you think that is necessary?
Depends on our use case, but maybe for this it's not worth it. It would be nice if you could pull the images built in CI if your code doesn't affect these things. It's like using it as a cache.
In general, I don't quite understand the purpose of the JSON files.
It's the simplest way to record the container image id. Since it's json we can import it in our typescript code directly as opposed to reading and parsing a file manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this. I am in favor of the simpler API.
Do I understand correctly that the (automated) steps now are:
-
User builds images resulting in images and
image.json
files. -
Composer generating test plans based on
versions.ts
which itself references aboveimage.json
files. It does so by loading the information into a sqlite database, querying it and returning the test matrix. -
User runs
npm run tests
which triggers thedsl
package. -
The
dsl
package takes the generated test matrix, transforms it into a file format that Testground understands, sets up the symlinks, and then calls the Testground binary. -
From here Testground takes over, itself walking through its various abstraction layers.
I might be missing something. From the above, I am not sure introducing yet another layer of abstraction (dsl) is the way forward.
I understand the strategy of creating a simple API (dsl) on top of the much more complex Testground API first, and only then iteratively simplifying the Testground API. I am afraid that we do the former but never get to the latter, resulting in yet another layer.
In my eyes this project is important but not urgent. At this point I think we know of multiple changes (no symlinking, prebuild images, ...) that we want inside of Testground for sure.
Instead of adding another layer (dsl) what do you think of pushing these changes upstream (Testground) right away?
I am sure I don't have the full picture at this point. I suggest we discuss this in our 1on1 on Monday @MarcoPolo and then follow up here.
94f4390
to
0c5cd29
Compare
I think this project is urgent in that we've committed to doing this this year, and adding more time may lead to a rabbit hole rather than solving the actual problem of creating this multidimensional test cases.
I want to try this API first before committing to it. I also don't want to block on first changing testground before being able to advance with this project. |
This is ready for review. I'll keep iterating on making the CI a bit better, but this does run in CI. Recommendation for reviewer:
|
@mxinden / @laurentsenta Could I get a review here when you get a chance please :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MarcoPolo for pushing this forward.
Do I understand correctly that:
- This would entirely replace the
ping/
folder? I would be in favor of that. ping-with-dsl/
is only here to showcase the dsl and would be removed before merging as well?
ping-with-dsl/testplans.ts
Outdated
] | ||
} | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're creating a DSL / configuration abstraction above Compositions:
This PR does two things: move the build step out of Testground and introduce a new configuration format.
The new configuration looks MUCH simpler because it is: we removed the essential complexity of the building configuration.
With a regular composition (no change to Testground required), this is what the same workflow would look like:
https://gist.github.com/laurentsenta/bde72a8ad2e8cfdcecfb3add75338a44
Turned into yaml which is more familiar than toml:
https://gist.github.com/laurentsenta/f85b2249b1598586a01071b61862e68a
I would recommend not going for a complete "DSL" abstraction yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a regular composition (no change to Testground required), this is what the same workflow would look like:
I know this is an example, but this doesn't work today. You still need a plan and a test case. These are vestiges that I think unnecessarily complicated.
If we want to iterate to something closer to this DSL (minimal config) and change this DSL later, that's fine by me. But I don't want to block this work on first changing testground.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple other things when comparing this to a regular composition file:
- The DSL leverages typescript to typecheck the runtime params
- The composition file has the idea of "groups" and "runs" as separate things. This is vestige of having testground build things. But really we only need "runs" here.
Here are both configs in JSON: https://gist.github.com/MarcoPolo/e4488ce57c58469298918af896ded192. This is subjective, but I find the DSL version simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is an example, but this doesn't work today. You still need a plan and a test case. These are vestiges that I think unnecessarily complicated.
The example would work, except for a typo with artifacts maybe.
I'm confused because we could do:
params -- (generator) --> composition
instead of:
params -- (generator) --> "dsl" -- (transformer) --> composition
- typing is supported just as well, type the boundaries of your generator, you could even have fun with generics there,
- the main change in the DSL is that we're going to duplicate every group, so code generation will be the default, no one is going to manually edit this file.
- there's nothing to block for in Testground.
It seems more efficient. Let's have a chat async on why we can't do this and why we call this a DSL instead of a new configuration format, Feel free to resolve this whenever you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example would work, except for a typo with artifacts maybe.
At the very least you currently need a manifest.toml
that defines a the plan name, defaults, builders, runners, and testcases.
the main change in the DSL is that we're going to duplicate every group, so code generation will be the default, no one is going to manually edit this file.
I don't follow, sorry.
there's nothing to block for in Testground.
What's blocking now?
why we call this a DSL instead of a new configuration format,
A rose by any other name. Happy to call this a new configuration format.
Thanks for sharing and pinging for review @MarcoPolo, The rest looks great, especially that move from templates to ts generation! (created features requests from that PR in testground/testground#1546 feel free to add what I missed). |
Not yet. I think that would be a great next step, but that's not a goal right now.
I wasn't planning on removing it since it's useful as a stepping stone to the DSL, but I can remove it if requested. |
I think some blockers right now for this are:
There's a question of whether we should merge the DSL changes into testground first. I think our goal here is to have a straightforward interop testing system. To accomplish this we are only blocked on the things above. There's no reason to block on making the upstream changes. This doesn't mean that we will never upstream these changes. Likely other users of testground want a simpler configuration. But we should prioritize the things we all agreed are important in the roadmap first. |
@Jorropo caught a bug in go-libp2p v0.24.0 where a node configured to use Noise could dial a node configured to use TLS (libp2p/go-libp2p#1946) In a subsequent pull request, we should extend the test cases in here to test the non happy-path (either as a part of the same github action as these or as a separate one that runs as a regression test in a nightly job.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the direction this is going, left a couple of comments :)
// Lots of duplication because I, Marco, couldn't figure out how to make the type system happy. | ||
let boxed_transport = match transport_param.as_str() { | ||
"tcp" => { | ||
let builder = TcpTransport::new(GenTcpConfig::new()) | ||
.upgrade(libp2p::core::upgrade::Version::V1Lazy); | ||
|
||
let authenticated = match secure_channel_param.as_str() { | ||
"noise" => builder | ||
.authenticate(libp2pv0490::noise::NoiseAuthenticated::xx(&local_key).unwrap()), | ||
_ => panic!("Unsupported secure channel"), | ||
}; | ||
|
||
match muxer_param.as_str() { | ||
"yamux" => authenticated | ||
.multiplex(yamux::YamuxConfig::default()) | ||
.timeout(Duration::from_secs(5)) | ||
.boxed(), | ||
"mplex" => authenticated | ||
.multiplex(mplex::MplexConfig::default()) | ||
.timeout(Duration::from_secs(5)) | ||
.boxed(), | ||
_ => panic!("Unsupported muxer"), | ||
} | ||
} | ||
"ws" => { | ||
let builder = WsConfig::new(TcpTransport::new(GenTcpConfig::new())) | ||
.upgrade(libp2p::core::upgrade::Version::V1Lazy); | ||
|
||
let authenticated = match secure_channel_param.as_str() { | ||
"noise" => builder | ||
.authenticate(libp2pv0490::noise::NoiseAuthenticated::xx(&local_key).unwrap()), | ||
_ => panic!("Unsupported secure channel"), | ||
}; | ||
|
||
match muxer_param.as_str() { | ||
"yamux" => authenticated | ||
.multiplex(yamux::YamuxConfig::default()) | ||
.timeout(Duration::from_secs(5)) | ||
.boxed(), | ||
"mplex" => authenticated | ||
.multiplex(mplex::MplexConfig::default()) | ||
.timeout(Duration::from_secs(5)) | ||
.boxed(), | ||
_ => panic!("Unsupported muxer"), | ||
} | ||
} | ||
_ => panic!("Unsupported"), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is, do we want to have a node that only speaks the requested configuration or should it just be compatible with it.
Even in the first case, the easier option here is probably to use OptionalTransport
and OptionalUpgrade
. That allows you move the conditional code into a value that you can compose as if it were unconditional!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want something that only speaks the requested configuration. That way we are sure we are testing the requested configuration and not accidentally making something else work.
This code is definitely hacked together. I'll take a look at the OptionalTransport, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jxs Do you think you can help me clean this up? You'll certainly do this faster than me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to using an EitherUpgrade
import { Version } from "../versions"; | ||
|
||
|
||
export async function buildTestplans(versions: Array<Version>): Promise<TestPlans> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. This whole thing is the test, isn't it?
ping-with-dsl/go/v0.23.0/Makefile
Outdated
@@ -0,0 +1,5 @@ | |||
image.json: Dockerfile main.go go.mod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share the same vision apart from the committing to the repo part. Can you elaborate why do you think that is necessary?
await fs.mkdir(dir) | ||
await fs.mkdir(path.join(dir, "testplans")) | ||
|
||
const filesToCreate = transform(testplans) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within transform
we assume that all containers have been built already (which they are currently via the Makefile). What do you think of moving the building of the images also into here?
I am asking because a next step for this could be to pull images instead of building them but that would probably be a bit clunky to express in the Makefile so it might be better done in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testground/testground#1522 is what I am aiming at!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's best if we don't try to build here. I think that's a sizable bit of complexity in testground. The Makefile is just one strategy to build the container image. Using Nix can be another. Maybe we want to use warpforge at some point.
It's nice that this just has to reference some existing image id, and then it'll work.
ping-with-dsl/go/v0.23.0/Makefile
Outdated
@@ -0,0 +1,5 @@ | |||
image.json: Dockerfile main.go go.mod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I don't quite understand the purpose of the JSON files.
9d9ca46
to
d0e194b
Compare
@thomaseizinger I haven’t started investigating this issue yet, but you might have a lot more context here and can probably help me out a ton. I’m seeing flakiness in the Rust tests. This shows up mostly in the rust x go, but also a rust x rust test (https://github.com/libp2p/test-plans/actions/runs/3798668092 scroll to summary below). And has shown up before this test here.
edit: this is due to a race condition in the Go test that wouldn't actually wait until it got all connections before emitting "connected". Fixed now. |
dsl/src/testground-runner.ts
Outdated
) | ||
await fs.writeFile(path.join(dir, ".env.toml"), ` | ||
[daemon.scheduler] | ||
task_timeout_min = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better for this to be configured in the DSL rather than hardcoded here.
6644184
to
102f01f
Compare
AND ha.hash < hb.hash | ||
-- quic only uses its own muxer/securechannel | ||
AND a.transport != "quic" | ||
AND a.transport != "quic-v1";`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also applies to webtransport, doesn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. Although the quic-v1/webtransport vs quic/webtransport change is why I haven't added webtransport yet here.
I think I share @mxinden's question here. The multidimensional configuration looks, whereas ping-with-dsl contains the @MarcoPolo Can you clarify the relationship between ping-with-dsl and multidim? |
Lets get rid of ping-with-dsl. It was mostly used as an example for now and I think it complicates things. |
@laurentsenta / @mxinden CI is passing and I've done a pass at some cleanup and addressed comments. Could I get another review here please? In a follow up PR I'm going to:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the missing go-libp2p and rust-libp2p versions, this looks good to me.
@jxs in case you have some capacity, can you help with the outstanding refactorings in the Rust transport construction? If not, I am happy to jump in.
@@ -50,6 +53,7 @@ runs: | |||
shell: bash | |||
|
|||
- name: Run the daemon or configure the client | |||
if: ${{ inputs.start_testground == 'true' }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am not mistaken, with this merged, nothing would require this to be true
anymore, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing ping tests use the run-composition
GH action which depends on this being true.
The multidim tests starts its own testground daemon so that we can control the home location and the .env.toml
file. In this case we don't want the setup-testground
action to start the testground daemon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing ping tests use the
run-composition
GH action which depends on this being true.
Do I understand correctly that long term that GH action would go away? Do I also understand correctly that we can not yet remove it here given that that would break CIs in rust-libp2p and go-libp2p?
For the sake of cleaning it up right away, would that breakage of our CI be that bad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that long term that GH action would go away?
I wonder if there's still value in these existing ping tests. We get most of the value from them in the new multidim interop tests. The other value is having a fast test that checks if current master works against the other implementations. This GH action would go away when we stop using the ping tests. Right now there's still some value in the ping tests that isn't captured by this PR.
Do I also understand correctly that we can not yet remove it here given that that would break CIs in rust-libp2p and go-libp2p?
Yeah, and we wouldn't have anything to replace them.
For the sake of cleaning it up right away, would that breakage of our CI be that bad?
We don't have a plan right now to replace the existing ping tests. I'd rather focus on replacing it and then update the other repo's CI rather than breaking CI and then forcing us to fix it in a reactive way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other value is having a fast test that checks if current master works against the other implementations.
That would as well happen via the multidimensional tests, no?
Right now there's still some value in the ping tests that isn't captured by this PR.
What value would they provide long term?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would as well happen via the multidimensional tests, no?
Long term yes. Right now, no.
What value would they provide long term?
Long term, we should get rid of them. Right now though, I think the priority is to have the multidim interop tests. Replacing the ping tests imo is lower priority.
What the ping test gives us that multidim doesn't right now:
- CI testing in the impl repo.
- Interop testing with Nim (I haven't spent any time getting Nim into the multidim)
Long term, multidim will do this and we can get rid of the ping tests. I would save that for a future PR rather than add to this one. And I would avoid removing the existing ping tests until we're ready to replace them.
I'm a bit worried on the scalability of testground for this. Looking at the last run: https://github.com/libp2p/test-plans/actions/runs/3833933397/jobs/6525831766 each test took and average of 5.5 seconds. I can reduce the time a bit here by only doing a single ping instead of the two currently. and there are probably some other optimizations I can make. What would be really nice is to keep the simplicity of two instances where A pings B (not a test where we have many instances all pinging each other), have these run faster, and maybe have these run in parallel. What can we do to reduce the 5.5 seconds? I'm not sure why this ping test would take more than a second to complete. So my guess here is that there is some overhead with testground for each test. |
exit 1 | ||
else | ||
exit 0 | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That check might break silently if we change the dashboard format, we could use a different exit code in renderResults
to mark the case "the render completed, but it contains an error".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll defer this to a future change.
No the dial didn't worked, the bug I caught was the error was shadowed (so it was returning a |
closing in favor of #97 |
This PR adds interop testing across various dimensions (transports, muxer, secure channel) against the Go and Rust libp2p implementations. This is the foundation work on this project, and enables adding more libp2p implementations to the
versions.ts
file. This also renders the result into a markdown file to give an overview of the test results. Example result here: https://gist.github.com/MarcoPolo/965e1f6fd16df004173131199444c62dRecommendation for reviewer:
ping-with-dsl/testplans.ts
first to get an idea of what the DSL looks like.dsl/
.a. The
version.ts
file defines the capabilities of each version.b. This file is consumed by the composer and loaded into an in-memory sqlite table.
c. The table is queried to produce the various tests we run in the form of the test-plans dsl.
d. The DSL is then transformed to the testground files and then run.
Feedback appreciated!