Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky test: iroh-net discovery::test_dns_pkarr::pkarr_publish_dns_discover #2221

Closed
dignifiedquire opened this issue Apr 22, 2024 · 3 comments · Fixed by #2450
Closed

flaky test: iroh-net discovery::test_dns_pkarr::pkarr_publish_dns_discover #2221

dignifiedquire opened this issue Apr 22, 2024 · 3 comments · Fixed by #2450
Assignees

Comments

@dignifiedquire
Copy link
Contributor

https://github.com/n0-computer/iroh/actions/runs/8783456856/job/24099637605

failures:
    discovery::test_dns_pkarr::pkarr_publish_dns_discover

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 88 filtered out; finished in 2.59s


--- STDERR:              iroh-net discovery::test_dns_pkarr::pkarr_publish_dns_discover ---
Error: timeout

Stack backtrace:
   0: std::backtrace_rs::backtrace::dbghelp::trace
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\..\..\backtrace\src\backtrace\dbghelp.rs:131
   1: std::backtrace_rs::backtrace::trace_unsynchronized
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\..\..\backtrace\src\backtrace\mod.rs:66
   2: std::backtrace::Backtrace::create
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\backtrace.rs:[331](https://github.com/n0-computer/iroh/actions/runs/8783456856/job/24099637605#step:10:332)
   3: std::backtrace::Backtrace::capture
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\backtrace.rs:296
   4: anyhow::Error::msg<ref$<str$> >
             at C:\Users\Administrator\.cargo\registry\src\index.crates.io-6f17d22bba15001f\anyhow-1.0.82\src\error.rs:83
   5: anyhow::__private::format_err
             at C:\Users\Administrator\.cargo\registry\src\index.crates.io-6f17d22bba15001f\anyhow-1.0.82\src\lib.rs:688
   6: iroh_net::discovery::test_dns_pkarr::state::impl$0::on_node::async_fn$0
             at .\src\discovery.rs:745
   7: iroh_net::discovery::test_dns_pkarr::pkarr_publish_dns_discover::async_block$0
             at .\src\discovery.rs:655
   8: core::future::future::impl$1::poll<ref_mut$<dyn$<core::future::future::Future<assoc$<Output,enum2$<core::result::Result<tuple$<>,anyhow::Error> > > > > > >
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04\library\core\src\future\future.rs:124
   9: core::future::future::impl$1::poll<ref_mut$<core::pin::Pin<ref_mut$<dyn$<core::future::future::Future<assoc$<Output,enum2$<core::result::Result<tuple$<>,anyhow::Error> > > > > > > > >
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\core\src\panic\unwind_safe.rs:272
  35: std::panicking::try::do_call
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\panicking.rs:554
  36: std::panicking::try
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\panicking.rs:518
  37: std::panic::catch_unwind
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\panic.rs:142
  38: test::run_test_in_process
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\test\src\lib.rs:644
  39: test::run_test::closure$0
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\test\src\lib.rs:567
  40: test::run_test::closure$1
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\test\src\lib.rs:595
  41: std::sys_common::backtrace::__rust_begin_short_backtrace<test::run_test::closure_env$1,tuple$<> >
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\sys_common\backtrace.rs:155
  42: std::thread::impl$0::spawn_unchecked_::closure$1::closure$0
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\thread\mod.rs:529
  43: core::panic::unwind_safe::impl$23::call_once
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\core\src\panic\unwind_safe.rs:272
  44: std::panicking::try::do_call
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\panicking.rs:554
  45: std::panicking::try
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\panicking.rs:518
  46: std::panic::catch_unwind
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\panic.rs:142
  47: std::thread::impl$0::spawn_unchecked_::closure$1
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\thread\mod.rs:528
  48: core::ops::function::FnOnce::call_once<std::thread::impl$0::spawn_unchecked_::closure_env$1<test::run_test::closure_env$1,tuple$<> >,tuple$<> >
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\core\src\ops\function.rs:250
  49: std::sys::pal::windows::thread::impl$0::new::thread_start
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\sys\pal\windows\thread.rs:58
  50: BaseThreadInitThunk
  51: RtlUserThreadStart
@flub
Copy link
Contributor

flub commented Jul 2, 2024

I believe all the test_dns_pkarr tests are likely flaky:

https://github.com/n0-computer/iroh/actions/runs/9760697872/job/26940130454?pr=2445

Let's use this issue for all of them.

@Frando
Copy link
Member

Frando commented Jul 3, 2024

The tests are fully local and I cannot spot a point where they might be flaky.

Maybe we just have to increase the timeout. If CI gets really slow the 2s might not be enough for the node to be discovered.

@Frando
Copy link
Member

Frando commented Jul 3, 2024

Fix in #2450

github-merge-queue bot pushed a commit that referenced this issue Jul 3, 2024
## Description

The idea of the probe plan is that first the STUN probes happen, if
those don't work we add the other probes.  However when there are
multiple relay servers we accidentally started the subsequent probes
for all but the first relay server too late.

This makes sure to globally record when the last STUN probe was sent
and re-uses this value for all the relay servers which are probed.

## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

Fixes #2444

See #2221 for flaky tests.

## Change checklist

- [x] Self-review.
- ~~[ ] Documentation updates if relevant.~~
- [x] Tests if relevant.
- ~~[ ] All breaking changes documented.~~
github-merge-queue bot pushed a commit that referenced this issue Jul 3, 2024
## Description

We have some tests that use timeouts to not wait infinitely for an event
that might not be coming. These tests are flaky if the timeout is too
low, especially on windows and likely if the machines are overworked.
This PR increases these timeouts:

* Increase timeout of `test_node_add_tagged_blob_event` from 1s to 10s
(Fixes #2331)
* Increase timeouts of the `pkarr_publish_dns_resolve_*` tests from 2s
to 10s (Fixes #2221)

## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

## Change checklist

- [x] Self-review.
@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in iroh Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
3 participants