Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(networking): Rework internal auto concurrency tests #3802

Merged
merged 10 commits into from
Sep 11, 2020

Conversation

bruceg
Copy link
Member

@bruceg bruceg commented Sep 10, 2020

This rebuilds the internal tests to run all at once, and to use external data files for the parameters to avoid needing a full recompile to tweak the settings.

@bruceg bruceg added domain: tests Anything related to Vector's internal tests type: tech debt A code change that does not add user value. domain: networking Anything related to Vector's networking domain: performance Anything related to Vector's performance labels Sep 10, 2020
@bruceg bruceg requested a review from jszwedko September 10, 2020 14:18
@bruceg bruceg self-assigned this Sep 10, 2020
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I like the test configuration being separate though it is a bit unfortunate that compilation time is pushing us into that pattern.

I did get this test failure locally though:

thread 'sinks::util::auto_concurrency::tests::all_tests' panicked at 'assertion failed: `(expr > high)`
expr: value = `0.126087944`,
high: `0.125`
File: "tests/data/auto-concurrency/medium-send.toml" Value: controller observed_rtt max
TestResults {
    file_path: "tests/data/auto-concurrency/medium-send.toml",
    stats: Statistics {
        in_flight: LevelTimeHistogram {
            level: 0,
            histogram: TimeHistogram {
                histogram: Histogram {
                    totals: [
                        0.002160466,
                        0.33612984500000004,
                        0.640026141,
                        0.16129062699999996,
                        9.782315846000012,
                        0.9297670430000003,
                        0.22614229900000005,
                        0.13017132,
                        0.324182766,
                        0.018357403,
                    ],
                },
                last_time: Some(
                    Instant {
                        tv_sec: 314176,
                        tv_nsec: 503624389,
                    },
                ),
            },
        },
    },
    cstats: ControllerStatistics {
        in_flight: TimeHistogram {
            histogram: Histogram {
                totals: [
                    0.002038978,
                    0.33604503299999994,
                    0.640103479,
                    0.15982594800000002,
                    9.748312863999983,
                    0.9651788440000001,
                    0.22615031700000002,
                    0.130118067,
                    0.324396845,
                    0.018414793,
                ],
            },
            last_time: Some(
                Instant {
                    tv_sec: 314176,
                    tv_nsec: 503641418,
                },
            ),
        },
        concurrency_limit: TimeHistogram {
            histogram: Histogram {
                totals: [
                    0.0,
                    0.0,
                    0.620126786,
                    0.102965506,
                    0.101770415,
                    0.102244049,
                    0.199927082,
                    0.101782596,
                    0.304869607,
                    4.321242149000001,
                    6.2903737930000005,
                ],
            },
            last_time: Some(
                Instant {
                    tv_sec: 314176,
                    tv_nsec: 405079296,
                },
            ),
        },
        observed_rtt: TimeWeightedSum {
            sum: WeightedSum {
                total: 1.2682793218495096,
                weights: 12.447222536000009,
                min: Some(
                    0.097010317,
                ),
                max: Some(
                    0.126087944,
                ),
            },
            last_observation: Some(
                Instant {
                    tv_sec: 314176,
                    tv_nsec: 503641418,
                },
            ),
        },
        averaged_rtt: TimeWeightedSum {
            sum: WeightedSum {
                total: 1.2366884388567434,
                weights: 12.145301983000003,
                min: Some(
                    0.10048516739999999,
                ),
                max: Some(
                    0.11242788340000001,
                ),
            },
            last_observation: Some(
                Instant {
                    tv_sec: 314176,
                    tv_nsec: 405079296,
                },
            ),
        },
    },
}', src/sinks/util/auto_concurrency/tests.rs:344:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    sinks::util::auto_concurrency::tests::all_tests

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 747 filtered out

error: test failed, to rerun pass '--lib'

Maybe flakey?

tests/data/auto-concurrency-template.toml Show resolved Hide resolved
@bruceg
Copy link
Member Author

bruceg commented Sep 10, 2020

Apparently your system can have worse than 25ms of extra delay on timings. I need to investigate using a faked clock to cause exact timing and near zero actual run time.

@jszwedko
Copy link
Member

Apparently your system can have worse than 25ms of extra delay on timings. I need to investigate using a faked clock to cause exact timing and near zero actual run time.

I'm running them in an AWS VM so I guess I'm not terribly surprised 😄 A fake clock seems like it makes sense.

By pausing and then manually advancing the clock, this provides for both
very fast running tests and nearly perfect time measurements for highly
repeatable test results.

Signed-off-by: Bruce Guenter <[email protected]>
@bruceg bruceg linked an issue Sep 11, 2020 that may be closed by this pull request
@bruceg
Copy link
Member Author

bruceg commented Sep 11, 2020

I had a light bulb moment, and this will now close #3823.

Bruce Guenter added 2 commits September 11, 2020 11:19
futures::future::join_all(tests).await;
entries.sort_unstable_by_key(|entry| entry.0.to_string_lossy().to_string());

time::pause();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see time::resume anywhere, advance work in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when the tokio timer is paused, the reference time stops. When advance is called, it steps forward by the duration specified and runs any timed events that were queued. resume ties that reference clock back to the system clock, resuming normal time flow. In this case, I want the timer to stay paused for the stability of the results, so there is no need to resume the clock. I don't think resume would be needed for most tests, but it would depend on exactly what is happening there.

This has the collateral benefit of being able to drop the large
`assert_within` macro that was only used here.

Signed-off-by: Bruce Guenter <[email protected]>
@bruceg
Copy link
Member Author

bruceg commented Sep 11, 2020

I think I've polished this doodoo far beyond what I should have, though it probably makes it a little better foundation for #2531.

@bruceg bruceg requested a review from jszwedko September 11, 2020 19:58
Bruce Guenter added 2 commits September 11, 2020 13:59
Signed-off-by: Bruce Guenter <[email protected]>
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I reran the tests a couple times.

@bruceg
Copy link
Member Author

bruceg commented Sep 11, 2020

Yeah, with the tests being able to complete so fast, I ran them about 100 times to make sure the tighter parameters were stable.

@bruceg bruceg merged commit b28eb18 into master Sep 11, 2020
@bruceg bruceg deleted the auto-concurrency-tests2 branch September 11, 2020 22:23
JeanMertz added a commit that referenced this pull request Sep 14, 2020
commit 2b08b6d
Author: Nell Shamrell-Harrington <[email protected]>
Date:   Sun Sep 13 23:10:18 2020 -0700

    docs: Updates reference to gitter to point to the Discord server (#3858)

    Signed-off-by: Nell Shamrell <[email protected]>

commit 27ba8b9
Author: James Turnbull <[email protected]>
Date:   Sat Sep 12 17:26:13 2020 -0400

    chore: Removing CI references to Nix (#3844)

    Signed-off-by: James Turnbull <[email protected]>

commit 7c95aa9
Author: James Turnbull <[email protected]>
Date:   Sat Sep 12 17:25:54 2020 -0400

    chore: Fix make prepare-environment (#3841)

commit cc00cf0
Author: James Turnbull <[email protected]>
Date:   Sat Sep 12 13:38:39 2020 -0400

    chore: Added workflow_dispatch for the env workflow (#3842)

commit b7f656e
Author: James Turnbull <[email protected]>
Date:   Sat Sep 12 12:14:21 2020 -0400

    chore: Fix Debian post-install process to reload systemd daemons (#3832)

commit c2078b7
Author: binarylogic <[email protected]>
Date:   Sat Sep 12 10:28:41 2020 -0400

    chore: Remove check-code.sh since it is no longer used

    Signed-off-by: binarylogic <[email protected]>

commit 7887d4e
Author: James Turnbull <[email protected]>
Date:   Fri Sep 11 23:50:56 2020 -0400

    chore: Added cross installation to fix nightly builds (#3831)

commit 648e82c
Author: Ana Hobden <[email protected]>
Date:   Fri Sep 11 17:33:29 2020 -0700

    chore(platforms): Migrate  x86 gnu target to cross, add new aarch64 gnu target (no sasl) (#3657)

    * Add initial cross adoption

    Signed-off-by: Ana Hobden <[email protected]>

    * Add archiving

    Signed-off-by: Ana Hobden <[email protected]>

    * Clean up the makefile a bit and get tarballs working

    Signed-off-by: Ana Hobden <[email protected]>

    * checkpoint

    Signed-off-by: Ana Hobden <[email protected]>

    * wip

    Signed-off-by: Ana Hobden <[email protected]>

    * Add to CI

    Signed-off-by: Ana Hobden <[email protected]>

    * Add autoinstall to CI

    Signed-off-by: Ana Hobden <[email protected]>

    * Remove cruft

    Signed-off-by: Ana Hobden <[email protected]>

    * Can't test yet

    Signed-off-by: Ana Hobden <[email protected]>

    * Autoinstall in e2e

    Signed-off-by: Ana Hobden <[email protected]>

    * Readd docker image

    Signed-off-by: Ana Hobden <[email protected]>

    * More comments

    Signed-off-by: Ana Hobden <[email protected]>

    * Set right CI task

    Signed-off-by: Ana Hobden <[email protected]>

    * Add note about TZ issue

    Signed-off-by: Ana Hobden <[email protected]>

commit b28eb18
Author: Bruce Guenter <[email protected]>
Date:   Fri Sep 11 16:22:58 2020 -0600

    chore(networking): Rework internal auto concurrency tests (#3802)

    * Move auto concurrency test case data into TOML files

    * Use tokio manual time tracking to speed up tests

    By pausing and then manually advancing the clock, this provides for both
    very fast running tests and nearly perfect time measurements for highly
    repeatable test results.

    * Collect all the failures for each test

    This has the collateral benefit of being able to drop the large
    `assert_within` macro that was only used here.

    * The tests should now work predictably on MacOS

    Signed-off-by: Bruce Guenter <[email protected]>

commit b37dea7
Author: Ana Hobden <[email protected]>
Date:   Fri Sep 11 14:53:14 2020 -0700

    chore: Add some additional Event impls (#3621)

    * Move log_event to new file

    Signed-off-by: Ana Hobden <[email protected]>

    * Move value to new file

    Signed-off-by: Ana Hobden <[email protected]>

    * Move log_schema to config

    Signed-off-by: Ana Hobden <[email protected]>

    * chore: Add some event impls

    Signed-off-by: Ana Hobden <[email protected]>

    * Do some namespace squashing

    Signed-off-by: Ana Hobden <[email protected]>

    * Fixup lints

    Signed-off-by: Ana Hobden <[email protected]>

    * Remove From<Event> for Vec<u8>

    Signed-off-by: Ana Hobden <[email protected]>

    * Fixup leveldb

    Signed-off-by: Ana Hobden <[email protected]>

    * fmt

    Signed-off-by: Ana Hobden <[email protected]>

    * Jean's feedback

    Signed-off-by: Ana Hobden <[email protected]>

commit f03898e
Author: Luke Steensen <[email protected]>
Date:   Fri Sep 11 15:53:51 2020 -0500

    enhancement(tokenizer transform): add internal events (#3807)

    Signed-off-by: Luke Steensen <[email protected]>

commit ff555aa
Author: Luke Steensen <[email protected]>
Date:   Fri Sep 11 15:53:27 2020 -0500

    enhancement(dedupe transform): add internal events (#3809)

    Closes #3393

    Signed-off-by: Luke Steensen <[email protected]>
    Co-authored-by: Jean Mertz <[email protected]>

commit 54a4d8e
Author: MOZGIII <[email protected]>
Date:   Fri Sep 11 17:09:57 2020 +0300

    chore: Generate Kubernetes distribution YAMLs from Helm Chart (#3614)

    * Add a script to manage the Kubernetes YAML configs

    Signed-off-by: MOZGIII <[email protected]>

    * Update Helm Chart to generate config more like YAML files

    Signed-off-by: MOZGIII <[email protected]>

    * Switch to generated YAML

    Signed-off-by: MOZGIII <[email protected]>

    * Define check-all target in multiple lines

    Signed-off-by: MOZGIII <[email protected]>

    * Add Kubernetes YAML tasks to Makefile

    Signed-off-by: MOZGIII <[email protected]>

    * Add check-kubernetes-yaml to CI

    Signed-off-by: MOZGIII <[email protected]>

    * Add kubernetes-yaml.sh to .github/CODEOWNERS

    Signed-off-by: MOZGIII <[email protected]>

    * Add /scripts/kubernetes-yaml/ to .github/CODEOWNERS

    Signed-off-by: MOZGIII <[email protected]>

    * Add a TODO for passing the app-version

    Signed-off-by: MOZGIII <[email protected]>

commit 68a5c43
Author: Do Duy <[email protected]>
Date:   Fri Sep 11 10:23:03 2020 +0700

    docs: Add docs for retry_backoff_secs option (#3819)

    Signed-off-by: Duy Do <[email protected]>

commit 1a1352b
Author: James Turnbull <[email protected]>
Date:   Thu Sep 10 17:34:16 2020 -0400

    chore: Remove docker-compose from integration tests (#3622)

commit 94def0f
Author: Luke Steensen <[email protected]>
Date:   Thu Sep 10 16:17:03 2020 -0500

    enhancement(reduce transform): add internal events (#3812)

    Closes #3401

    Signed-off-by: Luke Steensen <[email protected]>

commit 74b43b7
Author: Bruce Guenter <[email protected]>
Date:   Thu Sep 10 14:05:52 2020 -0600

    fix(networking): Rework auto concurrency backpressure logic (#3804)

    Previously, all responses were marked as not being backpressure. This
    changes that logic to:

    1. Treat RetryAction::Retry responses as backpressure

    2. Only use the RTT values from RetryAction::Successful responses

    Signed-off-by: Bruce Guenter <[email protected]>

commit eb09b92
Author: Ana Hobden <[email protected]>
Date:   Thu Sep 10 13:01:23 2020 -0700

    chore: Split up event (#3619)

    * Move log_event to new file

    Signed-off-by: Ana Hobden <[email protected]>

    * Move value to new file

    Signed-off-by: Ana Hobden <[email protected]>

    * Move log_schema to config

    Signed-off-by: Ana Hobden <[email protected]>

    * Do some namespace squashing

    Signed-off-by: Ana Hobden <[email protected]>

    * Fixup lints

    Signed-off-by: Ana Hobden <[email protected]>

    * Fixup leveldb

    Signed-off-by: Ana Hobden <[email protected]>

    * fmt

    Signed-off-by: Ana Hobden <[email protected]>

commit 2fd9ccf
Author: Jesse Szwedko <[email protected]>
Date:   Thu Sep 10 15:34:52 2020 -0400

    chore: Add expected review counts to CONTRIBUTING.md (#3666)

    * chore: Add expected review counts to CONTRIBUTING.md

    Signed-off-by: Jesse Szwedko <[email protected]>
mengesb pushed a commit to jacobbraaten/vector that referenced this pull request Dec 9, 2020
…ev#3802)

* Move auto concurrency test case data into TOML files

* Use tokio manual time tracking to speed up tests

By pausing and then manually advancing the clock, this provides for both
very fast running tests and nearly perfect time measurements for highly
repeatable test results.

* Collect all the failures for each test

This has the collateral benefit of being able to drop the large
`assert_within` macro that was only used here.

* The tests should now work predictably on MacOS

Signed-off-by: Bruce Guenter <[email protected]>
Signed-off-by: Brian Menges <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: networking Anything related to Vector's networking domain: performance Anything related to Vector's performance domain: tests Anything related to Vector's internal tests type: tech debt A code change that does not add user value.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make the auto concurrency tests more consistent
3 participants