Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some product tests are executed in multiple test suites #15096

Closed
nineinchnick opened this issue Nov 18, 2022 · 4 comments
Closed

Some product tests are executed in multiple test suites #15096

nineinchnick opened this issue Nov 18, 2022 · 4 comments
Assignees

Comments

@nineinchnick
Copy link
Member

nineinchnick commented Nov 18, 2022

When working on #14818 we got the idea to verify if there are product tests executed in multiple suites. If these suites would be using similar environments, such tests should run only in one of them. This might reduce the total CI workflow run duration and reduce the CI queue.

To identify such tests, we could use the data scraped from the GitHub API in https://github.com/nineinchnick/trino-cicd and the following query:

-- find same test methods executed in different suites and their duration
select
    run_id
  , regexp_extract(class, 'name="(.*?)"', 1) as class
  , regexp_extract(test_line, 'name="(.*?)" .* duration-ms="(.*?)"', 1) as method
  , count(distinct name) as num_jobs
  , array_agg(distinct name order by name) as jobs
  , array_agg(regexp_extract(test_line, 'name="(.*?)" .* duration-ms="(.*?)"', 2)) as durations
from trinocicd.v2.artifacts
cross join unnest(slice(split(from_utf8(contents), '<class '), 2, 10000)) c(class)
cross join unnest(regexp_extract_all(c.class, 'test-method .* name="(.*?)" .* duration-ms="(.*?)"')) as m(test_line)
where run_id in (3470415629) and name like 'test report pt %'
group by 1, 2, 3
having count(distinct name) > 1
-- some known combinations that duplicate tests
and array_agg(distinct name order by name) not in (
      ARRAY['test report pt (default, suite-1, )', 'test report pt (hdp3, suite-1, )']
    , ARRAY['test report pt (default, suite-2, )', 'test report pt (hdp3, suite-2, )']
    , ARRAY['test report pt (default, suite-delta-lake-databricks104, )', 'test report pt (default, suite-delta-lake-databricks73, )', 'test report pt (default, suite-delta-lake-databricks91, )' ]
    , ARRAY['test report pt (default, suite-delta-lake-databricks104, )', 'test report pt (default, suite-delta-lake-databricks113, )']
    , ARRAY['test report pt (default, suite-delta-lake-databricks104, )', 'test report pt (default, suite-delta-lake-databricks113, )', 'test report pt (default, suite-delta-lake-oss, )' ]
    , ARRAY['test report pt (default, suite-delta-lake-databricks104, )', 'test report pt (default, suite-delta-lake-databricks113, )', 'test report pt (default, suite-delta-lake-databricks91, )' ]
    , ARRAY['test report pt (default, suite-delta-lake-databricks104, )', 'test report pt (default, suite-delta-lake-databricks113, )', 'test report pt (default, suite-delta-lake-databricks91, )', 'test report pt (default, suite-delta-lake-oss, )' ]
    , ARRAY['test report pt (default, suite-delta-lake-databricks104, )', 'test report pt (default, suite-delta-lake-databricks113, )', 'test report pt (default, suite-delta-lake-databricks73, )', 'test report pt (default, suite-delta-lake-databricks91, )' ]
    , ARRAY['test report pt (default, suite-delta-lake-databricks104, )', 'test report pt (default, suite-delta-lake-databricks113, )', 'test report pt (default, suite-delta-lake-databricks73, )', 'test report pt (default, suite-delta-lake-databricks91, )', 'test report pt (default, suite-delta-lake-oss, )' ]
    )
order by 1, 2, 3;

Note: it would be easier to process the XML test reports if Trino had any XPath functions, like the ones suggested in #10057 or #9219

Example result: 20221118_080731_20056_2wrxi.csv

@nineinchnick nineinchnick self-assigned this Nov 18, 2022
@findepi
Copy link
Member

findepi commented Nov 18, 2022

Some product tests are executed in multiple test suites

In general, this is very intentional.

@nineinchnick
Copy link
Member Author

Do you think it's still worth verifying if there are no accidental cases?

@hashhar
Copy link
Member

hashhar commented Nov 18, 2022

I looked at results and here's my summary.

Seems we don't have true duplicates. suite-1 and suite-3 "appear" to have duplicates but they run against different environments (kerberos, tls, non-secured).

Other duplicate appearing stuff like Iceberg running as part of suite-7-non-generic is also not true duplicate since it's testing using a catalog configured with redirections.

So nothing to be done here other than the learning that numbered suites were probably a bad idea to begin with. Since they make it easy to accidentally have some test run with multiple suites even if not needed.

Thanks for verifying this though @nineinchnick.

@hashhar hashhar closed this as completed Nov 18, 2022
@nineinchnick
Copy link
Member Author

So nothing to be done here other than the learning that numbered suites were probably a bad idea to begin with. Since they make it easy to accidentally have some test run with multiple suites even if not needed.

If we continue splitting up suites as we did in #14818 it'll be a good opportunity to come up with meaningful names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants