Some product tests are executed in multiple test suites #15096

nineinchnick · 2022-11-18T08:09:22Z

When working on #14818 we got the idea to verify if there are product tests executed in multiple suites. If these suites would be using similar environments, such tests should run only in one of them. This might reduce the total CI workflow run duration and reduce the CI queue.

To identify such tests, we could use the data scraped from the GitHub API in https://github.com/nineinchnick/trino-cicd and the following query:

-- find same test methods executed in different suites and their duration
select
    run_id
  , regexp_extract(class, 'name="(.*?)"', 1) as class
  , regexp_extract(test_line, 'name="(.*?)" .* duration-ms="(.*?)"', 1) as method
  , count(distinct name) as num_jobs
  , array_agg(distinct name order by name) as jobs
  , array_agg(regexp_extract(test_line, 'name="(.*?)" .* duration-ms="(.*?)"', 2)) as durations
from trinocicd.v2.artifacts
cross join unnest(slice(split(from_utf8(contents), '<class '), 2, 10000)) c(class)
cross join unnest(regexp_extract_all(c.class, 'test-method .* name="(.*?)" .* duration-ms="(.*?)"')) as m(test_line)
where run_id in (3470415629) and name like 'test report pt %'
group by 1, 2, 3
having count(distinct name) > 1
-- some known combinations that duplicate tests
and array_agg(distinct name order by name) not in (
      ARRAY['test report pt (default, suite-1, )', 'test report pt (hdp3, suite-1, )']
    , ARRAY['test report pt (default, suite-2, )', 'test report pt (hdp3, suite-2, )']
    , ARRAY['test report pt (default, suite-delta-lake-databricks104, )', 'test report pt (default, suite-delta-lake-databricks73, )', 'test report pt (default, suite-delta-lake-databricks91, )' ]
    , ARRAY['test report pt (default, suite-delta-lake-databricks104, )', 'test report pt (default, suite-delta-lake-databricks113, )']
    , ARRAY['test report pt (default, suite-delta-lake-databricks104, )', 'test report pt (default, suite-delta-lake-databricks113, )', 'test report pt (default, suite-delta-lake-oss, )' ]
    , ARRAY['test report pt (default, suite-delta-lake-databricks104, )', 'test report pt (default, suite-delta-lake-databricks113, )', 'test report pt (default, suite-delta-lake-databricks91, )' ]
    , ARRAY['test report pt (default, suite-delta-lake-databricks104, )', 'test report pt (default, suite-delta-lake-databricks113, )', 'test report pt (default, suite-delta-lake-databricks91, )', 'test report pt (default, suite-delta-lake-oss, )' ]
    , ARRAY['test report pt (default, suite-delta-lake-databricks104, )', 'test report pt (default, suite-delta-lake-databricks113, )', 'test report pt (default, suite-delta-lake-databricks73, )', 'test report pt (default, suite-delta-lake-databricks91, )' ]
    , ARRAY['test report pt (default, suite-delta-lake-databricks104, )', 'test report pt (default, suite-delta-lake-databricks113, )', 'test report pt (default, suite-delta-lake-databricks73, )', 'test report pt (default, suite-delta-lake-databricks91, )', 'test report pt (default, suite-delta-lake-oss, )' ]
    )
order by 1, 2, 3;

Note: it would be easier to process the XML test reports if Trino had any XPath functions, like the ones suggested in #10057 or #9219

Example result: 20221118_080731_20056_2wrxi.csv

findepi · 2022-11-18T08:41:20Z

Some product tests are executed in multiple test suites

In general, this is very intentional.

nineinchnick · 2022-11-18T08:45:45Z

Do you think it's still worth verifying if there are no accidental cases?

hashhar · 2022-11-18T12:23:59Z

I looked at results and here's my summary.

Seems we don't have true duplicates. suite-1 and suite-3 "appear" to have duplicates but they run against different environments (kerberos, tls, non-secured).

Other duplicate appearing stuff like Iceberg running as part of suite-7-non-generic is also not true duplicate since it's testing using a catalog configured with redirections.

So nothing to be done here other than the learning that numbered suites were probably a bad idea to begin with. Since they make it easy to accidentally have some test run with multiple suites even if not needed.

Thanks for verifying this though @nineinchnick.

nineinchnick · 2022-11-18T12:26:36Z

So nothing to be done here other than the learning that numbered suites were probably a bad idea to begin with. Since they make it easy to accidentally have some test run with multiple suites even if not needed.

If we continue splitting up suites as we did in #14818 it'll be a good opportunity to come up with meaningful names.

nineinchnick self-assigned this Nov 18, 2022

hashhar closed this as completed Nov 18, 2022

This was referenced Jan 10, 2024

Reduce Hive ACID/Transactional tables test repetitions on CI #20320

Merged

Prevent unwanted product test duplicated runs #20321

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some product tests are executed in multiple test suites #15096

Some product tests are executed in multiple test suites #15096

nineinchnick commented Nov 18, 2022 •

edited

Loading

findepi commented Nov 18, 2022

nineinchnick commented Nov 18, 2022

hashhar commented Nov 18, 2022

nineinchnick commented Nov 18, 2022

Some product tests are executed in multiple test suites #15096

Some product tests are executed in multiple test suites #15096

Comments

nineinchnick commented Nov 18, 2022 • edited Loading

findepi commented Nov 18, 2022

nineinchnick commented Nov 18, 2022

hashhar commented Nov 18, 2022

nineinchnick commented Nov 18, 2022

nineinchnick commented Nov 18, 2022 •

edited

Loading