Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new interop tests for black-box json log schema testing #4327

Merged
merged 43 commits into from
Dec 3, 2021

Conversation

nathaniel-may
Copy link
Contributor

@nathaniel-may nathaniel-may commented Nov 23, 2021

Description

This PR is a proposal for a new type of test: interop tests. These are different than integration and unit tests because we have to write new CI to run them. Namely, for this first one, to compile Rust as a down stream consumer of our log files.

I chose to create the black-box logging test in Rust because Rust has support for statically derived json parsers. Although these derived parsers can parse all valid json values, some schemas are just ugly and I want our initial proposed design to fit into a clean, well-designed schema to make it as easy as possible for all downstream consumers to read our logs regardless of their language of choice.

ugly example:

{ "a": 1 }
{ "b": 1 }
{ "c": "string" }

Rust type modeling:

struct LogLine {
    a: Option<f64>
    b: Option<f64>
    c: Option<String>
}

This type is particularly ugly because it allows for completely empty log lines which we will never actually output. There's nothing technically wrong with this, but it doesn't exactly fill me with confidence either. This test forces us to confront our schema in a centralized way outside of the Python implementation the way many of our consumers will.

Todo

  • Use serde_json to check serialization loop
  • Add new Action which makes a dbt run that outputs json files and passes the location to the rust tests
  • use integration tests to get a better sample of logs
  • model all possible sub fields as structs
  • add tests for as many expected values as possible

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change

Copy link
Member

@emmyoop emmyoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this idea.

@nathaniel-may
Copy link
Contributor Author

Just leaving a note for myself that the latest commits require this to be rebased onto #4381 and #4384 once they get merged into main.

@nathaniel-may nathaniel-may force-pushed the nate/interop-tests branch 4 times, most recently from 2cf165e to 74608ad Compare December 2, 2021 23:54
@nathaniel-may nathaniel-may marked this pull request as ready for review December 3, 2021 01:46
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think highest priority is fixing the issues uncovered by this testing before the final release. Then, I'd love to get this merged in

@@ -34,15 +34,3 @@ def test_postgres_statements(self):
self.assertEqual(len(results), 1)

self.assertTablesEqual("statement_actual", "statement_expected")

@use_profile("presto")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

- name: Run integration tests
# run: tox -e py38-postgres -- -nauto <- TODO we can use this once json formatted integration tests no longer hang
# small subset of integration tests that don't hang:
run: python3 -m pytest test/integration/001_simple_copy_test test/integration/002_varchar_widening_test test/integration/003_simple_reference_test test/integration/006_simple_dependency_test test/integration/007_graph_selection_tests test/integration/008_schema_tests_test test/integration/009_data_tests_test test/integration/010_permission_tests test/integration/011_invalid_model_tests test/integration/012_deprecation_tests test/integration/013_context_var_tests test/integration/014_hook_tests test/integration/015_cli_invocation_tests test/integration/016_macro_tests test/integration/017_runtime_materialization_tests test/integration/018_adapter_ddl_tests test/integration/019_analysis_tests test/integration/020_ephemeral_test test/integration/021_concurrency_test test/integration/023_exit_codes_test test/integration/024_custom_schema_test test/integration/025_duplicate_model_test test/integration/025_timezones_test test/integration/026_aliases_test test/integration/027_cycle_test test/integration/028_cli_vars test/integration/029_docs_generate_tests test/integration/030_statement_test test/integration/031_thread_count_test test/integration/033_event_tracking_test test/integration/035_changing_relation_type_test test/integration/035_docs_blocks test/integration/037_external_reference_test test/integration/038_caching_test test/integration/039_config_test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I managed to successfully run most of the excluded tests locally:

  • I ran into hanging in 042_sources_test, which I fixed with the change in colorized dbt output #441
  • test/integration/040_init_test/test_init.py fails - I think it's fair to say init will work weirdly with JSON-formatted logs

I'd be interested in attempting to rerun with that fix (and excluding 040_init_test)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one more hanging test, test/integration/062_defer_state_test/test_run_results_state.py, that I'm looking into right now

use std::path::Path;
use walkdir::WalkDir;

//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: assuming there is supposed to be a comment or something here

- "releases/*"
pull_request:
workflow_dispatch:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add permissions: read-all like here. This scope the GitHub token down so you can't write back to the repo

Copy link
Contributor

@leahwicz leahwicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I would be interested in walking through the Rust with you as a learning exercise since not all of it makes sense to me

@nathaniel-may
Copy link
Contributor Author

rebased on the main, running all integration tests with json log format now.

@nathaniel-may nathaniel-may merged commit 5310498 into main Dec 3, 2021
@nathaniel-may nathaniel-may deleted the nate/interop-tests branch December 3, 2021 17:51
jtcohen6 added a commit that referenced this pull request Dec 3, 2021
iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
automatic commit by git-black, original commits:
  5310498
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants