Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: retain history feature #24344

Conversation

@nrainer-materialize nrainer-materialize added the T-testing Theme: tests or test infrastructure label Jan 10, 2024
@nrainer-materialize nrainer-materialize self-assigned this Jan 10, 2024
@nrainer-materialize nrainer-materialize requested a review from a team as a code owner January 10, 2024 18:50
test/retain-history/mzcompose.py Outdated Show resolved Hide resolved
@def-
Copy link
Contributor

def- commented Jan 11, 2024

Are you still planning the one-off test with RETAIN HISTORY added to all MVs in testdrive/python? Otherwise I can also run it.

@nrainer-materialize
Copy link
Contributor Author

nrainer-materialize commented Jan 11, 2024

Are you still planning the one-off test with RETAIN HISTORY added to all MVs in testdrive/python? Otherwise I can also run it.

Still planned. Done outside of this PR with #24368. I might have found a problem, which will be further investigated with #24459.

@nrainer-materialize nrainer-materialize force-pushed the tests/retain-history branch 2 times, most recently from 0cb4bc0 to 30e6a86 Compare January 15, 2024 11:52
@nrainer-materialize
Copy link
Contributor Author

I significantly reworked this PR, please re-review it, thanks!

@nrainer-materialize
Copy link
Contributor Author

nrainer-materialize commented Jan 16, 2024

@mjibson: I have unfortunately some trouble getting this test stable and valid. Would you be able to help me on this matter?

Can you tell me what the expected behavior is when I query

  1. a materialized view on a table with retain history with a timestamp
    a) before the table (and MV) existed
    b) at which the table existed without rows but the MV did not
    c) at which the table existed with rows but the MV did not
  2. a source with retain history with a timestamp
    a) before the source (and MV) existed
    b) at which the source existed but the MV did not

Furthermore, are there any tick intervals that I can influence (to avoid sleep statements)?

Furthermore, I am surprised by the influence of timing and sleep statements. In particular, I am confused by the behavior of the following code:

                > CREATE TABLE time (time_index INT, t TIMESTAMP);

                > CREATE TABLE retain_history_table (key INT, value INT);
                > INSERT INTO time VALUES (1, now());

                # $ sleep-is-probably-flaky-i-have-justified-my-need-with-a-comment duration="2s"

                > CREATE MATERIALIZED VIEW retain_history_mv2 WITH (RETAIN HISTORY FOR '30s') AS
                    SELECT * FROM retain_history_table;

                $ set-from-sql var=time1
                SELECT t::STRING FROM time WHERE time_index = 1

                > SELECT count(*) FROM retain_history_mv2 AS OF '${time1}'::TIMESTAMP; -- time1
                0

It will succeed like this. However, it will fail with

db error: ERROR: Timestamp (1705400225748) is not valid for all inputs: [Antichain { elements: [1705400226000] }]: ERROR: Timestamp (1705400225748) is not valid for all inputs: [Antichain { elements: [1705400226000] }]

if I uncomment the sleep-is-probably-flaky-.... I could somewhat understand if it was other way round but not like it is here... Do you have an explanation for this?

I pushed this code as platform check with name RetainHistoryOnMvSurprise to tests/temp/retain-history-with-sleep.
It can be run with

bin/mzcompose --find platform-checks down -v && bin/mzcompose --find platform-checks run default --scenario=NoRestartNoUpgrade --check=RetainHistoryOnMvSurprise

@maddyblue
Copy link
Contributor

I would expect most queries with an AS OF that is before the MV existed would fail. There may be some race condition where the underlying table or source has a since less than the timestamp when the MV was created, and so the MV gets some data before it was made. I don't know offhand. I suspect we should solidify this and choose exactly one behavior like: data is never available before the MV was created, even if the since would have allowed that to be the case? I'm really not sure. Other people have thought about this a lot. I think this means that we should make the results for all 5 of your questions "timestamp not available", but the code certainly doesn't care about this right now.

There's no tick you can watch for this that I'm aware of. We call tokio::time::sleep in the rust test for this.

That test snippet does seem to indicate some bug! No idea why it's doing that besides it's just wrong and wasn't tested very well. I'll have to look into it at some point.

@nrainer-materialize
Copy link
Contributor Author

I would expect most queries with an AS OF that is before the MV existed would fail. There may be some race condition where the underlying table or source has a since less than the timestamp when the MV was created, and so the MV gets some data before it was made. I don't know offhand. I suspect we should solidify this and choose exactly one behavior like: data is never available before the MV was created, even if the since would have allowed that to be the case? I'm really not sure. Other people have thought about this a lot. I think this means that we should make the results for all 5 of your questions "timestamp not available", but the code certainly doesn't care about this right now.

Thanks for your insights!

That test snippet does seem to indicate some bug! No idea why it's doing that besides it's just wrong and wasn't tested very well. I'll have to look into it at some point.

Ok, I will file a bug.

Another question: Is using now() appropriate or would I need to operate with mz_now()?

@nrainer-materialize
Copy link
Contributor Author

I had to disable the platform checks for now but can keep the mzcompose test active.

@nrainer-materialize nrainer-materialize merged commit 22a04c3 into MaterializeInc:main Jan 17, 2024
72 checks passed
@nrainer-materialize nrainer-materialize deleted the tests/retain-history branch January 17, 2024 10:12
@nrainer-materialize
Copy link
Contributor Author

Another question: Is using now() appropriate or would I need to operate with mz_now()?

=> https://github.com/MaterializeInc/database-issues/issues/7310

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-testing Theme: tests or test infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants