fix(manager): update expected error message for enospc restore test #9590

mikliapko · 2024-12-19T11:22:53Z

Error message that Manager returns for enospc scenario has been changed to more generic one. So, this change updates expected error message and removes skip per issue since it was closed without resolution.

refs:
scylladb/scylla-manager#4087

Testing

https://jenkins.scylladb.com/view/scylla-manager/job/manager-3.4/job/sct-feature-test-backup/6/

PR pre-checks (self review)

I added the relevant backport labels
I didn't leave commented-out/debugging code

soyacz · 2024-12-19T16:31:24Z

@fruch same here PIE794 issue

soyacz

LGTM

Michal-Leszczynski · 2024-12-20T09:08:51Z

mgmt_cli_test.py

-                        f"but with an ill fitting error message: {full_progress_string}"
+                full_progress_string = restore_task.progress_string(parse_table_res=False,
+                                                                    is_verify_errorless_result=True).stdout
+                assert "failed to restore sstables from location" in full_progress_string.lower(), \


I'm not sure what's the point of asserting the error message at all.
Right now it is really vague and the real problem can be identified by looking at the Scylla logs or metrics.
Also, if we change the error message slightly, it will keep on causing pain in the future test runs.

I'm just pondering on the nature of this test and SM implementation.
What SM is doing is that before downloading each batch it checks if the node that is downloading the batch has at lest 10% free disk space - but it does not check it for the other nodes, because we don't know where the data will end up. That's the reason why we don't have the "not enough disk space" error anymore - the problem is that some node with enough disk space downloaded the batch, but it was unable to stream it to the node without the disk space.

Perhaps that's the real issue with SM and we should not only validate that the node which does the download has enough disk space, but that's it also the case for all other nodes in the cluster. Maybe the 10% is a little bit too strict for such check (with the current effort of reaching 90% disk utilization), but something like 5% might be just fine.

So for now I would either drop the error message assertion or the whole test in general.
I created an issue to evaluate SM behavior in such cases.

For now, I'd probably drop this assertion.
In case of current, very common error message, it actually doesn't make much sense.
In future when we implement this, let's say, "smart" free space checker you mentioned, we can think about this test rework.

Error message Manager returns for enospc scenario has been changed to more generic one (#1). So, it doesn't make much sense to verify it. Moreover, there is a plan to fix check free disk space behaviour and the whole test will probably require rework to be done (#2). refs: #1 - scylladb/scylla-manager#4087 #2 - scylladb/scylla-manager#4184

soyacz

Besides one comment,
LGTM

soyacz · 2024-12-31T12:42:34Z

mgmt_cli_test.py

@@ -830,13 +829,6 @@ def test_enospc_before_restore(self):
                assert final_status == TaskStatus.ERROR, \
                    f"The restore task is supposed to fail, since node {target_node} lacks the disk space to download" \
                    f"the snapshot files"
-


could be useful to at least print error mesage with log.info level for clarity in case someone once needed to verify the flow.

What kind of message you'd like to see in log.info for such case?
I supposed the assertion message is quite enough to understand the testing scenario if someone wants to verify it.

I meant to log full_progress_string = restore_task.progress_string(parse_table_res=False, is_verify_errorless_result=True).stdout instead of asserting it (as its prone to change)

I got your point, but it seems to me it makes little sense.. When we analyze Manager failures, we usually look into sct python log. It contains all sctool outputs logged. full_progress_string can be easily found there.

ok, looks it's good to go then

mikliapko · 2025-01-03T15:56:00Z

It was mistakenly closed without merging, reopening it

mikliapko · 2025-01-07T09:42:51Z

@fruch Could you please merge it?

vponomaryov

LGTM

mikliapko self-assigned this Dec 19, 2024

mikliapko added the backport/none Backport is not required label Dec 19, 2024

mikliapko marked this pull request as ready for review December 19, 2024 13:43

mikliapko requested review from karol-kokoszka, Michal-Leszczynski and a team December 19, 2024 13:43

soyacz previously approved these changes Dec 19, 2024

View reviewed changes

Michal-Leszczynski reviewed Dec 20, 2024

View reviewed changes

Michal-Leszczynski mentioned this pull request Dec 20, 2024

Check free disk space on all nodes instead of just the downloading one scylladb/scylla-manager#4184

Open

mikliapko dismissed soyacz’s stale review via 11a4ab4 December 20, 2024 14:42

mikliapko force-pushed the fix-backup-enospc-error-msg branch from 0d3c8f1 to 11a4ab4 Compare December 20, 2024 14:42

mikliapko requested review from Michal-Leszczynski and soyacz December 30, 2024 16:33

soyacz approved these changes Dec 31, 2024

View reviewed changes

mikliapko mentioned this pull request Dec 31, 2024

Replace out-of-support CentOS distro with RHEL9 for Manager tests on Azure #9631

Merged

4 tasks

fruch closed this in #9631 Jan 2, 2025

mikliapko reopened this Jan 3, 2025

mikliapko mentioned this pull request Jan 3, 2025

fix(manager): update jenkinsfile for Manager Azure tests #9649

Merged

3 tasks

Michal-Leszczynski approved these changes Jan 7, 2025

View reviewed changes

vponomaryov approved these changes Jan 7, 2025

View reviewed changes

vponomaryov merged commit 0767db3 into scylladb:master Jan 7, 2025
7 checks passed

scylladbbot added the promoted-to-master label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(manager): update expected error message for enospc restore test #9590

fix(manager): update expected error message for enospc restore test #9590

mikliapko commented Dec 19, 2024 •

edited

Loading

soyacz commented Dec 19, 2024

soyacz left a comment

Michal-Leszczynski Dec 20, 2024

Michal-Leszczynski Dec 20, 2024

mikliapko Dec 20, 2024

mikliapko Dec 20, 2024

soyacz left a comment

soyacz Dec 31, 2024

mikliapko Dec 31, 2024

soyacz Dec 31, 2024

mikliapko Dec 31, 2024

soyacz Dec 31, 2024

mikliapko commented Jan 3, 2025

mikliapko commented Jan 7, 2025

vponomaryov left a comment

fix(manager): update expected error message for enospc restore test #9590

fix(manager): update expected error message for enospc restore test #9590

Conversation

mikliapko commented Dec 19, 2024 • edited Loading

Testing

PR pre-checks (self review)

soyacz commented Dec 19, 2024

soyacz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

soyacz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikliapko commented Jan 3, 2025

mikliapko commented Jan 7, 2025

vponomaryov left a comment

Choose a reason for hiding this comment

mikliapko commented Dec 19, 2024 •

edited

Loading