Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#352: BallistaContext::collect() logging is too noisy #394

Merged
merged 7 commits into from
May 27, 2021

Conversation

jgoday
Copy link
Contributor

@jgoday jgoday commented May 23, 2021

Which issue does this PR close?

Fix #352.

Rationale for this change

See #352 (logging too noisy inside BallistaContext::collect() inner loop).

What changes are included in this PR?

Change info messages for trace inside collect loop.

Are there any user-facing changes?

N/A

…nfo messages for trace inside collect loop

# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# Date:      Sun May 23 12:07:25 2021 +0200
#
# On branch ballista_context_collect_info
# Changes to be committed:
#	modified:   ballista/rust/client/src/context.rs
#
@alamb alamb added the ballista label May 23, 2021
@alamb alamb requested a review from andygrove May 23, 2021 10:17
@codecov-commenter
Copy link

codecov-commenter commented May 23, 2021

Codecov Report

Merging #394 (8dbc2c7) into master (db4f098) will increase coverage by 0.39%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #394      +/-   ##
==========================================
+ Coverage   74.94%   75.33%   +0.39%     
==========================================
  Files         146      147       +1     
  Lines       24314    24788     +474     
==========================================
+ Hits        18221    18673     +452     
- Misses       6093     6115      +22     
Impacted Files Coverage Δ
ballista/rust/client/src/context.rs 0.00% <0.00%> (ø)
datafusion-cli/src/print_format.rs 84.44% <0.00%> (-5.97%) ⬇️
datafusion/src/physical_plan/mod.rs 78.70% <0.00%> (-4.06%) ⬇️
datafusion/src/physical_plan/window_functions.rs 85.71% <0.00%> (-3.01%) ⬇️
...tafusion/src/physical_plan/datetime_expressions.rs 67.29% <0.00%> (-2.52%) ⬇️
datafusion/src/physical_plan/repartition.rs 82.45% <0.00%> (-1.89%) ⬇️
datafusion/src/physical_optimizer/pruning.rs 89.73% <0.00%> (-0.88%) ⬇️
datafusion/src/physical_plan/sort.rs 91.26% <0.00%> (-0.82%) ⬇️
datafusion/src/physical_plan/functions.rs 92.70% <0.00%> (-0.08%) ⬇️
datafusion-cli/src/main.rs 0.00% <0.00%> (ø)
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db4f098...8dbc2c7. Read the comment docs.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jgoday. With this change we will never see output saying that the query is running unless we have trace logging enabled. Would it be possible to show the message with info logging on the first time we see that the job is running?

@jgoday
Copy link
Contributor Author

jgoday commented May 23, 2021

@andygrove Is ok if we use a flag (first time a job_id is logged) and define a macro info_or_trace_if like ?

macro_rules! info_or_trace_if {
    ($first:ident, $($arg:tt)+) => (
        if $first {
            info!($($arg)+);
        } else {
            trace!($($arg)+);
        }
        $first = false;
    )
}

@andygrove
Copy link
Member

Perhaps we should just log the status if the status has changed since the last time round the loop?

@jgoday
Copy link
Contributor Author

jgoday commented May 26, 2021

@andygrove I have changed the condition to call info/trace to check the job's prev status.
Should we trace all loop iterations or just info! when the status has changed (so we don't need the info_or_trace_if macro) ?

@andygrove
Copy link
Member

@andygrove I have changed the condition to call info/trace to check the job's prev status.
Should we trace all loop iterations or just info! when the status has changed (so we don't need the info_or_trace_if macro) ?

Nice. I think we can just log with info! and remove the macro in this case.

@jgoday
Copy link
Contributor Author

jgoday commented May 26, 2021

@andygrove I have changed the condition to call info/trace to check the job's prev status.
Should we trace all loop iterations or just info! when the status has changed (so we don't need the info_or_trace_if macro) ?

Nice. I think we can just log with info! and remove the macro in this case.

Ok, I have changed it that way.

@@ -198,14 +200,21 @@ impl BallistaContext {
DataFusionError::Internal("Received empty status message".to_owned())
})?;
let wait_future = tokio::time::sleep(Duration::from_millis(100));
let has_status_change = prev_status.map(|x| x != status).unwrap_or(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let has_status_change = prev_status.map(|x| x != status).unwrap_or(false);
let has_status_change = prev_status.map(|x| x != status).unwrap_or(true);

I think the default should be true here. If there was no previous status then the status must have changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, sorry for the mistake :(

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I haven't tested this, but plan on doing so tonight. I will merge once I have confirmed that this works as expected.

Thanks @jgoday!

@andygrove
Copy link
Member

[2021-05-27T00:55:04Z INFO  ballista::context] Connecting to Ballista scheduler at http://ballista-scheduler:50050
[2021-05-27T00:55:04Z INFO  ballista::context] Job kO3v67p still queued...
[2021-05-27T00:55:04Z INFO  ballista::context] Job kO3v67p is running...
Query 12 iteration 0 took 5751.5 ms
+------------+-----------------+----------------+
| l_shipmode | high_line_count | low_line_count |
+------------+-----------------+----------------+
| MAIL       | 6202            | 9324           |
| SHIP       | 6200            | 9262           |
+------------+-----------------+----------------+
Query 12 avg time: 5751.51 ms

❤️

@andygrove andygrove merged commit 3f7736c into apache:master May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BallistaContext::collect() logging is too noisy
4 participants