-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored forGivenIngestors to use DoUntilQuorum instead of Do #9984
Refactored forGivenIngestors to use DoUntilQuorum instead of Do #9984
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[docs team] I have questions.
docs/sources/configure/_index.md
Outdated
@@ -3624,7 +3624,19 @@ backoff_config: | |||
# CLI flag: -<prefix>.backoff-retries | |||
[max_retries: <int> | default = 10] | |||
|
|||
# Enable TLS in the GRPC client. This flag needs to be enabled when any other | |||
# Initial stream window size. Values less than the default are not supported and | |||
# are ignored. Setting this to a value other than the default disables the BDP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we define BDP? There are 50 definitions on acronym finder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hiya! I'm really not sure what the changes in the _index.md file mean sorry. I was just failing the CI check-doc
step and the console log told me to run make doc
and commit the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha. Now I have to go try to track down which PR added these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akhilanarayanan did you make any changes to this file? if you just gitc checkout
this file, and also make sure you've merged the latest from main
onto your branch, then the docs step on your PR shouldn't behave any differently than it does on main
, and it's currently passing there.
docs/sources/configure/_index.md
Outdated
# are ignored. Setting this to a value other than the default disables the BDP | ||
# estimator. | ||
# CLI flag: -<prefix>.initial-stream-window-size | ||
[initial_stream_window_size: <int> | default = 63KiB1023B] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this default value correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, apparently it's correct? https://github.com/grafana/loki/pull/9920/files#r1261263633
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good to me, just a few small things.
docs/sources/configure/_index.md
Outdated
@@ -3624,7 +3624,19 @@ backoff_config: | |||
# CLI flag: -<prefix>.backoff-retries | |||
[max_retries: <int> | default = 10] | |||
|
|||
# Enable TLS in the GRPC client. This flag needs to be enabled when any other | |||
# Initial stream window size. Values less than the default are not supported and | |||
# are ignored. Setting this to a value other than the default disables the BDP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akhilanarayanan did you make any changes to this file? if you just gitc checkout
this file, and also make sure you've merged the latest from main
onto your branch, then the docs step on your PR shouldn't behave any differently than it does on main
, and it's currently passing there.
@@ -279,7 +279,8 @@ func (c *indexReaderWriter) chunksToSeries(ctx context.Context, in []logproto.Ch | |||
})) | |||
} | |||
|
|||
results := make([]labels.Labels, 0, len(chunksBySeries)) | |||
//results := make([]labels.Labels, 0, len(chunksBySeries)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this comment anymore
} | ||
return err | ||
resultsChan <- res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unable to recreate this bug locally (despite running with a count to 10k), but I'm curious if we just assign the results using results[idx]
instead of introducing an additional concurrency data structure here, would that also fix the problem? I can see the race condition with reassigning results
, but only assigning to specific indexes shouldn't have the same problem?
99d9419
to
a33f35f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks @akhilanarayanan
Do you think this could relate to #10320 ? |
I think I fixed one of the same bugs at #10310 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has a better fix to the race condition than #10310.
I also think it should go in its own PR, with CHANGELOG entry, because it is fixing a serious bug and not part of the refactor.
} | ||
return err | ||
}, | ||
); err != nil { | ||
return nil, err | ||
} | ||
|
||
sort.Slice(results, func(i, j int) bool { | ||
return labels.Compare(results[i], results[j]) < 0 | ||
flattened := make([]labels.Labels, 0) // Create a new slice to hold the flattened results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this allocated to len(chunksBySeries)
like before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congrats, this looks awesome!
A few suggestions:
- Do you mind adding to the PR description the "why" of the move to "DoUntilQuorum"? Depending on it, this might be applicable in other places of the code base.
- Changing the function used by ingesters is a delicate change, while most of the other code changes are not (ex: changing
newTokens
to be evaluated on a single line, or: to returnresponseFromIngesters
instead ofnil
for when a error is found). For future occasions: If you're interested in getting your delicate changes merged/approved faster, I'd recommend isolating them, and not mixing with other changes. This way reviewers will need less time to feel confident approving/merging your change.
Other than that, gj! 🎉
Backport cf353bb from #10310 --- **What this PR does / why we need it**: ~Lock around data that is modified from multiple goroutines concurrently.~ Replaced with the implementation from #9984 since @akhilanarayanan did it more efficiently. **Which issue(s) this PR fixes**: Relates to #8586 **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - NA Documentation added - NA Tests updated - [x] `CHANGELOG.md` updated - NA If the change is worth mentioning in the release notes, add `add-to-release-notes` label - NA Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - NA For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](d10549e) Co-authored-by: Bryan Boreham <[email protected]>
What this PR does / why we need it:
This changes the forGivenIngestors function in pkg/querier/ingester_querier.go to use
DoUntilQuorum
(from PR #293) instead ofDo
.DoUntilQuorum
is much more efficient thanDo
. The exact differences between them can be seen here.Which issue(s) this PR fixes:
N/A
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guide (required)CHANGELOG.md
updatedadd-to-release-notes
labeldocs/sources/upgrading/_index.md
production/helm/loki/Chart.yaml
and updateproduction/helm/loki/CHANGELOG.md
andproduction/helm/loki/README.md
. Example PR