Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored forGivenIngestors to use DoUntilQuorum instead of Do #9984

Merged
merged 12 commits into from
Aug 23, 2023

Conversation

akhilanarayanan
Copy link
Contributor

@akhilanarayanan akhilanarayanan commented Jul 19, 2023

What this PR does / why we need it:
This changes the forGivenIngestors function in pkg/querier/ingester_querier.go to use DoUntilQuorum (from PR #293) instead of Do.

DoUntilQuorum is much more efficient than Do. The exact differences between them can be seen here.

Which issue(s) this PR fixes:
N/A

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR

@CLAassistant
Copy link

CLAassistant commented Jul 19, 2023

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Jul 21, 2023
@akhilanarayanan akhilanarayanan marked this pull request as ready for review July 24, 2023 17:34
@akhilanarayanan akhilanarayanan requested review from JStickler and a team as code owners July 24, 2023 17:34
Copy link
Contributor

@JStickler JStickler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[docs team] I have questions.

@@ -3624,7 +3624,19 @@ backoff_config:
# CLI flag: -<prefix>.backoff-retries
[max_retries: <int> | default = 10]

# Enable TLS in the GRPC client. This flag needs to be enabled when any other
# Initial stream window size. Values less than the default are not supported and
# are ignored. Setting this to a value other than the default disables the BDP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we define BDP? There are 50 definitions on acronym finder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hiya! I'm really not sure what the changes in the _index.md file mean sorry. I was just failing the CI check-doc step and the console log told me to run make doc and commit the changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha. Now I have to go try to track down which PR added these changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akhilanarayanan did you make any changes to this file? if you just gitc checkout this file, and also make sure you've merged the latest from main onto your branch, then the docs step on your PR shouldn't behave any differently than it does on main, and it's currently passing there.

# are ignored. Setting this to a value other than the default disables the BDP
# estimator.
# CLI flag: -<prefix>.initial-stream-window-size
[initial_stream_window_size: <int> | default = 63KiB1023B]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this default value correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@trevorwhitney trevorwhitney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me, just a few small things.

@@ -3624,7 +3624,19 @@ backoff_config:
# CLI flag: -<prefix>.backoff-retries
[max_retries: <int> | default = 10]

# Enable TLS in the GRPC client. This flag needs to be enabled when any other
# Initial stream window size. Values less than the default are not supported and
# are ignored. Setting this to a value other than the default disables the BDP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akhilanarayanan did you make any changes to this file? if you just gitc checkout this file, and also make sure you've merged the latest from main onto your branch, then the docs step on your PR shouldn't behave any differently than it does on main, and it's currently passing there.

@@ -279,7 +279,8 @@ func (c *indexReaderWriter) chunksToSeries(ctx context.Context, in []logproto.Ch
}))
}

results := make([]labels.Labels, 0, len(chunksBySeries))
//results := make([]labels.Labels, 0, len(chunksBySeries))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for this comment anymore

}
return err
resultsChan <- res
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unable to recreate this bug locally (despite running with a count to 10k), but I'm curious if we just assign the results using results[idx] instead of introducing an additional concurrency data structure here, would that also fix the problem? I can see the race condition with reassigning results, but only assigning to specific indexes shouldn't have the same problem?

@github-actions github-actions bot removed the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Jul 31, 2023
@trevorwhitney trevorwhitney force-pushed the akhilanarayanan/replace-do-with-dountilquorum branch from 99d9419 to a33f35f Compare August 21, 2023 18:24
Copy link
Collaborator

@trevorwhitney trevorwhitney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks @akhilanarayanan

@bboreham
Copy link
Contributor

Do you think this could relate to #10320 ?

@bboreham
Copy link
Contributor

I think I fixed one of the same bugs at #10310

Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has a better fix to the race condition than #10310.
I also think it should go in its own PR, with CHANGELOG entry, because it is fixing a serious bug and not part of the refactor.

}
return err
},
); err != nil {
return nil, err
}

sort.Slice(results, func(i, j int) bool {
return labels.Compare(results[i], results[j]) < 0
flattened := make([]labels.Labels, 0) // Create a new slice to hold the flattened results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this allocated to len(chunksBySeries) like before?

Copy link
Contributor

@DylanGuedes DylanGuedes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats, this looks awesome!

A few suggestions:

  • Do you mind adding to the PR description the "why" of the move to "DoUntilQuorum"? Depending on it, this might be applicable in other places of the code base.
  • Changing the function used by ingesters is a delicate change, while most of the other code changes are not (ex: changing newTokens to be evaluated on a single line, or: to return responseFromIngesters instead of nil for when a error is found). For future occasions: If you're interested in getting your delicate changes merged/approved faster, I'd recommend isolating them, and not mixing with other changes. This way reviewers will need less time to feel confident approving/merging your change.
    Other than that, gj! 🎉

@trevorwhitney trevorwhitney merged commit 3303844 into main Aug 23, 2023
@trevorwhitney trevorwhitney deleted the akhilanarayanan/replace-do-with-dountilquorum branch August 23, 2023 21:41
kavirajk pushed a commit that referenced this pull request Aug 25, 2023
Backport cf353bb from #10310

---

**What this PR does / why we need it**:
~Lock around data that is modified from multiple goroutines
concurrently.~
Replaced with the implementation from #9984 since @akhilanarayanan did
it more efficiently.

**Which issue(s) this PR fixes**:
Relates to #8586

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- NA Documentation added
- NA Tests updated
- [x] `CHANGELOG.md` updated
- NA If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- NA Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- NA For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)

Co-authored-by: Bryan Boreham <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants