Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numeric overflow in InvalidatingNodeVisitor.DeletingNodeVisitor #19445

Closed
neumannt opened this issue Sep 7, 2023 · 2 comments
Closed

numeric overflow in InvalidatingNodeVisitor.DeletingNodeVisitor #19445

neumannt opened this issue Sep 7, 2023 · 2 comments
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: bug

Comments

@neumannt
Copy link

neumannt commented Sep 7, 2023

Description of the bug:

The function runInternal in InvalidatingNodeVisitor.DeletingNodeVisitor contains the following fragment:

        int numThreads = min(DEFAULT_THREAD_COUNT, listSize);
        for (int i = 0; i < numThreads; i++) {
          int index = i;
          executor.execute(
              () ->
                  visit(
                      Collections2.transform(
                          pendingList.subList(
                              (index * listSize) / numThreads,
                              ((index + 1) * listSize) / numThreads),
                          Pair::getFirst),
                      InvalidationType.DELETED));
        }

In a very large project that is build on a machine with many cores the expression ((index + 1) * listSize) / numThreads) sometimes becomes negative, presumably due to a numeric overflow. Which then crashes in the call to subList.

Which category does this issue belong to?

Core

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Unfortunately I do not have a reproducer. This happens non-deterministically, and only in builds that have already ran for 1 1/2 hours. But I have included a stack trace below, and it shows that the second argument to subList becomes negative.

Which operating system are you running Bazel on?

Linux, version 3.10.0-1160.95.1.el7.x86_64

What is the output of bazel info release?

release 6.3.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

The stack trace in the moment of the crash:

FATAL: bazel crashed due to an internal error. Printing stack trace:
    java.lang.IndexOutOfBoundsException: end index (-29676970) must not be negative
    at com.google.common.base.Preconditions.checkPositionIndexes(Preconditions.java:1430)
    at com.google.common.collect.ImmutableList.subList(ImmutableList.java:450)
    at com.google.devtools.build.skyframe.InvalidatingNodeVisitor$DeletingNodeVisitor.lambda$runInternal$0(InvalidatingNodeVisitor.java:280)
    at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:365)
    at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
    at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
@iancha1992 iancha1992 added the team-Core Skyframe, bazel query, BEP, options parsing, bazelrc label Sep 7, 2023
@haxorz haxorz added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Sep 18, 2023
iancha1992 pushed a commit to iancha1992/bazel that referenced this issue Jan 8, 2025
This protects against an integer overflow which could occur for large key list size and large thread counts.

Regrettably, it's difficult to write a regression test for this scenario, as exercising this overflow requires lots of time and heap, so it would be a performance regression to our test suites.

Fixes bazelbuild#19445

PiperOrigin-RevId: 595420516
Change-Id: Ic0a475a6a273c50fe9895dd0852fa5b062859cb2
@iancha1992
Copy link
Member

@bazel-io fork 7.5.0

bazel-io pushed a commit to bazel-io/bazel that referenced this issue Jan 8, 2025
This protects against an integer overflow which could occur for large key list size and large thread counts.

Regrettably, it's difficult to write a regression test for this scenario, as exercising this overflow requires lots of time and heap, so it would be a performance regression to our test suites.

Fixes bazelbuild#19445

PiperOrigin-RevId: 595420516
Change-Id: Ic0a475a6a273c50fe9895dd0852fa5b062859cb2
meteorcloudy pushed a commit that referenced this issue Jan 9, 2025
This protects against an integer overflow which could occur for large
key list size and large thread counts.

Regrettably, it's difficult to write a regression test for this
scenario, as exercising this overflow requires lots of time and heap, so
it would be a performance regression to our test suites.

Fixes #19445

PiperOrigin-RevId: 595420516
Change-Id: Ic0a475a6a273c50fe9895dd0852fa5b062859cb2

Commit
3e373d0

Co-authored-by: Googler <[email protected]>
tom-neara pushed a commit to tom-neara/bazel that referenced this issue Jan 15, 2025
This protects against an integer overflow which could occur for large key list size and large thread counts.

Regrettably, it's difficult to write a regression test for this scenario, as exercising this overflow requires lots of time and heap, so it would be a performance regression to our test suites.

Fixes bazelbuild#19445

PiperOrigin-RevId: 595420516
Change-Id: Ic0a475a6a273c50fe9895dd0852fa5b062859cb2
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.5.0 RC2. Please test out the release candidate and report any issues as soon as possible.
If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=7.5.0rc2. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: bug
Projects
None yet
Development

No branches or pull requests

5 participants