Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IllegalArgumentException: Values less than -1 bytes are not supported on DiskThresholdDecider #48380

Closed
nachogiljaldo opened this issue Oct 23, 2019 · 1 comment · Fixed by #48392
Labels
>bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v7.4.0

Comments

@nachogiljaldo
Copy link

Elasticsearch version (bin/elasticsearch --version): 7.4.0

Plugins installed: [repository-s3]

JVM version (java -version):

[root@c1be5a9fd961 /]# /elasticsearch/jdk/bin/java --version
openjdk 13 2019-09-17
OpenJDK Runtime Environment AdoptOpenJDK (build 13+33)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 13+33, mixed mode, sharing)

OS version (uname -a if on a Unix-like system):

Linux c1be5a9fd961 4.15.0-1027-aws #27-Ubuntu SMP Fri Nov 2 15:14:20 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

running on ESS

Description of the problem including expected versus actual behavior:
Suddenly during a plan migration, plans started to fail due to the exception:

[instance-0000000104] unexpected failure during [cluster_reroute(async_shard_fetch)], current state version [1212336]
java.lang.IllegalArgumentException: Values less than -1 bytes are not supported: -192978829196b
    at org.elasticsearch.common.unit.ByteSizeValue.<init>(ByteSizeValue.java:72) ~[elasticsearch-7.4.0.jar:7.4.0]
    at org.elasticsearch.common.unit.ByteSizeValue.<init>(ByteSizeValue.java:67) ~[elasticsearch-7.4.0.jar:7.4.0]
    at org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDecider.canRemain(DiskThresholdDecider.java:312) ~[elasticsearch-7.4.0.jar:7.4.0]
    at org.elasticsearch.cluster.routing.allocation.decider.AllocationDeciders.canRemain(AllocationDeciders.java:108) ~[elasticsearch-7.4.0.jar:7.4.0]
    at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator$Balancer.decideMove(BalancedShardsAllocator.java:668) ~[elasticsearch-7.4.0.jar:7.4.0]
    at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator$Balancer.moveShards(BalancedShardsAllocator.java:628) ~[elasticsearch-7.4.0.jar:7.4.0]
    at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator.allocate(BalancedShardsAllocator.java:123) ~[elasticsearch-7.4.0.jar:7.4.0]
    at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:405) ~[elasticsearch-7.4.0.jar:7.4.0]
    at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:370) ~[elasticsearch-7.4.0.jar:7.4.0]
    at org.elasticsearch.cluster.routing.BatchedRerouteService$1.execute(BatchedRerouteService.java:112) ~[elasticsearch-7.4.0.jar:7.4.0]
    at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47) ~[elasticsearch-7.4.0.jar:7.4.0]
    at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:702) ~[elasticsearch-7.4.0.jar:7.4.0]
    at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:324) ~[elasticsearch-7.4.0.jar:7.4.0]
    at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:219) [elasticsearch-7.4.0.jar:7.4.0]
    at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73) [elasticsearch-7.4.0.jar:7.4.0]

instance-0000000104 is the master node

Steps to reproduce:

Sorry, it's the first and only time I have seen this, so I have no steps.

Provide logs (if relevant):
The exception are up there.

Additionally, after @ywelsch suggestion, I enabled the DEBUG level on org.elasticsearch.cluster.routing.allocation.decider which revealed:

[instance-0000000106] less than the required 0b free bytes threshold (-194290383982 bytes free) on node aIG71toLQTeY-FIvHziBag, shard cannot remain

I identified that node using the _cluster/state and restarted it and the problem seems to be gone.

@ywelsch ywelsch added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Oct 23, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Allocation)

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Oct 23, 2019
Today it is possible that the total size of all relocating shards exceeds the
total amount of free disk space. For instance, this may be caused by another
user of the same disk increasing their disk usage, or may be due to how
Elasticsearch double-counts relocations that are nearly complete particularly
if there are many concurrent relocations in progress.

The `DiskThresholdDecider` treats negative free space similarly to zero free
space, but it then fails when rendering the messages that explain its decision.
This commit fixes its handling of negative free space.

Fixes elastic#48380
DaveCTurner added a commit that referenced this issue Oct 23, 2019
Today it is possible that the total size of all relocating shards exceeds the
total amount of free disk space. For instance, this may be caused by another
user of the same disk increasing their disk usage, or may be due to how
Elasticsearch double-counts relocations that are nearly complete particularly
if there are many concurrent relocations in progress.

The `DiskThresholdDecider` treats negative free space similarly to zero free
space, but it then fails when rendering the messages that explain its decision.
This commit fixes its handling of negative free space.

Fixes #48380
DaveCTurner added a commit that referenced this issue Oct 23, 2019
Today it is possible that the total size of all relocating shards exceeds the
total amount of free disk space. For instance, this may be caused by another
user of the same disk increasing their disk usage, or may be due to how
Elasticsearch double-counts relocations that are nearly complete particularly
if there are many concurrent relocations in progress.

The `DiskThresholdDecider` treats negative free space similarly to zero free
space, but it then fails when rendering the messages that explain its decision.
This commit fixes its handling of negative free space.

Fixes #48380
DaveCTurner added a commit that referenced this issue Oct 23, 2019
Today it is possible that the total size of all relocating shards exceeds the
total amount of free disk space. For instance, this may be caused by another
user of the same disk increasing their disk usage, or may be due to how
Elasticsearch double-counts relocations that are nearly complete particularly
if there are many concurrent relocations in progress.

The `DiskThresholdDecider` treats negative free space similarly to zero free
space, but it then fails when rendering the messages that explain its decision.
This commit fixes its handling of negative free space.

Fixes #48380
DaveCTurner added a commit that referenced this issue Oct 23, 2019
Today it is possible that the total size of all relocating shards exceeds the
total amount of free disk space. For instance, this may be caused by another
user of the same disk increasing their disk usage, or may be due to how
Elasticsearch double-counts relocations that are nearly complete particularly
if there are many concurrent relocations in progress.

The `DiskThresholdDecider` treats negative free space similarly to zero free
space, but it then fails when rendering the messages that explain its decision.
This commit fixes its handling of negative free space.

Fixes #48380
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v7.4.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants