`attempt to multiply with overflow` when using `i32` #138

hamirmahal · 2023-11-11T19:56:33Z

No description provided.

hamirmahal · 2023-11-11T19:56:51Z

https://github.com/hamirmahal/rstar-i32/ has some code that reproduces the issue.

hamirmahal · 2023-11-11T19:59:46Z

https://github.com/hamirmahal/rstar-i32/actions/runs/6836250994/job/18590953607#step:3:43 also shows the output on a machine that isn't mine, although you have to be signed in to see it via this link.

hamirmahal · 2023-11-11T20:01:08Z

I'm not sure if the error is in my code, or in rstar. Am I using rstar incorrectly? src/main.rs outputs

$  cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/rstar-boilerplate`
thread 'main' panicked at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ops/arith.rs:346:1:
attempt to multiply with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Rust code that resulted in the above output

use rstar::Point;

#[derive(Clone, Copy, Debug, PartialEq)]
struct Data {
    coordinates: [i32; 18],
}

impl Point for Data {
    type Scalar = i32;
    const DIMENSIONS: usize = 18;

    fn nth(&self, index: usize) -> Self::Scalar {
        self.coordinates[index]
    }

    fn nth_mut(&mut self, index: usize) -> &mut Self::Scalar {
        &mut self.coordinates[index]
    }

    fn generate(mut generator: impl FnMut(usize) -> Self::Scalar) -> Self {
        Data {
            coordinates: [
                generator(0),
                generator(1),
                generator(2),
                generator(3),
                generator(4),
                generator(5),
                generator(6),
                generator(7),
                generator(8),
                generator(9),
                generator(10),
                generator(11),
                generator(12),
                generator(13),
                generator(14),
                generator(15),
                generator(16),
                generator(17),
            ],
        }
    }
}

fn main() {
    let mut tree = rstar::RTree::new();
    tree.insert(Data {
        coordinates: [
            30, 30, 62, 30, 31, 30, 20, 12, 30, 80, 40, 132, 28, 78, 140, 139, 195, 67,
        ],
    });
    tree.insert(Data {
        coordinates: [
            40, 9, 9, 144, 136, 149, 148, 140, 198, 36, 50, 167, 179, 234, 38, 2, 103, 38,
        ],
    });
    tree.insert(Data {
        coordinates: [
            38, 38, 38, 6, 2, 3, 155, 61, 61, 195, 155, 15, 70, 134, 158, 126, 94, 63,
        ],
    });
    tree.insert(Data {
        coordinates: [
            154, 154, 138, 179, 121, 75, 143, 31, 7, 67, 11, 3, 113, 113, 65, 65, 73, 65,
        ],
    });
    tree.insert(Data {
        coordinates: [
            221, 215, 209, 115, 210, 198, 224, 236, 111, 6, 7, 7, 85, 92, 203, 197, 36, 44,
        ],
    });
    tree.insert(Data {
        coordinates: [
            176, 240, 176, 176, 112, 241, 240, 240, 192, 44, 44, 12, 206, 204, 142, 4, 44, 2,
        ],
    });
    tree.insert(Data {
        coordinates: [
            212, 218, 202, 70, 70, 23, 23, 23, 1, 230, 250, 55, 30, 23, 60, 60, 92, 16,
        ],
    });
}

urschrei · 2023-11-12T15:56:43Z

rstar uses a fold to calculate the area of the envelope (an AABB) of the geometry (In your case, the 18-dimensional point you're trying to insert on line 78). Fold uses an accumulator and a multiplication function:

diag.fold(one, |acc, cur| {
    max_inline(cur, zero) * acc
})

During the insertion operation on line 78 of your code, you can see the problem:

Acc: 359131136 Cur: 68.

Multiplying acc with cur overflows i32. That's a panic in debug builds.

hamirmahal · 2023-11-12T19:11:41Z

Ah, because the product, 24,420,917,248, is larger than the max i32 value, 2,147,483,647.

If each coordinate is a u8, the max possible value when multiplying them all together is 255^numCoordinates, which is currently 18, but the number of coordinates could easily double, or triple, for my use case.

So the maximum possible product currently is 255^18 = 2.078371831996e43, which I don’t even think the maximum unsigned integer size, u128, can support, since I think that’s approximately 3.402823669209e38.

Is calculating the area of the bounding box required to use an R-tree? I’m wondering if it’s possible to write an implementation that avoids doing this calculation altogether to avoid running the risk of overflow.

Alternatively, would checking for overflow before the multiplication, and setting the product to u128::MAX if there would be an overflow, result in any weird or unexpected behavior? I’m also considering that approach.

During the insertion operation on line 78 of your code, you can see the problem:

Acc: 359131136 Cur: 68.

Multiplying acc with cur overflows i32. That's a panic in debug builds.

Hmm... where does the current value of 68 come from? I don't see any elements in https://github.com/hamirmahal/rstar-i32/commit/aef70723a73ca76c71387ef93d825b7afcf5a0ea#diff-42cb6807ad74b3e201c5a7ca98b911c5fa08380e942be6e4ac5807f8377f87fc with the value 68.

Also, how were you able to see the current and total values in fold?

I tried adding

        diag.fold(one, |acc, cur| {
            std::println!("acc: {}, cur: {}", acc, cur);
            return max_inline(cur, zero) * acc;
        })

to ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/aabb.rs, but that gave me this error.

ailed to resolve: use of undeclared crate or module `std`
use of undeclared crate or module `std`rustc

Thanks for the detailed response.

michaelkirk · 2023-11-13T22:06:51Z

Just to clarify: You're using 18-dimensional coordinates, is that right?

michaelkirk · 2023-11-13T22:11:42Z

Ok, I see your example:
https://github.com/hamirmahal/rstar-i32/commit/aef70723a73ca76c71387ef93d825b7afcf5a0ea#diff-42cb6807ad74b3e201c5a7ca98b911c5fa08380e942be6e4ac5807f8377f87fc

So you are using 18-D coordinates.

Just to ask a silly question: Did you really intend to have a single coordinate with 18 dimensions, or were you trying to have something like 9 2-D (XY) coordinates?

hamirmahal · 2023-11-13T22:38:40Z

Did you really intend to have a single coordinate with 18 dimensions, or were you trying to have something like 9 2-D (XY) coordinates?

This is correct.

hamirmahal · 2023-11-13T22:40:14Z

I'm actually trying to find the element in my dataset with the closest Hamming distance to some input.

The things I'm comparing currently have 18 bytes, hence the 18 coordinates that can each be as large as 255, and it looks like we may need to increase the number of bytes we store to 32, or possibly even more.

Do you think R-trees are a suitable data structure for this purpose?

michaelkirk · 2023-11-14T11:21:39Z

Do you think R-trees are a suitable data structure for this purpose?

Well, I can say with certainty that this particular R-Tree implementation is apparently not currently suitable for this purpose. 🤣

But more seriously... I've never worked with hamming distance. The definition I searched up was:

While comparing two binary strings of equal length, Hamming distance is the number of bit positions in which the two bits are different.

So, hamming distance is about bitwise distance - not euclidean distance.

In other words:

hamming_distance(0b_1000, 0b_0000) == 1  // Near!
hamming_distance(0b_1000, 0b_0111) == 4  // Far!

But with typical numeric distance we get a very different answer:

numeric_distance(0b_1000, 0b_0000) == 8  // Far!
numeric_distance(0b_1000, 0b_0111) == 1  // Near!

So intuitively, I don't understand how you'd use an rtree, which deals with euclidean numeric distance, to efficiently compute hamming distance.

urschrei · 2023-11-14T11:56:59Z

Hmm. Hamming distance satisfies the triangle inequality, so I think it's possible to use R[*]-trees, but I have no idea how we're going to make it work for rstar…

hamirmahal · 2023-11-14T22:33:57Z

Do you think R-trees are a suitable data structure for this purpose?

Well, I can say with certainty that this particular R-Tree implementation is apparently not currently suitable for this purpose. 🤣

Ha!

But more seriously... I've never worked with hamming distance. The definition I searched up was:

While comparing two binary strings of equal length, Hamming distance is the number of bit positions in which the two bits are different.

So, hamming distance is about bitwise distance - not euclidean distance.

In other words:
hamming_distance(0b_1000, 0b_0000) == 1  // Near!
hamming_distance(0b_1000, 0b_0111) == 4  // Far!
But with typical numeric distance we get a very different answer:
numeric_distance(0b_1000, 0b_0000) == 8  // Far!
numeric_distance(0b_1000, 0b_0111) == 1  // Near!

That all sounds correct.

So intuitively, I don't understand how you'd use an rtree, which deals with euclidean numeric distance, to efficiently compute hamming distance.

I ended up doing

impl PointDistance for Data {
    fn distance_2(&self, point: &[f32; 18]) -> f32 {
        let hamming_distance: f32 = self
            .hash
            .as_slice()
            .iter()
            .zip(point)
            .map(|(l, r)| {
                // cast l and r to u8, because we know that the values are in the range [0, 255].
                let l = *l as u8;
                let r = *r as u8;
                (l ^ r).count_ones() as f32
            })
            .sum();

        // We must return the squared distance!
        hamming_distance * hamming_distance
    }
}

but it seems to be quite computationally expensive, and not very efficient.

I used f32s instead of i32s, even though I'm only dealing with u8s, because rstar doesn't support u8s out of the box, and because I didn't get attempt to multiply with overflow when using f32s.

urschrei · 2023-11-14T22:37:33Z

What's the hash for?

hamirmahal · 2023-11-14T22:40:56Z

Oh, so I'm ideally trying to find the closest Hamming distance between &self's hash, which is just 18 bytes, and every other hash in my dataset.

urschrei · 2023-11-14T22:42:57Z

Hm. If you're using the default Hash impl, you're using a very high-quality (and expensive) hashing algorithm, which I would imagine is complete overkill in this case.

hamirmahal · 2023-11-14T22:44:05Z

Ah, I see… I’ll look into some different approaches.

urschrei · 2023-11-14T22:46:32Z

https://nnethercote.github.io/2021/12/08/a-brutally-effective-hash-function-in-rust.html FxHash will certainly be faster (https://crates.io/crates/fxhash)

rmanoka · 2023-11-16T04:53:52Z

Do we support bnum ? It supports a u256, which should suffice for above calc?

adamreichold · 2023-11-16T07:02:28Z

I think before we go ahead and make this crate more difficult to maintain, it should be established whether R* trees are actually suitable to querying Hamming distances.

While it could be innovative cross-pollination between different areas of mathematics, I am somewhat doubtful that the geometric intuitions underpinning the construction of R* trees transfer over to code spaces. Generally, there exist a lot of metric spaces which are decidedly weird compared to the two- or three-dimensional Euclidean space our intuition is still based on even when alternative metrics like Manhattan are used.

So, I think the first step would be to just modify the source of rstar in a local fork to try to hack around the limits of the current design, e.g. use saturating arithmetic if that is helpful or use an high (e.g. bnum) or arbitrary (e.g. num-bigint) precision number library. Then this hacky local fork can be used to figure out if R* trees can be useful for the given problem and based on these experiences we can decide whether and how to extend this library to support this use case.

(But if the changes are too extensive, the answer might still be no as as maintainers, saying no is often what we need to do to keep complexity in check. We have also have represent the interests of our current user base which I guess is mostly GIS, games and then some simulation stuff in order of popularity.)

Personally, my first guess when reading the original problem would be to look into Levenshtein automata which are used for typo resistant search, i.e. to find a matching term with the fewest possible edits away from the query term.

urschrei · 2023-11-16T08:33:19Z

Oh, I don't think anyone was actually proposing a change. Certainly when I said "make it work" I didn't mean it in the sense of modification.

rmanoka · 2023-11-17T16:22:28Z

@hamirmahal I think you can get it working by just converting the [u8; 18] slice into a [f64; 18] slice before storing it in the hash variable. f64 or f32, with float should support the range you need. There'll be some loss in precision in that calculation, but I think it's only used for re-balancing of the r-tree that is constructed (and so must still work, with possibly poorer performance).

hamirmahal · 2023-11-21T01:59:07Z

@rmanoka I'll give that a look. Thanks for the suggestion.

stepancheg mentioned this issue Dec 17, 2023

attempt to add with overflow #139

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`attempt to multiply with overflow` when using `i32` #138

`attempt to multiply with overflow` when using `i32` #138

hamirmahal commented Nov 11, 2023

hamirmahal commented Nov 11, 2023

hamirmahal commented Nov 11, 2023

hamirmahal commented Nov 11, 2023

urschrei commented Nov 12, 2023 •

edited

Loading

hamirmahal commented Nov 12, 2023

michaelkirk commented Nov 13, 2023

michaelkirk commented Nov 13, 2023 •

edited

Loading

hamirmahal commented Nov 13, 2023

hamirmahal commented Nov 13, 2023

michaelkirk commented Nov 14, 2023 •

edited

Loading

urschrei commented Nov 14, 2023

hamirmahal commented Nov 14, 2023

urschrei commented Nov 14, 2023

hamirmahal commented Nov 14, 2023

urschrei commented Nov 14, 2023

hamirmahal commented Nov 14, 2023

urschrei commented Nov 14, 2023

rmanoka commented Nov 16, 2023

adamreichold commented Nov 16, 2023 •

edited

Loading

urschrei commented Nov 16, 2023

rmanoka commented Nov 17, 2023

hamirmahal commented Nov 21, 2023

attempt to multiply with overflow when using i32 #138

attempt to multiply with overflow when using i32 #138

Comments

hamirmahal commented Nov 11, 2023

hamirmahal commented Nov 11, 2023

hamirmahal commented Nov 11, 2023

hamirmahal commented Nov 11, 2023

urschrei commented Nov 12, 2023 • edited Loading

hamirmahal commented Nov 12, 2023

michaelkirk commented Nov 13, 2023

michaelkirk commented Nov 13, 2023 • edited Loading

hamirmahal commented Nov 13, 2023

hamirmahal commented Nov 13, 2023

michaelkirk commented Nov 14, 2023 • edited Loading

urschrei commented Nov 14, 2023

hamirmahal commented Nov 14, 2023

urschrei commented Nov 14, 2023

hamirmahal commented Nov 14, 2023

urschrei commented Nov 14, 2023

hamirmahal commented Nov 14, 2023

urschrei commented Nov 14, 2023

rmanoka commented Nov 16, 2023

adamreichold commented Nov 16, 2023 • edited Loading

urschrei commented Nov 16, 2023

rmanoka commented Nov 17, 2023

hamirmahal commented Nov 21, 2023

`attempt to multiply with overflow` when using `i32` #138

`attempt to multiply with overflow` when using `i32` #138

urschrei commented Nov 12, 2023 •

edited

Loading

michaelkirk commented Nov 13, 2023 •

edited

Loading

michaelkirk commented Nov 14, 2023 •

edited

Loading

adamreichold commented Nov 16, 2023 •

edited

Loading