[VL] Results mismatch with vanilla spark when using window exec #6845

NEUpanning · 2024-08-14T12:38:54Z

Backend

VL (Velox)

Bug description

SQL:

SELECT
            t.partner_id,
            t.qualifi_name as name,
            t.qualifi_type as type_id,
            row_number() over(partition by partner_id,qualifi_type order by modify_time,create_time,end_date desc) row_id
        FROM
            tbl
        WHERE partition_date = '2024-08-02'
            and status = 1
            AND qualifi_type in (201,202,1);

gluten results that mismatch with vanilla:

714328  临海市嘴爱烘焙坊        1       1
714328  徐新亮  1       2

vanilla results:

714328  徐新亮  1       1
714328  临海市嘴爱烘焙坊        1       2

the original rows of mismatch results

 partner_id | qualifi_type |   qualifi_name   |     modify_time     |     create_time     |      end_date
------------+--------------+------------------+---------------------+---------------------+---------------------
     714328 |            1 | 徐新亮           | 2019-06-08 05:53:49 | 2019-06-08 05:53:49 | 2099-01-01 00:00:00
     714328 |            1 | 临海市嘴爱烘焙坊   | 2019-06-08 05:53:49 | 2019-06-08 05:53:49 | 2099-01-01 00:00:00

It seems that row_number() over(...) produced different result for the same order of rows (the columns used to sort the two rows are equal).
Here is gluten physical plan:

gluten version : 1.2-rc

Spark version

3.0

The text was updated successfully, but these errors were encountered:

NEUpanning · 2024-08-14T12:39:59Z

cc @kecookier

kecookier · 2024-08-15T01:57:10Z

When we get duplicate rows, each row may end up with a value of 1 after row_number() is applied. This may not be an issue.

PHILO-HE · 2024-08-15T06:55:58Z

Looks velox sort doesn't respect the input order when some rows have same values for sort keys. Agreed with @kecookier, maybe we can ignore this issue.

NEUpanning · 2024-08-15T09:28:26Z

This inconsistent behavior seems acceptable, and vanilla Spark sort is not deterministic either. Therefore, I will close this issue. Thanks for your help. @kecookier @PHILO-HE

NEUpanning added bug Something isn't working triage labels Aug 14, 2024

NEUpanning mentioned this issue Aug 14, 2024

[VL] Result mismatch issues tracker #4652

Open

32 tasks

NEUpanning closed this as completed Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Results mismatch with vanilla spark when using window exec #6845

[VL] Results mismatch with vanilla spark when using window exec #6845

NEUpanning commented Aug 14, 2024 •

edited

Loading

NEUpanning commented Aug 14, 2024

kecookier commented Aug 15, 2024

PHILO-HE commented Aug 15, 2024

NEUpanning commented Aug 15, 2024

[VL] Results mismatch with vanilla spark when using window exec #6845

[VL] Results mismatch with vanilla spark when using window exec #6845

Comments

NEUpanning commented Aug 14, 2024 • edited Loading

Backend

Bug description

Spark version

NEUpanning commented Aug 14, 2024

kecookier commented Aug 15, 2024

PHILO-HE commented Aug 15, 2024

NEUpanning commented Aug 15, 2024

NEUpanning commented Aug 14, 2024 •

edited

Loading