Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

order by rand() don't return random value #9033

Closed
WangXiangUSTC opened this issue Jan 14, 2019 · 6 comments · Fixed by #10064
Closed

order by rand() don't return random value #9033

WangXiangUSTC opened this issue Jan 14, 2019 · 6 comments · Fixed by #10064
Labels
type/bug The issue is confirmed as a bug. type/question The issue belongs to a question.

Comments

@WangXiangUSTC
Copy link
Contributor

WangXiangUSTC commented Jan 14, 2019

  1. What did you do?

I want get random value by order by rand()

  1. What did you expect to see?
    I use the sql below to get some random values.
select a from test1 order by rand() limit 10;

I want to see random result in column a.

  1. What did you see instead?
mysql> select a from test1 order by rand() limit 10;
+---+
| a |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
10 rows in set (0.01 sec)

I run many times, and the result is same.

  1. What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)?
    tidb version: v2.1.2
@XuHuaiyu
Copy link
Contributor

@WangXiangUSTC
As the MySQL manual mentioned, 0<= rand() < 1.0.
Thus the results of the following 3 SQLs are always the same.

select a from test1 order by rand() limit 10;
select a from test1 order by 0 limit 10;
select a from test1 order by a limit 10;

@XuHuaiyu XuHuaiyu added the type/question The issue belongs to a question. label Jan 14, 2019
@WangXiangUSTC
Copy link
Contributor Author

but mysql can get random value use this sql @XuHuaiyu

@dbjoa
Copy link
Contributor

dbjoa commented Jan 15, 2019

@WangXiangUSTC
Could you show your results and the reproducible scenario?

@swdee
Copy link

swdee commented Apr 8, 2019

I ran into this same problem today also where TiDB doesn't behave as MySQL does.

As per the document link @XuHuaiyu quoted it states;

Use of a column with RAND() values in an ORDER BY or GROUP BY clause may yield unexpected results because for either clause a RAND() expression can be evaluated multiple times for the same row, each time returning a different result. If the goal is to retrieve rows in random order, you can use a statement like this:

SELECT * FROM tbl_name ORDER BY RAND();

This is commonly used to select a random number of rows from a table.

@dbjoa a reproducible scenario is;

Create schema and insert values

create table tmprand(val varchar(8) not null);
insert into tmprand (val) values ('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j');

Then randomly select 2 rows from table tmprand

select * from tmprand order by rand() limit 2;
+-----+
| val |
+-----+
| a   |
| b   |
+-----+
2 rows in set (0.005 sec)

When running the above select query multiple times TiDB will always output the same result of "a" and "b".

When running the same query on MySQL it will output 2 random rows, eg:

MariaDB [test]> select * from tmprand order by rand() limit 2;
+-----+
| val |
+-----+
| c   |
| f   |
+-----+
2 rows in set (0.007 sec)

MariaDB [test]> select * from tmprand order by rand() limit 2;
+-----+
| val |
+-----+
| h   |
| f   |
+-----+
2 rows in set (0.001 sec)

MariaDB [test]> select * from tmprand order by rand() limit 2;
+-----+
| val |
+-----+
| i   |
| d   |
+-----+
2 rows in set (0.001 sec)

MariaDB [test]> select * from tmprand order by rand() limit 2;
+-----+
| val |
+-----+
| j   |
| i   |
+-----+
2 rows in set (0.001 sec)

@lysu lysu added the type/bug The issue is confirmed as a bug. label Apr 8, 2019
@lysu
Copy link
Contributor

lysu commented Apr 8, 2019

@WangXiangUSTC @swdee

maybe we can workaround by use sub query like this select a from (select a, rand() r from t order by r) tt if we need it in the current version.

it seems to be caused by miss a project in plan result:

mysql> explain select a from t order by rand();
+-------------------+-------+------+------------------------------------------------------------+
| id                | count | task | operator info                                              |
+-------------------+-------+------+------------------------------------------------------------+
| TableReader_8     | 3.00  | root | data:TableScan_7                                           |
| └─TableScan_7     | 3.00  | cop  | table:t, range:[-inf,+inf], keep order:false, stats:pseudo |
+-------------------+-------+------+------------------------------------------------------------+

it should be like this:

mysql> explain select a, b from t2 order by a + b;
+-------------------------+----------+------+-------------------------------------------------------------+
| id                      | count    | task | operator info                                               |
+-------------------------+----------+------+-------------------------------------------------------------+
| Projection_8            | 10000.00 | root | test.t2.a, test.t2.b                                        |
| └─Sort_4                | 10000.00 | root | col_2:asc                                                   |
|   └─Projection_9        | 10000.00 | root | test.t2.a, test.t2.b, plus(test.t2.a, test.t2.b)            |
|     └─TableReader_7     | 10000.00 | root | data:TableScan_6                                            |
|       └─TableScan_6     | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo |
+-------------------------+----------+------+-------------------------------------------------------------+

@swdee
Copy link

swdee commented Apr 8, 2019

@lysu yes, that sub query workaround does the job.

In my case the dataset was small and doesn't change much so I worked around it by caching the table in
memory in the application, from which random entries are selected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The issue is confirmed as a bug. type/question The issue belongs to a question.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants