Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper truncation missing in casts to varchar(x) #552

Open
findepi opened this issue Mar 27, 2019 · 16 comments
Open

Proper truncation missing in casts to varchar(x) #552

findepi opened this issue Mar 27, 2019 · 16 comments
Assignees
Labels
bug Something isn't working correctness

Comments

@findepi
Copy link
Member

findepi commented Mar 27, 2019

presto:default> select cast(bigint '12345678' as varchar(2));
 _col0
-------
 12

but ...

presto:default> select cast(bigint '12345678' as varchar(2)) = '12';
 _col0
-------
 false

This is because https://github.com/prestosql/presto/blob/c74b775e103a7a239aafef1b7ca15fe6370495fe/presto-main/src/main/java/io/prestosql/type/BigintOperators.java#L270-L276
should consider x (@LiteralParameter("x") long x) and truncate the result to make sure no more than x characters are returned.

Same probably applies to casts from other types to varchar.

@findepi findepi added bug Something isn't working good first issue Good for newcomers correctness labels Mar 27, 2019
@sopel39
Copy link
Member

sopel39 commented Mar 28, 2019

Some more context: prestodb/presto#7815

@Hammond95
Copy link

Is someone already working on this? I would like to give it a try...

@ankitdixit
Copy link
Member

@findepi Does the work involve here involve just truncating the values in castToVarchar
return utf8Slice(String.valueOf(value));
or is this on hold awaiting clarity as discussion on prestodb/presto#7815 seem to inconclusive

@findepi
Copy link
Member Author

findepi commented Jul 19, 2019

@ankitdixit each case needs to be looked at separately.

  • in some cases truncation should be applied
  • in some cases exception should be raised
  • I imagine, in some cases the outcome depends on actual value being converted (truncation or exception)

If you want to work on this, I would propose to raise a separate PR for each case being fixed.
(all integer types {{tinyint,smallint,integer,bigint}} should be considered as one case; same for floating point: {{real,double}})

@ankitdixit ankitdixit self-assigned this Jul 25, 2019
@ankitdixit
Copy link
Member

ankitdixit commented Jul 25, 2019

@findepi @dain @sopel39 trying to summarize from comments on prestodb/presto#7815, please let me know if this makes sense.

Approach could be to add a 2 new session propertes

  • cast_legacy_behavior with default value true.
  • cast_notation_type with values simple and scientific
    We cast on the type based on the value of cast notation

Behaviour:

  • Integer -> varchar

    • select cast(cast(bigint '12' as varchar(2))
      • both flag values: '12'
    • select cast(bigint '123' as varchar(2))
      • cast_legacy_behavior = true :'12'
      • cast_legacy_behavior = false : throw CastException
  • Decimal->varchar:

    • select cast(cast(1.0 as decimal(2,1)) as varchar(3)))
      • both flag values: '1.0'
    • select cast(cast(1.0 as decimal(2,1)) as varchar(2))
      • both flag values: '1.0'
    • select cast(cast(1.1 as decimal(2,1)) as varchar(2))
      • cast_legacy_behavior = true :'1.0'
      • cast_legacy_behavior = false : throw CastException
  • real/double: Find whether conventional or scientific notation is smaller and use that

    • select cast(cast(1.0 as double) as varchar(2))
      • cast_legacy_behavior = true : '1.'
      • cast_legacy_behavior = false : '1'
  • real/double: Find whether conventional or scientific notation is smaller and use that

    • select cast(cast(1.1 as double) as varchar(1))
      • cast_legacy_behavior = true : '1.'
      • cast_legacy_behavior = false : throw CastException
    • select cast(cast(1.0 as double) as varchar(x)) (for all x >= 3)
      • both flag values : 1.0
    • select cast(cast(10.0 as double) as varchar(x)) (x>4)
      • cast_legacy_behavior = true : '10.0'
      • cast_legacy_behavior = false : ''1.0e1' [scientific notation, if x >4], if x=4, throw exception
      • cast_legacy_behavior = false : ''10.0' [simple]

We can also introduce another session property which can be used to chose notation type (conventional/scientific)

@dain
Copy link
Member

dain commented Jul 25, 2019

@ankitdixit I don't have an opinion on this, so I'll let @martint and @findepi reply.

@sopel39
Copy link
Member

sopel39 commented Jul 26, 2019

select cast(cast(1.0 as double) as varchar(2))
cast_legacy_behavior = false : throw CastException

why would this fail?

@ankitdixit
Copy link
Member

ankitdixit commented Jul 26, 2019

@sopel39 from prestodb/presto#7815 (comment)
Additionally, since we don't use scientific notation for double/float textual representation we should probably fail when doing casting from double to varchar when any digit is truncated. Assumption is that if user wants to truncate he can simply cast to decimal and round to n-th decimal places before casting to varchar.
I thought this meant we should not not truncate to '1.' and rather throw CastException

@sopel39
Copy link
Member

sopel39 commented Jul 26, 2019

@ankitdixit yes, but
select cast(cast(1.0 as double) as varchar(2))
could simply return
'1'

@ankitdixit
Copy link
Member

ankitdixit commented Jul 26, 2019

@sopel39 , I thought changing 1.0 to 1 would be losing precision in the generic sense as then 1.0 will cast as varchar(2) but 1.1 will fail.
Which also makes sense, then 1.0,2.0... will work but 1.1, 2.1... will not.

@sopel39
Copy link
Member

sopel39 commented Jul 29, 2019

I thought changing 1.0 to 1 would be losing precision

I think you can think of any floating point as having infinite number of trailing zeros. In this context you won't lose any precision.

@martint
Copy link
Member

martint commented Sep 28, 2019

For reference, here's what the SQL spec says about this:

Screen Shot 2019-09-28 at 4 46 56 PM

Screen Shot 2019-09-28 at 4 48 32 PM

@findepi findepi removed the good first issue Good for newcomers label Oct 5, 2019
@martint
Copy link
Member

martint commented Jun 7, 2020

@ankitdixit, are you still planning to work on this?

@hashhar
Copy link
Member

hashhar commented Nov 5, 2021

Is this #5015 a duplicate of this?

@findepi
Copy link
Member Author

findepi commented Jun 9, 2022

@kasiafi are any known problems remaining?

otherwise i'd close as done

@kasiafi
Copy link
Member

kasiafi commented Jun 9, 2022

I'm pretty sure the bug still remains. I fixed the numeric types, and date type. We should go through all types that can be cast to varchar. They likely repeat the same wrong pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctness
Development

No branches or pull requests

8 participants