Skip to content

Commit

Permalink
Implement cast for Table and Column (#6711)
Browse files Browse the repository at this point in the history
Closes #6112
  • Loading branch information
radeusgd authored May 19, 2023
1 parent 08e6d21 commit 447786a
Show file tree
Hide file tree
Showing 35 changed files with 1,013 additions and 256 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -445,6 +445,7 @@
- [Added `at_least_one` flag to `Table.tokenize_to_rows`.][6539]
- [Moved `Redshift` connector into a separate `AWS` library.][6550]
- [Added `Date_Range`.][6621]
- [Implemented the `cast` operation for `Table` and `Column`.][6711]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -655,6 +656,7 @@
[6539]: https://github.com/enso-org/enso/pull/6539
[6550]: https://github.com/enso-org/enso/pull/6550
[6621]: https://github.com/enso-org/enso/pull/6621
[6711]: https://github.com/enso-org/enso/pull/6711

#### Enso Compiler

Expand Down
48 changes: 21 additions & 27 deletions distribution/lib/Standard/Database/0.0.0-dev/src/Data/Column.enso
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ import Standard.Table.Internal.Java_Problems
import Standard.Table.Internal.Problem_Builder.Problem_Builder
import Standard.Table.Internal.Widget_Helpers
from Standard.Table import Sort_Column, Data_Formatter, Value_Type, Auto
from Standard.Table.Errors import Floating_Point_Equality, Inexact_Type_Coercion, Invalid_Value_Type, Lossy_Conversion
from Standard.Table.Errors import Floating_Point_Equality, Inexact_Type_Coercion, Invalid_Value_Type, Conversion_Failure
from Standard.Table.Internal.Cast_Helpers import check_cast_compatibility

import project.Connection.Connection.Connection
import project.Data.SQL_Statement.SQL_Statement
Expand Down Expand Up @@ -1002,18 +1003,13 @@ type Column
_ = [format, locale]
Error.throw <| Unsupported_Database_Operation.Error "`Column.format` is not implemented yet for the Database backends."

## PRIVATE
UNSTABLE
Cast the column to a specific type.
## Cast the column to a specific type.

Arguments:
- value_type: The `Value_Type` to cast the column to.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.

TODO [RW] this is a prototype needed for debugging, proper implementation
and testing will come with #6112.

In the Database backend, this will boil down to a CAST operation.
In the in-memory backend, a conversion will be performed according to
the following rules:
Expand All @@ -1024,6 +1020,9 @@ type Column
length.
- Conversion between numeric types will replace values exceeding the
range of the target type with `Nothing`.
- Converting decimal numbers into integers will truncate or round them,
depending on the backend. If more control is needed, use the various
rounding functions (such as `round` or `floor`).
- Booleans may also be converted to numbers, with `True` being converted
to `1` and `False` to `0`. The reverse is not supported - use `iif`
instead.
Expand All @@ -1032,32 +1031,27 @@ type Column
- If a `Date` is to be converted to `Date_Time`, it will be set at
midnight of the default system timezone.

? Conversion Precision

In the in-memory backend, if the conversion is lossy, a
`Lossy_Conversion` warning will be reported. The only exception is when
truncating a column which is already a text column - as then the
truncation seems like an intended behaviour, so it is not reported. If
truncating needs to occur when converting a non-text column, a warning
will still be reported.

Currently, the warning is not reported for Database backends.
If the target type cannot fit some of the values (for example due to too
small range), a `Conversion_Failure` may be reported according to the
`on_problems` rules. The Database backends may fail with `SQL_Error`
instead.

? Inexact Target Type

If the backend does not support the requested target type, the closest
supported type is chosen and a `Inexact_Type_Coercion` problem is
reported.
cast : Value_Type -> Problem_Behavior -> Column ! Illegal_Argument | Inexact_Type_Coercion | Lossy_Conversion
cast self value_type=self.value_type on_problems=Problem_Behavior.Report_Warning =
dialect = self.connection.dialect
type_mapping = dialect.get_type_mapping
target_sql_type = type_mapping.value_type_to_sql value_type on_problems
target_sql_type.if_not_error <|
infer_from_database new_expression =
SQL_Type_Reference.new self.connection self.context new_expression
new_column = dialect.make_cast self.as_internal target_sql_type infer_from_database
Column.Value new_column.name self.connection new_column.sql_type_reference new_column.expression self.context
cast : Value_Type -> Problem_Behavior -> Column ! Illegal_Argument | Inexact_Type_Coercion | Conversion_Failure
cast self value_type on_problems=Problem_Behavior.Report_Warning =
check_cast_compatibility self.value_type value_type <|
dialect = self.connection.dialect
type_mapping = dialect.get_type_mapping
target_sql_type = type_mapping.value_type_to_sql value_type on_problems
target_sql_type.if_not_error <|
infer_from_database new_expression =
SQL_Type_Reference.new self.connection self.context new_expression
new_column = dialect.make_cast self.as_internal target_sql_type infer_from_database
Column.Value new_column.name self.connection new_column.sql_type_reference new_column.expression self.context

## ALIAS Transform Column

Expand Down
32 changes: 12 additions & 20 deletions distribution/lib/Standard/Database/0.0.0-dev/src/Data/Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -1526,9 +1526,7 @@ type Table
_ = [column, pattern, case_sensitivity, parse_values, on_problems]
Error.throw (Unsupported_Database_Operation.Error "Table.parse_to_columns is not implemented yet for the Database backends.")

## PRIVATE
UNSTABLE
Cast the selected columns to a specific type.
## Cast the selected columns to a specific type.

Returns a new table in which the selected columns are replaced with
columns having the new types.
Expand All @@ -1539,9 +1537,6 @@ type Table
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.

TODO [RW] this is a prototype needed for debugging, proper implementation
and testing will come with #6112.

In the Database backend, this will boil down to a CAST operation.
In the in-memory backend, a conversion will be performed according to
the following rules:
Expand All @@ -1552,6 +1547,9 @@ type Table
length.
- Conversion between numeric types will replace values exceeding the
range of the target type with `Nothing`.
- Converting decimal numbers into integers will truncate or round them,
depending on the backend. If more control is needed, use the various
rounding functions (such as `round` or `floor`).
- Booleans may also be converted to numbers, with `True` being converted
to `1` and `False` to `0`. The reverse is not supported - use `iif`
instead.
Expand All @@ -1560,27 +1558,21 @@ type Table
- If a `Date` is to be converted to `Date_Time`, it will be set at
midnight of the default system timezone.

? Conversion Precision

In the in-memory backend, if the conversion is lossy, a
`Lossy_Conversion` warning will be reported. The only exception is when
truncating a column which is already a text column - as then the
truncation seems like an intended behaviour, so it is not reported. If
truncating needs to occur when converting a non-text column, a warning
will still be reported.

Currently, the warning is not reported for Database backends.
If the target type cannot fit some of the values (for example due to too
small range), a `Conversion_Failure` may be reported according to the
`on_problems` rules. The Database backends may fail with `SQL_Error`
instead.

? Inexact Target Type

If the backend does not support the requested target type, the closest
supported type is chosen and a `Inexact_Type_Coercion` problem is
reported.
@columns Widget_Helpers.make_column_name_vector_selector
cast : (Text | Integer | Column_Selector | Vector (Integer | Text | Column_Selector)) -> Value_Type -> Problem_Behavior -> Table ! Illegal_Argument | Inexact_Type_Coercion | Lossy_Conversion
cast self columns=[0] value_type=Value_Type.Char on_problems=Problem_Behavior.Report_Warning =
selected = self.select_columns columns
selected.columns.fold self table-> column_to_cast->
cast : (Text | Integer | Column_Selector | Vector (Integer | Text | Column_Selector)) -> Value_Type -> Boolean -> Problem_Behavior -> Table ! Illegal_Argument | Inexact_Type_Coercion | Conversion_Failure
cast self columns=[0] value_type error_on_missing_columns=True on_problems=Problem_Behavior.Report_Warning =
selected = self.columns_helper.resolve_columns columns error_on_missing_columns=error_on_missing_columns on_problems=on_problems
selected.fold self table-> column_to_cast->
new_column = column_to_cast.cast value_type on_problems
table.set new_column new_name=column_to_cast.name set_mode=Set_Mode.Update

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -145,12 +145,19 @@ type SQLite_Dialect
make_cast : Internal_Column -> SQL_Type -> (SQL_Expression -> SQL_Type_Reference) -> Internal_Column
make_cast self column target_type _ =
mapping = self.get_type_mapping
sql_type_text = mapping.sql_type_to_text target_type
new_expression = SQL_Expression.Operation "CAST" [column.expression, SQL_Expression.Literal sql_type_text]
# We override the type here, because SQLite gets it wrong if the column starts with NULL values.
target_value_type = mapping.sql_type_to_value_type target_type
custom_cast = make_custom_cast column target_value_type mapping
new_expression = custom_cast.if_nothing <|
self.make_cast_expression column target_type
new_sql_type_reference = SQL_Type_Reference.from_constant target_type
Internal_Column.Value column.name new_sql_type_reference new_expression

## PRIVATE
make_cast_expression self column target_type =
mapping = self.get_type_mapping
sql_type_text = mapping.sql_type_to_text target_type
SQL_Expression.Operation "CAST" [column.expression, SQL_Expression.Literal sql_type_text]

## PRIVATE
needs_execute_query_for_type_inference : Boolean
needs_execute_query_for_type_inference self = True
Expand All @@ -164,12 +171,15 @@ type SQLite_Dialect
So after unifying columns with mixed types, we add a cast to ensure that.
adapt_unified_column : Internal_Column -> Value_Type -> (SQL_Expression -> SQL_Type_Reference) -> Internal_Column
adapt_unified_column self column approximate_result_type infer_result_type_from_database_callback =
_ = infer_result_type_from_database_callback
# TODO [RW] This may be revisited with #6281.
case approximate_result_type of
Nothing -> column
_ ->
sql_type = self.get_type_mapping.value_type_to_sql approximate_result_type Problem_Behavior.Ignore
self.make_cast column sql_type infer_result_type_from_database_callback
new_expression = self.make_cast_expression column sql_type
new_sql_type_reference = SQL_Type_Reference.from_constant sql_type
Internal_Column.Value column.name new_sql_type_reference new_expression

## PRIVATE
prepare_fetch_types_query : SQL_Expression -> Context -> SQL_Statement
Expand Down Expand Up @@ -353,3 +363,11 @@ decimal_div = Base_Generator.lift_binary_op "/" x-> y->
## PRIVATE
mod_op = Base_Generator.lift_binary_op "mod" x-> y->
x ++ " - FLOOR(CAST(" ++ x ++ " AS REAL) / CAST(" ++ y ++ " AS REAL)) * " ++ y

## PRIVATE
It will return `Nothing` if the type does not require custom logic.
make_custom_cast column target_value_type type_mapping =
if target_value_type.is_text then
column_type = type_mapping.sql_type_to_value_type column.sql_type_reference.get
if column_type == Value_Type.Boolean then
SQL_Expression.Operation "IIF" [column.expression, SQL_Expression.Literal "'true'", SQL_Expression.Literal "'false'"]
51 changes: 50 additions & 1 deletion distribution/lib/Standard/Table/0.0.0-dev/src/Data/Column.enso
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import project.Data.Type.Enso_Types
import project.Data.Type.Storage
import project.Data.Type.Value_Type_Helpers
import project.Data.Table.Table
import project.Internal.Cast_Helpers
import project.Internal.Java_Problems
import project.Internal.Naming_Helpers.Naming_Helpers
import project.Internal.Parse_Values_Helper
Expand All @@ -21,7 +22,7 @@ import project.Data.Type.Value_Type_Helpers

from project.Data.Table import print_table
from project.Data.Type.Value_Type import Value_Type, Auto
from project.Errors import No_Index_Set_Error, Floating_Point_Equality, Invalid_Value_Type, Inexact_Type_Coercion
from project.Errors import No_Index_Set_Error, Floating_Point_Equality, Invalid_Value_Type, Inexact_Type_Coercion, Conversion_Failure
from project.Internal.Java_Exports import make_string_builder

polyglot java import org.enso.table.data.column.operation.map.MapOperationProblemBuilder
Expand Down Expand Up @@ -1279,6 +1280,54 @@ type Column
_ -> Error.throw <| Illegal_Argument.Error <| "Unsupported format type: " + format.to_text
new_column

## Cast the column to a specific type.

Arguments:
- value_type: The `Value_Type` to cast the column to.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.

In the Database backend, this will boil down to a CAST operation.
In the in-memory backend, a conversion will be performed according to
the following rules:
- Anything can be cast into the `Mixed` type.
- Converting to a `Char` type, the elements of the column will be
converted to text. If it is fixed length, the texts will be trimmed or
padded on the right with the space character to match the desired
length.
- Conversion between numeric types will replace values exceeding the
range of the target type with `Nothing`.
- Converting decimal numbers into integers will truncate or round them,
depending on the backend. If more control is needed, use the various
rounding functions (such as `round` or `floor`).
- Booleans may also be converted to numbers, with `True` being converted
to `1` and `False` to `0`. The reverse is not supported - use `iif`
instead.
- A `Date_Time` may be converted into a `Date` or `Time` type - the
resulting value will be truncated to the desired type.
- If a `Date` is to be converted to `Date_Time`, it will be set at
midnight of the default system timezone.

If the target type cannot fit some of the values (for example due to too
small range), a `Conversion_Failure` may be reported according to the
`on_problems` rules. The Database backends may fail with `SQL_Error`
instead.

? Inexact Target Type

If the backend does not support the requested target type, the closest
supported type is chosen and a `Inexact_Type_Coercion` problem is
reported.
cast : Value_Type -> Problem_Behavior -> Column ! Illegal_Argument | Inexact_Type_Coercion | Conversion_Failure
cast self value_type on_problems=Problem_Behavior.Report_Warning =
Cast_Helpers.check_cast_compatibility self.value_type value_type <|
target_storage_type = Storage.from_value_type value_type on_problems
cast_problem_builder = Cast_Helpers.new_java_problem_builder self.name value_type
new_storage = self.java_column.getStorage.cast target_storage_type cast_problem_builder.to_java
problems = cast_problem_builder.get_problems
on_problems.attach_problems_before problems <|
Column.from_storage self.name new_storage

## ALIAS Transform Column

Applies `function` to each item in this column and returns the column
Expand Down
35 changes: 14 additions & 21 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Data/Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -873,9 +873,7 @@ type Table
parse_problem_builder.attach_problems_before on_problems <|
Table.new new_columns

## PRIVATE
UNSTABLE
Cast the selected columns to a specific type.
## Cast the selected columns to a specific type.

Returns a new table in which the selected columns are replaced with
columns having the new types.
Expand All @@ -886,9 +884,6 @@ type Table
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.

TODO [RW] this is a prototype needed for debugging, proper implementation
and testing will come with #6112.

In the Database backend, this will boil down to a CAST operation.
In the in-memory backend, a conversion will be performed according to
the following rules:
Expand All @@ -899,6 +894,9 @@ type Table
length.
- Conversion between numeric types will replace values exceeding the
range of the target type with `Nothing`.
- Converting decimal numbers into integers will truncate or round them,
depending on the backend. If more control is needed, use the various
rounding functions (such as `round` or `floor`).
- Booleans may also be converted to numbers, with `True` being converted
to `1` and `False` to `0`. The reverse is not supported - use `iif`
instead.
Expand All @@ -907,28 +905,23 @@ type Table
- If a `Date` is to be converted to `Date_Time`, it will be set at
midnight of the default system timezone.

? Conversion Precision

In the in-memory backend, if the conversion is lossy, a
`Lossy_Conversion` warning will be reported. The only exception is when
truncating a column which is already a text column - as then the
truncation seems like an intended behaviour, so it is not reported. If
truncating needs to occur when converting a non-text column, a warning
will still be reported.

Currently, the warning is not reported for Database backends.
If the target type cannot fit some of the values (for example due to too
small range), a `Conversion_Failure` may be reported according to the
`on_problems` rules. The Database backends may fail with `SQL_Error`
instead.

? Inexact Target Type

If the backend does not support the requested target type, the closest
supported type is chosen and a `Inexact_Type_Coercion` problem is
reported.
@columns Widget_Helpers.make_column_name_vector_selector
cast : (Text | Integer | Column_Selector | Vector (Integer | Text | Column_Selector)) -> Value_Type -> Problem_Behavior -> Table ! Illegal_Argument | Inexact_Type_Coercion | Lossy_Conversion
cast self columns=[0] value_type=Value_Type.Char on_problems=Problem_Behavior.Report_Warning =
_ = [columns, value_type, on_problems]
## TODO [RW] actual implementation in #6112
self
cast : (Text | Integer | Column_Selector | Vector (Integer | Text | Column_Selector)) -> Value_Type -> Boolean -> Problem_Behavior -> Table ! Illegal_Argument | Inexact_Type_Coercion | Conversion_Failure
cast self columns=[0] value_type error_on_missing_columns=True on_problems=Problem_Behavior.Report_Warning =
selected = self.columns_helper.resolve_columns columns error_on_missing_columns=error_on_missing_columns on_problems=on_problems
selected.fold self table-> column_to_cast->
new_column = column_to_cast.cast value_type on_problems
table.set new_column new_name=column_to_cast.name set_mode=Set_Mode.Update

## Splits a column of text into a set of new columns.
The original column will be removed from the table.
Expand Down
Loading

0 comments on commit 447786a

Please sign in to comment.