Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow casting a Mixed column into a concrete type #6777

Merged
merged 22 commits into from
May 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1069,6 +1069,8 @@ type Column
resulting value will be truncated to the desired type.
- If a `Date` is to be converted to `Date_Time`, it will be set at
midnight of the default system timezone.
- For a `Mixed` column being converted into a specific type, each row is
converted individually.

If the target type cannot fit some of the values (for example due to too
small range), a `Conversion_Failure` may be reported according to the
Expand All @@ -1080,6 +1082,12 @@ type Column
If the backend does not support the requested target type, the closest
supported type is chosen and a `Inexact_Type_Coercion` problem is
reported.

! Casting Text values

The `parse` method should be used to convert text values into other
types. Due to this, a Mixed column containing values `[2, "3"]` will
actually be converted into `[2, Nothing]` when casting to Integer type.
cast : Value_Type -> Problem_Behavior -> Column ! Illegal_Argument | Inexact_Type_Coercion | Conversion_Failure
cast self value_type on_problems=Problem_Behavior.Report_Warning =
check_cast_compatibility self.value_type value_type <|
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1617,6 +1617,8 @@ type Table
resulting value will be truncated to the desired type.
- If a `Date` is to be converted to `Date_Time`, it will be set at
midnight of the default system timezone.
- For a `Mixed` column being converted into a specific type, each row is
converted individually.

If the target type cannot fit some of the values (for example due to too
small range), a `Conversion_Failure` may be reported according to the
Expand All @@ -1628,6 +1630,12 @@ type Table
If the backend does not support the requested target type, the closest
supported type is chosen and a `Inexact_Type_Coercion` problem is
reported.

! Casting Text values

The `parse` method should be used to convert text values into other
types. Due to this, a Mixed column containing values `[2, "3"]` will
actually be converted into `[2, Nothing]` when casting to Integer type.
@columns Widget_Helpers.make_column_name_vector_selector
cast : Vector (Text | Integer | Column_Selector) | Text | Integer -> Value_Type -> Boolean -> Problem_Behavior -> Table ! Illegal_Argument | Inexact_Type_Coercion | Conversion_Failure
cast self columns=[0] value_type error_on_missing_columns=True on_problems=Problem_Behavior.Report_Warning =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1320,6 +1320,8 @@ type Column
resulting value will be truncated to the desired type.
- If a `Date` is to be converted to `Date_Time`, it will be set at
midnight of the default system timezone.
- For a `Mixed` column being converted into a specific type, each row is
converted individually.

If the target type cannot fit some of the values (for example due to too
small range), a `Conversion_Failure` may be reported according to the
Expand All @@ -1331,6 +1333,12 @@ type Column
If the backend does not support the requested target type, the closest
supported type is chosen and a `Inexact_Type_Coercion` problem is
reported.

! Casting Text values

The `parse` method should be used to convert text values into other
types. Due to this, a Mixed column containing values `[2, "3"]` will
actually be converted into `[2, Nothing]` when casting to Integer type.
cast : Value_Type -> Problem_Behavior -> Column ! Illegal_Argument | Inexact_Type_Coercion | Conversion_Failure
cast self value_type on_problems=Problem_Behavior.Report_Warning =
Cast_Helpers.check_cast_compatibility self.value_type value_type <|
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -914,6 +914,8 @@ type Table
resulting value will be truncated to the desired type.
- If a `Date` is to be converted to `Date_Time`, it will be set at
midnight of the default system timezone.
- For a `Mixed` column being converted into a specific type, each row is
converted individually.

If the target type cannot fit some of the values (for example due to too
small range), a `Conversion_Failure` may be reported according to the
Expand All @@ -925,6 +927,12 @@ type Table
If the backend does not support the requested target type, the closest
supported type is chosen and a `Inexact_Type_Coercion` problem is
reported.

! Casting Text values

The `parse` method should be used to convert text values into other
types. Due to this, a Mixed column containing values `[2, "3"]` will
actually be converted into `[2, Nothing]` when casting to Integer type.
@columns Widget_Helpers.make_column_name_vector_selector
cast : Vector (Text | Integer | Column_Selector) | Text | Integer -> Value_Type -> Boolean -> Problem_Behavior -> Table ! Illegal_Argument | Inexact_Type_Coercion | Conversion_Failure
cast self columns=[0] value_type error_on_missing_columns=True on_problems=Problem_Behavior.Report_Warning =
Expand Down
30 changes: 24 additions & 6 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Errors.enso
Original file line number Diff line number Diff line change
Expand Up @@ -473,11 +473,11 @@ type No_Common_Type
to_display_text : Text
to_display_text self =
types = self.types.map .to_display_text . join ", "
prefix = "No common type could have been found for the types: "+types
prefix = "No common type was found for types: "+types
infix = case self.related_column_name of
column_name : Text -> " when unifying the column ["+column_name+"]."
column_name : Text -> " when unifying column ["+column_name+"]."
_ -> "."
suffix = "If you want to allow mixed types, please retype the columns to the `Mixed` before the concatenation (note however that most Database backends do not support `Mixed` types, so it may work only for the in-memory backend)."
suffix = " If you want to allow mixed types, please cast one of the columns to `Mixed` beforehand."
prefix + infix + suffix

type Unmatched_Columns
Expand Down Expand Up @@ -558,7 +558,11 @@ type Conversion_Failure

This may occur for example when a number does not fit the range of the
target type.
Error (target_type : Value_Type) (related_column : Text) (affected_rows_count : Nothing|Integer)
Error (target_type : Value_Type) (related_column : Text) (affected_rows_count : Nothing|Integer) (example_values : Vector Any)

## Indicates that for some values, their text representation is too long for
the target text type.
Text_Too_Long (target_type : Value_Type) (related_column : Text) (affected_rows_count : Nothing|Integer) (example_values : Vector Text)

## PRIVATE

Expand All @@ -567,8 +571,22 @@ type Conversion_Failure
to_display_text self =
rows_info = case self.affected_rows_count of
Nothing -> "Some values"
count -> count.to_text+" rows"
rows_info + " could not be converted into the target type "+self.target_type.to_display_text+" when converting the column ["+self.related_column+"]."
count -> case self.example_values.is_empty of
True -> count.to_text+" rows"
False ->
# We first `pretty` to wrap texts in quotes and avoid special characters, but then also `to_display_text` to limit the result length.
examples = self.example_values.map (t-> t.pretty.to_display_text) . join ", "
remaining_count = count - self.example_values.length
additional = if remaining_count <= 0 then "" else
cases = if remaining_count == 1 then "case" else "cases"
" and "+remaining_count.to_text+" other "+cases
"["+examples+additional+"]"

case self of
Conversion_Failure.Error _ _ _ _ ->
rows_info + " could not be converted into the target type "+self.target_type.to_display_text+" when converting the column ["+self.related_column+"]."
Conversion_Failure.Text_Too_Long _ _ _ _ ->
rows_info + " have a text representation that does not fit the target type "+self.target_type.to_display_text+" when converting the column ["+self.related_column+"]."

type Invalid_Value_For_Type
## PRIVATE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ import project.Data.Type.Value_Type.Value_Type
import project.Internal.Parse_Values_Helper
from project.Errors import Conversion_Failure

polyglot java import org.enso.table.data.column.operation.CastProblemBuilder
polyglot java import org.enso.table.data.column.operation.cast.CastProblemBuilder

## PRIVATE
Checks if one type can be cast into another and returns a dataflow error
explaining the situation if not.
check_cast_compatibility source_type target_type ~action =
are_compatible = if (target_type == Value_Type.Mixed) || target_type.is_text || (source_type == target_type) then True else
are_compatible = if (target_type == Value_Type.Mixed) || (source_type == Value_Type.Mixed) || target_type.is_text || (source_type == target_type) then True else
if source_type.is_text && is_a_valid_parse_target target_type then Error.throw (Illegal_Argument.Error "To parse a text column into "+target_type.to_display_text+" type, `parse` should be used instead of `cast`.") else
if source_type == Value_Type.Boolean then target_type.is_numeric else
if source_type.is_numeric then target_type.is_numeric else
Expand Down Expand Up @@ -46,9 +46,15 @@ type Cast_Problem_Builder
builder = Vector.new_builder
java_instance = self.to_java

lossy_conversion_rows = java_instance.getLossyConversionRowCount
lossy_conversion_rows = java_instance.getFailedConversionsCount
if lossy_conversion_rows > 0 then
builder.append (Conversion_Failure.Error self.target_type self.column_name lossy_conversion_rows)
example_values = Vector.from_polyglot_array java_instance.getFailedConversionExamples
builder.append (Conversion_Failure.Error self.target_type self.column_name lossy_conversion_rows example_values)

text_too_long_rows = java_instance.getTextTooLongCount
if text_too_long_rows > 0 then
example_values = Vector.from_polyglot_array java_instance.getTextTooLongExamples
builder.append (Conversion_Failure.Text_Too_Long self.target_type self.column_name text_too_long_rows example_values)

builder.to_vector

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
package org.enso.table.data.column.builder.object;

import org.enso.base.polyglot.NumericConverter;
import org.enso.table.data.column.operation.cast.ToFloatStorageConverter;
import org.enso.table.data.column.storage.BoolStorage;
import org.enso.table.data.column.storage.DoubleStorage;
import org.enso.table.data.column.storage.LongStorage;
import org.enso.table.data.column.storage.Storage;
import org.enso.table.data.column.storage.type.BooleanType;
import org.enso.table.data.column.storage.type.FloatType;
import org.enso.table.data.column.storage.type.IntegerType;
import org.enso.table.data.column.storage.type.StorageType;
import org.enso.table.util.BitSets;

import java.util.BitSet;
import java.util.Objects;

/**
* A builder for floating point columns.
*/
public class DoubleBuilder extends NumericBuilder {
DoubleBuilder(BitSet isMissing, long[] data, int currentSize) {
super(isMissing, data, currentSize);
}

@Override
public void writeTo(Object[] items) {
for (int i = 0; i < currentSize; i++) {
if (isMissing.get(i)) {
items[i] = null;
} else {
items[i] = Double.longBitsToDouble(data[i]);
}
}
}

@Override
public boolean canRetypeTo(StorageType type) {
return false;
}

@Override
public TypedBuilder retypeTo(StorageType type) {
throw new UnsupportedOperationException();
}

@Override
public StorageType getType() {
return FloatType.FLOAT_64;
}

@Override
public void appendNoGrow(Object o) {
if (o == null) {
isMissing.set(currentSize++);
} else {
double value = NumericConverter.coerceToDouble(o);
data[currentSize++] = Double.doubleToRawLongBits(value);
}
}

@Override
public boolean accepts(Object o) {
return NumericConverter.isCoercibleToDouble(o);
}

@Override
public void appendBulkStorage(Storage<?> storage) {
if (Objects.equals(storage.getType(), FloatType.FLOAT_64)) {
if (storage instanceof DoubleStorage doubleStorage) {
int n = doubleStorage.size();
ensureFreeSpaceFor(n);
System.arraycopy(doubleStorage.getRawData(), 0, data, currentSize, n);
BitSets.copy(doubleStorage.getIsMissing(), isMissing, currentSize, n);
currentSize += n;
} else {
throw new IllegalStateException(
"Unexpected storage implementation for type DOUBLE: "
+ storage
+ ". This is a bug in the Table library.");
}
} else if (Objects.equals(storage.getType(), IntegerType.INT_64)) {
if (storage instanceof LongStorage longStorage) {
int n = longStorage.size();
BitSets.copy(longStorage.getIsMissing(), isMissing, currentSize, n);
for (int i = 0; i < n; i++) {
data[currentSize++] = Double.doubleToRawLongBits((double) longStorage.getItem(i));
}
} else {
throw new IllegalStateException(
"Unexpected storage implementation for type LONG: "
+ storage
+ ". This is a bug in the Table library.");
}
} else if (Objects.equals(storage.getType(), BooleanType.INSTANCE)) {
if (storage instanceof BoolStorage boolStorage) {
int n = boolStorage.size();
for (int i = 0; i < n; i++) {
if (boolStorage.isNa(i)) {
isMissing.set(currentSize++);
} else {
double x = ToFloatStorageConverter.booleanAsDouble(boolStorage.getItem(i));
data[currentSize++] = Double.doubleToRawLongBits(x);
}
}
} else {
throw new IllegalStateException(
"Unexpected storage implementation for type BOOLEAN: "
+ storage
+ ". This is a bug in the Table library.");
}
} else {
throw new StorageTypeMismatch(getType(), storage.getType());
}
}

/**
* Append a new double to this builder.
*
* @param data the double to append
*/
public void appendDouble(double data) {
if (currentSize >= this.data.length) {
grow();
}
appendRawNoGrow(Double.doubleToRawLongBits(data));
}

@Override
public Storage<Double> seal() {
return new DoubleStorage(data, currentSize, isMissing);
}
}
Loading