-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Table.replace for the database backend #8986
Changes from 82 commits
756dfa9
58da2fd
f2cd94a
0501d05
fcab4e0
f2a58f6
5ed3fed
8caac00
74bc949
69753fa
4cf1309
aae2bd2
d458ed8
f924690
bf142c0
b97841c
b88633e
2a02eb0
2b7ad13
8b7c361
9a084d0
64d43f3
4acb007
df35418
5ac57ce
d01152d
69cdb47
76368e7
4ea9c22
a269166
b6cb8cc
940bfd3
bb7cb2f
6c1d530
a475905
f9aaf71
7ad9723
fe389f8
645805b
5450561
a6523c0
4dec328
d56bb09
947ebcf
8ad298d
e9c1a2c
324cbef
7a5854c
02a0d54
de5f91e
ea2876d
6fff368
80dbc76
8afbb74
e74b67d
4b34631
b241fbe
21b6b80
fd9212a
fa7c024
fbea5a9
0073e46
a5bf58a
5a1b6d0
ebf5087
cdddcc8
76ff386
22a76bb
0090a20
b99627c
0028687
5eaf502
5a772f0
2c47cfe
32452e8
9bce18d
bd3a300
4174e9b
ae66716
a1558b7
3aeb792
5c0a54a
6af697b
12a4c8e
88e022f
cd3fab6
046488c
2a7d8ae
fe72e3b
ea47095
7df7499
7cd0bb8
6a4ad4b
72f36c7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,6 +36,7 @@ import Standard.Table.Internal.Aggregate_Column_Helper | |
import Standard.Table.Internal.Column_Naming_Helper.Column_Naming_Helper | ||
import Standard.Table.Internal.Constant_Column.Constant_Column | ||
import Standard.Table.Internal.Problem_Builder.Problem_Builder | ||
import Standard.Table.Internal.Replace_Helpers | ||
import Standard.Table.Internal.Table_Helpers | ||
import Standard.Table.Internal.Table_Helpers.Table_Column_Helper | ||
import Standard.Table.Internal.Unique_Name_Strategy.Unique_Name_Strategy | ||
|
@@ -930,6 +931,74 @@ type Table | |
on_problems.attach_problems_before problems <| | ||
Warning.set result [] | ||
|
||
## PRIVATE | ||
A helper that creates a two-column table from a Map. | ||
|
||
The keys of the `Map` become the first column, with name | ||
`key_column_name`, and the values of the `Map` become the second column, | ||
with name `value_column_name`. | ||
|
||
For the in-memory database, the `Map` can be empty. For the database | ||
backends, it must not be empty. | ||
|
||
Arguments: | ||
- map: The `Map` to create the table from. | ||
- key_column_name: The name to use for the first column. | ||
- value_column_name: The name to use for the second column. | ||
make_table_from_map : Map Any Any -> Text -> Text -> Table | ||
make_table_from_map self map key_column_name value_column_name = | ||
total_size = map.size * 2 | ||
|
||
if map.is_empty then Error.throw (Illegal_Argument.Error "Map argument cannot be empty") else | ||
if total_size > MAX_LITERAL_ELEMENT_COUNT then Error.throw (Illegal_Argument.Error "Map argument is too large ("+map.size.to_text+" entries): materialize a table into the database instead") else | ||
keys_and_values = map.to_vector | ||
self.make_table_from_vectors [keys_and_values.map .first, keys_and_values.map .second] [key_column_name, value_column_name] | ||
|
||
## PRIVATE | ||
A helper that creates a literal table from `Vector`s. | ||
|
||
For the in-memory database, the columns can be empty. For the database | ||
backends, they must not be empty. | ||
|
||
Arguments: | ||
- column_vectors: A `Vector` of `Vector`s; each inner `Vector` becomes a | ||
column of the table. | ||
- column_names: The names of the columns of the new table. | ||
make_table_from_vectors : Vector (Vector Any) -> Vector Text -> Table | ||
make_table_from_vectors self column_vectors column_names = | ||
Runtime.assert (column_vectors.length == column_names.length) "column_vectors and column_names must have the same length" | ||
|
||
# Assume the columns are all the same length; if not, it will be an error anyway. | ||
total_size = if column_vectors.is_empty || column_vectors.at 0 . is_empty then 0 else | ||
column_vectors.length * (column_vectors.at 0 . length) | ||
|
||
if total_size == 0 then Error.throw (Illegal_Argument.Error "Vectors cannot be empty") else | ||
if total_size > MAX_LITERAL_ELEMENT_COUNT then Error.throw (Illegal_Argument.Error "Too many elements for table literal ("+total_size.to_text+"): materialize a table into the database instead") else | ||
type_mapping = self.connection.dialect.get_type_mapping | ||
|
||
values_to_type_ref column_vector = | ||
value_type = Value_Type_Helpers.find_common_type_for_arguments column_vector | ||
sql_type = case value_type of | ||
Nothing -> SQL_Type.null | ||
_ -> type_mapping.value_type_to_sql value_type Problem_Behavior.Ignore | ||
SQL_Type_Reference.from_constant sql_type | ||
|
||
literal_table_name = self.connection.base_connection.table_naming_helper.generate_random_table_name "enso-literal-" | ||
|
||
from_spec = From_Spec.Literal_Values column_vectors column_names literal_table_name | ||
context = Context.for_subquery from_spec | ||
|
||
internal_columns = 0.up_to column_vectors.length . map i-> | ||
column_vector = column_vectors.at i | ||
column_name = column_names.at i | ||
|
||
type_ref = values_to_type_ref column_vector.to_vector | ||
generated_literal_column_name = "column"+(i+1).to_text | ||
sql_expression = SQL_Expression.Column literal_table_name generated_literal_column_name | ||
Internal_Column.Value column_name type_ref sql_expression | ||
|
||
Table.Value literal_table_name self.connection internal_columns context | ||
|
||
## PRIVATE | ||
|
||
Create a constant column from a value. | ||
|
@@ -1398,7 +1467,7 @@ type Table | |
In the Database backend, there are no guarantees related to ordering of | ||
results. | ||
|
||
? Error Conditions | ||
! Error Conditions | ||
|
||
- If this table or the lookup table is lacking any of the columns | ||
specified in `key_columns`, a `Missing_Input_Columns` error is raised. | ||
|
@@ -1454,7 +1523,7 @@ type Table | |
In the Database backend, there are no guarantees related to ordering of | ||
results. | ||
|
||
? Error Conditions | ||
! Error Conditions | ||
|
||
- If this table or the lookup table is lacking any of the columns | ||
specified by `from_column`, `to_column`, or `column`, a | ||
|
@@ -1500,10 +1569,12 @@ type Table | |
# 1 | 20 | b | f | ||
# 2 | 30 | c | g | ||
# 3 | 40 | d | h | ||
replace : Table | Map -> (Text | Integer) -> (Text | Integer) -> (Text | Integer) -> Boolean -> Problem_Behavior -> Table ! Missing_Input_Columns | Non_Unique_Key | Unmatched_Rows_In_Lookup | ||
replace self lookup_table:(Table | Map) column:(Text | Integer) from_column:(Text | Integer)=0 to_column:(Text | Integer)=1 allow_unmatched_rows:Boolean=True on_problems:Problem_Behavior=Problem_Behavior.Report_Warning = | ||
_ = [lookup_table, column, from_column, to_column, allow_unmatched_rows, on_problems] | ||
Error.throw (Unsupported_Database_Operation.Error "Table.replace is not implemented yet for the Database backends.") | ||
@column Widget_Helpers.make_column_name_selector | ||
@from_column Widget_Helpers.make_column_name_selector | ||
@to_column Widget_Helpers.make_column_name_selector | ||
replace : Table | Map -> (Text | Integer) -> (Text | Integer | Nothing) -> (Text | Integer | Nothing) -> Boolean -> Problem_Behavior -> Table ! Missing_Input_Columns | Non_Unique_Key | Unmatched_Rows_In_Lookup | ||
replace self lookup_table:(Table | Map) column:(Text | Integer) from_column:(Text | Integer | Nothing)=Nothing to_column:(Text | Integer | Nothing)=Nothing allow_unmatched_rows:Boolean=True on_problems:Problem_Behavior=Problem_Behavior.Report_Warning = | ||
Replace_Helpers.replace self lookup_table column from_column to_column allow_unmatched_rows on_problems | ||
|
||
## ALIAS join by row position | ||
GROUP Standard.Base.Calculations | ||
|
@@ -2801,3 +2872,7 @@ Table.from (that:Materialized_Table) = | |
|
||
## PRIVATE | ||
Table_Ref.from (that:Table) = Table_Ref.Value that | ||
|
||
## PRIVATE | ||
The largest dataset that can be used to make a literal table, expressed in number of elements. | ||
MAX_LITERAL_ELEMENT_COUNT = 256 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps this should be bigger? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any DB limit? I guess it is limited by the query size... I wonder if such literal table will be able to use hashjoins or will it always fall back to a linear scan. I guess for <256 it doesn't matter much. For larger values the size of the SQL query may start being quite problematic. I think ideally we shouldn't cut off but instead we should be creating temporary tables - but definitely an improvement for a separate PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that was my concern. My first thought was to create a temporary table, but that will be happening repeatedly during each evaluation. I was able to create tables with ~18000 rows this way in both postgres and sqlite, and at that point I stopped testing. I'm more concerned about the size of the query being sent over the wire, but I don't really know at what point that becomes a problem. I figured that giving it a small would be a good first step. I think it covers the majority of use-cases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting that it worked for such large sizes. I guess the primary concern indeed is the query size, especially as a temp table is likely sent in a more compact binary compact. Anyway, seems all ok for now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 256 a reasonable literal default. Agee with @radeusgd should be a temp table when gets bigger! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will this result in a lot of temporary tables being created and not eagerly cleaned up? That was my concern when doing it the literal way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes indeed that is a concern. I think we should implement such a feature as a separate PR, as it has non-trivial complexity. Also, I think the concern of 'too many' temporary tables may also be increased due to the fact that currently the IDE seems to recompute unrelated nodes 'too often'. I think that if we resolve that, the amount of re-computations could be lower and thus the problem would not be as big. As for implementing these temporary tables efficiently, we can leverage a few tricks:
In fact we can merge the 2 approaches. If between runs the tables are the same we can avoid re-uploading them. Once all references to them are GCed we can clean them up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't these from the
lookup_table
so the selector is wrong?We need to have widgets derived from first argument for this I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the notation would be for this -- is there documentation for the @ clauses? Or, where is it implemented? I don't see an example of a widget attached to a value other than
self
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.