Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: suggest similar columns if column gets accessed that doesnt exist #385

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
be301e2
basic levenshtein method `get_similar_columns` and test
robmeth Jun 23, 2023
13fea9d
test
robmeth Jun 23, 2023
f4324ea
test
robmeth Jun 23, 2023
924ea46
gives list of similar columns
robmeth Jun 23, 2023
859cdc3
add get_similar_columns to other methods, when unknownColumnNameError…
robmeth Jun 23, 2023
f1f9587
updated warning message to also tell the name of the column, that's n…
robmeth Jun 23, 2023
d6fb58e
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
robmeth Jun 23, 2023
e46cecf
adjusted indentation of docstring and changed mode to imperative to a…
jxnior01 Jun 23, 2023
624490e
imported levenshtein differently
jxnior01 Jun 23, 2023
7fdf2a1
changed levenshtein method to `jaro_winkler`
robmeth Jun 23, 2023
df8ab48
added type annotation
robmeth Jun 23, 2023
b282536
installed levenshtein in poetry
jxnior01 Jun 23, 2023
093df64
adjusted docstring to make linter smile
jxnior01 Jun 23, 2023
098b355
style: apply automated linter fixes
megalinter-bot Jun 23, 2023
9aa1ffe
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
robmeth Jun 23, 2023
b705f6e
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
robmeth Jun 23, 2023
fa1bc47
is the linter overloaded?
jxnior01 Jun 23, 2023
e8aa133
Merge remote-tracking branch 'origin/203-suggest-similar-columns-if-c…
jxnior01 Jun 23, 2023
f33851e
Merge branch '203-suggest-similar-columns-if-column-gets-accessed-tha…
robmeth Jun 23, 2023
84d67e1
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
robmeth Jun 23, 2023
9dc5844
made method private, returns list of column names, removed warning, a…
jxnior01 Jun 30, 2023
ad8a433
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
jxnior01 Jun 30, 2023
0954e6a
levenshtein already present in pyproject.toml
jxnior01 Jun 30, 2023
e7a03c4
Merge remote-tracking branch 'origin/203-suggest-similar-columns-if-c…
jxnior01 Jun 30, 2023
a303b16
Merge branch '203-suggest-similar-columns-if-column-gets-accessed-tha…
robmeth Jun 30, 2023
87a1f3c
update usecases of `get_similar_columns` and update UnknownColumnsNam…
robmeth Jun 30, 2023
be5bace
fix UnknownColumnNameError
robmeth Jun 30, 2023
de93962
fix UnknownColumnNameError
robmeth Jun 30, 2023
370a947
fix UnknownColumnNameError
robmeth Jun 30, 2023
c8e4479
fix UnknownColumnNameError
robmeth Jun 30, 2023
300c774
fix UnknownColumnNameError?
robmeth Jun 30, 2023
d85e241
fix UnknownColumnNameError?
robmeth Jun 30, 2023
3969452
fix UnknownColumnNameError?
robmeth Jun 30, 2023
1a21a8a
fix UnknownColumnNameError?
robmeth Jun 30, 2023
ca187ef
fix UnknownColumnNameError?
robmeth Jun 30, 2023
af42744
fix UnknownColumnNameError!
robmeth Jun 30, 2023
ccbc7e6
make linter happy
robmeth Jun 30, 2023
be40d02
make linter happy
robmeth Jun 30, 2023
45266a9
make linter happy
robmeth Jun 30, 2023
0ec7298
make linter happy
robmeth Jun 30, 2023
5e51431
make linter happy
robmeth Jun 30, 2023
ec7a2b2
style: apply automated linter fixes
megalinter-bot Jun 30, 2023
23f1ebd
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
robmeth Jun 30, 2023
96a11aa
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
robmeth Jun 30, 2023
26d19fb
Merge branch '203-suggest-similar-columns-if-column-gets-accessed-tha…
robmeth Jul 7, 2023
24df909
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
robmeth Jul 7, 2023
aae28da
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
robmeth Jul 7, 2023
23d6ee6
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
robmeth Jul 7, 2023
c42a178
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
robmeth Jul 7, 2023
89aa44a
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
robmeth Jul 7, 2023
98f7d6b
Merge branch '203-suggest-similar-columns-if-column-gets-accessed-tha…
robmeth Jul 7, 2023
16778d9
requested changes
robmeth Jul 7, 2023
e57c7df
fix: Corrected the exception for `UnknownColumnNameError` to work wit…
Marsmaennchen221 Jul 7, 2023
4fb7383
linter
robmeth Jul 7, 2023
065da3b
linter
robmeth Jul 7, 2023
a191d61
linter
robmeth Jul 7, 2023
d726aea
Merge branch '203-suggest-similar-columns-if-column-gets-accessed-tha…
robmeth Jul 7, 2023
f34e266
fix: Corrected the exception for UnknownColumnNameError to work with …
robmeth Jul 7, 2023
2b1558a
linter
robmeth Jul 7, 2023
c91b0ae
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
robmeth Jul 7, 2023
f3374cb
style: apply automated linter fixes
megalinter-bot Jul 7, 2023
8e12921
style: apply automated linter fixes
megalinter-bot Jul 7, 2023
8d3dd2c
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
robmeth Jul 12, 2023
2afa433
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
robmeth Jul 12, 2023
b794cad
added test with completely different column name
robmeth Jul 12, 2023
1049dcc
Merge branch '203-suggest-similar-columns-if-column-gets-accessed-tha…
robmeth Jul 12, 2023
885859f
added the dynamically increase of the threshold for when it's more th…
robmeth Jul 12, 2023
e7f83ee
style: apply automated linter fixes
megalinter-bot Jul 12, 2023
01a9ec2
Update src/safeds/data/tabular/containers/_table.py
jxnior01 Jul 13, 2023
1c72f7d
Update src/safeds/data/tabular/containers/_table.py
jxnior01 Jul 13, 2023
60ccac2
Update src/safeds/data/tabular/containers/_table.py
jxnior01 Jul 13, 2023
ebc1261
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
jxnior01 Jul 13, 2023
ac8ccb0
more test cases, modified error message
jxnior01 Jul 13, 2023
6419940
Update dependencies
zzril Jul 13, 2023
29f714f
accepted merge from main, but run wasn't successful, trying poetry ad…
jxnior01 Jul 13, 2023
ea5ea5d
Merge remote-tracking branch 'origin/203-suggest-similar-columns-if-c…
jxnior01 Jul 13, 2023
d8dc7f3
Update tests/safeds/data/tabular/containers/_table/test_get_similar_c…
jxnior01 Jul 13, 2023
3eb4f31
style: apply automated linter fixes
megalinter-bot Jul 13, 2023
fef4550
Update src/safeds/exceptions/_data.py
jxnior01 Jul 13, 2023
7e73ba7
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
Marsmaennchen221 Jul 13, 2023
9bd0df0
passed UnknownColumnNameError an empty list in constructor
jxnior01 Jul 13, 2023
649b70f
last commit :)
jxnior01 Jul 13, 2023
5061634
style: apply automated linter fixes
megalinter-bot Jul 13, 2023
53f598b
style: apply automated linter fixes
megalinter-bot Jul 13, 2023
73c9508
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
jxnior01 Jul 13, 2023
dd252a0
Revert poetry.lock
zzril Jul 13, 2023
d24b0b5
Add testcases
zzril Jul 13, 2023
ab37198
style: apply automated linter fixes
megalinter-bot Jul 13, 2023
fdd47ca
Merge branch 'main' into 203-suggest-similar-columns-if-column-gets-a…
Marsmaennchen221 Jul 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
222 changes: 221 additions & 1 deletion poetry.lock

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ scikit-learn = "^1.2.0"
seaborn = "^0.12.2"
openpyxl = "^3.1.2"
scikit-image = "^0.21.0"
levenshtein = "^0.21.1"

[tool.poetry.group.dev.dependencies]
pytest = "^7.2.1"
Expand Down
57 changes: 51 additions & 6 deletions src/safeds/data/tabular/containers/_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from pathlib import Path
from typing import TYPE_CHECKING, Any, TypeVar

import Levenshtein
import matplotlib.pyplot as plt
import numpy as np
import openpyxl
Expand Down Expand Up @@ -597,7 +598,8 @@ def get_column(self, column_name: str) -> Column:
Column('b', [2])
"""
if not self.has_column(column_name):
raise UnknownColumnNameError([column_name])
similar_columns = self._get_similar_columns(column_name)
raise UnknownColumnNameError([column_name], similar_columns)

return Column._from_pandas_series(
self._data[column_name],
Expand Down Expand Up @@ -695,6 +697,34 @@ def get_row(self, index: int) -> Row:

return Row._from_pandas_dataframe(self._data.iloc[[index]], self._schema)

def _get_similar_columns(self, column_name: str) -> list[str]:
"""
Get all the column names in a Table that are similar to a given name.

Parameters
----------
column_name : str
The name to compare the Table's column names to.

Returns
-------
similar_columns: list[str]
A list of all column names in the Table that are similar or equal to the given column name.
"""
similar_columns = []
similarity = 0.6
i = 0
while i < len(self.column_names):
if Levenshtein.jaro_winkler(self.column_names[i], column_name) >= similarity:
similar_columns.append(self.column_names[i])
i += 1
if len(similar_columns) == 4 and similarity < 0.9:
similarity += 0.1
similar_columns = []
i = 0

return similar_columns

# ------------------------------------------------------------------------------------------------------------------
# Information
# ------------------------------------------------------------------------------------------------------------------
Expand Down Expand Up @@ -1106,11 +1136,13 @@ def keep_only_columns(self, column_names: list[str]) -> Table:
1 4
"""
invalid_columns = []
similar_columns: list[str] = []
for name in column_names:
if not self._schema.has_column(name):
similar_columns = similar_columns + self._get_similar_columns(name)
invalid_columns.append(name)
if len(invalid_columns) != 0:
raise UnknownColumnNameError(invalid_columns)
raise UnknownColumnNameError(invalid_columns, similar_columns)

clone = self._copy()
clone = clone.remove_columns(list(set(self.column_names) - set(column_names)))
Expand Down Expand Up @@ -1151,11 +1183,13 @@ def remove_columns(self, column_names: list[str]) -> Table:
1 3
"""
invalid_columns = []
similar_columns: list[str] = []
for name in column_names:
if not self._schema.has_column(name):
similar_columns = similar_columns + self._get_similar_columns(name)
invalid_columns.append(name)
if len(invalid_columns) != 0:
raise UnknownColumnNameError(invalid_columns)
raise UnknownColumnNameError(invalid_columns, similar_columns)

transformed_data = self._data.drop(labels=column_names, axis="columns")
transformed_data.columns = [name for name in self._schema.column_names if name not in column_names]
Expand Down Expand Up @@ -1349,7 +1383,8 @@ def rename_column(self, old_name: str, new_name: str) -> Table:
0 1 2
"""
if old_name not in self._schema.column_names:
raise UnknownColumnNameError([old_name])
similar_columns = self._get_similar_columns(old_name)
raise UnknownColumnNameError([old_name], similar_columns)
if old_name == new_name:
return self
if new_name in self._schema.column_names:
Expand Down Expand Up @@ -1401,7 +1436,8 @@ def replace_column(self, old_column_name: str, new_columns: list[Column]) -> Tab
0 1 3
"""
if old_column_name not in self._schema.column_names:
raise UnknownColumnNameError([old_column_name])
similar_columns = self._get_similar_columns(old_column_name)
raise UnknownColumnNameError([old_column_name], similar_columns)

columns = list[Column]()
for old_column in self.column_names:
Expand Down Expand Up @@ -1705,7 +1741,8 @@ def transform_column(self, name: str, transformer: Callable[[Row], Any]) -> Tabl
items: list = [transformer(item) for item in self.to_rows()]
result: list[Column] = [Column(name, items)]
return self.replace_column(name, result)
raise UnknownColumnNameError([name])
similar_columns = self._get_similar_columns(name)
raise UnknownColumnNameError([name], similar_columns)

def transform_table(self, transformer: TableTransformer) -> Table:
"""
Expand Down Expand Up @@ -1881,9 +1918,13 @@ def plot_lineplot(self, x_column_name: str, y_column_name: str) -> Image:
>>> image = table.plot_lineplot("temperature", "sales")
"""
if not self.has_column(x_column_name) or not self.has_column(y_column_name):
similar_columns_x = self._get_similar_columns(x_column_name)
similar_columns_y = self._get_similar_columns(y_column_name)
raise UnknownColumnNameError(
([x_column_name] if not self.has_column(x_column_name) else [])
+ ([y_column_name] if not self.has_column(y_column_name) else []),
(similar_columns_x if not self.has_column(x_column_name) else [])
+ (similar_columns_y if not self.has_column(y_column_name) else []),
)

fig = plt.figure()
Expand Down Expand Up @@ -1935,9 +1976,13 @@ def plot_scatterplot(self, x_column_name: str, y_column_name: str) -> Image:
>>> image = table.plot_scatterplot("temperature", "sales")
"""
if not self.has_column(x_column_name) or not self.has_column(y_column_name):
similar_columns_x = self._get_similar_columns(x_column_name)
similar_columns_y = self._get_similar_columns(y_column_name)
raise UnknownColumnNameError(
([x_column_name] if not self.has_column(x_column_name) else [])
+ ([y_column_name] if not self.has_column(y_column_name) else []),
(similar_columns_x if not self.has_column(x_column_name) else [])
+ (similar_columns_y if not self.has_column(y_column_name) else []),
)

fig = plt.figure()
Expand Down
15 changes: 13 additions & 2 deletions src/safeds/exceptions/_data.py
robmeth marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,19 @@ class UnknownColumnNameError(KeyError):
The name of the column that was tried to be accessed.
"""

def __init__(self, column_names: list[str]):
super().__init__(f"Could not find column(s) '{', '.join(column_names)}'")
def __init__(self, column_names: list[str], similar_columns: list[str] | None = None):
class _UnknownColumnNameErrorMessage(
str,
): # This class is necessary for the newline character in a KeyError exception. See https://stackoverflow.com/a/70114007
def __repr__(self) -> str:
return str(self)

error_message = f"Could not find column(s) '{', '.join(column_names)}'."

if similar_columns is not None and len(similar_columns) > 0:
error_message += f"\nDid you mean '{similar_columns}'?"

super().__init__(_UnknownColumnNameErrorMessage(error_message))


class NonNumericColumnError(Exception):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import pytest
from safeds.data.tabular.containers import Table
from safeds.exceptions._data import UnknownColumnNameError


@pytest.mark.parametrize(
("table", "column_name", "expected"),
[
(Table({"column1": ["col1_1"], "x": ["y"], "cilumn2": ["cil2_1"]}), "col1", ["column1"]),
(
Table(
{
"column1": ["col1_1"],
"col2": ["col2_1"],
"col3": ["col2_1"],
"col4": ["col2_1"],
"cilumn2": ["cil2_1"],
},
),
"clumn1",
["column1", "cilumn2"],
),
(
Table({"column1": ["a"], "column2": ["b"], "column3": ["c"]}),
"notexisting",
[],
),
(
Table({"column1": ["col1_1"], "x": ["y"], "cilumn2": ["cil2_1"]}),
"x",
["x"],
),
(Table({}), "column1", []),
],
ids=["one similar", "two similar/ dynamic increase", "no similar", "exact match", "empty table"],
)
def test_should_get_similar_column_names(table: Table, column_name: str, expected: list[str]) -> None:
assert table._get_similar_columns(column_name) == expected


def test_should_raise_error_if_column_name_unknown() -> None:
with pytest.raises(
UnknownColumnNameError,
match=r"Could not find column\(s\) 'col3'.\nDid you mean '\['col1', 'col2'\]'?",
):
raise UnknownColumnNameError(["col3"], ["col1", "col2"])
42 changes: 42 additions & 0 deletions tests/safeds/exceptions/test_unknown_column_name_error.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
import pytest
from safeds.exceptions import UnknownColumnNameError


@pytest.mark.parametrize(
("column_names", "similar_columns", "expected_error_message"),
[
(["column1"], [], r"Could not find column\(s\) 'column1'\."),
(["column1", "column2"], [], r"Could not find column\(s\) 'column1, column2'\."),
(["column1"], ["column_a"], r"Could not find column\(s\) 'column1'\.\nDid you mean '\['column_a'\]'\?"),
(
["column1", "column2"],
["column_a"],
r"Could not find column\(s\) 'column1, column2'\.\nDid you mean '\['column_a'\]'\?",
),
(
["column1"],
["column_a", "column_b"],
r"Could not find column\(s\) 'column1'\.\nDid you mean '\['column_a', 'column_b'\]'\?",
),
(
["column1", "column2"],
["column_a", "column_b"],
r"Could not find column\(s\) 'column1, column2'\.\nDid you mean '\['column_a', 'column_b'\]'\?",
),
],
ids=[
"one_unknown_no_suggestions",
"two_unknown_no_suggestions",
"one_unknown_one_suggestion",
"two_unknown_one_suggestion",
"one_unknown_two_suggestions",
"two_unknown_two_suggestions",
],
)
def test_empty_similar_columns(
column_names: list[str],
similar_columns: list[str],
expected_error_message: str,
) -> None:
with pytest.raises(UnknownColumnNameError, match=expected_error_message):
raise UnknownColumnNameError(column_names, similar_columns)