Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Search module and create_index command in python #868

Closed
wants to merge 1 commit into from

Conversation

shohamazon
Copy link
Collaborator

@shohamazon shohamazon commented Jan 28, 2024

Description of changes:

Adding the FT.CREATE command of RediSearch in python.

@shohamazon shohamazon requested a review from a team as a code owner January 28, 2024 11:16
@shohamazon shohamazon force-pushed the search branch 3 times, most recently from 583bbdf to 800300a Compare January 28, 2024 12:30
@shohamazon shohamazon requested a review from barshaul January 28, 2024 12:41
@shohamazon shohamazon added the python Python wrapper label Jan 28, 2024
@shohamazon shohamazon force-pushed the search branch 2 times, most recently from 10e0e5a to 3215278 Compare January 28, 2024 15:47
Copy link
Collaborator

@barshaul barshaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll continue tomorrow

schema (List[Field]): The schema of the index, specifying the fields and their types.
max_text_fields (bool): Forces RediSearch to encode indexes as if there were more than 32 text attributes.
Additional attributes (beyond 32) can be added using FT.ALTER.
temporary_seconds (Optional[int]): Creates a temporary index that automatically expires after a set period of inactivity, measured in seconds.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the description of this filed from redis site

args = super().get_field_args()
args.append("TEXT")
if not self.sortable and self.no_index:
raise ValueError("Cannot be both non-storable and non-indexed.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not doing it in the object init function?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also you can make it more informative - "no_index can only be set when sortable is provided" (something like that)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking inside get_field_args is better since user can do something like this:

index = Index(name="name" ,sortable=Sortable.SORTABLE , no_idex=false)
index.no_index = true
...

and we want to let user change some attributes because we want to make the object reusable, so the best thing to do is to check before executing the command.

@shohamazon shohamazon force-pushed the search branch 2 times, most recently from 156f1fc to c846677 Compare January 30, 2024 20:02
temporary_seconds (Optional[int]): Creates a lightweight temporary index that automatically expires after a set period of inactivity, measured in seconds.
The index's internal idle timer resets each time a search or addition operation occurs.
Because such indexes are lightweight, you can create thousands of such indexes without negative performance implications and, therefore,
you should consider using SKIPINITIALSCAN to avoid costly scanning.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what SKIPINITIALSCAN means? how to I set it in the client?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its the last argument in this function -> initial_scan , can be set with InitialScan.SKIP_SCAN.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand but the description should say that

no_stem (bool): Indicate whether to disable stemming when indexing field values.
no_index (bool): Indicate whether the field should be excluded from indexing.
phonetic (Optional[PhoneticType]): Activate phonetic matching for text searches, specifies the phonetic algorithm and language used (e.g., English: dm:en).
weight (Optional[float]): Specify attribute importance of this attribute when calculating result accuracy.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrite the description, use the one from redis's

"""


class Field:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it right the Field class shouldn't be used with create_index, only the extended versions of it.
If so, please make it an abstract class, so the get_field_args would be an abstract function and instead have another function implemented only in the base class get_filed_identifier which will return the filed name and alias

Copy link
Collaborator

@barshaul barshaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got until GeoField. will continue tomorrow.

@shohamazon shohamazon force-pushed the search branch 2 times, most recently from 01d6b1a to 53899f2 Compare January 31, 2024 16:33
Copy link
Contributor

@shachlanAmazon shachlanAmazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shohamazon very impressive work! I can see you've put a lot of thought into making this interface accessible to users.
approving, but @barshaul has the final word on this.

assert "search" in res["module"]

@pytest.mark.parametrize("cluster_mode", [True, False])
async def test_json_module_loadded(self, redis_client: TRedisClient):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you testing the JSON module in the search module tests?
also - loadded -> loaded

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to create json indexes, json module needs to be loaded, maybe it would be better to combine both tests into a single test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should combine the modules tests, and I think that this test should belong to json tests only. You can have an assertion in the json indexes tests to verify that json module was loaded

GeoField(name="location"),
]

assert (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of these tests only test that the call returns OK. Is this really the most meaningful test available?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, when I add FT.SEARCH the tests will be more meaningful

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also add tests for edge cases - passing an empty string for prefix/filter, combining mutually exclusive args, passing nulls etc.. make sure we raise appropriate exceptions

@shohamazon
Copy link
Collaborator Author

@shohamazon very impressive work! I can see you've put a lot of thought into making this interface accessible to users. approving, but @barshaul has the final word on this.

Thank you 🙂 :)


class Field(ABC):
"""
Base class for defining fields in a schema.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstract base class

args.extend(self.sortable.value.split(" "))
if self.no_index:
args.append("NOINDEX")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove newline


Args:
index (str): The name of the index.
index_type (IndexType): The type of the index, supports HASH (default) and JSON. To index JSON,
Copy link
Collaborator

@barshaul barshaul Feb 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't document what options IndexType has, as it might changed and then we will need to maintain it here too. but you can say it is default to HASH

prefix (Optional[List[str]]): The prefix for the index, tells the index which keys it should index. Multiple prefixes can be added.
If not set, the default is '*' (all keys).
filter (Optional[str]): A filter expression. It is possible to use "@__key" to access the key that was just added/changed.
language (Optional[str]): The default language for documents in the index. Default is English.
Copy link
Collaborator

@barshaul barshaul Feb 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "Default is English. "-> this is a server configuration that might be changed by redis, and by default we're passing null - not specific value

List[str]: A list of index attributes.
"""
args = [self.name]
if self.index_type:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can it be false? it has a default value

args = [self.name]
if self.index_type:
args.extend(["ON", self.index_type.value])
if self.prefix and len(self.prefix) > 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please raise a value error if len(self.prefix) == 0

Copy link
Collaborator

@barshaul barshaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shohamazon Finally, I finished reviewing all. Please make the required fixes and let me know when it's ready for another round.

@shohamazon shohamazon force-pushed the search branch 3 times, most recently from f72df1b to 76f55be Compare February 6, 2024 10:40
@shohamazon
Copy link
Collaborator Author

@barshaul ready

) -> TOK:
"""
Creates a new search index with the specified configuration.
See https://redis.io/commands/ft.create/ for more details.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
See https://redis.io/commands/ft.create/ for more details.
See https://redis.io/commands/ft.create/ for more details.

Comment on lines +28 to +35
max_text_fields: bool = False,
temporary_seconds: Optional[int] = None,
offset: Offset = Offset.STORE_OFFSET,
highlight: Highlights = Highlights.USE_HIGHLIGHTS,
field_flag: FieldFlag = FieldFlag.USE_FIELDS,
frequencies: Frequencies = Frequencies.SAVE_FREQUENCIES,
stop_words: Optional[List[str]] = None,
initial_scan: InitialScan = InitialScan.SCAN_INDEX,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider creating CreateIndexOptions with these parameters inside

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you can do options.toArgs() below

sortable (Optional[SortableOption]): If not None, sets the field as sortable with the specified sortable option.
no_stem (bool): Indicate whether to disable stemming when indexing field values.
no_index (bool): Indicate whether the field should be excluded from indexing.
phonetic (Optional[PhoneticType]): Activate phonetic matching for text searches, specifies the phonetic algorithm and language used (e.g., English: dm:en).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
phonetic (Optional[PhoneticType]): Activate phonetic matching for text searches, specifies the phonetic algorithm and language used (e.g., English: dm:en).
phonetic (Optional[PhoneticType]): Activate phonetic matching for text searches, specifies the phonetic algorithm and language used (e.g., `DM_ENGLISH`).

@shohamazon
Copy link
Collaborator Author

closing for now, will reopen when modules are supported

@shohamazon shohamazon closed this May 8, 2024
@shohamazon shohamazon deleted the search branch May 8, 2024 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Python wrapper
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants