Initial service commit #1

simonj2 · 2022-08-02T00:06:48Z

What

Initial commit of the search service.

Sorry for the large commit, but this allowed me to scope out the proposal better.

Contains:

Product document definition
Bulk import script
API definition
API implementation (partly implemented)
Docker, Elasticvue configuration

stephanegigandet · 2022-08-02T12:50:01Z

README.md

+
+Make sure your environment is configured:
+```commandline
+export ELASTIC_PASSWORD=PASSWORD_HERE


Could you also suggest sensible defaults for the other variables? Or provide an env file maybe?
e.g. for MEM_LIMIT, STACK_VERSION, CLUSTER_NAME, ES_PORT etc.

2064 export STACK_VERSION=8.3.3
2065 export CLUSTER_NAME=elasticcluster
2066 export MEM_LIMIT=1g
2069 export ES_PORT=7777

seem to be enough to launch the docker containers

Absolutely - there was an .env file but it wasn't committed due to the .gitignore. I'll add this now.

Note, I found that 2GB of memory lead to better performance, so I'll put that in. Still works well with 1GB if we're memory constrained.

stephanegigandet · 2022-08-02T12:58:58Z

README.md

+
+To start server:
+```console
+uvicorn api:app --reload


Sorry I'm a bit lost as I know little docker and python unfortunately. Where do I need to run uvicorn? In one of the docker containers? (elasticvue?) Or in a Python virtual env outside of docker, where I should install everything listed in requirements.txt?

If you can put a bit more details in the Helpful commands for newbies like me, it would be very helpful indeed! :)

Our python and docker expert is @alexgarel , but he's on vacation until August 16th.

Absolutely! So, I've now structured this so that the search service runs in docker. I've also added commands for both using docker and using locally.

Note, as part of this I moved everything inside an app directory, so the PR change looks bigger than it is.

stephanegigandet · 2022-08-02T13:06:53Z

That looks awesome, thank you very much @simonj2 !

The CSV export is very partial, so at some point it will be better to switch to the MongoDB export.

There are few things that we'll to think about at some point. In particular taxonomized fields like the categories are not stored with searchable textual names. The list of categories in in categories_tags, and it's an array with entries like "en:coffee". There is a field "categories" which can be in any language and should not be used for search.

One solution could be to include values for one language (e.g. for a French product, if there's categories_tags: "en:coffee", we also include categories: "café". Then search would work for one language (not so good for products sold in countries with multiple languages, like Belgium or Canada).

For language specific fields like product_name, we also have entries like product_name_en, product_name_fr etc. (product_name contains a copy of the value for the main language of the product).

simonj2 · 2022-08-02T21:19:17Z

Thanks for the feedback @stephanegigandet ! I've incorporated the comments you had about the code.

The CSV export is very partial, so at some point it will be better to switch to the MongoDB export.

Makes sense - will look at this in the future.

There are few things that we'll to think about at some point. In particular taxonomized fields like the categories are not stored with searchable textual names. The list of categories in in categories_tags, and it's an array with entries like "en:coffee". There is a field "categories" which can be in any language and should not be used for search.

One solution could be to include values for one language (e.g. for a French product, if there's categories_tags: "en:coffee", we also include categories: "café". Then search would work for one language (not so good for products sold in countries with multiple languages, like Belgium or Canada).

For language specific fields like product_name, we also have entries like product_name_en, product_name_fr etc. (product_name contains a copy of the value for the main language of the product).

Interesting! I think this comes back to the schema discussion. Fields aren't discoverable, documented, or consistent (ie, certain times the language is embedded in the value en:coffee, sometimes it's in the field name: product_name_fr.

A unified schema for data representation could solve this, ie, every field could be stored as a map of countrycode->value:

  product_name: {
    en: “Quarter Pounder”,
    fr: “Royale with Cheese”
  }

This could then be returned to the clients directly, and a convenience API could be offered to return the correct value if a language parameter is provided. The same logic could apply for tags (filtering if a language parameter is provided).

I could write logic to do things like:

Pull the product_name_fr field into the product_name field in the response if a lang parameter is passed in with fr
Consolidate the tags at index time (ie as you suggest by using one language)

But, I think that's probably fragile - new fields will break this, we don't make our API consumers lives any easier by consolidating fields, etc.

Perhaps let's see how the schema work lands, then it could be incorporated here quite easily?

stephanegigandet

It looks very good to me, thank you!

Initial service commit

b86795c

simonj2 assigned alexgarel, teolemon and stephanegigandet Aug 2, 2022

stephanegigandet reviewed Aug 2, 2022

View reviewed changes

Docker for search service, rearchitecture

5675534

stephanegigandet approved these changes Aug 4, 2022

View reviewed changes

simonj2 merged commit b82776d into main Aug 4, 2022

teolemon mentioned this pull request Aug 15, 2022

ci: Deployment template #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial service commit #1

Initial service commit #1

simonj2 commented Aug 2, 2022

stephanegigandet Aug 2, 2022

stephanegigandet Aug 2, 2022

simonj2 Aug 2, 2022

stephanegigandet Aug 2, 2022

simonj2 Aug 2, 2022

stephanegigandet commented Aug 2, 2022

simonj2 commented Aug 2, 2022

stephanegigandet left a comment

Initial service commit #1

Initial service commit #1

Conversation

simonj2 commented Aug 2, 2022

What

stephanegigandet Aug 2, 2022

Choose a reason for hiding this comment

stephanegigandet Aug 2, 2022

Choose a reason for hiding this comment

simonj2 Aug 2, 2022

Choose a reason for hiding this comment

stephanegigandet Aug 2, 2022

Choose a reason for hiding this comment

simonj2 Aug 2, 2022

Choose a reason for hiding this comment

stephanegigandet commented Aug 2, 2022

simonj2 commented Aug 2, 2022

stephanegigandet left a comment

Choose a reason for hiding this comment