-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor connector #319
Refactor connector #319
Conversation
@jvillafanez great 👍 pls ping me again when close to merge so I can prepare the docs part. |
1762201
to
5096d00
Compare
7bcadcf
to
c109b15
Compare
Kudos, SonarCloud Quality Gate passed! 0 Bugs |
these features seem to be too good to not merge them... @hodyroff I was just cleaning up and updating projects, when I found this change, which we apparently never merged. I do think our customers would benefit - maybe this is something for the 10.13? |
@jvillafanez |
No core changes are needed. Everything is part of the app. |
In this case, just to be noted, an app upgrade could be made to be regulary downloaded from the marketplace making the change availabe asap plus adding it to the default for the 10.13 release. There is no doc restriction going that path. |
d76741e
to
1683039
Compare
@jnweiger could we do some QA here? |
[full-ci] Remove special PR-based drone actions from CI
Please review and merge into release-2.4.0 branch, I''ll build a 2.4.0-rc.1 from there and then start QA. |
055757f
to
6dc7d39
Compare
I don't see a |
It wasn't pushed. Sorry. Pushed now, and retargetted this PR. |
For testing: search_elastic-2.3.0+refactor_connector.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened some issues for dubious search results #323 (comment)
I'd suggest to rename connectors_search to connector_search
I'd suggest to make all new keywords upper case. SIZE, EXT, MTIME, TYPE, MIME
size.b and size.mb is supported. size.kb is missing.
Migration step 6 is unclear: "Then you can completely remove the index from elasticsearch."
-> search:index:reset ? or some 'drop table' SQL in the elastic server?
Screenshot for @mmattel Admin -> Settings -> Search |
Latest commit has changed the behavior of the search. The initial post has been updated to reflect those changes. |
This will need some voting... I'd rather keep them lowercase 😄
We'd need to send that info because as far as I know, elasticsearch doesn't make any calculations (or it's complex to setup). As far as elasticsearch is concerned, you can send
It's mainly for elasticsearch maintenance, to remove data that won't be used any longer. It's possible to skip that step without any problem on our side, but you have to live with junk data in elasticsearch. |
Understood. Instructions, how to exactly remove this junk data is missing. |
Most of the problems are fixed in the current state of #319 but I believe there is one regression now |
4d746c6
to
e567cc9
Compare
Kudos, SonarCloud Quality Gate passed! |
New things coming in this PR:
About the new "RelevanceV2" connector:
Matching the starts of the word in the filename will score half of the points(No longer applies)Note that, while the modification time affects the scoring, it doesn't mean that recent files will always be the first ones to appear. Old files might still have a higher score even after those boosts.
The "RelevanceV2" connector also introduces new ways to search for files based on the indexed fields. Note that the following info only applies to the "RelevanceV2" connector.
By default, the "RelevanceV2" connector will search in the name field, and in the file content if possible. Old limitations for the app to index the file content are still in place. Also note that, due to those limitations, big files might not get its contents indexed.
Additional searches you can do with the "RelevanceV2" connector:
ext:pdf
,ext:docx
,ext:gif
,ext:mp4
,ext:tar.gz
,ext:gz
, etc, any extension is possiblesize.b:<8092
,size.b:>102400
,size.b:[8092 TO 16184]
size.mb:<3
,size.mb:>9
,size.mb:[3 TO 9]
type:file
,type:folder
mtime:<1678960862
,mtime:>1678960862
,mtime:[1608111372 TO 1678960862]
mtime:<2021-08-25
,mtime:>2023-01-18
,mtime:[2022-01-01 TO 2022-12-31]
mime:image
,mime:gif
,mime:text
NOTE: To search for the whole mimetype such as "image/gif" use
mime.key:image\/gif
By default, each search term will be joined with an "OR" operator. For example(It doesn't works like this any longer)brown ext:pdf
will be interpreted as "name or content containing brown OR extension = pdf", so "brown.txt" file and "tito.pdf" will appear in the results. You can usebrown AND ext:pdf
to match pdf files containing brown in the name or contents.Each search term will narrow the search. For example
brown ext:pdf
will be interpreted as "name or content containing brown AND extension = pdf", so "brown.pdf" and "a brown paper.pdf" will appear, but not "brown.txt" or "tito.pdf"Some example of complex searches:
confidential mtime:>2023-01-01 size.mb:<10
type:folder size.mb:>1024
mime:image mtime:[2020-03-01 TO 2020-06-30]
pdf or txt files containing "oxygen" or "helium":(no lnoger applies)(oxygen OR helium) AND (ext:pdf OR ext:txt)
Note that matching by name is pretty lax, so expect a bunch of unexpected results. Anyway, good results are expected to be on top.
Migrating to the "RelevanceV2" connector:
If you haven't indexed anything yet, you're encouraged to setup the connectors you want to use as part of the app configuration. The recommended one is "RelevanceV2" for write and search.
If you have indexed data, these are the steps to migrate to the new index.
occ search:index:fillSecondary RelevanceV2 <user>
command. The command needs to be run for all the users (or at least the ones using the search feature), and it's expected to take a lot of time.With step 2 you'll be writing in both indexes at the same time. This is expected to be slower.
Note that step 2 just takes care of new files. Files indexed previously won't be present in the new index. This is why step 3 is there.
Step 4 is important and you should stop at that point for a while. If something goes wrong, you can still revert things, in particular, you can switch back to the "Legacy" connector.
From step 5 the actions are irreversible. If you want to go back, you'll have to start a new migration.
It's important to notice there isn't any expected downtime while the migration happens.
Until step 4, the "Legacy" connector will keep updating the index normally. When the switch happens in the search connector, the new "RelevanceV2" connector will access to the new index, which should have been fully updated.
NOTE: This might not be the final version. This is mostly the state of the PR and things might change without notice. The official documentation is expected to contain the final information once this PR is completely finished.
@mmattel FYI this will need documentation.