Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add initial search in shared files #4207

Merged
merged 7 commits into from
Sep 27, 2013
Merged

Conversation

butonic
Copy link
Member

@butonic butonic commented Jul 26, 2013

@MTGap @icewind1991 customers start to ask for this. Not the best way to implement it because I use the folder backend get the children ids and then use those in a IN (?,...,?) SQL statement. This limits this implementation to the supported maximum length for a sql query ... which will be database specific. But it is a first implementation. I don't understand the sharing code well enough to implement a JOIN based implementation (actually that currently won't be possible because we have to recursively collect the children in a folder before we can filter their filename).

@karlitschek @DeepDiver1975 @bartv2 please review.

@ghost
Copy link

ghost commented Jul 26, 2013

Merged build triggered.

@karlitschek
Copy link
Contributor

👍

@butonic
Copy link
Member Author

butonic commented Jul 26, 2013

fixes #744
cc @felixboehm

@ghost
Copy link

ghost commented Jul 26, 2013

Merged build finished.

@ghost
Copy link

ghost commented Jul 26, 2013

Test passed.
Refer to this link for build results: https://ci.owncloud.org/job/pull-request-analyser/213/

@icewind1991
Copy link
Contributor

Code makes sense 👍

@karlitschek
Copy link
Contributor

@butonic How big is the risk for backporting?

@butonic
Copy link
Member Author

butonic commented Jul 26, 2013

@karlitschek from a code perspective not big. The change is not very intrusive and the functions / classes I rely on are already there. I am a little unsure of the approach because a user might well have hundreds of thousands of files shared with him. At some point the SQL query will become too long for the RDBMS to handle because the ... IN (?,?,?,...,?) will grow for every additional file that has been shared with the user.
Have a look at the numbers:
mysql: 16MB http://dev.mysql.com/doc/refman/5.0/en/packet-too-large.html
postgresql seems to have a config option (7.2 1000 params in IN, 8.0 2048k): http://grokbase.com/t/postgresql/pgsql-general/061mc7pxg6/what-is-the-maximum-length-of-an-in-a-b-c-d-list-in-postgresql
sqlite: has SQLITE_MAX_SQL_LENGTH defaulting to 1000000
oracle: http://docs.oracle.com/cd/B19306_01/server.102/b14237/limits003.htm - "The limit on how long a SQL statement can be depends on many factors, including database configuration, disk space, and memory" very helpful ...
mssql < 2008 will produce a stackdump at ~11500 entries in the IN clause http://support.microsoft.com/kb/288095/en-us, seems to be fixed in newer versions: see http://stackoverflow.com/questions/1869753/maximum-size-for-a-sql-server-query-in-clause-is-there-a-better-approach

so ... no risk, unless people share too many files ;)

@VicDeo
Copy link
Member

VicDeo commented Jul 26, 2013

@butonic iirc MySQL has 1000 placeholders as a limit. @DeepDiver1975 fixed something related to it some time ago.

Should we show these tons of files in dropdown anyway?
It makes sense to cap them at 20-30 and show 'and ~20K more files' below.

UPD here is his PR #271

@DeepDiver1975
Copy link
Member

@owncloud-bot retest this please

@ghost
Copy link

ghost commented Jul 29, 2013

Build started.

@butonic
Copy link
Member Author

butonic commented Jul 29, 2013

A test with 500000 parameters in the IN operator: http://stackoverflow.com/a/15588172/828717
done on mysql with an increased max_allowed_packet.

Limiting the number of search results should also be done in the filecache, so I vote for doing that in a separate PR.

@VicDeo
Copy link
Member

VicDeo commented Jul 29, 2013

@butonic yes, it's sqlite-related, I built similar query, the result is below:

  An exception occurred while executing 'SELECT * from "oc_appconfig" WHERE "app_title" IN(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)': SQLSTATE[HY000]: General error: 1 too many SQL variables

#0 /home/deo/public_html/oc-test/lib/public/db.php(45): OC_DB::prepare('SELECT * from ...', NULL, NULL) #1 /home/deo/public_html/oc-test/apps/files/templates/index.php(136): OCP\DB::prepare('SELECT * from...')

@VicDeo
Copy link
Member

VicDeo commented Jul 29, 2013

and quote from PR I linked above:

Currently the sql query will fail as soon as a certain number of items in the IN condition has been reached.
(1000 for Oracle and sqlite - no idea about the others)
The proposed solution will cut the array into tiny little pieces and execute the statement multiple times.

@butonic butonic mentioned this pull request Jul 30, 2013
8 tasks
@butonic
Copy link
Member Author

butonic commented Jul 31, 2013

@VicDeo I split the ids into chunks of 1k. I also planned to add test to the shared cache tests ... but non are existing. @MTGap could you copy tests/lib/files/cache/cache.php and change setUp() to get a working shared cache test? I tried, but soon found out that Shared_Cache->getSourceCache() expects a correctly shared file / folder.

@ghost
Copy link

ghost commented Jul 31, 2013

Test passed.
Refer to this link for build results: https://ci.owncloud.org/job/pull-request-analyser/282/

@butonic
Copy link
Member Author

butonic commented Aug 3, 2013

@VicDeo bump for thumb?

@VicDeo
Copy link
Member

VicDeo commented Aug 3, 2013

@butonic it does search and show the results. Is it ok that there is a 404 page once shared entry is clicked?
URL looks like that: http://oc/index.php/apps/files/download/Shared/files/01.jpg

@butonic
Copy link
Member Author

butonic commented Aug 7, 2013

@VicDeo nope, not ok, investigating.

@ghost
Copy link

ghost commented Aug 7, 2013

Test passed.
Refer to this link for build results: https://ci.owncloud.org/job/pull-request-analyser/423/

@butonic
Copy link
Member Author

butonic commented Aug 8, 2013

@VicDeo this fixes the problem. @DeepDiver1975 please review if you want to search in shared files. @karlitschek can now be backported without risk as the SQL is chunked into queries with max 1k params in the LIKE clause.

@VicDeo
Copy link
Member

VicDeo commented Aug 16, 2013

@butonic 👍 Tested

@gah242s
Copy link

gah242s commented Aug 16, 2013

I realize that this code is tested for master (eventually OC6?), but because of comment by @butonic about backporting, I stuck it into OC 5.0.10. On up-to-date CentOS 6, Apache, MySQL, I get nothing. By nothing, I mean that it doesn't log anything in owncloud.log, access.log, error.log, etc. There's no search drop-down or any indication that it's attempting to look, though top does report that mysqld is utilizing 100% CPU. It does continue to return results for owned files, but nothing for shared. I've been in a shared directory, entered the title of a .jpg file, and still get nothing. So, nothing broken apparently, but also, nothing gained. :-(


$ids = \OCP\Share::getItemsSharedWith('file', \OC_Share_Backend_File::FORMAT_GET_ALL);
foreach ($ids as $file) {
$folderBackend = \OCP\Share::getBackend('folder');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe do this outside the foreach? or does getChildren change the state?

@butonic
Copy link
Member Author

butonic commented Aug 28, 2013

@karlitschek @VicDeo @bartv2 @DeepDiver1975 @MTGap the shared cache uses getAll() when searching for files, either by mime or by pattern (added with this PR). IMO, getAll() did not work as expected. I assumed it would return all file ids in a share. Seems that is not the case. So I moved the getChildren code to getAll and used that in the search methods. Looking at searchbymime I got the feeling that the code has not been maintained. The returned paths are wrong and the mimetype is only returning our internal numbers. Looking at the implementation of the filecache that is clearly wrong. Fixed that as well. Please review!

@butonic
Copy link
Member Author

butonic commented Aug 28, 2013

searchByMime is used by pictures, documents, impress, impressionist, movies and reader. Tested with documents and pictures ... btw @karlitschek can we move gallery / pictures to it's own repo and complete the renaming? It's awkward to search for pictures / gallery and I never find what I'm looking for. It is already called pictures in the ui? Can I haz a separate repo?
bc0zuq1

@butonic
Copy link
Member Author

butonic commented Aug 28, 2013

@owncloud-bot retest this please

@ghost
Copy link

ghost commented Aug 28, 2013

Test passed.
Refer to this link for build results: https://ci.owncloud.org/job/pull-request-analyser/734/


while ($row = $result->fetchRow()) {
if (substr($row['path'], 0, 6)==='files/') {
$row['path'] = substr($row['path'],6); // remove 'files/' from path as it's relative to '/Shared'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's wrong. I'll just fix these methods in #4382.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the status of this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@icewind1991 To me it seems correct for #4516

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it seems to only work from search results that originates from the main store of the user, I think any result originating from an external storage will still have an incorrect path

For a proper fix, the mountpoint of for the storage which holds the search result should be compared with the user home folder

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaict files and files_external use / as the root for their mountpoints. only the files_sharing cache returns results prefixed with 'Shared'. feel free to give a code example or highjack this PR. what would a test for this look like?

@DeepDiver1975
Copy link
Member

That's wrong. I'll just fix these methods in #4382.

What would the fix look like for the 'old' sharing implementation?
Please always keep in mind that we need to maintain the old implementation for OC5 which will continue to live for some years.

@butonic
Copy link
Member Author

butonic commented Aug 28, 2013

@MTGap what exactly is wrong? The view prepends search results with the path corresponding to the mountpoint / storage. The shared cache should only return the relative path, which is what the code above does. Conditionally. So this is a viable PR to fix broken behaviour which I need in pictures and documents.

@MTGap
Copy link
Contributor

MTGap commented Aug 28, 2013

The shared cache should only return the relative path, which is what the code above does

Yes, but your implementation doesn't work with other mount points or different file targets.

@butonic
Copy link
Member Author

butonic commented Aug 28, 2013

@MTGap What other mount points use sharing as the file cache backend? We only have one 'Shared' folder. To my knowledge files_external backends uses searchCommon which I already pointed to: https://github.com/owncloud/core/blob/master/lib/files/view.php#L988 handles prepending the path with the mount points location in the file tree. What do you mean by file targets? The sharing cache only handles files, doesn't it? gallery shares / items are stored in the share table. This PR fixes all file based search results and does not break anything to my knowledge. Please give examples for any problems you see.

@MTGap
Copy link
Contributor

MTGap commented Aug 28, 2013

I'm talking about shared files that are located in external storage. And file targets are unique file names for the user shared with that are not necessarily the same name as for the owner of the file.

@butonic
Copy link
Member Author

butonic commented Aug 28, 2013

@MTGap AFAICT the sharing cache does not return results from eg. search_lucene, encryption, or the gallery that have a path starting with those apps names respectively. Tested this with demo.owncloud.org as an external mount and reshared to another user. he can now find those files fine. Yes, this leaves the cornercase of files residing in a literal 'files' folder. But this is something.

@ghost
Copy link

ghost commented Aug 28, 2013

Test passed.
Refer to this link for build results: https://ci.owncloud.org/job/pull-request-analyser/745/

@gah242s
Copy link

gah242s commented Sep 5, 2013

Better check this code in OC5 with shared folders that have spaces. A reboot brought it back up, but I was getting blank pages until the reboot.

@VicDeo
Copy link
Member

VicDeo commented Sep 19, 2013

@owncloud-bot retest this please

@VicDeo
Copy link
Member

VicDeo commented Sep 19, 2013

Rebased, tested, still 👍 from me, waiting for CI server

@gah242s can't reproduce with master but we will be careful with backporting

@ghost
Copy link

ghost commented Sep 19, 2013

Test passed.
Refer to this link for build results: https://ci.owncloud.org/job/pull-request-analyser/1107/

@VicDeo
Copy link
Member

VicDeo commented Sep 19, 2013

@DeepDiver1975 @icewind1991 can we finally have it?

@DeepDiver1975
Copy link
Member

tested 👍

ensure getChildren() is called on an instance of Share_Backend_Collection
@ghost
Copy link

ghost commented Sep 24, 2013

Test passed.
Refer to this link for build results: https://ci.owncloud.org/job/pull-request-analyser/1176/

DeepDiver1975 added a commit that referenced this pull request Sep 27, 2013
add initial search in shared files
@DeepDiver1975 DeepDiver1975 merged commit 735608f into master Sep 27, 2013
@DeepDiver1975 DeepDiver1975 deleted the search_shared_files branch September 27, 2013 12:18
@lock lock bot locked as resolved and limited conversation to collaborators Aug 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants