Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize join for sharedWith by moving logic to DB #27436

Merged
merged 2 commits into from
Jul 28, 2017
Merged

Conversation

mrow4a
Copy link
Contributor

@mrow4a mrow4a commented Mar 20, 2017

In best case it gets rid of doubled queries $shares, $this->shareManager->getSharedWith for both GROUP and USER shares, as in the PR body. Currently, if there are both user and group shares, we need to do 2 queries. With this PR, if we have both group and user shares, we need to do 1 query. If we have only user shares, old and new implementation will do 1 query per user.

selection_102

Anyways, it is big win, since this query does... 2 Left Joins! These are very expensive operations in DBMS for enterprise scale tables.

@mention-bot
Copy link

@mrow4a, thanks for your PR! By analyzing the history of the files in this pull request, we identified @PVince81, @DeepDiver1975 and @phisch to be potential reviewers.

@@ -136,6 +136,18 @@ public function getShareById($id, $recipientId = null);
*/
public function getSharesByPath(Node $path);


/**
* Get shared with the given user for shares of all available types for this share provider within all nodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean with "within" ? do you mean "for all given nodes including their subnodes" ? (and not strictly for the given nodes)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm and $node is singular, not plural but your PHPDoc talks about "all nodes".

Please clarify and adjust the parameter name if applicable.

@@ -68,8 +68,9 @@ public function __construct(IConfig $config, IManager $shareManager, ILogger $lo
* @return \OCP\Files\Mount\IMountPoint[]
*/
public function getMountsForUser(IUser $user, IStorageFactory $storageFactory) {
$shares = $this->shareManager->getSharedWith($user->getUID(), \OCP\Share::SHARE_TYPE_USER, null, -1);
$shares = array_merge($shares, $this->shareManager->getSharedWith($user->getUID(), \OCP\Share::SHARE_TYPE_GROUP, null, -1));
$requiredShareTypes = [\OCP\Share::SHARE_TYPE_USER, \OCP\Share::SHARE_TYPE_GROUP];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, indeed that makes sense

$qb = $this->dbConn->getQueryBuilder();
// Create array of sharedWith objects (target user -> $userId or group of which user is a member
$user = $this->userManager->get($userId);
$allGroups = $this->groupManager->getUserGroups($user, 'sharing');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if passing "sharing" here is correct.

@IljaN could mess up with the guest users ?
Some context: this function is called when retrieving the list of shares for which a SharedMount needs to be setup for the recipient. From my understanding passing "sharing" here risks excluding guest users and those users wouldn't see their received shares mounted.

Needs testing...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah never mind, it's late... this was copied from the old code that already works https://github.com/owncloud/core/pull/27436/files#diff-83377463cf0d718950be360f7fcabd50L710

);

$qb->andWhere($qb->expr()->in('share_with', $qb->createNamedParameter(
$sharedWithUserOrGroupChunk,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will boom boom 💥 💥 if there are users and groups with the same names, say a group called "meow" and also a user called "meow" (yes, I've seen such setups before (well, setups with both user and group called the same, not setups with "meow"))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make sure to add a test for this, test first FTW 😉

Copy link
Contributor Author

@mrow4a mrow4a Mar 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why, these are two separate shares of different types, isnt it? I will try it out though in unit test, good point!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they might be different types.

But say if there is a crazy scenario where user "user1" is in the group "meow".
With this query you'd also select the incoming shares for the user "meow" which is wrong.

The way I see the query like: and ((share_type = GROUP and share_with in ($userGroups) or (share_type = USER and share_with = $userId))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this will slow down the query, will check if this scenario really does not work with the query I implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, boom booms, will write unit test protecting this case and apply a bit slower query if there are any groups

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if the user is not member of any groups you could just discard the group filter bit completely.

@PVince81
Copy link
Contributor

Apart from the "meow" issue (see comments), the code looks good overall.

$user = $this->userManager->get($userId);
$allGroups = $this->groupManager->getUserGroups($user, 'sharing');

$sharedWithUserOrGroup = array_map(function(IGroup $group) { return $group->getGID(); }, $allGroups);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better rewrite this:

$sharedWithUserOrGroup = array_map(function(IGroup $group) {
    return $group->getGID();
}, $allGroups);

->leftJoin('s', 'filecache', 'f', $qb->expr()->eq('s.file_source', 'f.fileid'))
->leftJoin('f', 'storages', 'st', $qb->expr()->eq('f.storage', 'st.numeric_id'));

$qb->where($qb->expr()->in(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to not keep the chained calls? I mean:

$qb->select(....)
  ->selectAlias(....)
  ->from(...)
  ->leftJoin(...)
  ->leftJoin(...)
  ->where(....)
  ->andWhere(...)

unless there is a conditional clause that needs to be inserted in the middle, I think it's better this way. If the condition can be moved to the end without harming the performance, I'd move it. If the conditional clause have to be inserted in the middle, then there is no other choice.

/**
* @inheritdoc
*/
public function getAllSharedWith($userId, $shareTypes, $node) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment about why $sharetypes is being ignored here is needed. I pass some share types and not only it ignores them, but also uses another type. My first thought is "this is wrong"

* Filter by $node if provided
*
* @param string $userId
* @param int[] $shareTypes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need a better explanation... If some constants are expected, either list them or point the user to where the constants are.

@mrow4a mrow4a force-pushed the optimize_sharedwith branch 7 times, most recently from da11834 to 20bc46b Compare March 23, 2017 12:47
* @inheritdoc
*/
public function getAllSharedWith($userId, $node) {
$this->getSharedWith($userId, self::SHARE_TYPE_REMOTE, $node, -1, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing return? there is also no unit test on this I assume

* @inheritdoc
*/
public function getAllSharedWith($userId, $shareTypes, $node = null) {
$shares = array();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[]

@@ -90,7 +90,7 @@ public function moveShare(IShare $share, $recipientId);
* Get all shares shared by (initiated) by the provided user for specific node IDs.
*
* @param string $userId
* @param int[] $shareTypes
* @param \OC\Share\Constants[] $shareTypes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OC namespace in an OCP interface? NO

*
* @param string $userId
* @param \OC\Share\Constants[] $shareTypes
* @param Node|null $node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this to the argument declaration

@@ -94,7 +94,7 @@ public function move(\OCP\Share\IShare $share, $recipient);
* Get all shares by the given user
*
* @param string $userId
* @param int $shareType
* @param \OC\Share\Constants $shareType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👎 you cannot change the definition of a public api

}

public function testGetAllSharedWith() {
$user = $this->createMock('\OCP\IUser');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IUser::class

@PVince81
Copy link
Contributor

conflicts and comments to be resolved

@mrow4a mrow4a force-pushed the optimize_sharedwith branch from 20bc46b to c6d3ed8 Compare March 28, 2017 10:02
@mrow4a
Copy link
Contributor Author

mrow4a commented Mar 28, 2017

Ready for review if tests are passing.

IQueryBuilder::PARAM_STR_ARRAY
))
),
$qb->expr()->andX(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could have stored these in variables $usersCondition and $groupCondition to avoid duplicating the code here and in the block below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed queries a little and not applicable now

@mrow4a mrow4a force-pushed the optimize_sharedwith branch from c6d3ed8 to d92c4eb Compare March 28, 2017 12:10
$groupShares = [];

// Check if user is member of some groups and chunk them
$sharedWithGroup = array_map(function(IGroup $group) { return $group->getGID(); }, $allGroups);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjust code style here:

$sharedWithGroup = array_map(function(IGroup $group) {
  return $group->getGID();
}, $allGroups);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I adjusted to the one which is in the "original function getSharedWith" :> Question means which is "code style" :D


for ($chunkNo = 0; $chunkNo < $chunkNoRequired; $chunkNo++) {
// Query for user/group shares
$qb = $this->dbConn->getQueryBuilder();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move all the DB logic to another function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (!isset($providerIdMap[$providerId])) {
$providerIdMap[$providerId] = array();
}
array_push($providerIdMap[$providerId], $shareType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think unless you're pushing several items, $providerIdMap[$providerId][] = $shareType might perform better, and also is consistent with the rest of the code.

@mrow4a mrow4a force-pushed the optimize_sharedwith branch from d92c4eb to 3b93a61 Compare March 29, 2017 21:01
@DeepDiver1975 DeepDiver1975 modified the milestones: 10.0.1, 10.1 May 17, 2017
@mrow4a mrow4a force-pushed the optimize_sharedwith branch from 83e4c2b to 3058fd4 Compare May 23, 2017 20:09
* @param Node|null $node
* @return DB\QueryBuilder\IQueryBuilder $qb
*/
public function getSharedWithUserQuery($userId, $node) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created separate function to create query for user shares

* @param Node|null $node
* @return DB\QueryBuilder\IQueryBuilder $qb
*/
public function getSharedWithGroupQuery($groups, $node) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created separate function to create query for group shares

* @param Node|null $node
* @return DB\QueryBuilder\IQueryBuilder $qb
*/
public function getSharedWithUserGroupQuery($groups, $userId, $node) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created separate function to create query for both group and user shares

$groups = $sharedWithGroupChunks[$chunkNo];
if ($chunkNo === 0) {
// First chunk, so we need to obtain user
$qb = $this->getSharedWithUserGroupQuery($groups, $userId, $node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. Why the first chunk always calls this one function but other chunks the other one ? What happens if there are more than 100 in a chunk ?

What magic guarantees that the first chunk is only about users ?

Either change the code to be more readable or adjust PHPDoc to explain why and avoid future confusion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or don't cheat and use two arrays, one for users and one for groups instead of putting both in one and using the first chunk with a special meaning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is that if there are both user sharess and group shares, dont do 2 queries, do one as above.

In case of chunks, you only to get user shares only once, you dont need to do that for each chunk, this is why with each next chunk you dont fetch user, but only groups.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these two scenarios this covered by unit tests / integration tests ? if yes then I'm fine...

@mrow4a mrow4a force-pushed the optimize_sharedwith branch 2 times, most recently from 75fb6d0 to 8000bab Compare July 27, 2017 10:58
@mrow4a mrow4a force-pushed the optimize_sharedwith branch from 8000bab to 9279ed3 Compare July 27, 2017 12:12
$cursor->closeCursor();
} else {
// There are groups, query both for user and for groups
$userSharesRetrieved = false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PVince81 is it clearer now?

@mrow4a
Copy link
Contributor Author

mrow4a commented Jul 27, 2017

@jvillafanez @PVince81 @DeepDiver1975 more ready it wont be :>

@jvillafanez
Copy link
Member

I have nothing against 👍

@PVince81
Copy link
Contributor

Thanks! Please backport to stable10.

@PVince81 PVince81 merged commit bd2ae81 into master Jul 28, 2017
@PVince81 PVince81 deleted the optimize_sharedwith branch July 28, 2017 08:57
@mrow4a
Copy link
Contributor Author

mrow4a commented Jul 28, 2017

Never backported in my life, so how to do that properly? I need to patch stable10 with this commit? Cherry-pick ?

@PVince81
Copy link
Contributor

@mrow4a branch off stable10, cherry-pick, retest, then send a PR with target stable10

@mrow4a mrow4a mentioned this pull request Jul 28, 2017
@DeepDiver1975 DeepDiver1975 modified the milestones: 10.1, development Oct 10, 2017
@lock
Copy link

lock bot commented Aug 2, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants