Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go from (User, Permission) to a list of DvObjects #784

Closed
pdurbin opened this issue Jul 30, 2014 · 14 comments
Closed

Go from (User, Permission) to a list of DvObjects #784

pdurbin opened this issue Jul 30, 2014 · 14 comments

Comments

@pdurbin
Copy link
Member

pdurbin commented Jul 30, 2014

I originally wrote this up as a comment on the pluggable auth design but I'm creating an issue to lay out a concrete use case.

In that comment, I was arguing we need something like this...

List<DvObject> = permissionService.listOfDvObjectsUserHasPermissionOn(user, Permission.EditMetadata);

... and that may be true, but let's be more concrete about an actual need.

In our API docs for the Data Deposit API we shipped in DVN 3.6 we explain how to retrieve SWORD service document and what it is:

The service document enumerates the dataverses ("collections" from a SWORD perspective) the user can deposit data into. -- https://github.com/IQSS/dvn/blob/develop/doc/sphinx/source/dataverse-api-main.rst#retrieve-sword-service-document

The key part of this, of course is "enumerates the dataverses the user can deposit data into". How do we do this in 4.0?

In copying and pasting the Data Deposit API code from DVN 3.6 to 4.0 it became clear to me that we used to have a method for this. I added the following as a todo item:

/**
 * @todo All dataverses? listing all dataverse here is horribly
 * inefficient. We need the equivalent of this from 3.x:
 *
 * List<VDC> vdcList = vdcService.getUserVDCs(vdcUser.getId());
 */
List<Dataverse> allDataverses = dataverseService.findAll();
for (Dataverse dataverse : allDataverses) {
    if (swordAuth.hasAccessToModifyDataverse(user, dataverse)) {

The point is that I'm acutely aware that what I stubbed out is non-performant and I'd like to do it the right way but I'm not sure how.

For now I'm assigning this to @michbarsinai for review and comment. Yesterday we chatted a bit about how some denormalization in the database could help.

@pdurbin pdurbin added this to the In Review - Dataverse 4.0 milestone Jul 30, 2014
@pdurbin pdurbin modified the milestones: Beta 8 - Dataverse 4.0, In Review - Dataverse 4.0 Oct 22, 2014
@pdurbin
Copy link
Member Author

pdurbin commented Oct 22, 2014

@michbarsinai since I'm working on the SWORD-based Data Deposit API for the current milestone I moved this ticket there too. Please check out dd83960 where I added the method below that SWORD needs to PermissionServiceBean:

public List<Dataverse> getDataversesUserHasPermissionOn(User user, Permission permission) {
    List<Dataverse> allDataverses = dataverseService.findAll();
    List<Dataverse> dataversesUserHasPermissionOn = new ArrayList<>();
    for (Dataverse dataverse : allDataverses) {
        if (userOn(user, dataverse).has(permission)) {
            dataversesUserHasPermissionOn.add(dataverse);
        }
    }
    return dataversesUserHasPermissionOn;
}

I'm calling it like this from ServiceDocumentManagerImpl:

List<Dataverse> dataverses = permissionService.getDataversesUserHasPermissionOn(user, Permission.AddDataset);

From this list I construct a "Service Document" that shows where the user can deposit data (which is why I'm using Permission.AddDataset).

Obviously, it's inefficient to iterate through every dataverse like this but I don't know if our permission system provides a better way.

@michbarsinai
Copy link
Member

It doesn't. We could add some auxiliary indices at some point, if we feel the performance needs that.
Note that since we hit the network each time we query the DB for a dataverse, this solution may actually be more performant, as the data only travel once, since you make only a single query.

BTW, why not move this method to the PermissionServiceBean? Seems useful.

On 22 Oct, 2014, at 10:18 PM, Philip Durbin [email protected] wrote:

@michbarsinai https://github.com/michbarsinai since I'm working on the SWORD-based Data Deposit API for the current milestone I moved this ticket there too. Please check out dd83960 dd83960 where I added the method below that SWORD needs to PermissionServiceBean:

public List getDataversesUserHasPermissionOn(User user, Permission permission) {
List allDataverses = dataverseService.findAll();
List dataversesUserHasPermissionOn = new ArrayList<>();
for (Dataverse dataverse : allDataverses) {
if (userOn(user, dataverse).has(permission)) {
dataversesUserHasPermissionOn.add(dataverse);
}
}
return dataversesUserHasPermissionOn;
}
I'm calling it like this from ServiceDocumentManagerImpl:

List dataverses = permissionService.getDataversesUserHasPermissionOn(user, Permission.AddDataset);

From this list I construct a "Service Document" that shows where the user can deposit data (which is why I'm using Permission.AddDataset).

Obviously, it's inefficient to iterate through every dataverse like this but I don't know if our permission system provides a better way.


Reply to this email directly or view it on GitHub #784 (comment).

@pdurbin
Copy link
Member Author

pdurbin commented Oct 22, 2014

why not move this method to the PermissionServiceBean

@michbarsinai it's there already. But! I just realized I forgot to put any logic for isPermissionRoot. I tried to add it but then I realized that the root dataverse itself is not permissionRoot:

{
    "alias":"root",
    "name":"Root",
    "contactEmail":"[email protected]",
    "permissionRoot":false,
    "description":"The root dataverse."
}

That can't be right, can it? Shouldn't the root dataverse be permissionRoot:true?

@michbarsinai
Copy link
Member

You can add this as an issue and assign it to me. It's a problem in the Json printer, though. isRoot() is a JPA-style getter, You should use isEffectivelyPermissionRoot()

(insert rant about JPA contaminating the pure object model with getters and setters here)

On 22 Oct, 2014, at 11:42 PM, Philip Durbin [email protected] wrote:

why not move this method to the PermissionServiceBean

@michbarsinai https://github.com/michbarsinai it's there already. But! I just realized I forgot to put any logic for isPermissionRoot. I tried to add it but then I realized that the root dataverse itself is not permissionRoot:

{
"alias":"root",
"name":"Root",
"contactEmail":"[email protected]",
"permissionRoot":false,
"description":"The root dataverse."
}
That can't be right, can it? Shouldn't the root dataverse be permissionRoot:true?


Reply to this email directly or view it on GitHub #784 (comment).

@pdurbin
Copy link
Member Author

pdurbin commented Oct 23, 2014

You can add this as an issue and assign it to me. It's a problem in the Json printer, though. isRoot() is a JPA-style getter, You should use isEffectivelyPermissionRoot()

@michbarsinai ok, I created #987 and assigned it to you.

@pdurbin
Copy link
Member Author

pdurbin commented Jan 12, 2015

@sekmiller and @scolapasta as we discussed today, it would still be a wonderful thing if we had an efficient method. See also getDataversesUserHasPermissionOn at https://github.com/IQSS/dataverse/blob/master/src/main/java/edu/harvard/iq/dataverse/PermissionServiceBean.java which is horribly inefficient and doesn't check permissionRoot.

Because we lack this method I have documented "inefficiency" at http://guides.dataverse.org/en/latest/api/sword.html#known-issues

@michbarsinai
Copy link
Member

Removed the following from a code comment. Keeping it here in case its useful:

In DVN 3.x we had this: List<VDC> vdcList = vdcService.getUserVDCs(vdcUser.getId());

@pdurbin
Copy link
Member Author

pdurbin commented Apr 14, 2015

Now that we are in production, it's clear that the getDataversesUserHasPermissionOn method in PermissionServiceBean is not performant enough for 2901 dataverses. It's taking over 24 minutes:

murphy:dataverse pdurbin$ date; time curl --insecure -L -u d4685e77-e040-4117-ad49-4ca5925545f1: https://localhost:8181/dvn/api/data-deposit/v1.1/swordv2/service-document ; date
Tue Apr 14 13:05:31 EDT 2015
curl: (56) SSLRead() return error -9806

real    24m24.143s
user    0m0.039s
sys 0m0.045s
Tue Apr 14 13:29:55 EDT 2015
murphy:dataverse pdurbin$ 
murphy:dataverse pdurbin$ git diff src
diff --git a/src/main/java/edu/harvard/iq/dataverse/PermissionServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/PermissionServiceBean.java
index 99b8b07..ac67d56 100644
--- a/src/main/java/edu/harvard/iq/dataverse/PermissionServiceBean.java
+++ b/src/main/java/edu/harvard/iq/dataverse/PermissionServiceBean.java
@@ -262,11 +262,15 @@ public class PermissionServiceBean {
      */
     public List<Dataverse> getDataversesUserHasPermissionOn(User user, Permission permission) {
         List<Dataverse> allDataverses = dataverseService.findAll();
+        int numDataverses = allDataverses.size();
+        int count = 1;
         List<Dataverse> dataversesUserHasPermissionOn = new LinkedList<>();
         for (Dataverse dataverse : allDataverses) {
+            System.out.println(count + " of " + numDataverses);
             if (userOn(user, dataverse).has(permission)) {
                 dataversesUserHasPermissionOn.add(dataverse);
             }
+            count++;
         }
         return dataversesUserHasPermissionOn;
     }
murphy:dataverse pdurbin$ 

@raprasad dropped by and he helped me come up with this, which looks promising:

SELECT * FROM dvobject WHERE dtype = 'Dataverse' and id in (select definitionpoint_id from roleassignment where assigneeidentifier in ('@pdurbin', '&group1'));

/cc @rliebz and @axfelix whom I've been chatting with at http://irclog.iq.harvard.edu/dataverse/2015-04-14#i_18331

pdurbin added a commit that referenced this issue Apr 14, 2015
Note the todo that groups are not handled. See also #784
@pdurbin
Copy link
Member Author

pdurbin commented Apr 14, 2015

@michbarsinai we are not sure how to add the groups to the list so the commit at 70548db which I'm putting through QA in #2012 does not respect groups. If you have any ideas, please let me know!

@pdurbin
Copy link
Member Author

pdurbin commented Apr 15, 2015

Rather than 24 minutes, with that patch applied in production (again, groups memberships are ignored, unfortunately), I can retrieve the service document in well under a second:

murphy:dataverse pdurbin$ time curl -s -u $PDURBINKEY: https://dataverse.harvard.edu/dvn/api/data-deposit/v1.1/swordv2/service-document | xmllint -format -
<?xml version="1.0"?>
<service xmlns="http://www.w3.org/2007/app" xmlns:atom="http://www.w3.org/2005/Atom">
  <workspace>
    <atom:title type="text">Harvard</atom:title>
    <collection href="https://dataverse.harvard.edu/dvn/api/data-deposit/v1.1/swordv2/collection/dataverse/pdurbin">
      <atom:title type="text">Philip Durbin</atom:title>
      <collectionPolicy xmlns="http://purl.org/net/sword/terms/">There are no API Terms of Use for this Dataverse installation.</collectionPolicy>
      <mediation xmlns="http://purl.org/net/sword/terms/">false</mediation>
      <acceptPackaging xmlns="http://purl.org/net/sword/terms/">http://purl.org/net/sword/package/SimpleZip</acceptPackaging>
    </collection>
  </workspace>
  <generator xmlns="http://www.w3.org/2005/Atom" uri="http://www.swordapp.org/" version="2.0"/>
  <version xmlns="http://purl.org/net/sword/terms/">2.0</version>
</service>

real    0m0.176s
user    0m0.037s
sys 0m0.011s
murphy:dataverse pdurbin$ 

@scolapasta scolapasta removed this from the Not Assigned to a Release milestone Jan 28, 2016
@pdurbin pdurbin self-assigned this May 24, 2016
pdurbin added a commit that referenced this issue May 24, 2016
We are switching away from the old "admin only" permission checks we
were forced to used in the DVN 3.x days when the permission system was
less flexible. Now the rule is that if the permission system doesn't
prevent you from performing an operation (create dataset, upload file)
SWORD won't prevent you either.

Two SWORD operations had to be rewritten to support Dataverse 4
permissions.

First, retrieval of the Service Document with
permissionService.getDataversesUserHasPermissionOn has been changed to
pay attention to a user's groups. It's very common for the
":authenticated-users" group to be given a role that should allow SWORD
operations, for example.

Second, the "list datasets in a dataverse" operation
(listCollectionContents) was rewritten to allow anyone with `AddDataset`
permission to be able to execute the command to they can see if a
dataverse has been published or not but the only datasets that are
returned are those for which the user can issue `UpdateDatasetCommand`.
In the root dataverse, for example, the user should only see their
datasets.
@pdurbin
Copy link
Member Author

pdurbin commented May 24, 2016

I just opened pull request #3137 as a fix for #1070 and #2495 but I'm passing this issue to QA because my new fix is so closely related to the performance fix I put in for #2012 but now I'm restoring group support (except Shib groups) when retrieving the SWORD Service Document.

@michbarsinai I'd be interested to get your feedback on the latest iteration of a method I keep tweaking to go from (User, Permission) to a list of DvObjects (from PermissionServiceBean in pull request #3137):

public List<Dataverse> getDataversesUserHasPermissionOn(AuthenticatedUser user, Permission permission) {

I'm using a native query which feels a little wrong but we switched to native for performance reasons in 70548db. We lost group support back then and in the pull request above I'm adding it back.

I'm moving this issue to QA. From a QA perspective, the question is if the code is performant enough in two cases:

  • Getting the SWORD Service Document with production data
  • Listing datasets in a dataverse with lots of datasets in a single dataverse, such as the root. I think that's how our production data is but I'm not sure.

@pdurbin pdurbin assigned kcondon and unassigned pdurbin May 24, 2016
@michbarsinai
Copy link
Member

Sorry for the late reply - I was snowed under IWPE'16 and DataTags stuff.

A few comments:

  1. We should use prepared query rather than composing and parsing the SQL each time. JDBC supports parameter binding for IN clauses, so that should not be a problem, and would increase performance.
  2. No need to go over the (much larger) dvobject table, since we're only interested in the id of Dataverses. It is enough to select from "dataverse", and again would be much faster.

Other than that, it looks good. I think there's little chance that the identifiers would break the SQL, but if we move to prepared queries (which we totally should) that will no longer be a concern.

-- M

On 24 May 2016, at 12:32, Philip Durbin [email protected] wrote:

I just opened pull request #3137 #3137 as a fix for #1070 #1070 and #2495 #2495 but I'm passing this issue to QA because my new fix is so closely related to the performance fix I put in for #2012 https://github.com/IQSS/dataverse/%20issues/2012 but now I'm restoring group support (except Shib groups) when retrieving the SWORD Service Document.

@michbarsinai https://github.com/michbarsinai I'd be interested to get your feedback on the latest iteration of a method I keep tweaking to go from (User, Permission) to a list of DvObjects (from PermissionServiceBean in pull request #3137 #3137):

public List getDataversesUserHasPermissionOn(AuthenticatedUser user, Permission permission) {

I'm using a native query which feels a little wrong but we switched to native for performance reasons in 70548db 70548db. We lost group support back then and in the pull request above I'm adding it back.

I'm moving this issue to QA. From a QA perspective, the question is if the code is performant enough in two cases:

Getting the SWORD Service Document with production data
Listing datasets in a dataverse with lots of datasets in a single dataverse, such as the root. I think that's how our production data is but I'm not sure.

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub #784 (comment)

@kcondon
Copy link
Contributor

kcondon commented Jul 19, 2016

Leaving in QA but removing my name until I get pull request.

@kcondon
Copy link
Contributor

kcondon commented Jul 21, 2016

Tested service doc in test w/ prod data (21s) and prod (16s) from root as admin.
Tested list datasets from antislaverypetitions dv in test w/ prod data (39s) and prod (25s).

Apples to apples comparison on same test box:
service doc: v4.5 21s, v4.4 10s
dataset list: v4.5 39s, v4.4 11s

Though slower, these test cases are likely the worst ones and fall within the 60s apache timeout.

We should look at improving these times if possible but do not seem to be a blocker.

@kcondon kcondon closed this as completed Jul 21, 2016
kcondon added a commit that referenced this issue Jul 21, 2016
Sword Auth: use Dataverse 4 permission model #1070 #2495 #784
@pdurbin pdurbin added this to the 4.5 milestone Jul 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants