-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data restricted to Shibboleth groups cannot be explored in TwoRavens or downloaded via curl #3447
Comments
I haven't really talked about workarounds. The first workaround to try is IP Groups if we're mostly trying to restrict data to Harvard campus IP addresses. I don't believe there are a lot of docs but there's a bit at http://guides.dataverse.org/en/4.5.1/api/native-api.html#ipgroups Another workaround is to create an "explicit" group and assign all the users to that group. Then this explicit group could be given permission at the dataverse, dataset, or file level. There's a security wrinkle in here that I don't quite understand that @landreev could explain better. Something about how we could use an IP Group to allow just the IP Group of Rserve server (?) to be able to download the restricted files? |
@pdurbin - |
#3151 is about how IP Groups shouldn't allow access to level 2 data or above. I guess we'd make an exception in this case? Could Rserve authenticate to Dataverse as a user rather than an IP Group? I'm asking because one workaround would be to create a dedicated user for Rserve and flip the
I think this is the only way to allow an IP Group to download any file because permissions are not inherited from a parent dataverse to a child dataverse per #2447. The way I think of this is that each dataverse is an island of permission. @scolapasta @kcondon @michbarsinai and @landreev can fact check me on this. 😄 #alternativefacts |
This doesn't seem to be a high priority so I'm closing this issue. If I'm wrong, let's re-open it some day. |
First mentioned at https://help.hmdc.harvard.edu/Ticket/Display.html?id=243061 we have a case at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/T2DQQS where files are restricted and access has been granted to the institution-wide Shibboleth group for Harvard. The files can be downloaded but they cannot be explored in TwoRavens. Here's a screenshot of the poor user experience (no "data pebbles):
This morning @landreev @kcondon @scolapasta @djbrooke and I discussed this ticket and it was agreed that I would create this issue and provide background since I'm the one who implemented institution-wide Shibboleth groups in #1401 and documented them at http://guides.dataverse.org/en/4.5.1/installation/shibboleth.html#institution-wide-shibboleth-groups
We came up with a lot of ideas of how we could fix the bug (comments welcome) but I'd like to explain why these shib groups work the way they do.
We are trying to be careful about knowing if a Harvard affiliate, for example, has left Harvard and should no longer have access to restricted data. If you can't log in with HarvardKey anymore, you shouldn't be able to download data that is restricted to Harvard users. Toward that end, institution-wide Shibboleth groups are implemented as runtime groups. This is similar to how IP Groups are also implemented. (Perhaps you expect to be able to download data from a Harvard IP address when you're on campus, but you understand that if you're at home, you have a non-Harvard IP address, you will not be able to download the restricted file.) Dataverse knows you are a Harvard affiliate because you have logged in successfully to HarvardKey and HarvardKey has asserted back to Dataverse information about you including your name, email address, and most importantly for this case, which Identity Provider (IdP) you came from, which for Harvard is Entity ID "https://fed.huit.harvard.edu/idp/shibboleth" as seen at https://incommon.org/federation/info/entity.html?entityID=https%3A%2F%2Ffed.huit.harvard.edu%2Fidp%2Fshibboleth&technical=true and below:
So, once we have "https://fed.huit.harvard.edu/idp/shibboleth" where do we put it? We store it in the
persistentuserid
column of theauthenticateduserlookup
table, followed by the pipe character (|
), followed by an the unique identifier for the user. It looks something like this:https://fed.huit.harvard.edu/idp/shibboleth|[email protected]
. This is true not only for HarvardKey users, of course, but all of the 200+ institutions that can log in.For most of those 200 institutions, that's the only place where this "Entity ID" is stored. It's basically a no-op, permissions-wise. But for institutions for which we've bothered to set up an institution-wide Shibboleth group (currently Harvard and MIT only, I think, we'd like to automate this in #1403), at runtime we will match the user's IdP with the group (stored in a table called "shibgroup") and consider them part of that group from a permissions point of view. It's that shib group that is given "download permission" or whatever on a particular file or dataset. Again, the important thing here is that this only happens at runtime. In the code, as of 4.5.1 it only happens in the GUI with the line
au.setShibIdentityProvider(shibIdp)
at https://github.com/IQSS/dataverse/blob/v4.5.1/src/main/java/edu/harvard/iq/dataverse/Shib.java#L338 . Note that the variable is annotated as@Transient
. It's not persisted anywhere. Runtime only.I hope this background helps. I could go on and on but I think this is a good start. In short, we rely on the GUI (the browser) to determine at runtime if you're a Harvard user or not. In the title, I'm mentioning curl because it's another way to test this. You can't download files restricted to a Shib group via curl. The API token alone is currently enough. There's a related issue here in that if someone leaves Harvard and can't log into HarvardKey anymore, their API token will still work. Not forever, but until it expires, which I think is a year after creation? We'll know more after we dig into #3398.
The text was updated successfully, but these errors were encountered: