-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Planning] Allow serving user generated content from a separate domain #1932
Comments
I'm told we can do this. =D |
I'm told that we can do this too. The main point of discussion seems to be security. BackgroundThis blog covers most of the important points for us, so I'll transclude a portion of it here https://security.googleblog.com/2012/08/content-hosting-for-modern-web.html
We have a very similar situation to google here. We should consider our risk model and probably match it to theirs (they do have some smart people I'm told). Additionally they raise a very good point, embedded images. This will be an interesting point for us. (Update: it is a very boring point. Yay!) ComparisonsI, personally, would put everything any of us work on squarely in the "low-risk" class. Google uses this classing for user-generated documents with a public share link. That link is a globally valid, longer-lived URL. I see galaxy history elements as being very much analogous to that. We could class things in a higher risk category, but that would likely include serious tradeoffs in usability. Another comp is to GitHub and their private repos. They provide an auth token on all "raw" links in private repositories. As mentioned in Google's comments, we cannot just use the cookies to auth the user. We could use the URL as authenticating information (the user knows the history ID and the dataset ID which are two random tokens). However, this suffers the undesirably property that access cannot be changed: once a history is public, anyone knowing that URL would have access to it. Galaxy Implementation
UsabilityBefore we go into possible solutions, we know that we're balancing usability with security here (like always), so what usability do we support?
Honestly, UGC on a separate domain is starting to sound like a pretty darn good deal ;) This means that whatever implementation we choose, we're already gaining a number of wins over the status quo. That might induce us to consider higher security implementations, given that they will still be advantageous over what we're doing before, and we would not be losing much by choosing them. Implementation comments
The following comment was made by @natefoo in IRC.
which is a very valid thing to consider. For github, the token gets you permanent access to that version of that file.
That is a less appealing model given that our datasets never change, so we should explore more of the options and, more importantly, our threat model. ThreatsThe most plausible threat is that a user will accidentally publish a secret token somewhere and then be unable to revoke that. Permanent TokensEasy. So easy. It is interesting to note that GitHub does not consider this a significant enough threat to defend against it. Do we need to re-evaluate our threat model? Further, we could use "permanent tokens" but allow manually resetting them in the case of security breaches. A "reset token" button could be added to the pencil icon menu (do these things have names?) for individual datasets, and we could have a history-level reset that functions similarly. This would seemingly have a nice balance between usability (tokens don't randomly stop working, they stop working at defined events) and security (token can stop working at user defined events). Additionally the history/dataset level resets would be relatively simple to implement, just NULLify any token associated with a dataset. But what about collaborators? Anyone the user has shared this history with will need to have an access token to access the dataset. Either we use the owner's access token, or we have per-use access tokens. If we use the owner's access token, let us assume one of the collaborators is evil, and publishes all of the access tokens to their friends. The friends, without proper accounts or authentication, could access the datasets. The owner must then find out about this and revoke the tokens in order to prevent such an attack. If we use per-user access tokens, our database model becomes more complex, and additionally this attack is still possible. This is a mess. Let's move on. Non-permanent TokensHere we have to define events during which tokens are reset, or choose to reset on every access. Let's assume that we reset on access.
I am told that Galaxy has code for single-use tokens in the codebase already, so this may not be prohibitive to implement. Implementation ConclusionsIf we re-examine @natefoo's statement there's an interesting clause:
If we follow github's implementation, this is not the case. Due to the redirection, the user doesn't see the token unless they specifically request to view a file. We can remove that clause, and consider ourselves safe to use a permanent token for URL access. However, we can go further and apply single-use tokens. Noting this clause, I probably could have started my discussion here, but y'all get my entire through process instead 😉 Author's Conclusions
Originally I was very much in favour of permanent access tokens, but after writing this, I'm strongly in favour of single-use tokens. |
Thank you for this analysis, it is amaaaaaaaazing, and thanks for leaving your thought process, I'm sure whatever we implement, people will have questions about, and they can be referred to this. Like you, I started out thinking permanent revokeable tokens were the way to go, but I think you've swung me to single-use tokens (the redirect helps a lot). |
awwwwwww, thanks <3 yeah, I think the redirect solves nearly all of the potential issues with adding this feature by making it completely transparent to end users and generally a zero impact change if they're doing everything how they're supposed to. |
xref Trello
The text was updated successfully, but these errors were encountered: