Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add secure data capability #20

Closed
6 tasks
tomkralidis opened this issue Nov 9, 2021 · 12 comments
Closed
6 tasks

add secure data capability #20

tomkralidis opened this issue Nov 9, 2021 · 12 comments
Assignees
Labels
access control Access control user story User story
Milestone

Comments

@tomkralidis
Copy link
Collaborator

tomkralidis commented Nov 9, 2021

User story

As an operator, I want to add access control to some data so that sensitive resources require authentication and authorization.

Acceptance criteria

  • operator can add password protection to resources in a given topic hierarchy
  • users cannot access data which is not public
  • user can discover the existence of access controlled data, but will not be provided access controlled links

Definition of done

  • unit/functional tests added/updated
  • documentation updated
  • live demonstration provided
@tomkralidis tomkralidis added the user story User story label Nov 9, 2021
@efucile efucile added the access control Access control label Nov 9, 2021
@efucile efucile modified the milestones: 0.1.0, 0.2.0 Nov 10, 2021
@tomkralidis tomkralidis modified the milestones: sprint-002, sprint-003 Feb 28, 2022
@tomkralidis
Copy link
Collaborator Author

Notes from 2022-03-09 discussion:

Terms

Authentication: verification of identity
Authorization: verification of access

Scope

In:

  • WAF
  • API
  • recommended/restricted data

Out:

  • core/public data
  • broker
    • delegates to WAF
    • inline messages are not in scope for wis2box

Considerations

  • one setup to be shared between WAF and API
  • if data permissions change (recommended to core, or vice-versa)
    • this should be communicated by a message/notification
  • need to assess existing tooling
  • can/should we use external services for authentication?
  • authorization will need to be tied to data structure
  • granularity: delineate by dataset collection
  • for a network of stations with hybrid access control, they can be grouped by core/recommended into distinct dataset collections

Next steps

@petersilva
Copy link
Contributor

adding #140 as a pre-requisite for this.

@tomkralidis
Copy link
Collaborator Author

@petersilva
Copy link
Contributor

I just wanted to point out... in Sarracenia, we implemented bearer_token support to work with NOAA sites ( https://omisips1.omisips.eosdis.nasa.gov ) the above example implements an "access_token" I'm not sure what standards apply to this stuff or if everyone uses their own stuff and every client is supposed to use custom javascript... it's odd

https://oauth.net/2/bearer-tokens/#:~:text=Bearer%20Tokens%20are%20the%20predominant,such%20as%20JSON%20Web%20Tokens

https://swagger.io/docs/specification/authentication/bearer-authentication/

I'm not sure if the two things are describing the same mechanism or not ... the implementation of bearer token support was quite simple:

           headers = {'user-agent': 'Sarracenia ' + sarracenia.__version__}
            if self.bearer_token:
                logger.debug('bearer_token: %s' % self.bearer_token)
                headers['Authorization'] = 'Bearer ' + self.bearer_token
           ....
            urllib.request.Request(self.urlstr, headers=headers)

One just includes there Authorization header when opening the request. I get the feeling OAUTH2 is huge and has many options, and so one can have completely different implementations of "OAUTH2" that don't work with each other because they implement different options or parts of it. The spec is open to many different use cases.

might want to clarify what, beyond just OAUTH2, needs to be implemented.

Found a link about the different token varieties:

https://www.c-sharpcorner.com/article/accesstoken-vs-id-token-vs-refresh-token-what-whywhen/#:~:text=Access%20tokens%20are%20credentials%20used,that%20bearer%20tokens%20be%20protected.

perhaps good to target bearer_token as first pass, likely sufficient for needs. the access token stuff seems to include continuous replacement of tokens, and looks a lot more complicated for the client to deal with.

@tomkralidis
Copy link
Collaborator Author

We should use "static" tokens to alleviate users from refreshing tokens based on expiry/etc.

Needs to be shared across nginx and API.

@webb-ben
Copy link
Member

@tomkralidis and I have been iterating a bit over the last couple of days. Going to collate some of the considerations we have discussed thus far. @petersilva interested to hear your thoughts!

  • Multiple containers need authorization and access control. To satisfy Cloud service / on-premises deployment #22, ideally, these would happen all in one layer & container. That could be the webserver authenticating traffic or an additional auth container.
  • We will need to implement custom access control for the API and WAF. yes/no access on a dataset or topic hierarchy basis. In the API that would be on the collections level which shares a name with the dotpath of the topic hierarchy. For the WAF, the topic hierarchy is (by necessity) a file path.
authz
=====
username,dataset
tomkralidis,data.recommended.foo
benwebb,data.recommended.foo
  • The logic for authorization will probably be modifying nginx configuration or making a sub-request to an authentication server. I am partial to the latter; A sub-request feels more digestible than modifying configuration files and could be done via proxy_pass.
  • Should authentication & authorization happen for every request and provide default authn/authz validate for anything considered “open data”? Or do we separate that from the topic hierarchy and settle it at the topmost level? This needs to be addressed for both the API and the WAF.
Restricted
/data/restricted/data/recommended/1/2/3/4/foo

Open
/data/public/data/core/5/6/7/8/bar
/data/public/data/recommended/11/22/33/44/baz

@petersilva
Copy link
Contributor

I think we should constrain things to be at least entire folders aka topics... not allow some files under the same topic to have some restricted items and other public ones. I don´t think we can restrict more than that...

@tomkralidis
Copy link
Collaborator Author

@petersilva agree, for WAF access control is on directories, for the API, this equates to dataset collections.

@webb-ben
Copy link
Member

I think we are all in agreement on this point

@tomkralidis
Copy link
Collaborator Author

Updates:

@tomkralidis
Copy link
Collaborator Author

We should also cover cases for embedded data. In this case, a possible option would be to have the entire channel to be authenticated. We should be able to advertise various data for various channels.

@tomkralidis
Copy link
Collaborator Author

Initial capability now in main branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
access control Access control user story User story
Projects
None yet
Development

No branches or pull requests

4 participants