-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement metadata support in server (to replace Graphql) #2048
Comments
Just a clarifying note: the motivation for using GraphQL in the dashboard was to allow the efficient access and mutation of a small subset of a potentially large volume of user metadata stored in the backend DB (currently postgresql). In migrating the implementation from dashboard to server, the greatest benefit specific to a GraphQL "backend" would be that dashboard code could be supported without change. While that's not without value, it's also not essential; and, in particular, your proposal above doesn't achieve that because it hypothesizes a simple JSON REST API model. If we're not exposing a GraphQL API, then using GraphQL internally may become 100% overhead vs simply using the right SQL queries directly. (GraphQL on top of postgresql can't do anything we can't do with SQL, and if we're tying the operational range down by exposing a non-GraphQL API, our client doesn't benefit from the flexibility of the GraphQL engine.) The original concern is that we might end up with a GET metadata / PUT metadata API model where the entire payload of user metadata is all in one bundle. But there are other alternatives; for example, a GET and PUT where specific parts (e.g., keywords) are specified. |
Summary of my discussion with @dbutenhof on this issue: We figured there are 2 possible approaches to move forward:
With 1st approach we would not have to change the dashboard code much as it can continue to make graphql queries against the a respective graphql endpoint, however, this could create some inconsistency in our API architecture where we would have certain endpoints that needs a JSON input and certain endpoints that would need a graphql query and dashboard needs to know which endpoint need what type of data. With 2nd approach we would have to change the dashboard code a bit specifically about how it makes the graphql query. However, this would create a consistency among all the server APIs definition without adding any extra overhead. Also our dashboard code needs some changes anyway once all the APIs and authentication mechanism is baked out. The Therefore we decided to possibly keep the graphql schema definitions in the code but just not using it for now instead making a SQL query to handle the requested data and therefore essentially adopting a 2nd approach but keeping the graphql schema definition in the code so if we have to make a transition, its quicker (but this is in the anticipation that if we ever have to). Any comments/suggestions? |
One problem with exposing a GraphQL endpoint is that we can't just "check authorization" up front; we need to be sure either that the GraphQL schema doesn't allow any possible mutation that might be restricted, or we need to analyze each GraphQL query payload to be sure that it doesn't try to mutate (or in fact even examine) anything we wouldn't allow for the user. I conceive of our metadata model as something like a 2-level per-user NOSQL query. That is, our metadata is structured somewhat like this, conceptually: (visualized as JSON, though we're really talking about one or more SQL tables and we probably only manifest the user's individual metadata values as JSON)
E.g., the current dashboard "URL" metadata might be represented as "metadata key 1" above, with a name like |
Can you enumerate how the pbench-dashboard uses GraphQL? What is the current schema today? I believe it is defined at: https://github.com/distributed-system-analysis/pbench-dashboard/blob/main/prisma/datamodel.prisma For convenience, copied here:
The The Instead of writing an entire API, why not use something like [1] https://memcached.org/ I am personally very fond of the simplicity, performance, and flexibility of memcached, as I have used it for past web sites to make them scale well. One of the use cases we used it for was for session tokens and associated stored data. |
The session module from the dashboard is same as the metadata object module (sqlalchemy table) defined here. This table has a value column that can store a text blob and this blog can be anything dashboard/client wants it to be along with a user defined key to segregate different types of metadata objects. The reason for using a sql table is to persist the metadata. These metadatas can be saved/favorite view/sessions that user wants to save so that they can get consistent view when they login everytime. You seem to be suggesting that we use the cashing mechanism instead of persistent storage. I am not sure how and why the metadata objects should be short lived. A user would want to persist their choice of favorite/saved views to get a consistent dashboard experience isn't it? Earlier the dashboard was using the Graphql interface to create/query the metadata objects (session module from dashboard). However the only difference is we decided to implement it using the REST protocol keeping everything else same. |
The existing To that end, we need to convert these URL sharing sessions to something temporary in the short term ( All this mechanism for a 1-to-1 feature compatibility with GraphQL does not seem worth it. We might as well keep using the existing GraphQL and add a timeout for now. Before we continue with this work, we really need to work out the full use case in the Dashboard for the functionality of the feature, and then implement from there. |
It isn't just sessions, it can be any metadata text blob that dashboard on behalf of a user would want to store on the server. However, you raised a valid points how long they need to be persisted and how many should a user be allowed to create. My understanding is that we do not have complete answer for it yet as we never discussed around it and that's why the PR implements these details unbounded. We certainly need to discuss these more in detail since implementing it via cashing mechanism would completely change the design workflow. |
The API Nikhil has implemented certainly provides for "management" in a manual sense, in that we can display all metadata keys and values, and delete any we want. An administrator (when we have such a think) should be able to do this on behalf of any user. We'd just need a management console view.
So you're proposing that "saved views" (and "favorites") continue to be strictly a front-end concept that couldn't (easily) be used by another client and would be completely invisible to the backend? (Our current "saved view" wouldn't be useful to anything but our dashboard anyway since my understanding is that it's essentially an abridged Redux dump, but in principle, user's "favorites" sounds like a core server concept that ought to be more visible.)
A timeout is interesting, but seems to lessen the utility of the saved view concept. And to have "favorites" time out while the datasets still live seems... weird. I actually think the concepts we discussed in the meeting would be a much better alternative for the "saved view" ... the dashboard (possibly in conjunction with the server) generates a URI that you can copy and paste anywhere; email, chat, embed in a document, and click to restore. The state in the current saved view metadata would be embedded in the URI as a query tag, along (possibly) with a role-based server authorization token (which may not be necessary if it only applies to "published" data visible to everyone anyway). That way neither the server nor the dashboard needs to store saved views at all. But while this makes a lot of sense for "saved view", I don't think it makes sense for "favorites" ... and I like the idea of letting users "tag" their datasets to organize them. In fact, I think "favorites" is overly simplistic, and we should consider allowing users to add arbitrary tags on which they can search and sort in the dashboard. That implies persistent metadata with a long and reliable lifetime. Our user metadata would easily allow that without any additional changes. Of course that's still user-specific tags, and if we want to add arbitrary metadata through the UI to the datasets, we can either use my Postgresql dataset_metadata, or we can add fields to the Elasticsearch run documents. Either way, this sort of persistent (and, yes, manageable, given the proper UI work on top of it) seems valuable.
Agreed. We've always been focusing on replacing what the dashboard does internally with server APIs, based on the implicit design criteria manifested by the existing implementation ... often without really pushing at the parameters to determine how much of this was really well-considered and deliberate vs what just "happened" and might not necessarily be the best way forward. The whole issue of "anti-users" falls into that bucket (because "anti-user" is cooler and easier to say than "non-logged-in user"). |
Filed distributed-system-analysis/pbench-dashboard#133, where we should really re-consider the implementation of the dashboard session sharing feature, re-implementing it as a URL for all pages that can be shared so that existing bookmarks / url shortening capabilities can be leveraged instead of a server side database. |
From @dbutenhof: "We'll handle server-side Dataset and User metadata completely differently." |
Right now we have GraphQL server running with Prisma and Apollo server, however, in an efforts to create more robust Pbench server APIs we need to move the implementation on the server side. Therefore we need to re-define our GraphQL types, schema and mutations in server code.
So the GraphQL implementation should sit in the server where as the front-end request the required data for the dashboard using only the exposed APIs. Making a GraphQL queries from front-end shouldn't be required. The APIs would receive a simple JSON request which server then parses after passing all the authentication and make a valid GraphQL schema for execution. Upon completing the schema execution server returns the results back to the front-end.
The text was updated successfully, but these errors were encountered: