-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API to enumerate databases #31
Comments
FWIW, based on early feedback when IDB was shiny and new, Chrome implemented/shipped a prefixed API for this: partial interface IDBFactory {
IDBRequest webkitGetDatabaseNames();
}; Result is a |
I think we should keep this around given that it's useful for IDB libraries and diagnostic purposes. I recommend that we pursue a standardized non-prefixed version of this API e.g. |
I reckon no attempt needs to be made to ensure the results remain valid after the function call returns values. I would suggest leaving it up to the application author to deal with it. That said, a separate function like hasDatabase(name) would be useful, with the proviso that other open/create requests for the same name would be queued until after the current code finishes dealing with IndexedDB (I'm not familiar with the implementation enough to suggest a specific mechanism for this). |
@Gilead The only mechanism I can think of would be to hold onto a lock for the database, which is what you get with |
It could be treated just like another transaction (such as versionchange, readwrite, readonly) in the sense that no other transactions can execute while |
@aliams Sorry if I misunderstood your suggestion, but merely serializing the execution of |
That's a fair point @pwnall. Are you thinking that after |
@aliams Thank you for the clarifying question! I was (unclearly) trying to point out that an implementation of @Gilead's I think it makes sense to specify |
That makes sense to me! I agree that |
This issue is for a Chrome extension that is an administrative interface for PouchDB (which is implemented on top of IndexedDB). The removal of Note that this use case doesn't require any consistency guarantees. FWIW, most relational databases I've worked with don't provide transactional guarantees for these sorts of functions. |
WebKit would be happy to implement getDatabaseNames() as long as there are no consistency guarantees. |
@aliams Can your comments above be interpreted to mean that Edge would be interested in implementing as well? @bevis-tseng Any thoughts on Firefox interest in a standardized |
@pwnall, yes! |
Shall we at least has some guarantee on |
I think making it so that it would be blocked on version change transactions makes sense. Would it make sense to expose an array of |
It seems like we have 3 proposals.
partial interface IDBFactory {
IDBRequest getDatabaseNames();
} The IDBRequest result here is a list of names (
partial interface IDBFactory {
IDBRequest getDatabases();
} The IDBRequest result here is a
partial interface IDBFactory {
IDBRequest getDatabasesInfo();
} The IDBRequest result here is a dictionary IDBDatabaseInfo {
DOMString name;
unsigned long long version;
} For the third alternative, given that the use cases we've seen only need names, we have the option to start out with a dictionary whose only attribute is The first alternative seems to give more flexibility for implementations. For example, using my proposed weak guarantees, an implementation would be free to block on @aliams If I understand correctly, you prefer the 2nd alternative. What are your thoughts on the 3rd? @beidson You mentioned you'd be willing to implement the first alternative. What are your thoughts on the newer alternatives? Would they add a significant amount of burden to the implementation work? @bevis-tseng Do you prefer the 2nd or 3rd alternative? I'll find time soon to prototype the alternatives and estimate the amount of work, but I'm willing to implement whichever version we agree on. Thank you very much for your thoughts! (Also, I'm not attached in any way to the names in the 2nd and 3rd proposal, please don't be shy if you have other ideas!) |
I don’t think proposals 2-3 have to block on upgradeneeded any more than #1 does. Thats what I mean about no consistency guarantee - this is just a snapshot in time, has no guarantee of still being true after the IDBRequest is resolved. There’s no transaction for this, so the moment the results return they might be invalid by the time you try to use them. And that’s what we’re fine with. |
I'd prefer proposal#2. Regarding "queue a task" in #1 , I think this IDBRequest is very special and is different from all the other IDBRequest which could only be blocked by other requests bound to a specific IDBDatabase connection. |
I don't think we do...? I'm only familiar with WebKit's implementation and in that impl this doesn't concern me. I still don't see how others can have implemented things such that they do have a huge concern here. Here's what we do. No matter how many unique databases are open with an active request queue, we're still piping everything through only a handful of threads all in one process. Handling this "getAllDatabases" request will hijack the database i/o thread for the period of time it takes to tally all of the results. During that period of time literally no other database operations will happen (version change or not). It doesn't matter if any version change transactions are in progress or not. The "getAllDatabases" operation will read the non-committed truth from disk for all in-scope databases. If you have a database at version 1 and it has an open transaction that will be upgrading it to version 2, it will definitely and deterministically be returned as version 1 in the getAllDatabases call. Does anybody else implement things so differently that this isn't automatically the case, or at least wouldn't be easy to make happen...? |
An obvious difference in implementation I could see would be if web content processes all each hit the database files on disk themselves and used the application process as the gatekeeper for event dispatching to other connections, etc. In that style of implementation there's a lot more variability in what happens "If a getAllDatabases call is made while a version change transaction is in progress" But even in WebKit's implementation, the version change that was in progress before getAllDatabases was issued might complete before the getAllDatabases call actually executes. It's racy, it's non-deterministic, and I can't think of a use-case of this API where that's not fine. |
Is there more use case study before adding this API in a rush? |
@nolanlawson, any thoughts here from a dev's perspective about the |
I'm starting to think that I made a mistake by reusing the IDBRequest mechanism, and the API would be better served by a Promise. I'd rather not invoke the algorithms related to queuing an IDB request, so we have complete freedom in specifying (the lack of) ordering guarantees. @beidson -- On your question about why implementations would need to block, I can imagine an implementation where all the IDB metadata for an origin lives in a single table. Can we agree to spec the method in such a way that an implementation that needs to block on I think we'll need some sort of guarantees, to be able to write WPT tests for this feature. My proposal (roughly the same as before), which I think is a subset of what WebKit offers, is:
@bevis-tseng -- Regarding your comment about use cases and rushing, I'll share my thoughts, and end with a question. I think we had Are there any use cases you'd like to explore, other than the two listed above? Is there any other information that make you comfortable reaching a consensus about this API? Thank you very much for your comments, everyone! IndexedDB is subtle, and thinking about it can be draining. I am grateful for your replies! |
If this API is only use for debugging or removing(it makes not much sense to open all the IDBDatabase obects at once), it seems over-designed to make too much guarantee for the consistency by returning all IDBDatabase objects. |
Then such implementations could not resolve the getAllDatabases() request until the version change transaction(s) are finished, but the other implementations could. That's fine.
This is what I'm voting for. As an aside, "inconsistent data" is being thrown around a lot here. I'm thinking the term is wrong for this. It's "racy data" in all implementations, in that whatever is returned by the getAllDatabases call might be partially (or fully) out of date by the time it is received, and even more out of date by the time it is used. And, again, we're fine with that. Any attempt to guarantee that the results of getAllDatabases() somehow lock all of the relevant databases from use by all other web contexts, therefore getting rid of the raciness... Is something that would be overtly complex, over engineered, and of dubious value. |
I prefer getAllDatabases Also support a Promise over and IDBRequest. I'm pretty sure:
Should be:
|
Thank you very much for the IDL correction! Updated 😄 |
One potential use case from LinkedIn There's a very real possibility that teams may have widely varying database names, and it's helpful to be able to wipe out any potentially harmful data across ALL databases (and AFAIK there's no path forward without the ability to enumerate over db names). |
@mike-north Thank you very much for the feedback! everyone: I intend to put together an explainer and a prototype implementation for Chrome, then ask TAG for a review. Please comment soon if you have any objections to the current API shape, which is: partial interface IDBFactory {
Promise<sequence<IDBDatabase>> getAllDatabases();
}
dictionary IDBDatabaseInfo {
DOMString name;
unsigned long long version;
} |
That looks good to me! I was just wondering if making it return a promise would seem inconsistent with how the rest of the spec returns IDBRequest objects. Any thoughts on that? |
@aliams I agree that returning a Promise is inconsistent with the rest of the IndexedDB API. I proposed the Promise approach to avoid the queueing machinery associated with IDBRequest. If we used IDBRequest we'd have a null If I understand correctly, @beidson and @bevis-tseng are in favor of returning a Promise. I am happy to implement either approach, and slightly favor the usability of the Promise approach. Are you opposed to the Promise approach, or just noting the inconsistency? Thank you for your thoughts! |
I think I would prefer being consistent here from a developer's expectation standpoint, but I wouldn't want us to block on it. |
Promises weren't around when IDB1 was done. Developers have never had this API before, so introducing it will result in all new code, so there can be no previous assumption that it ever worked like an IDBRequest. Additionally, IDBRequests are about just that - Indexed Database Requests. Their entire construction is based around a query to a unique instance of an Indexed Database, whereas this call is about querying the state of all indexed databases. IDBRequest doesn't fit. |
Fair enough - are there any other objections with the proposal from @pwnall above? |
SGTM. Only thought is to simplify the name further to just |
That sgtm too |
Hello! TLDR: This new feature will break some security design and privacy. To prevent this, I suggest to add an option to hide/unhide a database from your getAllDatabases() function. Long story: The publication of the API (8th Jan 2015) don't permit to list databases. Google Chrome have also removed their webkitGetDatabaseNames() to respect your initial API ( https://www.chromestatus.com/features/5725741740195840 ). A example where it's possible to guess if a user is registered on some websites: Imagine a family where the boy is gay and registered on a gay website with the family computer. If the website want to store user's private information in a IndexedDB, but secretly with a database name from a derived key of the login/password of the user. With your new feature, it's possible for the father with little JavaScript knowledge to list the databases and to discover that his son is registered on this gay website. The PWA example There is a big buzz around PWA, and if it's the future, this mean all websites will store more data inside IndexedDB to prevent offline problem. So, if the website is well designed, each user on a device will have a separate database with random names. With your new feature, it's possible for a JS Coder / XSS attack to connect to theses private database to steal theses information. An example of security design: WebCrypto API permit to generate keys, to prevent extraction and to store directly the CryptoKey inside IndexedDB. This is really useful, but this need to be stored inside a unguessable random database name. This random name is only known by the server and sent to the client if the login/password are correct. With your new feature, this break this security design and make WebCrypto sign/verify features useless. Please, to not break security and privacy, could you add an option to hide/unhide a database from your getAllDatabases() function? |
Their reason for removing it is quite clearly spelled out at that link and it has nothing to do with security/privacy.
This is pretty much FUD. Let's say the son registers at "https://www.gaybook.com/" and it stores some information in a local IndexedDB. Every browser already lets you natively list which websites have data stored on your local machine, and "https://www.gaybook.com/" would show up in that list already. If this API is added, it doesn't really change the situation. The father now could know ahead of time to open a tab to "https://www.gaybook.com/" so he's executing javascript in the "https://www.gaybook.com/" origin and list all the databases... but he already had the native ability to list databases in the browser. Any time an attacker has physical access to the machine you've already lost. The only protection for this concern is to use different user accounts, encrypted home directories, etc etc.
The bug in this scenario is the XSS attack itself, not the DB feature. If the website is well enough designed to have truly unguessable names for each user's DB (probably not!), then it's probably also well enough designed to not have this XSS vulnerability. In the presence of the XSS attack, the user's data is already vulnerable whether or not this convenience method exists.
Again, you are assuming an XSS vulnerability here. You are also assuming "security by obscurity" which should not be a technique used by any truly secure website. Let's define for a moment that XSS-vulnerable web sites are broken. |
@lakano Thank you for your perspective! The threat model for IndexedDB (and other local storage primitives) is that each origin is a principal, so an origin has full access to all the data that it has written. None of the primitives is designed with the thought of having multiple principals in the same origin. In your first example, the parent can open Dev Tools in any of the major browsers and see all the databases for an origin. The parent can probably look through the browser's history as well. For your second example, it's really difficult (if not impossible) to design a Web application that supports having mutually distrusting users authenticated at the same time. Again, Dev Tools can most likely be used to extract information belonging to other users. Each user should have at least a separate browser profile. Ideally, users should be separated at the operating system level, so they get separate (optionally encrypted) home directories. I hope this helps. |
@beidson You're right about XSS attacks, but you surely known there is a lot of websites that are not fully protected by XSS attacks. If we forget XSS attacks, there is still possibility for a JS developer to extract database manually because of this new feature, isn't it? |
@pwnall Ok, your right it's possible to see them from Dev Tools, good point! So it's not secure to store data inside IndexedDB, and PWA should not store any private information (this make them less useful BTW). |
This brings to mind another use case we have at LinkedIn. If a user explicitly logs out, we would like the ability to reliably enumerate over and destroy all databases, getting rid of potentially private data. Of course we could attempt to "remember" all DB names and enumerate over that list, but that's a human process that could potentially fail. |
@mike-north This is definitely something that should be doable. You should (eventually) be able to use the Async Cookies API in a Service Worker and enumerate+clear the databases. You should also (eventually) be able to use an HTTP header to clear the origin's data. |
I tossed a spec PR up at #240 for this. Needs tests and an implementation. |
Implementation in Chrome behind a flag Test cases: https://github.com/web-platform-tests/wpt/blob/master/IndexedDB/get-databases.any.js |
Firefox Gecko implementation will happen on https://bugzilla.mozilla.org/show_bug.cgi?id=934640 |
Moved from: https://www.w3.org/Bugs/Public/show_bug.cgi?id=16137
Discussions:
http://lists.w3.org/Archives/Public/public-webapps/2011JulSep/1528.html
http://lists.w3.org/Archives/Public/public-webapps/2011JulSep/1537.html
Critical bit: how do we do this such that the answer isn't obsolete as soon as it's given, since there are no transactions over the set-of-databases.
The text was updated successfully, but these errors were encountered: