Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extensible web plugin #718

Open
sampsyo opened this issue Apr 22, 2014 · 24 comments
Open

Extensible web plugin #718

sampsyo opened this issue Apr 22, 2014 · 24 comments

Comments

@sampsyo
Copy link
Member

sampsyo commented Apr 22, 2014

Following discussion in #456 and on the mailing list here and here, it's time to get slightly more serious about overhauling the web plugin for extensibility.

Here's my wishlist:

  • Start with a clean slate REST API. We should design this with several potential use cases in mind. The old API can stick around for backwards compatibility; perhaps the new one goes under /v2/ or something. The new API should be standards-worthy: clean and well-documented, potentially even for reuse by other tools.
  • Design a new Python API for exposing functionality via the web API.
  • This probably means rolling the web API into beets core instead of keeping it in a plugin.
  • Accordingly, we should separate the UI from the API itself. As I mentioned on the mailing list, this would all work best if beets core could avoid including the frontend HTML/JavaScript/CSS itself; this should be relegated to pluggable client interfaces. We could provide a simple default interface (perhaps hosted somewhere as static files?) but make it easy to switch to others.

How does all that sound to everyone?

@asutherland
Copy link
Contributor

This sounds great. Thoughts:

  • I would generally advise against taking on backwards compatibility burdens without known consumers of the existing API that are unwilling to upgrade and for which it would be more work to provide patches to update them. The existing API is reasonably straightforward and likely supported by any v2 API with minimal changes. I definitely would stash the old API under a /v1/ prefix or the like if doing so.

  • For the heavyweight class of web UIs (which I would categorize my efforts at http://www.visophyte.org/blog/?p=875 as) I think the primary desire for database exposure via the web API is for efficient replication of the entire database and any subsequent state changes to the client. This primarily entails being able to characterize the state of the database via a single revision id, indicate the current revision, and provide a mutually efficient means of getting a list of changes since a given revision id. The server indicating the revision id is too old and a bootstrap sync should be performed is an acceptable part of the protocol.

    Which is to say I think the actual REST API can and should start simple, growing to support the explicit needs of a light-weight UI targeted at resource-limited devices. I think web UIs with higher aspirations are likely to slurp the database since it provides them the most representation flexibility, general offline support, and resilience to network suckage.

    A datapoint that informs my opinion is that on the Mozilla Firefox OS project we've found that it's hard to design Web APIs (exposed to web content by the browser) that do everything every consumer wants. Specifically, this has been the case for our contacts database and our SMS/MMS message database. A simpler DataStore API designed around just allowing data replication is winning over specific APIs; limited docs and rationale can be found at https://wiki.mozilla.org/WebAPI/DataStore. This is not appropriate or workable in all cases (I work on the email app and having everyone replicate the mail database via simple sync API is untenable), but I think it's reasonable in this specific case.

  • As an example of an appropriate light-weight API, I think the current "web" plugin's REST API has it largely right. Its query mechanism uses the beets default search mechanism over a couple fields. I'm unsure if this actually exposes all of what http://beets.readthedocs.org/en/v1.3.5/reference/query.html does, but I think that simple Google-esque query mechanism with extensible keywords is the way to go. The lightweight UIs get all of the power of the beets query engine and any smarts it gains over time. The query returns typed artist/album/item id's that the web UI can then do an explicit RESTy fetch on to display.

    As an example of what I think would not want to be (initially) done is to try and explode the beets query system into a complicated JSON blob that gets posted.

@PierreRust
Copy link
Contributor

Sounds great to me :)

I agree with @asutherland that the current web API is not bad and a good starting point. I'm not sure we can design a new API from scratch without developing a web client at the same time to really identify that is needed. Here are my first thoughts :

  • only use beets API to isolate the web API from futures changes on the database schema (there's some sqlite direct queries in the current implementation)
  • add and extension mechanism to allow plugin to register new endpoints : I cannot really tell if this would require moving the Web API into the core.
  • add write access

I think i'll start hacking a client and see what else I actually need !

The revision-based replication API would certainly be a nice thing too, but I'm not sure how applies to web-based UI : you still have to store the data somewhere and imo local storage API are not meant for large amount of data, but I may be wrong. Yet it would be very useful to implement a native mobile client that would provide offline music access and statistics synchronization (like doubletwist airsync does it for example). A problem I see with replication though is that you have to re-implement the full query logic (which is very powerful !) in the client application. With that in mind I would really only use that approach for offline clients working on a limited subset of the collection.

@sampsyo
Copy link
Member Author

sampsyo commented Apr 22, 2014

Thanks for the discussion, both of you! Here are a few brief thoughts:

Backwards compatibility

Good points, @asutherland. The only consumers I know of are:

Not exactly a herculean task to bring these up to speed.

Maybe we can keep the web plugin around as legacy and build the new stuff (if we break compatibility) under a built-in beet serve command.

Heavyweight UIs and sync

This is an important thing to keep in mind and a really hard problem. I agree that it makes sense to add this kind of functionality later on as it will be a significant design challenge in itself.

Current API is mostly fine

Yes, also true. The warts I was referring to are just simple stuff: /item/query/QUERY is a weird endpoint; maybe it should just be /items/QUERY. That sort of thing.

Mozilla's DataStore is an interesting pointer. I've also been intrigued by JSON API, which is an effort to standardize the boring boilerplate in simple APIs. It should theoretically make it easier to write clients by reusing a generic JSON API library.

I also don't exactly know what to do with the query syntax. I've similarly considered encoding queries into crazy JSON objects, but I concur that this is probably overkill and not particularly helpful. One thing to consider, though, would be trying to at least write a specification of the query syntax—or a compatible parser/encoder in JavaScript or something—to prove to ourselves that it's sufficient for interoperability. (It's been designed ad-hoc along the way for human-legibility rather than machine manipulation.)

@asutherland
Copy link
Contributor

Re: local storage API, I'm mainly referring to use of IndexedDB whose semantics are roughly equivalent to LevelDB's (and is indeed backed by LevelDB on Chrome; Firefox uses a SQLite implementation.) Browsers have permission-based quota mechanisms that do a good job of making sure the user opts into non-trivial data use. There is still some work (in at least Firefox) to make sure it's easy for the user to notice the data usage and clean it up. You definitely would not want to use the localStorage API in browsers for anything more than trivial data storage since it's a foot-gun that should be avoided at all costs.

Re: duplicating the query API, I would not suggest heavy-weight clients try and duplicate the general beets query mechanism. As proposed, they could still use the query API, but they could then check their local cache (which need not be exhaustive and which can be kept up to date via the sync mechanism) to avoid issuing any redundant network requests while also not having to worry about stale data.

Indices and fancier use-cases

Mainly what I meant is that for many fancier use-cases you are going to want to have pre-computed indices/views of data to have minimal latency. Looking at the current SQLite schema for the beets database there are not a lot of indices but even if there were, many complicated queries are going to end up doing a fair amount of computation either in SQLite or beets itself which will affect latency. The lessons around DataStore were basically that 1) you can't predict every index everyone will ever want (especially since the best ones are composite and may involve values computed by app logic) and 2) indexes aren't free to maintain from a performance or disk perspective.

Beets could grow support for pre-computed queries that web UIs could either define in their minimal python extension definitions or that could be dynamically created on the fly. I definitely think it would be awesome for beets to gain this type of facility as arguably it most belongs inside beets. It's just easy to go down an ORM rabbit-hole here and end up complicating beets in a bad way, or just end up with people writing bad SQL that doesn't improve things.

As an example of a good thing beets could do here in the long term, I always liked CouchDB's view model. There are docs on it at http://docs.couchdb.org/en/latest/couchapp/views/intro.html. The basic idea is it's pre-computed map-reduce and you don't have to worry cluttering up the items/albums records with indexing hints.

I'm not sure I'd actually recommend CouchDB as a database backend, though. But as an example of how the browser can be good for experimentation here, http://pouchdb.com/ is a JS implementation of CouchDB. (I can't speak to its efficiency though.)

Delta Sync

Implementing this actually wouldn't be all that hard. The main thing needed is to add an "updated" column to "items" and "albums" alongside "added" that stores the timestamp of the last update to the record. CREATE INDEX items_by_update ON items (updated) is the index we need. SELECT id FROM items WHERE updated > ? should be a query that uses the index correctly in a covering fashion that avoids actually fetching anything from the items table, although I'd of course double-check it with EXPLAIN beforehand and maybe just explicitly hint that the index should be used. This has bounded disk-usage and "updated" is potentially a useful thing to have anyways.

The annoying thing is dealing with deleted albums and items. This requires one of the following: a bounded tombstone table, the client to just ask for the current set of all id's if it infers something might have been deleted, or a wide set of clever options like using merkle trees or probabilistic set stuff. I expect deletions to be rare so just checking the current set of valid id's seems workable.

I'm happy to provide patches along these lines, too. Although I totally agree this is a separate issue that should be thought of as an enhancement.

@geigerzaehler
Copy link

Design a new Python API for exposing functionality via the web API.

What do you mean by this, @sampsyo?

@sampsyo
Copy link
Member Author

sampsyo commented Apr 24, 2014

Ah, that was unclear, sorry; I just meant adding a plugin hook for extending the web server by exposing new endpoints, etc.

@PierreRust
Copy link
Contributor

I really like the @asutherland 's idea of pre-computed queries. I totally agree that we need minimal latency for a Web API and some queries might be too slow with the current implementation. While there's certainly some possible optimization it would be very hard, as you mentioned, to guess which queries must be lighting-fast and custom pre-computed queries would be very useful

I'm not sure I really get are you are planning to implement that though. Plugins would be able to request the core that an arbitrary query is pre-computed. Then the core, on each modification, would check which pre-computed queries might be affected by the change and run the query (probably in background) so that the result is already computed and available (it's result would probably be stored in the database). Is that wath you have in mind ? That would be great but must be used with caution as a bad plugin could slow everything down.

I also thought of two additional functionalities that would be needed :

  • pagination : on large collections I think paginated queries are a must, you certainly don't want to return a json string with several thousand albums as it would probably be to much for the client to handle at once and the latency would be pretty bad.
    A flexible pagination mechanism could work with two arguments : start and limit

    • start indicates the index of the first item you want
    • limit indicates the maximum number of items you want in the response (that is, the page size).

    For example, with 460 items and a page size of 200,

    start=0&limit=200 => returns the first 200 elements
    start=200&limit=200 => returns the elements 200 to 400
    start=400&limit=200 => returns the last 60 elements

Response to paginated queries could also contain a link to the next and previous pages (with the same page size). This would also provide end-of-list detection.

I'm not sure if every paginated request should return the total number of elements : it can be useful on the client side to know this in advance, but is not always convenient (or even possible) on the server side and could be optional.

  • server side sorting : sorting on an arbitrary object's attribute can be done one the client side, except when using pagination.

@sampsyo
Copy link
Member Author

sampsyo commented Apr 25, 2014

FWIW, sorting and pagination are in the JSON API spec. So if we go with that as a template, we don't have to design the details.

@asutherland
Copy link
Contributor

  • pre-computing/indices: Pierre, I have no plans to implement extensible query mechanism stuff at this time. And I agree with you that there is an inherent trade-off where indexes are involved because of the costs in updating them. Most naive systems will just remove the old index values derived from a record and then insert the new derived values afterwards.

  • pagination: I don't actually see a description of how pagination is supposed to work on http://jsonapi.org/format/. I assume when they specify it they will specify the optimal thing, but let me call that out here just in case. start/offset is known to be an efficiency footgun because start=200 effectively requires you to run the full query, seeking to start=0, then just skip over the first 200 records. SQLite is incredibly well written, so it will be as efficient as it can about this, but there's still inherent inefficiency.

    The usual stateless solution for this is for the requester to have the next request indicate the directly seekable unique id to start from. This works well if we're talking about albums stored in their natural id order. This doesn't work so well for queries. In the latter case, you probably want something to hold onto the state and you provide the API with a token that will expire at some-point. A good compromise for searches/sorts and other complicated queries is just to return the id's of the matches without returning the related records. The id's are compact and already known from the query. (And a LIMIT doesn't save any disk I/O, although SQLite optimizes memory usage by bounding its temporary result table and discarding any records that fall out of the table's size.) This works well with the JSON API spec and avoids persistent server resources being required.

    It's also worth noting that pagination without server state or keying off unique id's potentially opens the door to inconsistent state problems. Ex: you ask for the first 20 items, and the 20th item is B. But then beets imports another item which is now item 5, so when you ask for the next 20 items, you get B again.

@sampsyo
Copy link
Member Author

sampsyo commented Apr 25, 2014

Ah, good point; I thought I had read something but apparently made it up. Thanks for the advice here.

@canpolat
Copy link

It's exciting to see this in development. This may be too high level to consider now, but anyway...

Artist/Album/Track hierarchy is suitable (and good enough) for the majority of the users, I suppose. But it really is not enough for classical music listeners. Composer is always important (sometimes the most important) data that needs to be presented (or queried for). I would really like to see beets having a user-configurable hierarchy instead of a fixed one. Filter panes of WinAmp was a very good implementation of this approach. User has a bunch of pre-configured hierarchies that filters the music collection according to certain queries (e.g., one pane would list all tracks with "Classical" or "Baroque" Genre tag and list them with a Composer/Artist/Album/Track hierarchy).

I don't know if this would be too heavy-weight, but this kind of flexibility is something I really miss among the server-client media libraries. (Next logical step from my point of view would be an MPRIS plugin for beets :))

@sampsyo
Copy link
Member Author

sampsyo commented Apr 25, 2014

One note on that (configurable hierarchies): beets has for some time shipped with a tiny beets.vfs module precisely for building and querying hierarchies of tracks. Currently, it's only used for BPD. Maybe it can also be of use here.

@jonathanthomas83
Copy link

Would love to help out with any UI (HTML/jQuery/CSS/Other frontend) stuff. Have loads of ideas for editing tags and fronting the db in browser. Sorry I can't be much help with the coding stuff, but am willing to learn.

@jjrh
Copy link

jjrh commented Apr 30, 2014

So what's the status on this? What's our first step?

(am I perhaps missing where discussion is taking place?)

@sampsyo
Copy link
Member Author

sampsyo commented Apr 30, 2014

Good question, @jjrh. I've broken this up into a few tickets that I think represent solid units of work:

If you're interested helping out, I suggest you add yourself to one or several of these tickets. (For example, @jonthomas83 may be interested in #738.) Let the good work begin!

@jjrh
Copy link

jjrh commented Apr 30, 2014

@sampsyo Sorry, what do you mean by "add myself" to the ticket? I don't see any way to do this in github. Do you mean 'subscribe' to it?

@sampsyo
Copy link
Member Author

sampsyo commented Apr 30, 2014

Yep! Or just comment, which (a) subscribes you and (b) shows you in the "participants" thing on the side.

@jjrh
Copy link

jjrh commented Jul 17, 2014

Did I miss some further discussion or did people just get busy with other things too?

I know Pierre Rust had started work ( https://github.com/PierreRust/beets/tree/web-api/beetsplug ) Is there a newer branch I should work from?

@sampsyo sampsyo mentioned this issue Oct 11, 2014
@sampsyo
Copy link
Member Author

sampsyo commented Nov 1, 2014

I've started designing a new API for this hypothetical new plugin. The idea is to design a good REST API that should be usable by other projects as well. The docs and a reference implementation are currently hosted over in another repository: https://github.com/sampsyo/aura

Please let me know if you're interested in collaborating on that piece of the puzzle.

@jonathanthomas83
Copy link

Adrian, this looks really exciting! Great work. From what I can understand
of it, with my limited knowledge, it looks fantastic. Wish I understood
more to be able to help out!

GUI & UX helper here when you need him! :-)

On 1 November 2014 21:53, Adrian Sampson [email protected] wrote:

I've started designing a new API for this hypothetical new plugin. The
idea is to design a good REST API that should be usable by other projects
as well. The docs and a reference implementation are currently hosted over
in another repository: https://github.com/sampsyo/aura

Please let me know if you're interested in collaborating on that piece of
the puzzle.


Reply to this email directly or view it on GitHub
#718 (comment).

@magne4000
Copy link

Here's a little feedback of what I learned with a project of mine:
It's really difficult to have an efficient (web oriented) query system with paging behavior with a SQLite database (but not impossible).

The possible solutions are:

  • Full Text Search module of SQLite.
    • We could really have a huge performance gain, and it's probably the most cost effective solution (regarding impact and time of dev)
    • This would still require the rewriting of most python query processing by SQL ones.
  • Replace SQLite with an index such as Whoosh (really awesome project btw).
    • But this would have a tremendous impact on everything.

@sampsyo
Copy link
Member Author

sampsyo commented Dec 21, 2016

Neat! Whoosh looks nifty—I'll need to read more.

Out of curiosity, can you elaborate more on where you ran into pain points with SQLite? For example, was pagination just the final straw, or was everything fine until you tried to add pagination?

@magne4000
Copy link

In fact, it's not related to paging, it's related to the use of LIKE '%%', which will NEVER use any existing index. FTS solves this.

@sampsyo
Copy link
Member Author

sampsyo commented Dec 21, 2016

Oh, right, that makes total sense—it's something that affects us already with ordinary, command-line beets queries. A real index for textual queries would be a nice thing to have someday, not just for web-based uses but for all beets interactions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants