Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming interface for queries? #41

Open
hvr opened this issue Jul 1, 2015 · 8 comments
Open

Streaming interface for queries? #41

hvr opened this issue Jul 1, 2015 · 8 comments

Comments

@hvr
Copy link
Contributor

hvr commented Jul 1, 2015

Currently, I use select as in e.g.

            coms <- runDbConn (select ((ComBoardField ==. BoardIdKey bid) &&.
                                       (ComArtNumField >=. lo') &&.
                                       (ComArtNumField <=. hi'))) cm

to fetch up to about 20000 rows from a table. However, this seems to result in 60MiB of ARR_WORDS being allocated according to Heap profiling, which is far more than the underlying (Sqlite) database contains actual data.

Is there a way to consume result values as soon as they get returned, rather than all at once? Maybe e.g. via a fold-like API?

@lykahb
Copy link
Owner

lykahb commented Jul 2, 2015

The size of data loaded in memory may be 2-4 times bigger than in the database because of boxed representation in Haskell.
Adding a streaming interface involves two issues:

  1. Where to put the streaming functions separating them from functions that return lists. This can be done by adding selectStream, projectStream to PersistBackend. Alternatively, they may be in another class or module. I am open to the name suggestions.
  2. Choosing the interface. There are two popular streaming libraries: Pipes and Conduit and they both have solid infrastructure. Currently Groundhog has a type RowPopper that existed in Persistent prior to conduit interface. The function selectStream can use either it or Source/Producer from Conduit/Pipes respectively. In the first case we can have libraries like groundhog-pipes that transform RowPopper to the actual type. In the second case Groundhog will be tied into one of the libraries.

@hvr
Copy link
Contributor Author

hvr commented Jul 3, 2015

The size of data loaded in memory may be 2-4 times bigger than in the database because of boxed representation in Haskell.

Yeah, but I'm surprised that the majority of heap objects are ARR_WORDS rather than (boxed) constructors...

I've got no strong opinion about the API. At the end of the day I just need a fold-style (and/or foldM-style) API, as I work mostly with Builders, and sometimes I want to interface with simple low-overhead ByteString.Builder -> IO ()-style sinks, for which I deliberately avoid the use of any high-level streaming framework. So I guess if I can "dumb down" a pipes tuple-source to a trivial fold-style API w/o paying too much overhead, I'll be happy :-)

@hansonkd
Copy link

hansonkd commented Sep 2, 2015

@lykahb

I'm currently writing a RethinkDB backend and am particularly interested in this issue. Having a Streaming instance opens up change subscriptions in Rethink and tailable cursors in MongoDB (Although this would probably be another class StreamingSubscribe or something).

I believe the Cursor only needs to be implemented in terms of two functions close (this is debatable since cursors should be automatically close, but could be handy for the subscriptions). and next which returns (Just Entity). From that we can build mapping/folding implementations. However this requires a cursor object which sql-simple doesn't expose.

We could implement it in terms of a fold (which I think all sql-simple packages already have implemented). PersistEntity v -> m () would be in the spirit of the library but it could also be a typefamily associated with Cursor a, CursorRow a, and the fold could be CursorRow a -> m () and make CursorRow a able to convert to Entity v, which would open up two functions foldRows and foldEntityRows.

I don't think groundhog needs to be tied to any other iterator libraries. Although a clean implementation would leave open the possibility of a lightweight groundhog-conduit down the road.

EDIT:
Actually a conduit library would be a pretty cool combination with websockets or other network that would let you publish db changes or events easily. With a StreamingSubscribe it could resemble something like meteor.

@lykahb
Copy link
Owner

lykahb commented Sep 2, 2015

I have just committed a streaming interface. It was written a while ago and was going to push it after writing at least either groundhog-pipes or groundhog-conduit. It has a bracket-like function interface that passes next and closes automatically. Please let me know what you think about ca2e901

@hvr
Copy link
Contributor Author

hvr commented Sep 2, 2015

@lykahb What's the ETA for groundhog-pipes btw?

@hansonkd
Copy link

hansonkd commented Sep 2, 2015

This looks like it should work! Thanks.

When I finish up my RethinkDB backend I'll take a look at implement groundhog-conduit

@lykahb
Copy link
Owner

lykahb commented Sep 4, 2015

Sorry, @hvr. I've just noticed your comment. I got back to groundhog-pipes yesterday. It is harder than I thought to use bracket-like function in there. Perhaps, as @hansonkd suggested, I will use next and close.

@lykahb
Copy link
Owner

lykahb commented Sep 12, 2015

I've committed groundhog-pipes. After logging is fixed, the packages will be ready for release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants