Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of String for internal representations. #43

Open
hansonkd opened this issue Sep 7, 2015 · 5 comments
Open

Get rid of String for internal representations. #43

hansonkd opened this issue Sep 7, 2015 · 5 comments

Comments

@hansonkd
Copy link

hansonkd commented Sep 7, 2015

I did a quick first pass here: I made library dependent strings into Text and SQL related strings into Utf8.Text in BSON, Redis and ReQL and the like are closer to Text than to String so it might be nice to have a more direct relationship for the mappings to the backend.

I also got rid of StringLike interface on flattenP since it would've converted into string and back. This also makes it easy to swap out what type Utf8 is, making it easy to try out different string representations.

It still needs some more work. So far I converted groundhog, part of groundhog-th and groundhog-sqlite
https://github.com/lykahb/groundhog/compare/master...hansonkd:string-to-text?expand=1

I thought this would have had more of a speed bump, but it looks like it is about the same as current master so I am probably going to put it on hold. Although it still solves the problem of preventing encoding information from being lost (in an intermediate PersistString).

This also removes read for integer conversion in favor of Text.Reader. A parsing library should be found for datetime conversion.

Edit:
This should only mean backend changes. Users can keep using Strings and have the same performance as before.

By my current benchmarks it increases the speed of Text fields 2x-4x (compared with Text fields in Master), speeds up query render (see critLite.hs) times between 1.5x and 3x, String fields remain about the same performance. Switching from String to Text will be about a 1.5x speedup.

@hansonkd
Copy link
Author

hansonkd commented Sep 7, 2015

Actually I'll show my benchmarks:

With this bench:

{-# LANGUAGE GADTs, TypeFamilies, TemplateHaskell, QuasiQuotes, FlexibleInstances #-}

import Control.Monad.IO.Class (liftIO)
import Control.Monad
import Database.Groundhog.TH
import Database.Groundhog.Sqlite
import Data.Text (Text, pack)
import Data.Text as T
import Data.Maybe (fromJust)
data Person = Person {name :: Text, age :: Int, height :: Int} deriving (Eq, Show)

mkPersist defaultCodegenConfig [groundhog|
- entity: Person
|]

-- First, normal build, then profiling build with "-osuf p_o -hisuf p_hi"
main :: IO ()
main = withSqlitePool ":memory:" 10 $ \gCon -> runDbConn (do { runMigration $ migrate (undefined :: Person); res <- foldM (\acc b -> go >>= return . (+) acc) 0 [1..100]; liftIO $ print res}) gCon
  where go = do
          let person = Person (pack "abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc") 22 180
          k <- insert $ person
          b <- get k
          -- Do some math
          (replicateM_ 1000 $ insert person)
          replicateM_ 1000 $ get k

          -- Force the evaluation of the name field by getting its length
          calc <- (foldM (\acc b -> acc `seq` get k >>= return . ((+) acc . T.length . name . fromJust)) 0 [1..10000])
          calc `seq` return calc

Using master:
http://imgur.com/IYFyvRS (http://i.imgur.com/SHiT1qO.png)

Using Master where name :: String
http://i.imgur.com/UAcwlVV.png (http://i.imgur.com/c3VDqCg.png)

Using Groundhog with No Strings (Over 2x speedup):
http://i.imgur.com/4ux62Tc.png (http://i.imgur.com/JNTdePg.png)

Using No Strings but name :: String (About the same as master)
http://i.imgur.com/3q5BbXA.png (http://i.imgur.com/hCROyjw.png)

So when it comes to text processing I guess it is a bit more than a bump. Seeing as I would suspect most haskellers would use Text for their programs, this would mean an almost 2x speed up.

Edit:

There was a memory bug before, but I added accseq and re ran the benchmarks.

@lykahb
Copy link
Owner

lykahb commented Sep 9, 2015

Thank you, it is an interesting experiment. It looks like the speedup comes mostly from adding Text to PersistValue.

The type Utf8 has somewhat unfortunate name. I did not intend to expose it to the general API, it was just a wrapper for the Builder used for SQL rendering. This synonym helped experimenting with several other datatypes before I chose Builder.

Can you create a pull request that replaces PersistString with PersistText? Then we can see how replacing String with Text in entity description affects performance. It is a big change that needs some more thought.

@hansonkd
Copy link
Author

I agree. It is a big change (It started with PersistValue and then I kinda got carried away). There are essentially 3 parts to this:

  1. ) PerstistValue
  2. ) Internal field representations
  3. ) SQL representation.

Changing PersistString to PersistText is where probably 90% of the performance boost came from. But there is more to moving off strings completely than just performance. Many backends support UTF8 table names and fields (interestingly so do Haskell functions).

I will work on the PR for isolating the PersistString changes.

@lykahb
Copy link
Owner

lykahb commented Sep 27, 2015

I am going to issue a new major release soon. If you don't have time for PersistText PR now, I can do the change myself and mention you in the commit.

@lykahb
Copy link
Owner

lykahb commented Nov 17, 2015

Thanks, @hansonkd
if you rebase your branch on top of HEAD we can see how internal field representation affect performance. If it is noticeable we can do the change. SQL generation has been optimized with Builder for quite a while.
98671a2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants