JSON with low memory consumption #232

proyb6 · 2020-02-09T06:06:38Z

Just tried with jsparser, I realise it consume lowest memory possible, less than 9MB, probably avoid extra allocation iirc.

macOS Catalina
Go version 1.14rc1
Timing to complete: <2.2s
Memory: <9MB

package main

import (
        "os"
        "bufio"
        "fmt"
        "strconv"
        "github.com/tamerh/jsparser"
)

func main() {
f, _ := os.Open("/tmp/1.json")
br := bufio.NewReaderSize(f, 16384)
parser := jsparser.NewJSONParser(br, "coordinates").SkipProps([]string{"name", "opts"})
x, y, z := 0.0, 0.0, 0.0; len := 0.0

for json := range parser.Stream() {
        xx, _ := strconv.ParseFloat(json.ObjectVals["x"].StringVal, 64)
        yy, _ := strconv.ParseFloat(json.ObjectVals["y"].StringVal, 64)
        zz, _ := strconv.ParseFloat(json.ObjectVals["z"].StringVal, 64)
        x += xx
        y += yy
        z += zz
        len += 1.0
}
        fmt.Printf("%.8f\n%.8f\n%.8f\n", x/len, y/len, z/len)
}

The text was updated successfully, but these errors were encountered:

nuald · 2020-02-10T15:57:22Z

The provided example reads from the file, not the memory. Please update it (or even better, send PR) with the code that reads into the memory first, and parse the JSON from the memory. The overall time could increase, but the measured time interval (for the actual JSON parsing) could decrease.

proyb6 · 2020-02-10T17:05:51Z

I see, would you be interested to PR instead?

nuald · 2020-02-10T17:11:34Z

Yes, please. As a guide, the PR for the new tests usually includes:

build.sh changes to compile the binary;
run.sh changes to run the binary;
the code itself contains notifications between the measured lines (first notification includes the name of the test and the PID, second may contain anything as it just notifies that the measurements should be stopped). Please use any other Go test as a reference.
the code should be formatted using the official tools (gofmt for Go).

proyb6 · 2020-02-17T17:05:10Z

Sorry, haven’t have the time to follow up, I hope you could PR?

nuald · 2020-02-18T19:18:49Z

PR #236 - Please note that is has bigger memory consumption because it reads the file from the memory (as all other tests), not from the file system directly. As for the performance, it doesn't beat other Go tests, but rather it's the slowest among them, so I have some doubts about including it into the benchmarks. However, if you wish, I'll merge the PR into master.

proyb6 · 2020-02-23T12:25:12Z

In my opinion, if this can indicate as "read file from OS" or a separate JSON benchmark, otherwise, we can ignore the PR.

tamerh · 2020-05-10T14:03:05Z

Hi @proyb6 and @nuald

I recently noticed this issue and made some improvements and it is now faster and more efficient using avarage 5mb memory. It could be improved more but for now probably enough. It doesn't needed jsparser in your benchmarks but want to add few comments,

Most of the exisiting libraries including simdjson load all the file into a memory which gives a lot flexibility for fast parsing but requires big memory for large files and you need to wait all the parsing done for processing the data. My usecase was more suitable for stream parser that's why I wrote the parser.

Your benchmark counts total memory usage if you taken into acount avarage memory usage then
jsparser would probably stand somewhere in the top for avarage memory via using buffered reader only.

I impressed simdjson and simdjson-go via your benchmarks thanks for this. They have plan to implement Stream parsing in the future. Let's see how it will works out, probably no need to jsparser when implemented then I would also switch to it.

beached · 2020-05-10T18:31:13Z

Even if you use memory mapping, which is essentially streaming, and doesn't really use much memory if it isn't there, as it relies on the OS paging, the measurement looks very similar with the measurement being done, at least a few months ago.

Another approach might be to take the memory prior to parsing after the file has been loaded as this will show parsing memory which seems like the goal.

proyb6 closed this as completed Feb 9, 2020

proyb6 reopened this Feb 9, 2020

proyb6 mentioned this issue Feb 10, 2020

JSON benchmark ecoshub/jin#3

Closed

proyb6 closed this as completed May 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON with low memory consumption #232

JSON with low memory consumption #232

proyb6 commented Feb 9, 2020 •

edited

Loading

nuald commented Feb 10, 2020

proyb6 commented Feb 10, 2020

nuald commented Feb 10, 2020

proyb6 commented Feb 17, 2020

nuald commented Feb 18, 2020

proyb6 commented Feb 23, 2020

tamerh commented May 10, 2020 •

edited

Loading

beached commented May 10, 2020

JSON with low memory consumption #232

JSON with low memory consumption #232

Comments

proyb6 commented Feb 9, 2020 • edited Loading

nuald commented Feb 10, 2020

proyb6 commented Feb 10, 2020

nuald commented Feb 10, 2020

proyb6 commented Feb 17, 2020

nuald commented Feb 18, 2020

proyb6 commented Feb 23, 2020

tamerh commented May 10, 2020 • edited Loading

beached commented May 10, 2020

proyb6 commented Feb 9, 2020 •

edited

Loading

tamerh commented May 10, 2020 •

edited

Loading