Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON with low memory consumption #232

Closed
proyb6 opened this issue Feb 9, 2020 · 8 comments
Closed

JSON with low memory consumption #232

proyb6 opened this issue Feb 9, 2020 · 8 comments

Comments

@proyb6
Copy link
Contributor

proyb6 commented Feb 9, 2020

Just tried with jsparser, I realise it consume lowest memory possible, less than 9MB, probably avoid extra allocation iirc.

macOS Catalina
Go version 1.14rc1
Timing to complete: <2.2s
Memory: <9MB

package main

import (
        "os"
        "bufio"
        "fmt"
        "strconv"
        "github.com/tamerh/jsparser"
)

func main() {
f, _ := os.Open("/tmp/1.json")
br := bufio.NewReaderSize(f, 16384)
parser := jsparser.NewJSONParser(br, "coordinates").SkipProps([]string{"name", "opts"})
x, y, z := 0.0, 0.0, 0.0; len := 0.0

for json := range parser.Stream() {
        xx, _ := strconv.ParseFloat(json.ObjectVals["x"].StringVal, 64)
        yy, _ := strconv.ParseFloat(json.ObjectVals["y"].StringVal, 64)
        zz, _ := strconv.ParseFloat(json.ObjectVals["z"].StringVal, 64)
        x += xx
        y += yy
        z += zz
        len += 1.0
}
        fmt.Printf("%.8f\n%.8f\n%.8f\n", x/len, y/len, z/len)
}
@proyb6 proyb6 closed this as completed Feb 9, 2020
@proyb6 proyb6 reopened this Feb 9, 2020
@nuald
Copy link
Collaborator

nuald commented Feb 10, 2020

The provided example reads from the file, not the memory. Please update it (or even better, send PR) with the code that reads into the memory first, and parse the JSON from the memory. The overall time could increase, but the measured time interval (for the actual JSON parsing) could decrease.

@proyb6
Copy link
Contributor Author

proyb6 commented Feb 10, 2020

I see, would you be interested to PR instead?

@nuald
Copy link
Collaborator

nuald commented Feb 10, 2020

Yes, please. As a guide, the PR for the new tests usually includes:

  • build.sh changes to compile the binary;
  • run.sh changes to run the binary;
  • the code itself contains notifications between the measured lines (first notification includes the name of the test and the PID, second may contain anything as it just notifies that the measurements should be stopped). Please use any other Go test as a reference.
  • the code should be formatted using the official tools (gofmt for Go).

@proyb6
Copy link
Contributor Author

proyb6 commented Feb 17, 2020

Sorry, haven’t have the time to follow up, I hope you could PR?

@nuald
Copy link
Collaborator

nuald commented Feb 18, 2020

PR #236 - Please note that is has bigger memory consumption because it reads the file from the memory (as all other tests), not from the file system directly. As for the performance, it doesn't beat other Go tests, but rather it's the slowest among them, so I have some doubts about including it into the benchmarks. However, if you wish, I'll merge the PR into master.

@proyb6
Copy link
Contributor Author

proyb6 commented Feb 23, 2020

In my opinion, if this can indicate as "read file from OS" or a separate JSON benchmark, otherwise, we can ignore the PR.

@tamerh
Copy link

tamerh commented May 10, 2020

Hi @proyb6 and @nuald

I recently noticed this issue and made some improvements and it is now faster and more efficient using avarage 5mb memory. It could be improved more but for now probably enough. It doesn't needed jsparser in your benchmarks but want to add few comments,

Most of the exisiting libraries including simdjson load all the file into a memory which gives a lot flexibility for fast parsing but requires big memory for large files and you need to wait all the parsing done for processing the data. My usecase was more suitable for stream parser that's why I wrote the parser.

Your benchmark counts total memory usage if you taken into acount avarage memory usage then
jsparser would probably stand somewhere in the top for avarage memory via using buffered reader only.

I impressed simdjson and simdjson-go via your benchmarks thanks for this. They have plan to implement Stream parsing in the future. Let's see how it will works out, probably no need to jsparser when implemented then I would also switch to it.

@beached
Copy link
Contributor

beached commented May 10, 2020

Even if you use memory mapping, which is essentially streaming, and doesn't really use much memory if it isn't there, as it relies on the OS paging, the measurement looks very similar with the measurement being done, at least a few months ago.

Another approach might be to take the memory prior to parsing after the file has been loaded as this will show parsing memory which seems like the goal.

@proyb6 proyb6 closed this as completed May 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants