Orca hanging when large JSONs are piped in #110

sdrap · 2018-08-06T21:26:29Z

I have a dataframe with 20 years of daily data. iplot can process any plots.

However, I can only use orca when I slice the dataframe for less than 4 years of data.
It fails in the notebook as well as in command line from a json dumped file with the following text.

A JavaScript error occurred in the main process
Uncaught Exception:
TypeError: path must be a string or Buffer
    at Object.fs.mkdirSync (fs.js:891:18)
    at main (/usr/local/lib/node_modules/orca/bin/graph.js:105:8)
    at Object.<anonymous> (/usr/local/lib/node_modules/orca/bin/orca_electron.js:73:25)
    at Object.<anonymous> (/usr/local/lib/node_modules/orca/bin/orca_electron.js:99:3)
    at Module._compile (module.js:569:30)
    at Object.Module._extensions..js (module.js:580:10)
    at Module.load (module.js:503:32)
    at tryModuleLoad (module.js:466:12)
    at Function.Module._load (module.js:458:3)
    at loadApplicationPackage (/usr/local/lib/node_modules/electron/dist/resources/default_app.asar/main.js:287:12)

The JSON files are 250Kb large for the 20 years of data.
data.zip

The text was updated successfully, but these errors were encountered:

etpinard · 2018-08-06T21:33:12Z

I suspect you're calling orca as:

orca graph data.zip

which won't work, as we currently only accept path/to/json/files, url/to/json/files or json strings.

Is this indeed the case?

sdrap · 2018-08-06T21:39:07Z

No I uploaded as a zip because github doesn't accept plain json. I run it as follows

orca graph data.json -o test.png -debug

or alternatively

cat data.json | orca graph > test.png

which just hang.

I don't know exactly what is the problem.

etpinard · 2018-08-06T21:58:41Z

I think something is up with your data encoding:

var s = fs.readFileSync('./data.json');
typeof JSON.parse(s)
// => return 'string'

Orca expect the result of JSON.parse to be an object.

sdrap · 2018-08-06T23:31:10Z

I am puzzled a bit with JSON. I dumped in the file using

A = df['S'].iplot(**kwargs)
file = 'data.json'
with open(file, 'w') as outfile:
    json.dump(json.dumps(A, cls=plotly.utils.PlotlyJSONEncoder), outfile)

Strangely enough it starts with "{ and put a backslash in front of every double quote.
I edited the file removing these characters in a working json that I can process through cat.

I have here the two json (the one year and 20 years) and process them by

cat data.json | orca graph >test.png

the small one get through the large one not.

I also made a copy paste of the JSON file and processed it through the pipe.
However I get the the error from cat

bash: /bin/cat: Argument list too long

I thought it could be related to the ARG_MAX value of my system which is capped to 2097152.
However when I count the number of characters of the data it is 255272.

In the python notebook, with command

A = df['S'].iplot(**kwargs)
B = json.dumps(A, cls=plotly.utils.PlotlyJSONEncoder)
file = '-o test.png'
call(['orca', 'graph', B, file])

I always get the error [Errno 7] Argument list too long: 'orca'

while

A = df['2018']['S'].iplot(**kwargs)
B = json.dumps(A, cls=plotly.utils.PlotlyJSONEncoder)
file = '-o test.png'
call(['orca', 'graph', B, file])

goes through.

Terribly sorry to disturb, it may well be a problem outside of the scope of orca.
data2.zip
data.zip

etpinard · 2018-08-07T13:35:44Z

Terribly sorry to disturb, it may well be a problem outside of the scope of orca.

No worries at all. The orca CLI is bound to have a few rough edges at the moment. Thanks very much for writing in.

As mentioned in #104 (comment), I'd recommend first saving large JSONs to a temporary file.

Now, your snippet

A = df['S'].iplot(**kwargs)
file = 'data.json'
with open(file, 'w') as outfile:
    json.dump(json.dumps(A, cls=plotly.utils.PlotlyJSONEncoder), outfile)

seems odd. Wouldn't

A = df['S'].iplot(**kwargs)
file = 'data.json'
with open(file, 'w') as outfile:
    json.dump(A, outifle, cls=plotly.utils.PlotlyJSONEncoder)

suffice?

sdrap · 2018-08-07T14:58:14Z

Many thanks, my dump dumps was totally dumb. Your solution provide the right data format to cat in the pipeline.

However the problem remain the same. The 1 year dataset can be processed, the 20 years can not.

I really don't know where the problem comes from and I don't know how to run the debug command in the pipe

cat data.json | orca graph > test.png

If I can be of any help to run some commands or some tests, do not hesitate to ask.
I add once again a correct dump of the two data sets (small and large)
datalarge.zip
datasmall.zip

etpinard · 2018-08-07T15:24:59Z

I got it to work using:

orca graph datalarge.json

Unfortunately, from python this means dumping your figure object into a temporary file, but as discussed in #104, piping very large JSONs into orca does not scale well. This will always be slow as we have to wait for the full JSON to be in memory. In other words we can't start creating a graph from a partial JSON chunk.

Luckily, @jonmmease is creating an official python wrapper for orca that should handle all the temporary file messiness.

sdrap · 2018-08-07T15:37:39Z

Oh yes! that's great I could get it the way you did. I will use this solution with temporary files, since in the notebook it doesn't work. Just a matter of writing a small script to handle all the temp files.

Many thanks, I had been waiting for a long time for this export solution and it is really nice :).

jonmmease · 2018-08-07T19:37:16Z

@etpinard, a thought just occurred to me. Do you have a sense of what would it take to launch orca in server mode (as a Python subprocess) and then send requests to it from Python? This would save the orca startup time (once the server process is launched the first time) and avoid the temporary file business.

etpinard · 2018-08-08T13:24:20Z

Do you have a sense of what would it take to launch orca in server mode (as a Python subprocess) and then send requests to it from Python?

It shouldn't be too hard if you'd like to experiment. The server part of orca predates orca itself. Taking a look at our orca serve tests is probably the best way to got going. As always, let me know if you have any questions.

Note also that orca graph accepts multiple input e.g.

orca graph fig.json fig1.json fig2.json

# which can also be saved in a directory e.g.
orca graph fig.json fig1.json fig2.json -d orca-outputs/

So to improve perf, one could generate all JSONs files to be exported then call orca on all thoses files at once.

jonmmease · 2018-08-08T13:35:21Z

Thanks @etpinard , I'll take a look.

Yeah, if we go the temp file approach I was planning to work out an API to allow users to batch convert collections of figures in one go. The --parallel-limit option applies to batch conversion case right?

etpinard · 2018-08-08T13:44:07Z

The parallelization option applies to batch conversion case right?

Yep, the --parallel-limit [or --parallelLimit] CLI options set the limit of parallel tasks run. Its default value is 1.

One note on parallelization, no matter the --parallel-limit value set orca only creates one Electron instance. Parallelization is especially productive for exporting plotly.js graphs (except PDF and EPS exports) where only one browser window is created no matter the --parallel-limit value, as we can create multiple graph divs on the same page and export them individually (with Plotly.toImage(gd)). For other export types, parallelization leads to the creation of more browser windows, which can slow down the process in extreme cases.

jonmmease · 2018-08-08T14:41:11Z

Image conversion with orca server from Python!

I haven't tried any large graphs yet, but for small stuff it's impressively responsive!

Here are the only two issues I see at the moment:

Launching the server on on my macbook pro starts up an "orca" process plus 7 "orca Helper" processes that consume close to 400MB of Memory. Is there any way to control the number of helpers?

The server processes don't shutdown when I use the terminate or kill methods on the subprocess:

# Shutdown process with `SIGTERM`
orca_proc.terminate()
# Shutdown process with `SIGKILL`
orca_proc.kill()

I'll see what I can find on the Python side. Do you expect the server process to respect these signals?

jonmmease · 2018-08-08T14:56:33Z

Some rough timing numbers for single trace (After server is running, from execution to display in the notebook)

Small scatter (non-GL) trace: ~50ms
Small scattergl: ~150ms
scattergl 10,000: ~200ms
scattergl 100,000: ~500ms
scattergl 1,000,000: ~ 3.5s

I think these numbers are awesome! For comparison, using the plotly graph approach in the README with a small plot takes about 1.7s.

etpinard · 2018-08-08T15:03:49Z

but for small stuff it's impressively responsive!

Great.

"orca" process plus 7 "orca Helper"

Interesting find. From what I'm seeing only the standalone executable behaves this way. The ./bin/orca.js script does not spin up that many "helpers". This could be related to orca serve booting up one browser window per available export component, but I doubt the idle windows would consume that much memory. We should compare with other Electron-based desktop apps.

The server processes don't shutdown when I use the terminate or kill methods on the subprocess:

orca serve

# and then
<ctrl-c>

seems to kill all processes, I'm not sure if that's equivalent to SIGTERM and SIGKILL. Perhaps we'll need to listen to a few more events in bin/serve.js.

scattergl 1,000,000: ~ 3.5s
I think these numbers are awesome!

Fantastic 🎉

jonmmease · 2018-08-08T15:19:35Z

On my side, when I run ./bin/orca.js serve I get the same number of child processes, they're just named Electron rather than orca:

jonmmease · 2018-08-08T15:41:10Z

Here's a clue, the returned subprocess PID is a bash process, not the main electron process. If I run os.kill(electron_pid, ...) the shutdown works.

Ohhh, it's the orca.sh wrapper script that's getting killed, which isn't killing or orca process. Getting there...

jonmmease · 2018-08-08T15:56:24Z

Ok, I figured out a solution based on this article: http://veithen.github.io/2014/11/16/sigterm-propagation.html

In our wrapper bash script we basically just need to prefix the call to orca with exec. Then the bash process becomes the orca process and the signals sent from Python make it to orca.

Since we haven't merged it yet, I'll update this in my conda build PR.

etpinard · 2018-08-08T18:20:47Z

On my side, when I run ./bin/orca.js serve I get the same number of child processes

Confirmed. I accidentally ran orca graph which spins up 3 electron processes which looks like the intended behavior. So I'm not sure why we get to 7 processes in orca serve but it must be related to opening one window per component. Perhaps we add a --graph-only flag to orca serve or something fancier to reduce that number down to 3?

From comparison, orca serve gives:

on Ubuntu 18.04, which is significantly less memory than for @jonmmease 🤔

jonmmease · 2018-08-08T18:31:13Z

Are other components the dashboard/dash/thumbnail parts? If so, then yeah, a --graph-only option might be a nice way to go.

Based on my experiments today, I think the server approach is going to be cleaner (no temp files) and provide a better use experience (more responsive). So I think it is probably worth looking into what it would take to trim the process count down by a few.

jonmmease · 2018-08-09T12:41:24Z

Well, I took a look, and it seemed pretty straightforward, and then I ended up with a PR 🙂 #112

jonmmease · 2018-12-03T11:29:30Z

Closing as #112 was merged months ago.

etpinard changed the title ~~Orca hanging on large json files~~ Orca hanging when large JSONs are piped in Aug 7, 2018

jonmmease mentioned this issue Aug 9, 2018

Add --graph-only option for the orca serve entry point #112

Closed

jonmmease mentioned this issue Aug 9, 2018

Server graph only #114

Merged

etpinard mentioned this issue Aug 10, 2018

Orca integration for static image export plotly/plotly.py#1105

Closed

jonmmease closed this as completed Dec 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orca hanging when large JSONs are piped in #110

Orca hanging when large JSONs are piped in #110

sdrap commented Aug 6, 2018

etpinard commented Aug 6, 2018

sdrap commented Aug 6, 2018

etpinard commented Aug 6, 2018 •

edited

Loading

sdrap commented Aug 6, 2018 •

edited

Loading

etpinard commented Aug 7, 2018 •

edited

Loading

sdrap commented Aug 7, 2018

etpinard commented Aug 7, 2018 •

edited

Loading

sdrap commented Aug 7, 2018

jonmmease commented Aug 7, 2018

etpinard commented Aug 8, 2018 •

edited

Loading

jonmmease commented Aug 8, 2018 •

edited

Loading

etpinard commented Aug 8, 2018

jonmmease commented Aug 8, 2018

jonmmease commented Aug 8, 2018

etpinard commented Aug 8, 2018

jonmmease commented Aug 8, 2018

jonmmease commented Aug 8, 2018 •

edited

Loading

jonmmease commented Aug 8, 2018

etpinard commented Aug 8, 2018

jonmmease commented Aug 8, 2018

jonmmease commented Aug 9, 2018

jonmmease commented Dec 3, 2018

Orca hanging when large JSONs are piped in #110

Orca hanging when large JSONs are piped in #110

Comments

sdrap commented Aug 6, 2018

etpinard commented Aug 6, 2018

sdrap commented Aug 6, 2018

etpinard commented Aug 6, 2018 • edited Loading

sdrap commented Aug 6, 2018 • edited Loading

etpinard commented Aug 7, 2018 • edited Loading

sdrap commented Aug 7, 2018

etpinard commented Aug 7, 2018 • edited Loading

sdrap commented Aug 7, 2018

jonmmease commented Aug 7, 2018

etpinard commented Aug 8, 2018 • edited Loading

jonmmease commented Aug 8, 2018 • edited Loading

etpinard commented Aug 8, 2018

jonmmease commented Aug 8, 2018

jonmmease commented Aug 8, 2018

etpinard commented Aug 8, 2018

jonmmease commented Aug 8, 2018

jonmmease commented Aug 8, 2018 • edited Loading

jonmmease commented Aug 8, 2018

etpinard commented Aug 8, 2018

jonmmease commented Aug 8, 2018

jonmmease commented Aug 9, 2018

jonmmease commented Dec 3, 2018

etpinard commented Aug 6, 2018 •

edited

Loading

sdrap commented Aug 6, 2018 •

edited

Loading

etpinard commented Aug 7, 2018 •

edited

Loading

etpinard commented Aug 7, 2018 •

edited

Loading

etpinard commented Aug 8, 2018 •

edited

Loading

jonmmease commented Aug 8, 2018 •

edited

Loading

jonmmease commented Aug 8, 2018 •

edited

Loading