-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orca hanging when large JSONs are piped in #110
Comments
I suspect you're calling orca as: orca graph data.zip which won't work, as we currently only accept path/to/json/files, url/to/json/files or json strings. Is this indeed the case? |
No I uploaded as a zip because github doesn't accept plain json. I run it as follows
or alternatively
which just hang. I don't know exactly what is the problem. |
I think something is up with your data encoding: var s = fs.readFileSync('./data.json');
typeof JSON.parse(s)
// => return 'string' Orca expect the result of |
I am puzzled a bit with JSON. I dumped in the file using
Strangely enough it starts with I have here the two json (the one year and 20 years) and process them by
the small one get through the large one not. I also made a copy paste of the JSON file and processed it through the pipe.
I thought it could be related to the In the python notebook, with command
I always get the error while
goes through. Terribly sorry to disturb, it may well be a problem outside of the scope of orca. |
No worries at all. The orca CLI is bound to have a few rough edges at the moment. Thanks very much for writing in. As mentioned in #104 (comment), I'd recommend first saving large JSONs to a temporary file. Now, your snippet A = df['S'].iplot(**kwargs)
file = 'data.json'
with open(file, 'w') as outfile:
json.dump(json.dumps(A, cls=plotly.utils.PlotlyJSONEncoder), outfile) seems odd. Wouldn't A = df['S'].iplot(**kwargs)
file = 'data.json'
with open(file, 'w') as outfile:
json.dump(A, outifle, cls=plotly.utils.PlotlyJSONEncoder) suffice? |
Many thanks, my dump dumps was totally dumb. Your solution provide the right data format to cat in the pipeline. However the problem remain the same. The 1 year dataset can be processed, the 20 years can not. I really don't know where the problem comes from and I don't know how to run the debug command in the pipe
If I can be of any help to run some commands or some tests, do not hesitate to ask. |
I got it to work using:
Unfortunately, from python this means dumping your figure object into a temporary file, but as discussed in #104, piping very large JSONs into orca does not scale well. This will always be slow as we have to wait for the full JSON to be in memory. In other words we can't start creating a graph from a partial JSON chunk. Luckily, @jonmmease is creating an official python wrapper for orca that should handle all the temporary file messiness. |
Oh yes! that's great I could get it the way you did. I will use this solution with temporary files, since in the notebook it doesn't work. Just a matter of writing a small script to handle all the temp files. Many thanks, I had been waiting for a long time for this export solution and it is really nice :). |
@etpinard, a thought just occurred to me. Do you have a sense of what would it take to launch orca in server mode (as a Python subprocess) and then send requests to it from Python? This would save the orca startup time (once the server process is launched the first time) and avoid the temporary file business. |
It shouldn't be too hard if you'd like to experiment. The server part of orca predates orca itself. Taking a look at our Note also that
So to improve perf, one could generate all JSONs files to be exported then call orca on all thoses files at once. |
Thanks @etpinard , I'll take a look. Yeah, if we go the temp file approach I was planning to work out an API to allow users to batch convert collections of figures in one go. The |
Yep, the One note on parallelization, no matter the |
Image conversion with orca server from Python! I haven't tried any large graphs yet, but for small stuff it's impressively responsive! Here are the only two issues I see at the moment:
# Shutdown process with `SIGTERM`
orca_proc.terminate()
# Shutdown process with `SIGKILL`
orca_proc.kill() I'll see what I can find on the Python side. Do you expect the server process to respect these signals? |
Some rough timing numbers for single trace (After server is running, from execution to display in the notebook)
I think these numbers are awesome! For comparison, using the |
Great.
Interesting find. From what I'm seeing only the standalone executable behaves this way. The
seems to kill all processes, I'm not sure if that's equivalent to
Fantastic 🎉 |
Ok, I figured out a solution based on this article: http://veithen.github.io/2014/11/16/sigterm-propagation.html In our wrapper bash script we basically just need to prefix the call to orca with Since we haven't merged it yet, I'll update this in my conda build PR. |
Confirmed. I accidentally ran From comparison, on Ubuntu 18.04, which is significantly less memory than for @jonmmease 🤔 |
Are other components the dashboard/dash/thumbnail parts? If so, then yeah, a Based on my experiments today, I think the server approach is going to be cleaner (no temp files) and provide a better use experience (more responsive). So I think it is probably worth looking into what it would take to trim the process count down by a few. |
Well, I took a look, and it seemed pretty straightforward, and then I ended up with a PR 🙂 #112 |
Closing as #112 was merged months ago. |
I have a dataframe with 20 years of daily data. iplot can process any plots.
However, I can only use orca when I slice the dataframe for less than 4 years of data.
It fails in the notebook as well as in command line from a json dumped file with the following text.
The JSON files are 250Kb large for the 20 years of data.
data.zip
The text was updated successfully, but these errors were encountered: