Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warnings + time-out when sending a job to deepcell in Segment_image_data #297

Closed
awedwards opened this issue Oct 24, 2020 · 16 comments
Closed
Labels
question Further information is requested

Comments

@awedwards
Copy link
Contributor

I'm trying to send run the create_deepcell_output method to zip and send a file to the DeepCell server (very minor: there's a typo in the uploading message that says DeppCell, which is kinda great).

I get several warnings and then it just never completes the send and seems to time out. When I kill the process, the .zip file gets removed.

Thanks for the help!

What have you tried so far?
To get around it I've been just zipping myself and uploading it through the browser. Haven't been able to figure out it out in the code though.

Additional context

Here's the error trace in case that's helpful

Unhandled Error
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/twisted/python/log.py", line 103, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/usr/local/lib/python3.6/site-packages/twisted/python/log.py", line 86, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/usr/local/lib/python3.6/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/local/lib/python3.6/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/usr/local/lib/python3.6/site-packages/twisted/internet/posixbase.py", line 614, in _doReadOrWrite
    why = selectable.doRead()
  File "/usr/local/lib/python3.6/site-packages/twisted/internet/tcp.py", line 243, in doRead
    return self._dataReceived(data)
  File "/usr/local/lib/python3.6/site-packages/twisted/internet/tcp.py", line 249, in _dataReceived
    rval = self.protocol.dataReceived(data)
  File "/usr/local/lib/python3.6/site-packages/twisted/internet/endpoints.py", line 132, in dataReceived
    return self._wrappedProtocol.dataReceived(data)
  File "/usr/local/lib/python3.6/site-packages/twisted/protocols/tls.py", line 325, in dataReceived
    self._unbufferPendingWrites()
  File "/usr/local/lib/python3.6/site-packages/twisted/protocols/tls.py", line 498, in _unbufferPendingWrites
    self._producer.resumeProducing()
  File "/usr/local/lib/python3.6/site-packages/twisted/protocols/tls.py", line 101, in resumeProducing
    self._producer.resumeProducing()
  File "/usr/local/lib/python3.6/site-packages/treq/multipart.py", line 119, in resumeProducing
    self._currentProducer.resumeProducing()
  File "/usr/local/lib/python3.6/site-packages/twisted/web/client.py", line 1217, in resumeProducing
    self._task.resume()
  File "/usr/local/lib/python3.6/site-packages/twisted/internet/task.py", line 460, in resume
    raise NotPaused()
twisted.internet.task.NotPaused: 

[None]: Encountered RequestTransmissionFailed during UPLOAD ../data/HNDysplasia/input_dir/deepcell_input/fovs.zip: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
Unexpected exception from treq.multipart.MultiPartProducer.stopProducing
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/twisted/protocols/policies.py", line 125, in connectionLost
    self.wrappedProtocol.connectionLost(reason)
  File "/usr/local/lib/python3.6/site-packages/twisted/web/_newclient.py", line 1050, in dispatcher
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/twisted/web/_newclient.py", line 1734, in _connectionLost_TRANSMITTING
    self._currentRequest.stopWriting()
  File "/usr/local/lib/python3.6/site-packages/twisted/web/_newclient.py", line 951, in stopWriting
    _callAppFunction(self.bodyProducer.stopProducing)
--- <exception caught here> ---
  File "/usr/local/lib/python3.6/site-packages/twisted/web/_newclient.py", line 197, in _callAppFunction
    function()
  File "/usr/local/lib/python3.6/site-packages/treq/multipart.py", line 93, in stopProducing
    self._currentProducer.stopProducing()
  File "/usr/local/lib/python3.6/site-packages/twisted/web/client.py", line 1161, in stopProducing
    self._task.stop()
  File "/usr/local/lib/python3.6/site-packages/twisted/internet/task.py", line 497, in stop
    self._checkFinish()
  File "/usr/local/lib/python3.6/site-packages/twisted/internet/task.py", line 507, in _checkFinish
    raise self._completionState
twisted.internet.task.TaskStopped: 
@awedwards awedwards added the question Further information is requested label Oct 24, 2020
@ngreenwald
Copy link
Member

How many images are you trying to run at once? And how large are they?

@awedwards
Copy link
Contributor Author

6 images, 2048x2048, 31 channels

@ngreenwald
Copy link
Member

Can you try sending a single 1024x1024 crop of one of the images? That doesn't seem like it would be too large, but we haven't tried larger than 1024 so far

@awedwards
Copy link
Contributor Author

Resizing to 1024x1024 completes the call. Unfortunately, the deepcell results zip file is empty (22 bytes).

@awedwards
Copy link
Contributor Author

I tried again and got a json report (didn't get that last time):

{
    "cpu_node_cost": "",
    "gpu_node_cost": "",
    "total_node_and_networking_costs": "",
    "start_delay": 0.1,
    "num_jobs": 1,
    "time_elapsed": 30.80152876599459,
    "job_data": [
        {
            "input_file": "../data/HNDysplasia/input_dir/deepcell_input/fovs.zip",
            "status": "done",
            "total_time": "None",
            "total_jobs": 1.0,
            "download_url": "https://storage.googleapis.com/deepcell-prod/output/d8edfac196094bc8acf5c9a40f9fc1df.zip",
            "created_at": "2020-10-26T20:09:35.592Z",
            "finished_at": "2020-10-26T20:09:37.954947+00:00",
            "prediction_time": "None",
            "postprocess_time": "None",
            "upload_time": "None",
            "download_time": "None",
            "predict_retries": "None",
            "cleanup_time": 0.26133740300429054,
            "children_upload_time": 0.2788974010036327,
            "model": ":",
            "postprocess": "",
            "preprocess": "",
            "reason": null,
            "job_id": "multiplex-zip:ced14adba657c2a71a1e3acbdf118077.zip:d35658c2-186d-450d-9d0a-f241c63f6160"
        }
    ]

@ngreenwald
Copy link
Member

And the same zip file works on the website?

@awedwards
Copy link
Contributor Author

Good question -- is there an easy way to stop create_deepcell_output from deleting the input zip file?

@ngreenwald
Copy link
Member

Yeah we should probably remove that line anyway. In the meantime you can copy the function contents into a cell and edit it:

def create_deepcell_output(deepcell_input_dir, deepcell_output_dir, fovs=None,

@awedwards
Copy link
Contributor Author

I was mistaken (found a typo in my script)! It works at 1024x1024 resolution. So I guess the original problem was just that it was too big.

Is there any plan to allow queries of images that are bigger than 1024?

@ngreenwald
Copy link
Member

What was the typo that lead to an empty zip file being returned? Ideally we'd like to catch that type of thing earlier.

To clarify, you took a 2048x2048 image with 31 channels, picked 2 of them for the nuclear/segmentation channel, and that image failed with the timeout error?

@awedwards
Copy link
Contributor Author

My image ended in "MassCorrected-Filtered-1.tiff" and my MIBItiff_suffix variable = "MassCorrected-Filtered.tiff" without the '-1'.

Yes, except we are using 2 nuclear channels and 3 membrane channels. And this works at 2048x2048 if we zip manually and upload through the browser.

@ngreenwald
Copy link
Member

Got it. The wrong formatting for input files will be addressed by #281.
Looks like the file size issue is a problem with the kiosk-client. What is the file-size for the zip file that did work vs the one that didn't?

@awedwards
Copy link
Contributor Author

I realized I had never tried a single 2048x2048 image, and it actually worked through the create_deepcell_output. So just to re-cap:

✓ Single 1024x1024 via both deepcell.org & create_deepcell_output method
✓ Single 2048x2048 via both
✓ 6 2048x2048 images via deepcell.org only (this works for both the manual zip file and the programatically generated one)
X 6 2048x2048 images via create_deepcell_output

If I zip the 6 files manually (on Linux), the resulting zip file is 30.2MB. The one generated by the jupyter notebook is 201.3MB.

@ngreenwald
Copy link
Member

Okay, so there's an issue with our zip file creation which is making comically large zip files. This is triggering a known bug in the kiosk-client which times out for large uploads. We'll address the file size issue first. We know the limit is somewhere around 100 MB, so once we fix the zip issue, that should give us around 75 1024x1024 before running into issues

@awedwards
Copy link
Contributor Author

Great! So, just going by # of pixels, that's about 20 2048x2048 images right? Or would it scale differently?

@ngreenwald
Copy link
Member

Normally I would say yes, but whatever is creating the massive zip files could be leading to creative, non-linear scaling as well. The newest version of master branch no longer deletes the zip files, so you can always manually upload it to deepcell.org if it pops up again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants