-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
timeout error when spike sorting with container #1666
Comments
@shunnnli does this error only appear if you sort by property? |
No, I've tried run_sorter as well and it also returns the same result |
Ok, can you make sure you followed the instructions here? Basically you need to have Docker desktop installed, an account on dockerhub, and being logged in (for Windows) |
Yes, as you can see from my previous error (issue #1662), I can start enter the container normally. |
Hi I restarted my computer and it runs now! But I got the following error: Starting container
|
Hi @shunnnli This is progress!! I'll look into it but I think I know where the source of the error is. In the meanwhile (to test if the container mechanism works), can you try to run spike sorting on one shank only?
Let me know if this works! |
Hi! This works!! However, the kilosort has been running for 3 hours with no sign of finishing. I'm willing to stop and rerun for all shanks if you have any suggestions! |
Let's let it finish! Still investigating! |
Mmmm...it's not supposed to be taking that much! How long is the recording? |
Can you also check if the GPU is beong used? |
The recording is like 40 minutes, so it shouldn't take that long. How can I check whether GPU is being used in windows? |
you should have an NVIDIA application or somtehing, but I'm not 100% sure |
Yeah I have it on my computer |
So I also re-create the timeout error: I stopped the kernal, shut it down, deactivate conda; then I reactivate conda and re-run the code, now running sorter on single shank would show me timeout error as above |
Not really sure what's happening then..it might be due to a problem with the Docker installation itself. Are you able to try the analysis on a Linux machine? |
I can try install it in Linux. Is there any sample output for kilosort after printing 'Running kilosort2_5 sorter inside spikeinterface/kilosort2_5-compiled-base'? I just want to know which step the code got stuck on |
You should have a |
Hi! I ran into the same thing and was able to fix it by updating the docker library to 6.1.2 See the relevant issue and PR here: docker/docker-py#3124 |
Unfortunately, we are having the same issue running Docker under Windows with larger datasets. The sorter runs indefinitely (although uses large CPU and memory resources) and only creates chanMap.mat and recording.dat but doesn't start creating temp_wh.dat We are running docker 6.1.3. so I don't think updating fully resolves this problem (currently rerunning it with downgraded 6.1.2 to make sure). Does Docker create a log file that we can use to identify where the code gets stuck? |
I'm having the same issue, also running on windows locally, for a ~90 gb concatenated neuropixels 1.0 recording. I have docker 6.1.3. Has anyone found a solution or work around? |
not sure if it's related but I'm also running into this memory overflow error #2201 when concatenating, so I changed the dtype of the recordings from int16 to int64 and I no longer get the warning but the print statement is still telling me the recording is much smaller than the actual size:
|
I'm experiencing a similar issue where the Docker has been running for more than 24 hours, but there's no progress. It only created two files, chanMap.mat, and recording.dat, and then appeared to be idle without any error messages. I've checked my GPU and CPU, but they're also idle. The recording size is around 136GB (npx 1.0), and I've been able to sort it using a locally installed KS3. However, Docker's KS3 seems to be running indefinitely, and I would love to set up my workflow on Docker. Do you have any suggestions? Thanks! |
Hey All, I'm going to close this issue for lack of activity, but my hypothesis for people working on this in docker is that the actually writing of the binary file is going slow because of |
Hi,
I'm recording using neuropixel 2 probe and running kilosort for each shank on docker image. My code is as follows:
sorting = si.run_sorter_by_property('kilosort3', rec, grouping_property='group', working_folder= base_folder/'kilosort3_output', mode_if_folder_exists="overwrite", docker_image=True, verbose=True, **params_kilosort3)
However, I got a timeout error that says connection aborted. Attached is the detailed error code. Thanks in advance!!
Starting container
TimeoutError Traceback (most recent call last)
File ~.conda\envs\spikeinterface\lib\site-packages\urllib3\connectionpool.py:703, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
702 # Make the request on the httplib connection object.
--> 703 httplib_response = self._make_request(
704 conn,
705 method,
706 url,
707 timeout=timeout_obj,
708 body=body,
709 headers=headers,
710 chunked=chunked,
711 )
713 # If we're going to release the connection in
finally:
, then714 # the response doesn't need to know about the connection. Otherwise
715 # it will also try to release it and we'll have a double-release
716 # mess.
File ~.conda\envs\spikeinterface\lib\site-packages\urllib3\connectionpool.py:449, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
445 except BaseException as e:
446 # Remove the TypeError from the exception chain in
447 # Python 3 (including for exceptions like SystemExit).
448 # Otherwise it looks like a bug in the code.
--> 449 six.raise_from(e, None)
450 except (SocketTimeout, BaseSSLError, SocketError) as e:
File :3, in raise_from(value, from_value)
File ~.conda\envs\spikeinterface\lib\site-packages\urllib3\connectionpool.py:444, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
443 try:
--> 444 httplib_response = conn.getresponse()
445 except BaseException as e:
446 # Remove the TypeError from the exception chain in
447 # Python 3 (including for exceptions like SystemExit).
448 # Otherwise it looks like a bug in the code.
File ~.conda\envs\spikeinterface\lib\http\client.py:1377, in HTTPConnection.getresponse(self)
1376 try:
-> 1377 response.begin()
1378 except ConnectionError:
File ~.conda\envs\spikeinterface\lib\http\client.py:320, in HTTPResponse.begin(self)
319 while True:
--> 320 version, status, reason = self._read_status()
321 if status != CONTINUE:
File ~.conda\envs\spikeinterface\lib\http\client.py:281, in HTTPResponse._read_status(self)
280 def _read_status(self):
--> 281 line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
282 if len(line) > _MAXLINE:
File ~.conda\envs\spikeinterface\lib\site-packages\docker\transport\npipesocket.py:224, in NpipeFileIOBase.readinto(self, buf)
223 def readinto(self, buf):
--> 224 return self.sock.recv_into(buf)
File ~.conda\envs\spikeinterface\lib\site-packages\docker\transport\npipesocket.py:25, in check_closed..wrapped(self, *args, **kwargs)
22 raise RuntimeError(
23 'Can not reuse socket after connection was closed.'
24 )
---> 25 return f(self, *args, **kwargs)
File ~.conda\envs\spikeinterface\lib\site-packages\docker\transport\npipesocket.py:151, in NpipeSocket.recv_into(self, buf, nbytes)
150 win32file.CancelIo(self._handle)
--> 151 raise TimeoutError
152 return win32file.GetOverlappedResult(self._handle, overlapped, 0)
TimeoutError:
During handling of the above exception, another exception occurred:
ProtocolError Traceback (most recent call last)
File ~.conda\envs\spikeinterface\lib\site-packages\requests\adapters.py:487, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
486 try:
--> 487 resp = conn.urlopen(
488 method=request.method,
489 url=url,
490 body=request.body,
491 headers=request.headers,
492 redirect=False,
493 assert_same_host=False,
494 preload_content=False,
495 decode_content=False,
496 retries=self.max_retries,
497 timeout=timeout,
498 chunked=chunked,
499 )
501 except (ProtocolError, OSError) as err:
File ~.conda\envs\spikeinterface\lib\site-packages\urllib3\connectionpool.py:787, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
785 e = ProtocolError("Connection aborted.", e)
--> 787 retries = retries.increment(
788 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
789 )
790 retries.sleep()
File ~.conda\envs\spikeinterface\lib\site-packages\urllib3\util\retry.py:550, in Retry.increment(self, method, url, response, error, _pool, _stacktrace)
549 if read is False or not self._is_method_retryable(method):
--> 550 raise six.reraise(type(error), error, _stacktrace)
551 elif read is not None:
File ~.conda\envs\spikeinterface\lib\site-packages\urllib3\packages\six.py:769, in reraise(tp, value, tb)
768 if value.traceback is not tb:
--> 769 raise value.with_traceback(tb)
770 raise value
File ~.conda\envs\spikeinterface\lib\site-packages\urllib3\connectionpool.py:703, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
702 # Make the request on the httplib connection object.
--> 703 httplib_response = self._make_request(
704 conn,
705 method,
706 url,
707 timeout=timeout_obj,
708 body=body,
709 headers=headers,
710 chunked=chunked,
711 )
713 # If we're going to release the connection in
finally:
, then714 # the response doesn't need to know about the connection. Otherwise
715 # it will also try to release it and we'll have a double-release
716 # mess.
File ~.conda\envs\spikeinterface\lib\site-packages\urllib3\connectionpool.py:449, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
445 except BaseException as e:
446 # Remove the TypeError from the exception chain in
447 # Python 3 (including for exceptions like SystemExit).
448 # Otherwise it looks like a bug in the code.
--> 449 six.raise_from(e, None)
450 except (SocketTimeout, BaseSSLError, SocketError) as e:
File :3, in raise_from(value, from_value)
File ~.conda\envs\spikeinterface\lib\site-packages\urllib3\connectionpool.py:444, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
443 try:
--> 444 httplib_response = conn.getresponse()
445 except BaseException as e:
446 # Remove the TypeError from the exception chain in
447 # Python 3 (including for exceptions like SystemExit).
448 # Otherwise it looks like a bug in the code.
File ~.conda\envs\spikeinterface\lib\http\client.py:1377, in HTTPConnection.getresponse(self)
1376 try:
-> 1377 response.begin()
1378 except ConnectionError:
File ~.conda\envs\spikeinterface\lib\http\client.py:320, in HTTPResponse.begin(self)
319 while True:
--> 320 version, status, reason = self._read_status()
321 if status != CONTINUE:
File ~.conda\envs\spikeinterface\lib\http\client.py:281, in HTTPResponse._read_status(self)
280 def _read_status(self):
--> 281 line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
282 if len(line) > _MAXLINE:
File ~.conda\envs\spikeinterface\lib\site-packages\docker\transport\npipesocket.py:224, in NpipeFileIOBase.readinto(self, buf)
223 def readinto(self, buf):
--> 224 return self.sock.recv_into(buf)
File ~.conda\envs\spikeinterface\lib\site-packages\docker\transport\npipesocket.py:25, in check_closed..wrapped(self, *args, **kwargs)
22 raise RuntimeError(
23 'Can not reuse socket after connection was closed.'
24 )
---> 25 return f(self, *args, **kwargs)
File ~.conda\envs\spikeinterface\lib\site-packages\docker\transport\npipesocket.py:151, in NpipeSocket.recv_into(self, buf, nbytes)
150 win32file.CancelIo(self._handle)
--> 151 raise TimeoutError
152 return win32file.GetOverlappedResult(self._handle, overlapped, 0)
ProtocolError: ('Connection aborted.', TimeoutError())
During handling of the above exception, another exception occurred:
ConnectionError Traceback (most recent call last)
Cell In[12], line 5
2 params_kilosort3 = {'do_correction': True}
4 # Sorting by groups
----> 5 sorting = si.run_sorter_by_property('kilosort3', rec,
6 grouping_property='group',
7 working_folder= base_folder/'kilosort3_output',
8 mode_if_folder_exists="overwrite",
9 docker_image=True, verbose=True, **params_kilosort3)
File ~.conda\envs\spikeinterface\lib\site-packages\spikeinterface\sorters\launcher.py:119, in run_sorter_by_property(sorter_name, recording, grouping_property, working_folder, mode_if_folder_exists, engine, engine_kwargs, verbose, docker_image, singularity_image, **sorter_params)
116 assert grouping_property in recording.get_property_keys(), f"The 'grouping_property' {grouping_property} is not "
117 f"a recording property!"
118 recording_dict = recording.split_by(grouping_property)
--> 119 sorting_output = run_sorters([sorter_name], recording_dict, working_folder,
120 mode_if_folder_exists=mode_if_folder_exists,
121 engine=engine,
122 engine_kwargs=engine_kwargs,
123 verbose=verbose,
124 with_output=True,
125 docker_images={sorter_name: docker_image},
126 singularity_images={sorter_name: singularity_image},
127 sorter_params={sorter_name: sorter_params})
129 grouping_property_values = None
130 sorting_list = []
File ~.conda\envs\spikeinterface\lib\site-packages\spikeinterface\sorters\launcher.py:269, in run_sorters(sorter_list, recording_dict_or_list, working_folder, sorter_params, mode_if_folder_exists, engine, engine_kwargs, verbose, with_output, docker_images, singularity_images)
266 if engine == 'loop':
267 # simple loop in main process
268 for task_args in task_args_list:
--> 269 _run_one(task_args)
271 elif engine == 'joblib':
272 from joblib import Parallel, delayed
File ~.conda\envs\spikeinterface\lib\site-packages\spikeinterface\sorters\launcher.py:38, in _run_one(arg_list)
35 # because we won't want the loop/worker to break
36 raise_error = False
---> 38 run_sorter(sorter_name, recording, output_folder=output_folder,
39 remove_existing_folder=remove_existing_folder,
40 delete_output_folder=delete_output_folder,
41 verbose=verbose, raise_error=raise_error,
42 docker_image=docker_image, singularity_image=singularity_image,
43 with_output=with_output, **sorter_params)
File ~.conda\envs\spikeinterface\lib\site-packages\spikeinterface\sorters\runsorter.py:137, in run_sorter(sorter_name, recording, output_folder, remove_existing_folder, delete_output_folder, verbose, raise_error, docker_image, singularity_image, with_output, **sorter_params)
135 else:
136 container_image = singularity_image
--> 137 return run_sorter_container(
138 container_image=container_image,
139 mode=mode,
140 **common_kwargs,
141 )
143 return run_sorter_local(**common_kwargs)
File ~.conda\envs\spikeinterface\lib\site-packages\spikeinterface\sorters\runsorter.py:478, in run_sorter_container(sorter_name, recording, mode, container_image, output_folder, remove_existing_folder, delete_output_folder, verbose, raise_error, with_output, extra_requirements, sorter_params)
476 if verbose:
477 print('Starting container')
--> 478 container_client.start()
480 if verbose and install_si_from_source:
481 print("****")
File ~.conda\envs\spikeinterface\lib\site-packages\spikeinterface\sorters\runsorter.py:305, in ContainerClient.start(self)
303 def start(self):
304 if self.mode == 'docker':
--> 305 self.docker_container.start()
306 elif self.mode == 'singularity':
307 self.client_instance.start()
File ~.conda\envs\spikeinterface\lib\site-packages\docker\models\containers.py:406, in Container.start(self, **kwargs)
397 def start(self, **kwargs):
398 """
399 Start this container. Similar to the
docker start
command, but400 doesn't support attach options.
(...)
404 If the server returns an error.
405 """
--> 406 return self.client.api.start(self.id, **kwargs)
File ~.conda\envs\spikeinterface\lib\site-packages\docker\utils\decorators.py:19, in check_resource..decorator..wrapped(self, resource_id, *args, **kwargs)
15 if not resource_id:
16 raise errors.NullResource(
17 'Resource ID was not provided'
18 )
---> 19 return f(self, resource_id, *args, **kwargs)
File ~.conda\envs\spikeinterface\lib\site-packages\docker\api\container.py:1126, in ContainerApiMixin.start(self, container, *args, **kwargs)
1120 raise errors.DeprecatedMethod(
1121 'Providing configuration in the start() method is no longer '
1122 'supported. Use the host_config param in create_container '
1123 'instead.'
1124 )
1125 url = self._url("/containers/{0}/start", container)
-> 1126 res = self._post(url)
1127 self._raise_for_status(res)
File ~.conda\envs\spikeinterface\lib\site-packages\docker\utils\decorators.py:46, in update_headers..inner(self, *args, **kwargs)
44 else:
45 kwargs['headers'].update(self._general_configs['HttpHeaders'])
---> 46 return f(self, *args, **kwargs)
File ~.conda\envs\spikeinterface\lib\site-packages\docker\api\client.py:233, in APIClient._post(self, url, **kwargs)
231 @update_headers
232 def _post(self, url, **kwargs):
--> 233 return self.post(url, **self._set_request_timeout(kwargs))
File ~.conda\envs\spikeinterface\lib\site-packages\requests\sessions.py:635, in Session.post(self, url, data, json, **kwargs)
624 def post(self, url, data=None, json=None, **kwargs):
625 r"""Sends a POST request. Returns :class:
Response
object.626
627 :param url: URL for the new :class:
Request
object.(...)
632 :rtype: requests.Response
633 """
--> 635 return self.request("POST", url, data=data, json=json, **kwargs)
File ~.conda\envs\spikeinterface\lib\site-packages\requests\sessions.py:587, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
582 send_kwargs = {
583 "timeout": timeout,
584 "allow_redirects": allow_redirects,
585 }
586 send_kwargs.update(settings)
--> 587 resp = self.send(prep, **send_kwargs)
589 return resp
File ~.conda\envs\spikeinterface\lib\site-packages\requests\sessions.py:701, in Session.send(self, request, **kwargs)
698 start = preferred_clock()
700 # Send the request
--> 701 r = adapter.send(request, **kwargs)
703 # Total elapsed time of the request (approximately)
704 elapsed = preferred_clock() - start
File ~.conda\envs\spikeinterface\lib\site-packages\requests\adapters.py:502, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
487 resp = conn.urlopen(
488 method=request.method,
489 url=url,
(...)
498 chunked=chunked,
499 )
501 except (ProtocolError, OSError) as err:
--> 502 raise ConnectionError(err, request=request)
504 except MaxRetryError as e:
505 if isinstance(e.reason, ConnectTimeoutError):
506 # TODO: Remove this in 3.0.0: see #2811
ConnectionError: ('Connection aborted.', TimeoutError())
The text was updated successfully, but these errors were encountered: