Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 bugfix fails to copy studies with lots of data #2542

Merged
merged 8 commits into from
Sep 16, 2021

Conversation

pcrespov
Copy link
Member

@pcrespov pcrespov commented Sep 16, 2021

What do these changes do?

Analyzed logs of webserver, storage and datcoreadapter services. Found and fix the folling issues:

webserver.storage_api call times out [webserver]

  • Increases total timeout from 20 secs to 5 mins so it takes time to copy
ERROR: aiohttp.server:log_exception(393) - Error handling request
Traceback (most recent call last):
  File "/home/scu/.venv/lib/python3.8/site-packages/simcore_service_webserver/diagnostics_monitoring.py", line 72, in _middleware_handler
    resp = await handler(request)
  File "/home/scu/.venv/lib/python3.8/site-packages/aiohttp_session/__init__.py", line 154, in factory
    response = await handler(request)
  File "/home/scu/.venv/lib/python3.8/site-packages/servicelib/rest_middlewares.py", line 67, in _middleware_handler
    return await handler(request)
  File "/home/scu/.venv/lib/python3.8/site-packages/servicelib/rest_middlewares.py", line 171, in _middleware_handler
    return await handler(request)
  File "/home/scu/.venv/lib/python3.8/site-packages/simcore_service_webserver/products.py", line 98, in discover_product_middleware
    response = await handler(request)
  File "/home/scu/.venv/lib/python3.8/site-packages/simcore_service_webserver/diagnostics_monitoring.py", line 53, in metrics_handler
    result = await request.loop.run_in_executor(
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/scu/.venv/lib/python3.8/site-packages/aiohttp/web_protocol.py", line 422, in _handle_request
    resp = await self._request_handler(request)
  File "/home/scu/.venv/lib/python3.8/site-packages/aiohttp/web_app.py", line 499, in _handle
    resp = await handler(request)
  File "/home/scu/.venv/lib/python3.8/site-packages/aiohttp/web_middlewares.py", line 119, in impl
    return await handler(request)
  File "/home/scu/.venv/lib/python3.8/site-packages/simcore_service_webserver/diagnostics_monitoring.py", line 112, in _middleware_handler
    app_name, request.method, request.path, resp.status, exc_name
UnboundLocalError: local variable 'resp' referenced before assignment

🐛 Uncaught timeout exception [storage, datcore_adapter]

  • asyncio.TimeoutEror is not the same as built-in TimeoutError
2021-09-14T22:14:20.678026536Z ERROR:servicelib.rest_middlewares:Unexpected server error "<class 'asyncio.exceptions.TimeoutError'>" from access: 10.11.0.65 "GET /v0/locations". Responding with status 500
2021-09-14T22:14:20.678082623Z Traceback (most recent call last):
2021-09-14T22:14:20.678090699Z   File "/home/scu/.venv/lib/python3.8/site-packages/servicelib/rest_middlewares.py", line 71, in _middleware_handler
2021-09-14T22:14:20.678096383Z     response = await handler(request)
2021-09-14T22:14:20.678101270Z   File "/home/scu/.venv/lib/python3.8/site-packages/servicelib/rest_middlewares.py", line 173, in _middleware_handler
2021-09-14T22:14:20.678106510Z     resp = await handler(request)
2021-09-14T22:14:20.678111943Z   File "/home/scu/.venv/lib/python3.8/site-packages/simcore_service_storage/handlers.py", line 88, in get_storage_locations
2021-09-14T22:14:20.678117341Z     locs = await dsm.locations(user_id)
2021-09-14T22:14:20.678122041Z   File "/home/scu/.venv/lib/python3.8/site-packages/simcore_service_storage/dsm.py", line 178, in locations
2021-09-14T22:14:20.678126944Z     if await datcore_adapter.check_user_can_connect(
2021-09-14T22:14:20.678131916Z   File "/home/scu/.venv/lib/python3.8/site-packages/simcore_service_storage/datcore_adapter/datcore_adapter.py", line 114, in check_user_can_connect
2021-09-14T22:14:20.678136952Z     await _request(app, api_key, api_secret, "GET", "/user/profile")
2021-09-14T22:14:20.678141983Z   File "/home/scu/.venv/lib/python3.8/site-packages/simcore_service_storage/datcore_adapter/datcore_adapter.py", line 47, in _request
2021-09-14T22:14:20.678147319Z     async with session.request(
2021-09-14T22:14:20.678152001Z   File "/home/scu/.venv/lib/python3.8/site-packages/aiohttp/client.py", line 1117, in __aenter__
2021-09-14T22:14:20.678157021Z     self._resp = await self._coro
2021-09-14T22:14:20.678161660Z   File "/home/scu/.venv/lib/python3.8/site-packages/aiohttp/client.py", line 544, in _request
2021-09-14T22:14:20.678166751Z     await resp.start(conn)
2021-09-14T22:14:20.678171923Z   File "/home/scu/.venv/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 905, in start
2021-09-14T22:14:20.678176933Z     self._continue = None
2021-09-14T22:14:20.678181458Z   File "/home/scu/.venv/lib/python3.8/site-packages/aiohttp/helpers.py", line 656, in __exit__
2021-09-14T22:14:20.678186230Z     raise asyncio.TimeoutError from None
2021-09-14T22:14:20.678190862Z asyncio.exceptions.TimeoutError
2021-09-14T22:14:20.678195598Z Stack (most recent call last):
  • Timeout is produced when storage checks for datcore credentials. Speedup by avoiding check if no tokens available. Can read a lot of
INFO:     10.0.44.4:54464 - "GET /v0/user/profile HTTP/1.1" 401 Unauthorized
INFO:     10.0.44.4:54466 - "GET /v0/user/profile HTTP/1.1" 401 Unauthorized

🐛 Fixes UnboundLocalError reraised while handling CancelledError [webserver]

ERROR: aiohttp.server:log_exception(393) - Error handling request
Traceback (most recent call last):
  File "/home/scu/.venv/lib/python3.8/site-packages/simcore_service_webserver/diagnostics_monitoring.py", line 72, in _middleware_handler
    resp = await handler(request)
  File "/home/scu/.venv/lib/python3.8/site-packages/aiohttp_session/__init__.py", line 154, in factory
    response = await handler(request)
  File "/home/scu/.venv/lib/python3.8/site-packages/servicelib/rest_middlewares.py", line 67, in _middleware_handler
    return await handler(request)
  File "/home/scu/.venv/lib/python3.8/site-packages/servicelib/rest_middlewares.py", line 171, in _middleware_handler
    return await handler(request)
  File "/home/scu/.venv/lib/python3.8/site-packages/simcore_service_webserver/products.py", line 98, in discover_product_middleware
    response = await handler(request)
  File "/home/scu/.venv/lib/python3.8/site-packages/simcore_service_webserver/diagnostics_monitoring.py", line 53, in metrics_handler
    result = await request.loop.run_in_executor(
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/scu/.venv/lib/python3.8/site-packages/aiohttp/web_protocol.py", line 422, in _handle_request
    resp = await self._request_handler(request)
  File "/home/scu/.venv/lib/python3.8/site-packages/aiohttp/web_app.py", line 499, in _handle
    resp = await handler(request)
  File "/home/scu/.venv/lib/python3.8/site-packages/aiohttp/web_middlewares.py", line 119, in impl
    return await handler(request)
  File "/home/scu/.venv/lib/python3.8/site-packages/simcore_service_webserver/diagnostics_monitoring.py", line 112, in _middleware_handler
    app_name, request.method, request.path, resp.status, exc_name
UnboundLocalError: local variable 'resp' referenced before assignment

future exception un retrieved [storage]

ERROR:asyncio:Future exception was never retrieved
future: <Future finished exception=UniqueViolation('duplicate key value violates unique constraint "file_meta_data_pkey"\nDETAIL:  Key (file_uuid)=(c892d04c-1241-11ec-93d1-02420a0b1572/59333091-4d54-54ff-92be-bfe7139e0ee2/TheNumber.txt) already exists.\n')>
Traceback (most recent call last):
  File "/home/scu/.venv/lib/python3.8/site-packages/tenacity/_asyncio.py", line 41, in __call__
    result = await fn(*args, **kwargs)
  File "/home/scu/.venv/lib/python3.8/site-packages/simcore_service_storage/dsm.py", line 505, in _init_metadata
    await conn.execute(ins)
  File "/home/scu/.venv/lib/python3.8/site-packages/aiopg/sa/connection.py", line 168, in _execute
    await cursor.execute(str(compiled), post_processed_params[0])
  File "/home/scu/.venv/lib/python3.8/site-packages/aiopg/connection.py", line 426, in execute
    await self._conn._poll(waiter, timeout)
  File "/home/scu/.venv/lib/python3.8/site-packages/aiopg/connection.py", line 881, in _poll
    await asyncio.wait_for(self._waiter, timeout)
  File "/usr/local/lib/python3.8/asyncio/tasks.py", line 494, in wait_for
    return fut.result()
  File "/home/scu/.venv/lib/python3.8/site-packages/aiopg/connection.py", line 788, in _ready
    state = self._conn.poll()
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "file_meta_data_pkey"
DETAIL:  Key (file_uuid)=(c892d04c-1241-11ec-93d1-02420a0b1572/59333091-4d54-54ff-92be-bfe7139e0ee2/TheNumber.txt) already exists.

Related issue/s

@codecov
Copy link

codecov bot commented Sep 16, 2021

Codecov Report

Merging #2542 (f3fd394) into master (4680e0b) will decrease coverage by 0.0%.
The diff coverage is 62.5%.

❗ Current head f3fd394 differs from pull request most recent head 560616b. Consider uploading reports for the commit 560616b to get more accurate results
Impacted file tree graph

@@           Coverage Diff            @@
##           master   #2542     +/-   ##
========================================
- Coverage    77.0%   76.9%   -0.1%     
========================================
  Files         611     611             
  Lines       23445   23455     +10     
  Branches     2297    2300      +3     
========================================
- Hits        18055   18041     -14     
- Misses       4782    4805     +23     
- Partials      608     609      +1     
Flag Coverage Δ
integrationtests 66.1% <30.0%> (+<0.1%) ⬆️
unittests 71.2% <62.5%> (-0.2%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...erver/src/simcore_service_webserver/storage_api.py 48.0% <0.0%> (-1.0%) ⬇️
...service_storage/datcore_adapter/datcore_adapter.py 32.0% <33.3%> (-15.3%) ⬇️
...imcore_service_webserver/diagnostics_monitoring.py 79.4% <55.5%> (-9.3%) ⬇️
...ervices/storage/src/simcore_service_storage/dsm.py 75.4% <60.0%> (-0.3%) ⬇️
...src/simcore_service_storage/s3wrapper/s3_client.py 68.2% <85.7%> (-0.7%) ⬇️
...rage/datcore_adapter/datcore_adapter_exceptions.py 66.6% <0.0%> (-22.3%) ⬇️
...torage/datcore_adapter/datcore_adapter_settings.py 91.6% <0.0%> (-8.4%) ⬇️
...ector_v2/modules/comp_scheduler/background_task.py 92.1% <0.0%> (-2.0%) ⬇️
... and 5 more

@pcrespov pcrespov requested review from sanderegg, GitHK and mguidon and removed request for sanderegg and GitHK September 16, 2021 10:18
@pcrespov pcrespov self-assigned this Sep 16, 2021
@pcrespov pcrespov added this to the Capra delle nevi milestone Sep 16, 2021
@pcrespov pcrespov requested a review from KZzizzle September 16, 2021 10:19
@pcrespov pcrespov added the bug buggy, it does not work as expected label Sep 16, 2021
@pcrespov pcrespov requested a review from odeimaiz September 16, 2021 10:22
@pcrespov pcrespov changed the title 🐛 bugfix ITISFoundation/osparc-issues#542: large data copy 🐛 bugfix fails to copy studies with lots of data Sep 16, 2021
Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Please consider the below request.

Copy link
Contributor

@KZzizzle KZzizzle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fingers crossed! Thank you!

@pcrespov pcrespov merged commit 9addb14 into ITISFoundation:master Sep 16, 2021
@pcrespov pcrespov deleted the fix542/large-data-copy branch September 16, 2021 13:38
pcrespov added a commit that referenced this pull request Sep 16, 2021
Fixes for "Cannot create template studies with large data" ITISFoundation/osparc-issues#542

* Fix: use asyncio.TimeoutError and not built-in TimeoutError
* avoid auth if no tokens
* fixes unbound local error during exception handling of CancelledError
* fixes linter errors and cleanup
* increasing timeout
* Fixes unhandled exception on UniqueViolation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug buggy, it does not work as expected
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants