Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

14 unit tests fail with "ConnectionError (to Aquarius?) #245

Closed
trentmc opened this issue Mar 20, 2021 · 8 comments
Closed

14 unit tests fail with "ConnectionError (to Aquarius?) #245

trentmc opened this issue Mar 20, 2021 · 8 comments
Assignees
Labels
Priority: High Type: Bug Something isn't working

Comments

@trentmc
Copy link
Member

trentmc commented Mar 20, 2021

Describe the bug
14 unit tests fail due to requests.exceptions.ConnectionError.

This is probably an issue with Aquarius, and maybe Provider; I'm putting it in ocean.py because this is where I discovered it and how to reproduce the bug.

To Reproduce
Steps to reproduce the behavior:

  1. Go to developers.md
  2. Go through all the steps, up to and including running: pytest
  3. pytest will return 14 failed

Expected behavior
All unit tests pass.

Logs

================================================= short test summary info ==================================================
FAILED ocean_lib/assets/test/test_asset_downloader.py::test_ocean_assets_download_failure - requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
FAILED ocean_lib/assets/test/test_asset_downloader.py::test_ocean_assets_download_indexes - requests.exceptions.Connectio...
FAILED ocean_lib/assets/test/test_asset_downloader.py::test_ocean_assets_download_destination_file - requests.exceptions....
FAILED ocean_lib/ocean/test/test_ocean_assets.py::test_register_asset - requests.exceptions.ConnectionError: ('Connection...
FAILED ocean_lib/ocean/test/test_ocean_assets.py::test_ocean_assets_search - requests.exceptions.ConnectionError: ('Conne...
FAILED ocean_lib/ocean/test/test_ocean_assets.py::test_ocean_assets_validate - requests.exceptions.ConnectionError: ('Con...
FAILED ocean_lib/ocean/test/test_ocean_assets.py::test_ocean_assets_algorithm - requests.exceptions.ConnectionError: ('Co...
FAILED ocean_lib/ocean/test/test_ocean_assets.py::test_ocean_assets_compute - requests.exceptions.ConnectionError: ('Conn...
FAILED ocean_lib/ocean/test/test_ocean_assets.py::test_download_fails - requests.exceptions.ConnectionError: ('Connection...
FAILED tests/integration/test_compute_flow.py::test_compute_raw_algo - requests.exceptions.ConnectionError: ('Connection ...
FAILED tests/integration/test_compute_flow.py::test_compute_multi_inputs - requests.exceptions.ConnectionError: ('Connect...
FAILED tests/integration/test_market_flow.py::test_market_flow[implicit_none] - requests.exceptions.ConnectionError: ('Co...
FAILED tests/integration/test_market_flow.py::test_market_flow[explicit_none] - requests.exceptions.ConnectionError: ('Co...
FAILED tests/integration/test_market_flow.py::test_payer_market_flow - requests.exceptions.ConnectionError: ('Connection ...
================================== 14 failed, 108 passed, 3 skipped in 134.56s (0:02:14) ===================================

Here's a snippet from the log of the first failing unit test.

ocean_lib/ocean/ocean_assets.py:234: in create
    if did in self._get_aquarius().list_assets():
venv/lib/python3.8/site-packages/ocean_utils/aquarius/aquarius.py:65: in list_assets
    response = self.requests_session.get(self._base_url).content
venv/lib/python3.8/site-packages/requests/sessions.py:555: in get
    return self.request('GET', url, **kwargs)
venv/lib/python3.8/site-packages/requests/sessions.py:542: in request
    resp = self.send(prep, **send_kwargs)
venv/lib/python3.8/site-packages/requests/sessions.py:655: in send
    r = adapter.send(request, **kwargs)

Full log in my console (to the extent that my console captured it):
log.txt

Note: the barge console logs didn't have any WARNING or ERROR messages.

High priority because it makes unit tests fail.

Maybe related, just reported today too:

Maybe related, reported >1 week ago, issue is closed:

@trentmc trentmc added Type: Bug Something isn't working Priority: High labels Mar 20, 2021
@trentmc trentmc changed the title 14 unit tests fail with "ConnectionError" 14 unit tests fail with "ConnectionError (to Aquarius?) Mar 20, 2021
@calina-c
Copy link
Contributor

My leading theory here is memory issues + Docker limits. Investigating.

@calina-c
Copy link
Contributor

calina-c commented Mar 25, 2021

I remember getting these at one point with tox. And the solution was to prune and restart the volumes, which worked. It's possible some issues happened inbetween v.2.2.4 and v2.2.6, going through v2.2.5, which suppported ES queries but still had some wrong assertions, causing failures. My local system can not run provider2 (Docker has memory leaks on Mac, and that's apparently the straw that breaks the camel's back), so I always get one failure from compute flow. I've run tests a few times now, and it seems to reproduce randomly, only for one test (market flow).

Clues:

  • with a fresh barge install, I get a failed market_flow test only on the 3rd run of the full tests
  • running only the market flow test, independently, does not fail

Since this takes a lot of precious time (11 mins per full test run and only reproduces in some full runs), I'll give it another 2-3 runs. But if I can't reproduce it, I'll continue working with ocean.py as usually, and see if it pops up again.

@calina-c
Copy link
Contributor

calina-c commented Mar 25, 2021

Finally found the most likely issue. On the market_flow error I got, I traced it back to:

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='s3.amazonaws.com', port=443): Read timed out. in barge.

I checked usages of s3 in ocean.py and they coincide with the usage of the metadata() fixture, market_flow and compute_flow (a bigger file for the last cases). So even though I can't fully reproduce it, I've reached the following most likely conclusion:

It happens when my Internet connection is slow or the S3 is slow, and I can't connect to the sample files on amazon quick enough (there's a max timeout set for connections). I tried reproducing it by turning off WiFi during testing, but it's not possible, since the data provider first checks the connection itself. The issue is not with NO connection, but with SLOW connection, and depends both on my Internet speed and S3 itself. I don't know how I can simulate that. And since it only happens in rare cases, I don't know how much time I should keep spending on this one.

@trentmc
Copy link
Member Author

trentmc commented Mar 25, 2021

When I saw the error I did not have internet issues.

It happened repeatedly.

I wish I could say "sure ignore" but right now I don't think we can.

@calina-c
Copy link
Contributor

When I saw the error I did not have internet issues.

It happened repeatedly.

Could that have been issues from S3? Even if your Internet was fine, if S3 was slow you'd get the same failures. If it happened repeatedly and then stopped, we might be able to check whether S3 was slow that day and/or something was fixed.

@trentmc
Copy link
Member Author

trentmc commented Mar 26, 2021

[Trent] When I saw the error I did not have internet issues.

[Calina] I remember getting these at one point with tox. And the solution was to prune and restart the volumes, which worked

It is possible that I did not clean my volumes. I thought I did, but you never know.

So I just did everything fresh again.

It worked. :) That is, the errors reported above went away.

So, I'm fine if you close the ticket. We can reopen it if the issue re-emerges. (Or go ahead and keep chasing leads if you have other things you want to investigate).

@calina-c
Copy link
Contributor

Let's reopen if it arises again. I'll keep in mind the S3 issue if pruning is just a coincidence.

@trentmc
Copy link
Member Author

trentmc commented Mar 26, 2021

Note: not all tests passed. There were two errors that I hadn't seen before. I reported them in #255, since scope looks different than this ticket. Though it does appear related to aquarius too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: High Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants