Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection pool is full, discarding connection | 'Connection aborted.', RemoteDisconnected('Remote end closed connection without response #1263

Closed
abinjaik opened this issue Feb 20, 2020 · 20 comments
Labels

Comments

@abinjaik
Copy link

abinjaik commented Feb 20, 2020

I am new to Locust Load testing framework and in process of migrating my existing Azure cloud based Performance testing C# scripts to Locust's Python based scripts. Our team almost completed migration of scripts. But during our load tests, we are getting errors as below, which fails to create new requests from the machine due to high CPU utilization or because of so many exception on Locust. We are running with Locust web based mode - details are indicated below. These scripts are working fine on smaller loads of 50 to 100 users. Issue happens only when we run tests with higher loads above 500 to 3500 users

"Error 1 -('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))"

"Error 2 : Connection pool is full, discarding connection"

Environment

Our Load testing configurations are : "3500 users at a hatch rate of 5 users per second". Running natively(no docker container) on a 8 Core , 16 Gb Linux Ubuntu Virtual machine on Azure. ulimit set as 50,000 on Linux machine.

Please help us with your thoughts

###sample code
import os
import sys
sys.path.append(os.environ.get('WORKDIR', os.getcwd()))

from locust import HttpLocust, TaskSet, task
from locust.wait_time import between

class ContactUsBehavior(TaskSet):

wait_time = AppUtil.get_wait_time_function(2)


@task(1)
def post_load_test_contact(self):
    data = { "ContactName" : "Mane"
        , "Email" : "[email protected]"
        , "EmailVerifaction" : "[email protected]"
        , "TelephoneContact" : ""
        , "PhoneNumber" : ""
        , "ContactReason" : "Other"            
        , "OtherComment" : "TEST Comments 2019-12-30"
        , "Agree" : "true"
         }
    self.client.post("app/contactform", self.client, 'Contact us submission', post_data = data)

class UnauthenticatedUser(HttpLocust):
task_set = ContactUsBehavior
# host is override-able
host = 'https://app.devurl.com/'

  • OS: Linux Ubuntu( acutally a VM on Azure) - Linux 5.0.0-1032-azure x86_64
  • Python version: Python 3.6.9
  • Locust version: locust 0.14.4
  • Locust command line that you ran:~/LocustPythonScripts$ locust -f LocustTestCases.py -H https://appURL/ GeneralUser
  • Locust file contents (anonymized if necessary): see above sample file
@abinjaik abinjaik added the bug label Feb 20, 2020
@cyberw
Copy link
Collaborator

cyberw commented Feb 20, 2020

Hi!

What is AppUtil?

What kind of throughput are you getting? You’ll need to run multiple load gen processes for real high throughputs (python is limited to one core). See documentation about distributed runs.

Are there warnings in the log about high cpu usage?

@abinjaik
Copy link
Author

abinjaik commented Feb 20, 2020

Hi! What is AppUtil? What kind of throughput are you getting? You’ll need to run multiple load gen processes for real high throughputs. See documentation about distributed runs. Are there warnings in the log about high cpu usage?

Its just a utility class which get wait time from global configuration.

`class AppUtil:

@classmethod
def get_wait_time_function(cls, taskset_avg_time):
    """
    Return global wait time configuration (int) if exists, or otherwise None 

    See https://github.com/locustio/locust/blob/master/locust/wait_time.py
    on why method is defined this way.
    """

    value = Configuration.get_wait_time()
    if (value is not None):
        return lambda instance: value 
    else:        
        return lambda instance: (taskset_avg_time - 1) + (random.random() * 2)    `

Did you mean running locust on Master-Slave config ? https://docs.locust.io/en/stable/running-locust-distributed.html ??

@cyberw
Copy link
Collaborator

cyberw commented Feb 20, 2020

Ok!

Yes :)

@cyberw
Copy link
Collaborator

cyberw commented Feb 20, 2020

.. but distributed shouldnt be necessary for such a simple test plan unless you’re running at least some hundreds of requests/s. You can also try FastHttpLocust https://docs.locust.io/en/stable/increase-performance.html

@abinjaik
Copy link
Author

.. but distributed shouldnt be necessary for such a simple test plan unless you’re running at least some hundreds of requests/s. You can also try FastHttpLocust https://docs.locust.io/en/stable/increase-performance.html

I will try soon and let you know with in a day or two. Please wait and thanks for the directions.

@abinjaik
Copy link
Author

abinjaik commented Feb 20, 2020

.. but distributed shouldnt be necessary for such a simple test plan unless you’re running at least some hundreds of requests/s. You can also try FastHttpLocust https://docs.locust.io/en/stable/increase-performance.html

@cyberw - we checked both FastHttpLocust and gevenhttp clients, but both dont have an attribute on the response object that helps with URL of the response. Means we are checking Url of http response on the default locust http client to compare where the user lands after certain actions. Do you have any thoughts?

@cyberw
Copy link
Collaborator

cyberw commented Feb 21, 2020

If FastHttpLocust doesnt fit your needs then it is probably best to stay on HttpLocust. Sounds like you have plenty of hardware, if you just run distributed.

What kind of throughput are you at when you get the problems? Are there any cpu usage warnings in your log?

@cyberw
Copy link
Collaborator

cyberw commented Feb 21, 2020

I had a look at requests framework and apparently "connection is full..." is not so bad: https://stackoverflow.com/questions/53765366/urllib3-connectionpool-connection-pool-is-full-discarding-connection

The second error message seems to be the server side dropping the connection (maybe the server does not allow 3500 concurrent connections?) https://stackoverflow.com/questions/48105448/python-http-server-client-remote-end-closed-connection-without-response-error

@Naren-Hub
Copy link

Naren-Hub commented Feb 27, 2020

@cyberw I too have been getting error1 (('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
My concern here is since the error is raised from Locust(client) but not produced by the server, i believe that is supposed to be ignored and shouldn't be counted as a failure. When it is showing it as a failure in the output report, it is assumed that the server is throwing the error.

Additionally, I believe this will not give accurate RPS also if it is counted in failure count.
RPS=Total Requests processed/(startTime-EndTime)
We are considering these as failures which means it is assumed that the request is processed but the server rejected it whereas indeed it has not reached the server.

Your thoughts?

cc: @abinjaik

@cyberw
Copy link
Collaborator

cyberw commented Feb 27, 2020

Hmm... I'm not sure what you mean @Naren-Hub , the "Remote end closed connection without response" is a server side error (or at least not locust/client side, it could also be network), and counting it as an error is the only reasonable behaviour.

The error is of course detected/raised on locust/client side, but locust didnt cause the error.

@Naren-Hub
Copy link

@cyberw I got that, Apologies.
Googling the error gave me all references to python related and my server is a spring-boot. It made me think that the error has something to do with Locust or it's internal libraries while creating users or something.
Anyways, thanks for answering patiently, cheers!

@abinjaik
Copy link
Author

@cyberw I tried running my existing code with regular Locust's HttpClient on Master-Slave configuration Linux servers. Master is 8 Core, 16GB Linux Ubuntu Machine & 2 Slave- 4 cores, 8 GB each. Every machine is getting CPU maxed out message easily. So tried to understand how many users cam be simulated with our code on a single machine with "standlone" mode. This shows only upto 200 users even on that 8 core machine... Sound weird?? When i checked on internet, it says Python only executes on single core at a time and it seems right, because even though my machine has 8 cores, locust is only utilizing one specific core.

Now , my question here is Can we run multiple slaves on a single 8 core machine ? So that I can utilize all cores on each machine.

@cyberw
Copy link
Collaborator

cyberw commented Feb 28, 2020

Absolutely. Run one slave per core to ensure good utilization.

@abinjaik
Copy link
Author

abinjaik commented Feb 28, 2020

Absolutely. Run one slave per core to ensure good utilization.

How Locust will pick the other core on the same machine? If we open a new SSH session execute same command will it pick other core?

@cyberw
Copy link
Collaborator

cyberw commented Feb 28, 2020

Absolutely. Run one slave per core to ensure good utilization.

How Locust will pick the other core on the same machine? If we open a new SSH session execute same command will it pick other core?

Python/locust is not bound to a specific core so you dont have to worry about that, the OS will distribute the processes to all your cores.

@abinjaik
Copy link
Author

If we open a new SSH session execute same command will it pick other core?

Thanks for replying. Just confirming one point - So If we open a new SSH session on same Linux machine to execute same --Slave command will it work?

@cyberw
Copy link
Collaborator

cyberw commented Feb 28, 2020

If we open a new SSH session execute same command will it pick other core?

Thanks for replying. Just confirming one point - So If we open a new SSH session on same Linux machine to execute same --Slave command will it work?

Yes. You may want to use the —expect-slaves parameter on master side to wait for all the slaves to connect before starting

@abinjaik
Copy link
Author

If we open a new SSH session execute same command will it pick other core?

Thanks for replying. Just confirming one point - So If we open a new SSH session on same Linux machine to execute same --Slave command will it work?

Yes. You may want to use the —expect-slaves parameter on master side to wait for all the slaves to connect before starting

Yes. That seems to be working , but need to test on higher loads... But Charts on Locust Dashboard died now. Do youknow why Charts and other tabs died.

@cyberw
Copy link
Collaborator

cyberw commented Feb 28, 2020

no idea, sorry...

@abinjaik
Copy link
Author

abinjaik commented Mar 3, 2020

Closing this issue. Because Connection aborted got solved by adding more slaves... I will create new issue charts , slave tab crashing problem... Thanks @cyberw for all your support

@abinjaik abinjaik closed this as completed Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants