Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request taking 20+ seconds to timeout #195

Closed
1 task
abentley-technolog-ltd opened this issue Nov 19, 2024 · 7 comments
Closed
1 task

Request taking 20+ seconds to timeout #195

abentley-technolog-ltd opened this issue Nov 19, 2024 · 7 comments
Labels
bug This issue is a bug. closed-for-staleness module/sys-mgr-ext p2 This is a standard priority issue response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.

Comments

@abentley-technolog-ltd
Copy link

abentley-technolog-ltd commented Nov 19, 2024

Describe the bug

Hi,

We are experiencing an intermittent issue where the application fails to load within the expected time frame. The app should load in a couple of seconds, but on some machines, it takes anywhere from 30 to 90 seconds.

Using Fiddler, we traced the issue to the latest/api/token request to 169.254.169.254, which always results in a 502 Bad Gateway error. While the request completes in a few milliseconds on some machines, it takes up to 20+ seconds on others before the 502 error is returned.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

The request to latest/api/token should either complete successfully or fail quickly with an appropriate error, rather than taking a prolonged amount of time before failing.
The application should load within a few seconds, not 30-90 seconds, regardless of the machine.

Current Behavior

The latest/api/token request consistently fails with a 502 Bad Gateway error.
On some machines, the request takes a few milliseconds to fail, while on others, it takes 20-90 seconds before the 502 error is returned.
This delay in the request affects the overall application load time, causing significant performance issues.

Reproduction Steps

Launch the application.
Monitor the loading time of the app.
Use Fiddler or another network trace tool to track the request to latest/api/token.
Observe that the request always fails with a 502 Bad Gateway error.
On some machines, the request completes in a few milliseconds; on others, it takes 20+ seconds before failing.

Possible Solution

No response

Additional Information/Context

Slow.txt
Fast.txt

AWS .NET SDK and/or Package version used

Not Affected: Amazon.Extensions.Configuration.SystemsManager 5.0.2
Affected: Amazon.Extensions.Configuration.SystemsManager 5.1.0
Affected: Amazon.Extensions.Configuration.SystemsManager 6.2.2

Targeted .NET Platform

.NET8

Operating System and version

Windows 11

@abentley-technolog-ltd abentley-technolog-ltd added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 19, 2024
@github-actions github-actions bot added the potential-regression Marking this issue as a potential regression to be checked by team member label Nov 19, 2024
@ashishdhingra
Copy link
Contributor

@abentley-technolog-ltd Good morning. Could you please share supporting data, including verbose logs, that support this issue to be potential regression? Verbose logs could be enabled using below statements:

Amazon.AWSConfigs.LoggingConfig.LogResponses = Amazon.ResponseLoggingOption.Always;
Amazon.AWSConfigs.LoggingConfig.LogTo = Amazon.LoggingOptions.Console;
Amazon.AWSConfigs.AddTraceListener("Amazon", new System.Diagnostics.ConsoleTraceListener());

The 5.x.x versions of Amazon.Extensions.Configuration.SystemsManager were released last year. The latest version is Amazon.Extensions.Configuration.SystemsManager (6.2.2). Potential regression is an issue which is introduced in recent version where the scenario was working in earlier version. Also, per https://www.nuget.org/packages/Amazon.Extensions.Configuration.SystemsManager (and CHANGELOG.md), there is no version 5.2.2.

Regarding IP address 169.254.169.254, it is the EC2 instance metadata endpoint (refer Access instance metadata for an EC2 instance). This package relies on AWS SDK for .NET to resolve credentials. Kindly refer Credential and profile resolution for exact order in which resolution is done, with EC2 instance metadata being last. For your scenario, if you are executing your application on an EC2 instance, you would need to enable EC2 instance metadata in case credentials are not configured at places before the EC2 instance metadata step. If EC2 instance metadata is not enabled, then the request might ultimately timeout.

Thanks,
Ashish

@ashishdhingra ashishdhingra added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Nov 19, 2024
@github-actions github-actions bot removed the potential-regression Marking this issue as a potential regression to be checked by team member label Nov 19, 2024
@abentley-technolog-ltd
Copy link
Author

abentley-technolog-ltd commented Nov 20, 2024

Hi @ashishdhingra,

Apologies, that should be of said version 6.2.2, not 5.2.2.
It will be difficult to get the supporting data as development machines do not present the connection hang that appears on our production machines.
These are on-premise machines, so the application is not being deployed to EC2. For authentication, we're using access keys with the BasicAWSCredentials

bool isDevelopment = Environment.GetEnvironmentVariable("DOTNET_ENVIRONMENT")! == "development";
if (!isDevelopment) // If not development, add parameter store as config provider.
{
	AwsConfiguration credential = GetAWSCredential();
	configurationBuilder.AddSystemsManager(source =>
	{
		source.Path = "/xyz_software";
		source.ParameterProcessor = new JsonParameterProcessor();
		source.AwsOptions = new()
		{
			Credentials = new BasicAWSCredentials(credential.AccessKey, credential.SecretKey),
			Region = RegionEndpoint.GetBySystemName(credential.Region)
		};
	});
}

@ashishdhingra
Copy link
Contributor

Hi @ashishdhingra,

Apologies, that should be of said version 6.2.2, not 5.2.2. It will be difficult to get the supporting data as development machines do not present the connection hang that appears on our production machines. These are on-premise machines, so the application is not being deployed to EC2. For authentication, we're using access keys with the BasicAWSCredentials

bool isDevelopment = Environment.GetEnvironmentVariable("DOTNET_ENVIRONMENT")! == "development";
if (!isDevelopment) // If not development, add parameter store as config provider.
{
	AwsConfiguration credential = GetAWSCredential();
	configurationBuilder.AddSystemsManager(source =>
	{
		source.Path = "/xyz_software";
		source.ParameterProcessor = new JsonParameterProcessor();
		source.AwsOptions = new()
		{
			Credentials = new BasicAWSCredentials(credential.AccessKey, credential.SecretKey),
			Region = RegionEndpoint.GetBySystemName(credential.Region)
		};
	});
}

@abentley-technolog-ltd Thanks for the information. Could you please share the following:

  • Are production machines also non-EC2 instances?
  • Are you also using BasicAWSCredentials on production machines? If yes, per Credential and profile resolution, credential resolution should finish successfully at first step and should not have hit EC2 IMDS endpoint. However, since your network wire log states this endpoint being queried, I suspect your application might be relying on EC2 instance metadata for temporary credentials.

Thanks,
Ashish

@ashishdhingra ashishdhingra added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. module/sys-mgr-ext and removed response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Nov 20, 2024
@abentley-technolog-ltd
Copy link
Author

Yes, the production machines are not EC2's and are using Access keys.

Thanks,
Adam

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Nov 22, 2024
@ashishdhingra
Copy link
Contributor

ashishdhingra commented Nov 22, 2024

Yes, the production machines are not EC2's and are using Access keys.

Thanks, Adam

@abentley-technolog-ltd Good morning. We might need the verbose logs to find the exact source of latency. Your network logs latency caused by IP address 169.254.169.254, which is the EC2 instance metadata endpoint. Unsure if your production network fabric has a different configuration than for the development machines.

You may try using any of the below workarounds and see if it works for you (taken from aws/aws-sdk-net#2546):

  • Set the AWS_EC2_METADATA_DISABLED environment variable to false to disable the EC2 instance metadata lookup.

  • Explicitly set the DefaultConfigurationMode like below:

    bool isDevelopment = Environment.GetEnvironmentVariable("DOTNET_ENVIRONMENT")! == "development";
    if (!isDevelopment) // If not development, add parameter store as config provider.
    {
    	AwsConfiguration credential = GetAWSCredential();
        configurationBuilder.AddSystemsManager(source =>
        {
        	source.Path = "/xyz_software";
        	source.ParameterProcessor = new JsonParameterProcessor();
      	source.AwsOptions = new()
      	{
      		Credentials = new BasicAWSCredentials(credential.AccessKey, credential.SecretKey),
      		Region = RegionEndpoint.GetBySystemName(credential.Region),
                           DefaultConfigurationMode = DefaultConfigurationMode.Standard
      	};
      });
    }

Thanks,
Ashish

@ashishdhingra ashishdhingra added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Nov 22, 2024
Copy link

This issue has not received a response in 5 days. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Nov 28, 2024
@github-actions github-actions bot closed this as completed Dec 1, 2024
@abentley-technolog-ltd
Copy link
Author

I have done an initial test and DefaultConfigurationMode = DefaultConfigurationMode.Standard appears to resolve the issue. But I will confirm once we've done a full deployment to production.

Thanks,
Adam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. closed-for-staleness module/sys-mgr-ext p2 This is a standard priority issue response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.
Projects
None yet
Development

No branches or pull requests

2 participants