From 5f3538595190ebc47761ae5ece41722c54785df1 Mon Sep 17 00:00:00 2001 From: George Fu Date: Thu, 12 Sep 2024 16:35:27 +0000 Subject: [PATCH] docs: create new pNode.js performance docs section and clean up upgrading docs --- UPGRADING.md | 31 ++- supplemental-docs/CLIENTS.md | 39 ++++ supplemental-docs/README.md | 5 + supplemental-docs/performance/README.md | 1 + .../performance/parallel-workloads-node-js.md | 181 ++++++++++++++++++ 5 files changed, 250 insertions(+), 7 deletions(-) create mode 100644 supplemental-docs/performance/parallel-workloads-node-js.md diff --git a/UPGRADING.md b/UPGRADING.md index 2c0c4ff9c3333..ab01956cfe316 100644 --- a/UPGRADING.md +++ b/UPGRADING.md @@ -54,12 +54,13 @@ This list is indexed by [v2 config parameters](https://docs.aws.amazon.com/AWSJa configure them by supplying a new `requestHandler`. Here's the example of setting http options in Node.js runtime. You can find more in [v3 reference for NodeHttpHandler](https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-smithy-node-http-handler/). - All v3 requests use HTTPS by default. You only need to provide custom httpsAgent. + All v3 requests use HTTPS by default. You can provide a custom agent via the `httpsAgent` + field of the `NodeHttpHandler` constructor input. ```javascript const { Agent } = require("https"); - const { Agent: HttpAgent } = require("http"); const { NodeHttpHandler } = require("@smithy/node-http-handler"); + const dynamodbClient = new DynamoDBClient({ requestHandler: new NodeHttpHandler({ httpsAgent: new Agent({ @@ -71,19 +72,19 @@ This list is indexed by [v2 config parameters](https://docs.aws.amazon.com/AWSJa }); ``` - If you are passing custom endpoint which uses http, then you need to provide httpAgent. + If you are using a custom endpoint which uses http, then you can provide an `httpAgent`. ```javascript const { Agent } = require("http"); const { NodeHttpHandler } = require("@smithy/node-http-handler"); const dynamodbClient = new DynamoDBClient({ + endpoint: "http://example.com", requestHandler: new NodeHttpHandler({ httpAgent: new Agent({ /*params*/ }), }), - endpoint: "http://example.com", }); ``` @@ -92,6 +93,7 @@ This list is indexed by [v2 config parameters](https://docs.aws.amazon.com/AWSJa ```javascript const { FetchHttpHandler } = require("@smithy/fetch-http-handler"); + const dynamodbClient = new DynamoDBClient({ requestHandler: new FetchHttpHandler({ requestTimeout: /*number in milliseconds*/ @@ -121,14 +123,16 @@ This list is indexed by [v2 config parameters](https://docs.aws.amazon.com/AWSJa - **v3**: **Deprecated**. Requests are _always_ asynchronous. - `xhrWithCredentials` - **v2**: Sets the "withCredentials" property of an XMLHttpRequest object. - - **v3**: Not available. SDK inherits [the default fetch configurations](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch) + - **v3**: the `fetch` equivalent field `credentials` can be set via constructor + configuration to the `requestHandler` config when using the browser + default `FetchHttpHandler`. - [`logger`](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#logger-property) - **v2**: An object that responds to .write() (like a stream) or .log() (like the console object) in order to log information about requests. - **v3**: No change. More granular logs are available in v3. - [`maxRedirects`](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#maxRedirects-property) - **v2**: The maximum amount of redirects to follow for a service request. - - **v3**: **Deprecated**. SDK _does not_ follow redirects to avoid unintentional cross-region requests. + - **v3**: **Deprecated**. SDK _does not_ follow redirects to avoid unintentional cross-region requests. S3 region redirects can be enabled separately with `followRegionRedirects=true` in the S3 Client only. - [`maxRetries`](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#maxRetries-property) - **v2**: The maximum amount of retries to perform for a service request. - **v3**: Changed to `maxAttempts`. See more in [v3 reference for RetryInputConfig](https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-smithy-middleware-retry/#maxattempts). @@ -179,6 +183,19 @@ This list is indexed by [v2 config parameters](https://docs.aws.amazon.com/AWSJa - **v2**: Whether to use the Accelerate endpoint with the S3 service. - **v3**: No change. +## Error handling + +Top level fields such as `error.code` and http response metadata like the +status code have slightly moved locations within the thrown error object +to subfields like `error.$metadata` or `error.$response`. + +This is because v3 more accurately follows the service models and avoids +adding metadata at the top level of the error object, which may conflict +with the structural error shape modeled by the services. + +See how error handling has changed in v3 +here: [ERROR_HANDLING](./supplemental-docs/ERROR_HANDLING.md). + ## Credential Providers In v2, the SDK provides a list of credential providers to choose from, as well as a credentials provider chain, @@ -348,7 +365,7 @@ variable. ### File System Credentials -- **v2**: [`FileSystemCredentials`](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/FileSystemCredentials.html) +- **v2**: [`FileSystemCredentials`](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/FileSystemCredentials.html) represents credentials from a JSON file on disk. - **v3**: **Deprecated**. You can explicitly read the JSON file and supply to the client. Please open a [feature request](https://github.com/aws/aws-sdk-js-v3/issues/new?assignees=&labels=feature-request&template=---feature-request.md&title=) diff --git a/supplemental-docs/CLIENTS.md b/supplemental-docs/CLIENTS.md index 6aa3598bbd176..c93b5bb309a1e 100644 --- a/supplemental-docs/CLIENTS.md +++ b/supplemental-docs/CLIENTS.md @@ -533,6 +533,45 @@ client.middlewareStack.add( await client.listBuckets({}); ``` +### Middleware Caching `cacheMiddleware`. + +> Available only in [v3.649.0](https://github.com/aws/aws-sdk-js-v3/releases/tag/v3.649.0) and later. + +By default (false), the middleware function stack is resolved every request, +because the user may modify the middleware stack by adding middleware to the +`client` or `command` instances at any time. + +By contrast, when `cacheMiddleware=true`, the creation of the middleware function stack +is cached on a per-client, per-command-class basis. + +In the following example, the S3 HeadObject Command is called 10 times, but +its middleware function stack is only created once, instead of once per request. + +```ts +// example: middleware caching +import { S3Client, HeadObjectCommand } from "@aws-sdk/client-s3"; + +const client = new S3Client({ cacheMiddleware: true }); + +for (let i = 0; i < 10; ++i) { + await client.send( + new HeadObjectCommand({ + Bucket: "...", + Key: String(i), + }) + ); +} +``` + +This caches the combination of `S3Client+HeadObjectCommand`'s resolved +`middlewareStack` upon the first request. This has two key effects: + +- request creation time is reduced by (up to) a few milliseconds per request +- modifying the middleware stack after requests have begun will have no effect. + +**Only enable this feature if you need the marginal increaese to +request performance, and are aware of its side-effects.** + ### Dual-stack `useDualstackEndpoint` This is a simple `boolean` setting that is present in most SDK Clients. diff --git a/supplemental-docs/README.md b/supplemental-docs/README.md index 2dd6973b86d9c..26394c29cc9f9 100644 --- a/supplemental-docs/README.md +++ b/supplemental-docs/README.md @@ -14,6 +14,11 @@ Upgrading from AWS SDK for JavaScript (v2) (https://github.com/aws/aws-sdk-js). Best practices for working within AWS Lambda using the AWS SDK for JavaScript (v3). +#### [Performance](./performance/README.md) + +Details what steps the AWS SDK team has taken to optimize performance of the SDK, +and includes tips for configuring the SDK to run efficiently. + #### [TypeScript](./TYPESCRIPT.md) TypeScript tips & FAQ related to this project. diff --git a/supplemental-docs/performance/README.md b/supplemental-docs/performance/README.md index 548fc65e69647..f0b3f233dbfff 100644 --- a/supplemental-docs/performance/README.md +++ b/supplemental-docs/performance/README.md @@ -14,3 +14,4 @@ Topics: - [Bundle Sizes](./bundle-sizes.md) - [Dynamic Imports](./dynamic-imports.md) - [Dependency File Count Reduction](./dependency-file-count-reduction.md) +- [Parallel workloads in Node.js](./parallel-workloads-node-js.md) diff --git a/supplemental-docs/performance/parallel-workloads-node-js.md b/supplemental-docs/performance/parallel-workloads-node-js.md new file mode 100644 index 0000000000000..a2495e1663182 --- /dev/null +++ b/supplemental-docs/performance/parallel-workloads-node-js.md @@ -0,0 +1,181 @@ +# Performance > Parallel workloads in Node.js + +Other sections such as bundle sizing, dependency count, and dynamic imports +cover aspects of performance related to the initial startup of your application. + +This section focuses on post-startup performance of request throughput. Specifically, +we cover performance configuration of the AWS SDK for JavaScript (v3) +in Node.js using HTTP/1.1 and the `node:https` module via the SDK's requestHandler +dependency, `@smithy/node-http-handler`. + +## What is a parallel workload? + +A parallel workload is any time you make more than one request +before the first request has completed. + +In single-threaded JavaScript, this is accomplished via the asynchronicity of `Promise`s. + +## Configuration options related to throughput + +Here is an example containing SDK Client configuration options that have +an effect on request throughput. + +```ts +// example: configuring an SDK client for throughput. +import { S3 } from "@aws-sdk/client-s3"; +import { NodeHttpHandler } from "@smithy/node-http-handler"; +import { Agent } from "node:https"; + +const s3 = new S3({ + /** + * Default is false. Setting this to true caches + * middleware resolution and prevents modifications + * to the middlewareStack from taking effect. + * + * Use only if you are not adding custom middleware. + */ + cacheMiddleware: true, + requestHandler: new NodeHttpHandler({ + httpsAgent: new Agent({ + /** + * Default is true. This should be left as true + * generally speaking, unless you have very specific + * use-case needing the alternative. + */ + keepAlive: true, + /** + * See expanded note below about sockets. + * You should use a number that is the size + * of your parallel workload batch. + */ + maxSockets: 50, + }), + }), +}); + +// shorthand syntax available since v3.521.0 +const client = new S3({ + requestHandler: { + requestTimeout: 3_000, + httpsAgent: { maxSockets: 50 }, + }, +}); +``` + +## Client instances + +In this SDK, much functionality is cached for performance reasons, but +the cache is usually associated with the client instance. In particular, +the following are cached on the client instance: + +- credentials fetched by async function calls + - if your client is configured to source credentials from a provider that includes + a network request and/or file-system read, this work is done once per client until + expiration of the credentials. If you instantiate a new client for every request, + this will slow things down substantially. +- middleware function stack when `cacheMiddleware=true` +- `node:https` Agent and its socket pool + +If you do need multiple instances of an SDK client, but don't want to +have separate credentials and socket pools, you can share +credentials and requestHandlers between clients. + +```ts +// example: credential and socket pool sharing from primary client. +import { S3 } from "@aws-sdk/client-s3"; + +const s3_east = new S3({ region: "us-east-1" }); + +const { credentials, requestHandler } = s3_east.config; + +const s3_west = new S3({ + region: "us-west-2", + credentials, + requestHandler, +}); +``` + +```ts +// example: credential and socket pool sharing from user instantiated objects. +import { S3 } from "@aws-sdk/client-s3"; +import { fromNodeProviderChain } from "@aws-sdk/credential-providers"; +import { NodeHttpHandler } from "@smithy/node-http-handler"; + +const credentials = fromNodeProviderChain(); +const requestHandler = new NodeHttpHandler({ + httpsAgent: { + maxSockets: 100, + }, +}); + +const s3_east = new S3({ region: "us-east-1", credentials, requestHandler }); +const s3_west = new S3({ region: "us-west-2", credentials, requestHandler }); +``` + +## Node.js Sockets + +The `node:https` Agent class manages sockets on your behalf. The most impactful configuration you can make for parallel workloads is to set +the value of `maxSockets`. + +Configuring the `maxSockets` value for the SDK's requestHandler should +be based on the parallelism or parallel workload batch size of your application +and usage scenario. + +- Configuring too few sockets leads to a slowdown as this is equivalent to + setting a lower cap on the parallel workload batch size. +- Configuring too many sockets can _also_ slow down your application. This is + because the application may open a new socket, which takes some CPU time, when + an existing socket was about to become free for reuse. + - configuring too many sockets can cause you to hit the file descriptor limit of the + operating system. This can manifest as `Error: EMFILE, too many open files` + in Node.js. + +## Example Scenario + +You have 10,000 files to upload to S3. + +- Uploading one at a time is too slow. +- Uploading all at once risks crashing your application process, or + being throttled by the server. + +#### Recommendataion + +Test your application to determine the right level of parallel request traffic. +After that, configure the `maxSockets` value to be equal to the batch size, or +a factor of it. + +```ts +// example: workload of 10,000 files, batch size of 100. +import { S3 } from "@aws-sdk/client-s3"; + +const files = [ + /*... */ +]; +const BATCH_SIZE = 100; + +const s3 = new S3({ + requestHandler: { + httpsAgent: { maxSockets: 100 }, + }, +}); + +const promises = []; +while (files.length) { + promises.push( + ...files.slice(0, BATCH_SIZE).map((file) => { + return s3.putObject({ + Bucket: "...", + Key: file.name, + Body: file.contents, + }); + }) + ); + await Promise.all(promises); + promises.length = 0; +} +``` + +In this example we've adhered to the best practices mentioned in this section: + +- use one client instance for repeated requests +- set a `maxSockets` value that is a factor of the batch size