Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket connection fails without connection.on(error) getting called. #445

Open
Epick362 opened this issue Nov 22, 2023 · 5 comments
Open
Assignees
Labels
bug This issue is a bug. p2 This is a standard priority issue

Comments

@Epick362
Copy link

Epick362 commented Nov 22, 2023

Describe the bug

I am using a backend "secrets" endpoint (using STS assumeRole) that generates credentials for MQTT connection for the current user. This works well for our use case except for the fact that these credentials expire and even though there is code on the client to attempt a reconnect (request new set of credentials), the connection.on(error) event handler does not get called. I see this in the console:

GET
wss://x-ats.iot.eu-west-1.amazonaws.com/mqtt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=Z&X-Amz-Date=20231122T152819Z&X-Amz-SignedHeaders=host&X-Amz-Signature=Y&X-Amz-Security-Token=X

Firefox can’t establish a connection to the server at wss://x-ats.iot.eu-west-1.amazonaws.com/mqtt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=Z&X-Amz-Date=20231122T152819Z&X-Amz-SignedHeaders=host&X-Amz-Signature=Y&X-Amz-Security-Token=X.

Is there a way to react to this error? I looked into websocket options in the client config but couldn't find anything relevant.
This is how I build the connection on client after receiving credentials:

    const client_bootstrap = new io.ClientBootstrap()
    const config_builder = iot.AwsIotMqttConnectionConfigBuilder.new_with_websockets()

    const clientId = `X-${user?.id}-${uuidv4()}`
    config_builder.with_client_id(clientId)
    config_builder.with_endpoint(credentialsResponse.credentials.iotEndpoint)
    config_builder.with_credentials(
      credentialsResponse.credentials.region,
      credentialsResponse.credentials.accessKey,
      credentialsResponse.credentials.secretKey,
      credentialsResponse.credentials.sessionToken,
    )

    const config = config_builder.build()
    const client = new mqtt.MqttClient(client_bootstrap)
    const connection = client.new_connection(config)

and these are the relevant event handlers I've tried

        connection.on('disconnect', () => {
          this.connected = false
          log.warn('MQTT connection disconnect')

          this.attemptReconnect()
        })

        connection.on('interrupt', () => {
          if (!this.connected) {
            return
          }

          this.connected = false
          log.warn('MQTT connection interrupt')

          this.attemptReconnect()
        })

        connection.on('error', (e) => {
          this.connected = false
          console.error('MQTT connection error')

          reject(e)

          // Queue exponential backoff reconnect
          this.attemptReconnect()
        })

Expected Behavior

The connection.on(error) handler gets invoked and I can manage new credentials request.

Current Behavior

My client SDK is stuck without websocket connection and thinks it is connected successfully.

Reproduction Steps

Initiate a websocket MQTT connection using temporary credentials and wait for them to timeout.

Possible Solution

No response

Additional Information/Context

No response

SDK version used

1.17.0

Environment details (OS name and version, etc.)

any browser

@Epick362 Epick362 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 22, 2023
@jmklix
Copy link
Member

jmklix commented Nov 23, 2023

Thanks for the detailed write up @Epick362. As you noticed the sdk tries to reconnect and you should leave it to the sdk to reconnect for you. Rather than trying to handle this error on your end can you help us figure out what is causing this re-connection failure by answering the following questions:

  • How long are you running this before you see this connection failure?
  • Does it happen every time you re-connect?
  • Is the failure to re-connect happen right after the temporary credentials expire?
  • Can you provide a compile minimal sample that reproduces this?

@jmklix jmklix added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 2 days. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Nov 23, 2023
@jmklix jmklix self-assigned this Nov 23, 2023
Copy link

Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 5 days unless further comments are made. label Nov 25, 2023
@Epick362
Copy link
Author

Epick362 commented Nov 27, 2023

* How long are you running this before you see this connection failure?

Up to 12 hours, for which the credentials are valid.

* Does it happen every time you re-connect?

No, while the credentials provided by backend service are valid, the service reconnects.

* Is the failure to re-connect happen right after the temporary credentials expire?

Yes.

* Can you provide a compile minimal sample that reproduces this?

Okay, I will try but as parts happen on server and parts in client it is hard to compile.

Server credentials:

    const iot = new AWS.Iot()
    const sts = new AWS.STS()
    
    const [{ endpointAddress: iotEndpoint }, { Account: accountId }] = await Promise.all([
      // https://aws.amazon.com/blogs/iot/aws-iot-core-ats-endpoints/
      iot.describeEndpoint({ endpointType: 'iot:Data-ATS' }).promise(),
      sts.getCallerIdentity().promise()
    ])

    const allowedTopics = [
      `*/notifications/*/${uid}`,
    ]
    const statements = allowedTopics.reduce(
      (acc, topicString) => {
        const subscribeStatement = {
          Effect: 'Allow',
          Action: ['iot:Subscribe'],
          Resource: [`arn:aws:iot:${region}:${accountId}:topicfilter/${topicString}`]
        }

        const recieveStatement = {
          Effect: 'Allow',
          Action: ['iot:Receive'],
          Resource: [`arn:aws:iot:${region}:${accountId}:topic/${topicString}`]
        }

        acc.push(subscribeStatement, recieveStatement)

        return acc
      },
      [
        {
          Effect: 'Allow',
          Action: ['iot:Connect'],
          Resource: ['*']
        }
      ]
    )

    const roleParams = {
      RoleArn: `arn:aws:iam::${accountId}:role/${ROLE_NAME}`,
      RoleSessionName: uid,
      Policy: JSON.stringify({
        Version: '2012-10-17',
        Statement: statements
      })
    }

    const { Credentials } = await sts.assumeRole(roleParams).promise()

   return {
        credentials: {
          iotEndpoint,
          region,
          accessKey: Credentials.AccessKeyId,
          secretKey: Credentials.SecretAccessKey,
          sessionToken: Credentials.SessionToken
        }
      }

Client side:

export class NotificationStore {
  @observable connected = false
  connection?: mqtt.MqttClientConnection
  listeners: Map<string, Subscription[]> = new Map()
  reconnectionTimeout: number = RECONNECT_TIMEOUT_FLOOR

  constructor(main: MainStore) {
    this.main = main

    this.connect()
  }

  private async establishConnection(): Promise<mqtt.MqttClientConnection | undefined> {
    const credentialsResponse = await this.main.notifications.getCredentials()

    if (!credentialsResponse?.credentials) {
      console.warn('Unable to get IoT credentials')
      return
    }

    const client_bootstrap = new io.ClientBootstrap()
    const config_builder = iot.AwsIotMqttConnectionConfigBuilder.new_with_websockets()

    const clientId = `client-${this.main.userStore.user?.id}-${uuidv4()}`
    config_builder.with_client_id(clientId)
    config_builder.with_endpoint(credentialsResponse.credentials.iotEndpoint)
    config_builder.with_credentials(
      credentialsResponse.credentials.region,
      credentialsResponse.credentials.accessKey,
      credentialsResponse.credentials.secretKey,
      credentialsResponse.credentials.sessionToken,
    )

    const config = config_builder.build()
    const client = new mqtt.MqttClient(client_bootstrap)
    const connection = client.new_connection(config)

    return connection
  }

  async connect(): Promise<boolean> {
    if (this.connected) {
      log.warn('Already connected to IoT')
      return false
    }

    const connection = await this.establishConnection()

    if (!connection) {
      log.warn('Unable to establish connection to IoT')

      this.connected = false
      return false
    }

    this.connection = connection

    try {
      await new Promise((resolve, reject) => {
        connection.on('connect', () => {
          log.debug('connected IoT')

          this.connected = true
          this.reconnectionTimeout = RECONNECT_TIMEOUT_FLOOR // reset timeout

          this.resolveSubscriptionQueue()

          resolve(null)
        })

        connection.on('message', (topic: string, payload) => {
          // ...
        })

        connection.on('resume', () => {
          log.debug('resumed IoT')
        })

        connection.on('disconnect', () => {
          this.connected = false
          log.warn('MQTT connection disconnect')

          this.attemptReconnect()
        })

        connection.on('interrupt', () => {
          if (!this.connected) {
            return
          }

          this.connected = false
          log.warn('MQTT connection interrupt')

          this.attemptReconnect()
        })

        connection.on('error', (e) => {
          this.connected = false
          console.error('MQTT connection error')

          reject(e)

          // Queue exponential backoff reconnect
          this.attemptReconnect()
        })
      })

      return true
    } catch (e) {
      if (e instanceof Error) {
        e.message.includes('Connection closed')
        return false
      }
      log.error(e)

      return false
    }
  }

  attemptReconnect() {
    setTimeout(() => {
      log.debug('Attempting reconnect')
      this.connect()
    }, this.reconnectionTimeout * 1000)

    console.log(`Setting reconnect timeout to ${this.reconnectionTimeout} seconds`)

    // increase timeout, exponentially
    this.reconnectionTimeout = this.reconnectionTimeout ** 2
  }
}

@github-actions github-actions bot removed closing-soon This issue will automatically close in 5 days unless further comments are made. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 2 days. labels Nov 27, 2023
@doubliez
Copy link

I ran into a similar issue where I need to use temporary credentials generated with STS AssumeRole to connect to MQTT over WebSockets (using Sigv4). If I simply pass the generated credentials using AwsCredentialsProvider.newStatic(...), I'm going to have an issue when the client reconnects and attempts to re-use expired credentials.

I'm using the SDK in Node.js, and it doesn't seem like AssumeRole is supported natively for reconnects.

But looking at the code, I came up with the following workaround which works for me. I override builder.config.websocketHandshakeTransform to call AssumeRole whenever the WebSocket handshake happens. A bit hacky but until there's a native solution for this, that's what I'm going to use:

import { auth, CrtError, http, iot, mqtt5 } from 'aws-iot-device-sdk-v2';
import { STSClient, AssumeRoleCommand } from '@aws-sdk/client-sts';

const sts = new STSClient();

const builder = iot.AwsIotMqtt5ClientConfigBuilder
    .newWebsocketMqttBuilderWithSigv4Auth(hostName, {
        region: process.env.AWS_REGION
    })
    .withConnectProperties({
        clientId: 'CLIENT_ID',
        keepAliveIntervalSeconds: 1200
    });

// Override `websocketHandshakeTransform`
// Need to cast `builder` as `any` to get around build issues because `builder.config` is private
(builder as any).config.websocketHandshakeTransform = async (
    request: http.HttpRequest, done: (error_code?: number) => void
) => {
    try {
        const command = new AssumeRoleCommand({
            RoleArn: 'arn:aws:iam::012345678901:role/my-role',
            RoleSessionName: 'session-name'
        });
        const response = await sts.send(command);
        const credentialsProvider = auth.AwsCredentialsProvider.newStatic(
            response.Credentials?.AccessKeyId!,
            response.Credentials?.SecretAccessKey!,
            response.Credentials?.SessionToken
        );
        const signingConfig : auth.AwsSigningConfig = {
            algorithm: auth.AwsSigningAlgorithm.SigV4,
            signature_type: auth.AwsSignatureType.HttpRequestViaQueryParams,
            provider: credentialsProvider as auth.AwsCredentialsProvider,
            region: process.env.AWS_REGION || 'us-west-2',
            service: 'iotdevicegateway',
            signed_body_value: auth.AwsSignedBodyValue.EmptySha256,
            omit_session_token: true
        };

        await auth.aws_sign_request(request, signingConfig);
        done();
    } catch (error) {
        if (error instanceof CrtError) {
            done(error.error_code);
        } else {
            done(3); /* TODO: AWS_ERROR_UNKNOWN */
        }
    }
};

const client = new mqtt5.Mqtt5Client(builder.build());

client.start();

@picitujeromanov
Copy link

I usually just setup interval to check when the credential expire and get new ones 15 minutes before that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

4 participants