Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(ecs): default to stable fluentbit image instead of latest #16403

Open
ayozemr opened this issue Sep 7, 2021 · 15 comments
Open

(ecs): default to stable fluentbit image instead of latest #16403

ayozemr opened this issue Sep 7, 2021 · 15 comments
Labels
@aws-cdk/aws-ecs Related to Amazon Elastic Container breaking-change This issue requires a breaking change to remediate. effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2

Comments

@ayozemr
Copy link

ayozemr commented Sep 7, 2021

❓ General Issue

The Question

I am affected by a bug in Fluentbit that causes DNS problems when using firelens log driver, almost with datadog. (aws/aws-for-fluent-bit#233). As of now, working solution is to not use latest image and use 2.19.0, which I have tested working.

My problem is that using CDK I don't see how I can set to use a fixed fluentbit version in the "log-router" container, as that container definition is created by cloudformation when we specify the logrouter in CDK/CFN in the app container taskdef, the system adds that log-router sidecar container with the version set to latest automaticly.

Its possible to change that sidecar container image from latest to 2.19.0?

Environment

  • CDK CLI Version: 1.121.0 (build 026cb8f)
  • Module Version: 1.121.0
  • Node.js Version: 14
  • OS: macOS
  • Language (Version): TypeScript

Other information

  logging: ecs.LogDrivers.firelens({
    options: {
      Name: 'datadog',
      provider: 'ecs',
      TLS: 'on',
      dd_message_key: 'log',
      dd_service: 'test_service,
      dd_source: 'java',
      dd_tags: 'env:test',
    },
    secretOptions: {
      apikey: ecs.Secret.fromSsmParameter(
        REDACTED
      ),
    },
  }),

Thanks for your time!!

@ayozemr ayozemr added guidance Question that needs advice or information. needs-triage This issue or PR still needs to be triaged. labels Sep 7, 2021
@github-actions github-actions bot added the @aws-cdk/aws-ecs Related to Amazon Elastic Container label Sep 7, 2021
@peterwoodworth
Copy link
Contributor

Hey @ayozemr,

To help me fully understand your problem, can you share the relevant cdk synth output you'd like to see changed, and can you share the code which creates your container? Thanks!

@peterwoodworth peterwoodworth added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed needs-triage This issue or PR still needs to be triaged. labels Sep 7, 2021
@ayozemr
Copy link
Author

ayozemr commented Sep 7, 2021

Sure, there we go. I redacted some possible sensitive values.

The thing is the "log-router" container auto created has an image linked to a param out of my control, that checking ECS taskdefinition json, translates to "image": "906394416424.dkr.ecr.us-east-1.amazonaws.com/aws-for-fluent-bit:latest"

So the idea is if there is any possibility that "log-router" image is any other tag instead of "latest" until they fix the bug, so I can make that "log-router" container uses "906394416424.dkr.ecr.us-east-1.amazonaws.com/aws-for-fluent-bit:2.19.0" instead of latest.

Relevant code:

CDK:

    const taskdefFamily = `${buildConfig.App}-${buildConfig.Environment}-dbmigrations`;
    const migrationTaskDef = new ecs.TaskDefinition(this, 'MigrationsTaskDef', {
      family: taskdefFamily,
      compatibility: ecs.Compatibility.FARGATE,
      cpu: '256',
      memoryMiB: '512',
      networkMode: ecs.NetworkMode.AWS_VPC,
      taskRole: taskdefRole,
      executionRole: taskExecutionRole,
    });

    migrationTaskDef.addContainer('FlywayContainer', {
      image: ecs.RepositoryImage.fromEcrRepository(ecrRepository, 'latest'),
      memoryLimitMiB: 512,
      environment: {
        ENV: buildConfig.Environment,
        FLYWAY_URL: `jdbc:postgresql://${dbcredentials
          .secretValueFromJson('host')
          .toString()}:${dbcredentials
          .secretValueFromJson('port')
          .toString()}/db`,
      },
      logging: ecs.LogDrivers.firelens({
        options: {
          Name: 'datadog',
          provider: 'ecs',
          TLS: 'on',
          dd_message_key: 'log',
          dd_service: `${buildConfig.App}-Migrations`,
          dd_source: 'java',
          dd_tags: `env:${buildConfig.Environment}`,
        },
        secretOptions: {
          apikey: ecs.Secret.fromSsmParameter(
            ssm.StringParameter.fromStringParameterName(
              this,
              'DatadogApiKey',
              '/datadog/APIKEY'
            )
          ),
        },
      }),
    });

// -----------------------------
Synth cloudformation:

  MigrationsTaskDef1F4C395A:
    Type: AWS::ECS::TaskDefinition
    Properties:
      ContainerDefinitions:
        - Environment:
            - Name: ENV
              Value: stage
            - Name: FLYWAY_URL
              Value: REDACTED
          Essential: true
          Image:
            Fn::Join:
              - ""
              - - Fn::Select:
                    - 4
                    - Fn::Split:
                        - ":"
                        - Fn::GetAtt:
                            - MigrationsServiceDECCB9A7
                            - Arn
                - .dkr.ecr.
                - Fn::Select:
                    - 3
                    - Fn::Split:
                        - ":"
                        - Fn::GetAtt:
                            - MigrationsServiceDECCB9A7
                            - Arn
                - "."
                - Ref: AWS::URLSuffix
                - /
                - Ref: MigrationsServiceDECCB9A7
                - :latest
          LogConfiguration:
            LogDriver: awsfirelens
            Options:
              Name: datadog
              provider: ecs
              TLS: "on"
              dd_message_key: log
              dd_service: svc-Migrations
              dd_source: java
              dd_tags: env:stage
            SecretOptions:
              - Name: apikey
                ValueFrom: REDACTED-PARAM
          Memory: 512
          Name: app
        - Essential: true
          FirelensConfiguration:
            Type: fluentbit
          Image:
            Ref: SsmParameterValueawsserviceawsforfluentbitlatestC96584B6F00A464EAD1953AFF4B05118Parameter
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group:
                Ref: MigrationsTaskDeflogrouterLogGroupAFD29B37
              awslogs-stream-prefix: firelens
              awslogs-region: us-east-1
          MemoryReservation: 50
          Name: log-router
      Cpu: "256"
      ExecutionRoleArn:
        Fn::GetAtt:
          - TaskExecRoleA2BBB60C
          - Arn
      Family: REDACTED
      Memory: "512"
      NetworkMode: awsvpc
      RequiresCompatibilities:
        - FARGATE
      Tags:
        - Key: app
          Value: appname
        - Key: environment
          Value: stage
        - Key: stack
          Value: stackname
      TaskRoleArn:
        Fn::GetAtt:
          - TaskRole30FC0FBB
          - Arn
    Metadata:
      aws:cdk:path: stackname/MigrationsTaskDef/Resource

@peterwoodworth peterwoodworth removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Sep 7, 2021
@peterwoodworth
Copy link
Contributor

peterwoodworth commented Sep 7, 2021

This is where the image is set:

private renderContainers() {
// add firelens log router container if any application container is using firelens log driver,
// also check if already created log router container
for (const container of this.containers) {
if (container.logDriverConfig && container.logDriverConfig.logDriver === 'awsfirelens'
&& !this.containers.find(x => x instanceof FirelensLogRouter)) {
this.addFirelensLogRouter('log-router', {
image: obtainDefaultFluentBitECRImage(this, container.logDriverConfig),
firelensConfig: {
type: FirelensLogRouterType.FLUENTBIT,
},
logging: new AwsLogDriver({ streamPrefix: 'firelens' }),
memoryReservationMiB: 50,
});
break;
}
}

The obtainDefaultFluentBitECRImage function used for the image here will return this image:

const fluentBitImageTag = imageTag || 'latest';
const fluentBitImage = `${fluentBitImageSSMPath}/${fluentBitImageTag}`;
// Not use ContainerImage.fromEcrRepository since it's not support parsing ECR repo URI,
// use repo ARN might result in complex Fn:: functions in cloudformation template.
return ContainerImage.fromRegistry(ssm.StringParameter.valueForStringParameter(task, fluentBitImage));

fluentBitImageTag will always be latest in this case, because imageTag isn't passed in as an argument to obtainDefaultFluentBitECRImage(...). So there's no direct way to modify this by the user before the CDK creates the CloudFormation output with the latest image.

Unfortunately, the use of escape hatches won't work here because CfnTaskDefinition.containerDefinitions is a lazy value. I'll have to research to see if there's a workaround for this.

@skinny85
Copy link
Contributor

skinny85 commented Sep 7, 2021

Hey @ayozemr ,

I think you can use the public addFirelensLogRouter() method of TaskDefinition to supply the FireLens router, using any image you want. Something like:

        migrationTaskDef.addFirelensLogRouter('log-router', {
            image: your-image-here,
            firelensConfig: {
                type: FirelensLogRouterType.FLUENTBIT,
            },
            logging: new AwsLogDriver({ streamPrefix: 'firelens' }),
            memoryReservationMiB: 50,
        });

Thanks,
Adam

@skinny85
Copy link
Contributor

skinny85 commented Sep 8, 2021

BTW, I think this should morph into a feature request to be able to more easily customize the image for the firelens() method.

@ayozemr
Copy link
Author

ayozemr commented Sep 8, 2021

Thanks @peterwoodworth for your time.

@skinny85 I have tried that but ended having problem when pulling the non latest image 🤷 . Had to use fromRepositoryArn because fromRepositoryName was adding my account id by default, ending in an incorrect image uri.

Its weird but:

  • This pulls OK: 906394416424.dkr.ecr.us-east-1.amazonaws.com/aws-for-fluent-bit:latest
  • This is 403 forbidden: 906394416424.dkr.ecr.us-east-1.amazonaws.com/aws-for-fluent-bit:2.19.0

Reading official repo seems there is a public image, but looking at its URI I dont know how I can build its ARN to be able to use:
https://github.com/aws/aws-for-fluent-bit -> public.ecr.aws/aws-observability/aws-for-fluent-bit:

So your solution works to build using another image, but I had no luck with the pull process in fargate...

Code used:

    const awsFluentRepo = ecr.Repository.fromRepositoryArn(
      this,
      'AwsFluentRepo',
      'arn:aws:ecr:us-east-1:906394416424:/aws-for-fluent-bit'
      // '906394416424.dkr.ecr.us-east-1.amazonaws.com/aws-for-fluent-bit:2.19.0'
    );

    migrationTaskDef.addFirelensLogRouter('log-router', {
      image: ecs.ContainerImage.fromEcrRepository(awsFluentRepo, '2.19.0'),
      firelensConfig: {
        type: ecs.FirelensLogRouterType.FLUENTBIT,
      },
      logging: new ecs.AwsLogDriver({ streamPrefix: 'firelens' }),
      memoryReservationMiB: 50,
    });

Something to override firelens image would be useful, especially in cases like this related to a bug. Maybe also something to use a container image using its URI instead of arn, would let me use that public ecr image

Thanks for your time all!!

@skinny85
Copy link
Contributor

skinny85 commented Sep 8, 2021

@ayozemr I think what you're looking for here is the ContainerImage.fromRegistry() method.

So something like:

    migrationTaskDef.addFirelensLogRouter('log-router', {
      image: ecs.ContainerImage.fromRegistry('public.ecr.aws/aws-observability/aws-for-fluent-bit:2.19.0'),
      // ...

@ayozemr
Copy link
Author

ayozemr commented Sep 9, 2021

Oh yes! That did it 👏 Much appreciated!! Thanks for your time!

About changing the issue to feature request, I don't know how to do it. I can close this one and create another referencing this if its ok for you

@blimmer
Copy link
Contributor

blimmer commented Sep 9, 2021

I just ran into this problem and opened #16439 to default to the stable image instead of latest.

@peterwoodworth peterwoodworth added feature-request A feature should be added or improved. feature/enhancement A new API to make things easier or more intuitive. A catch-all for general feature requests. and removed guidance Question that needs advice or information. labels Sep 9, 2021
@peterwoodworth peterwoodworth changed the title (ecs): Its possible to fix firelens container image version? (ecs): default to stable fluentbit image instead of latest Sep 9, 2021
@peterwoodworth
Copy link
Contributor

peterwoodworth commented Sep 9, 2021

Great, I've converted this into a feature request. Changing the image to default to stable seems like a fine solution to me

@madeline-k
Copy link
Contributor

Closing this issue, since the upstream issue with fluentbit has been resolved. And we are no longer planning to implement this change in the default.

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@tessro
Copy link

tessro commented Dec 7, 2022

I know this is closed as wontfix, but I figured I'd post anyway since the AWS for Fluent Bit image broke again: aws/aws-for-fluent-bit#491

This caused an outage for us that would have been avoided by using stable.

Is using bleeding-edge Fluent Bit, at increased risk of instability, really worth it? stable only tracks behind by 14 days, is this really a breaking change that needs to be avoided? It would probably help a lot of people like me who had to learn way more than they ever wanted about the internals of Fluent Bit images in order to address this issue today.

@blimmer
Copy link
Contributor

blimmer commented Dec 8, 2022

Per the comment visibility warning, cc @madeline-k and @peterwoodworth RE: @paulrosania's comment above.

My PR was closed because changing from latest to stable would technically be a breaking change. However, since it broke again, I think it's worth making the breaking change, even behind a feature flag to make people's lives easier. Having your ECS tasks fail to start up twice in two years because of problems with latest seems unacceptable IMO.

@peterwoodworth peterwoodworth reopened this Dec 9, 2022
@peterwoodworth peterwoodworth added breaking-change This issue requires a breaking change to remediate. p1 effort/small Small work item – less than a day of effort and removed feature/enhancement A new API to make things easier or more intuitive. A catch-all for general feature requests. labels Dec 9, 2022
@MrArnoldPalmer
Copy link
Contributor

We can possibly implement this behind a feature flag to avoid the breakage. Should be a fairly simple contribution if someone wants to take a crack at it.

@MrArnoldPalmer MrArnoldPalmer added p2 and removed p1 labels Jan 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-ecs Related to Amazon Elastic Container breaking-change This issue requires a breaking change to remediate. effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants