Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-logs: race condition with AWS Lambda default log groups #32689

Open
1 task
garysassano opened this issue Dec 30, 2024 · 5 comments
Open
1 task

aws-logs: race condition with AWS Lambda default log groups #32689

garysassano opened this issue Dec 30, 2024 · 5 comments
Labels
@aws-cdk/aws-logs Related to Amazon CloudWatch Logs bug This issue is a bug. effort/small Small work item – less than a day of effort p2

Comments

@garysassano
Copy link

Describe the bug

my-stack.ts

const lambdaConfigs = [
  { lambda: backendLambda, name: "Backend" },
  { lambda: frontendLambda, name: "Frontend" },
  { lambda: clientNodeLambda, name: "ClientNode" },
  { lambda: clientPythonLambda, name: "ClientPython" },
  { lambda: clientRustLambda, name: "ClientRust" },
];

lambdaConfigs.forEach(({ lambda, name }) => {
  const logGroup = new LogGroup(this, `${name}LambdaLogGroup`, {
    logGroupName: `/aws/lambda/${lambda.functionName}`,
    retention: RetentionDays.ONE_WEEK,
    removalPolicy: RemovalPolicy.DESTROY,
  });
});

In this scenario, all client functions are triggered by the EventBridge Scheduler every minute. During stack destruction via cdk destroy, there is a possibility that one of the Lambda functions could be invoked simultaneously. This results in the recreation of the default log group at /aws/lambda/{function-name}, even if it had just been deleted a few seconds earlier.

Regression Issue

  • Select this option if this issue appears to be a regression.

Last Known Working CDK Version

No response

Expected Behavior

CDK should guarantee that AWS Lambda log groups are always deleted last to avoid race conditions.

Current Behavior

image

image

One of the Log Groups seemed to survive the stack destruction despite having RemovalPolicy.DESTROY set. However, upon closer inspection, you can notice the retention period is set to Never expire, whereas in our stack we defined RetentionDays.ONE_WEEK. And that's because while the CDK was destroying the log group defined in our stack, the Lambda function got invoked, thus creating a new log group not managed by the CDK and with the default retention policy, which is Never expire.

Reproduction Steps

see above.

Possible Solution

One possible workaround is to create the log groups first, and assign them to each Lambda function directly at creation time.

const logGroupConfigs = [
  { name: "Backend", functionName: "backend-lambda" },
  { name: "Frontend", functionName: "frontend-lambda" },
  { name: "ClientNode", functionName: "client-node-lambda" },
  { name: "ClientPython", functionName: "client-python-lambda" },
  { name: "ClientRust", functionName: "client-rust-lambda" },
];

const logGroups = Object.fromEntries(
  logGroupConfigs.map(({ name, functionName }) => [
    name,
    new LogGroup(this, `${name}LambdaLogGroup`, {
      logGroupName: `/aws/lambda/${functionName}`,
      retention: RetentionDays.ONE_WEEK,
      removalPolicy: RemovalPolicy.DESTROY,
    }),
  ]),
);

const clientNodeLambda = new NodejsFunction(this, "ClientNodeLambda", {
  entry: join(__dirname, "..", "functions/client/node", "index.ts"),
  runtime: Runtime.NODEJS_22_X,
  architecture: Architecture.ARM_64,
  memorySize: 1024,
  timeout: Duration.minutes(1),
  loggingFormat: LoggingFormat.JSON,
  logGroup: logGroups.ClientNode,
});

However, this approach introduces additional complexity and makes the code less clean.

A more convenient solution would be to introduce a Function.addLogGroup() method, enabling users to link a log group to a Lambda function as a dependency within the same AWS CDK stack but in a different section of the code.

feature.ts

// Define the lambdas and their configurations
const lambdaConfigs = [
  { lambda: backendLambda, name: "Backend" },
  { lambda: frontendLambda, name: "Frontend" },
  { lambda: clientNodeLambda, name: "ClientNode" },
  { lambda: clientPythonLambda, name: "ClientPython" },
  { lambda: clientRustLambda, name: "ClientRust" },
];

// Create forwarder subscription for each lambda
lambdaConfigs.forEach(({ lambda, name }) => {
  const logGroup = lambda.addLogGroup(
      new LogGroup(this, `${name}LambdaLogGroup`, {
      logGroupName: `/aws/lambda/${lambda.functionName}`,
      retention: RetentionDays.ONE_WEEK,
      removalPolicy: RemovalPolicy.DESTROY,
    }),
  );

  new SubscriptionFilter(this, `${name}LambdaSubscription`, {
    logGroup,
    destination: new LambdaDestination(forwarderLambda),
    filterPattern: FilterPattern.literal("{ $.__otel_otlp_stdout = * }"),
  });
});

Additional Information/Context

No response

CDK CLI Version

2.173.4

Framework Version

No response

Node.js Version

22.12.0

OS

Ubuntu 24.04.1

Language

TypeScript

Language Version

No response

Other information

No response

@garysassano garysassano added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 30, 2024
@github-actions github-actions bot added the @aws-cdk/aws-logs Related to Amazon CloudWatch Logs label Dec 30, 2024
@khushail khushail added needs-reproduction This issue needs reproduction. and removed needs-triage This issue or PR still needs to be triaged. labels Dec 30, 2024
@khushail khushail self-assigned this Dec 30, 2024
@khushail khushail added the p2 label Dec 30, 2024
@khushail
Copy link
Contributor

@garysassano , I think the retentionperiod you are talking about which is set to never expire happens by default when its created by custom resources as mentioned here -

This is implemented using a [CloudFormation custom resource](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-cfn-customresource.html) which pre-creates the log group if it doesn't exist, and sets the specified log retention period (never expire, by default).

@khushail khushail added investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-reproduction This issue needs reproduction. labels Dec 31, 2024
@garysassano
Copy link
Author

@khushail I'm a bit confused. Where do you see that I'm using the LogRetention construct? When the stack is deployed, all log groups are created with a 1-week retention period as specified in the above stack. However, during stack deletion, it's possible for a log group to be removed while a Lambda function is still being invoked. In such cases, the log group may be deleted and then automatically recreated by the Lambda function. Since this new log group wouldn't be managed by the CDK, it wouldn't inherit the retention settings defined in the stack definition.

@Dreamescaper
Copy link

Something similar to this: aws/serverless-application-model#1216 (comment).
I have a workaround to explicitly disallow lambdas to create log groups:

        myLambda.AddToRolePolicy(new PolicyStatement(new PolicyStatementProps
        {
            Effect = Effect.DENY,
            Actions = ["logs:CreateLogGroup"],
            Resources = ["*"]
        }));

@garysassano
Copy link
Author

@Dreamescaper That's actually a smart solution.

@khushail
Copy link
Contributor

khushail commented Jan 2, 2025

@khushail I'm a bit confused. Where do you see that I'm using the LogRetention construct? When the stack is deployed, all log groups are created with a 1-week retention period as specified in the above stack. However, during stack deletion, it's possible for a log group to be removed while a Lambda function is still being invoked. In such cases, the log group may be deleted and then automatically recreated by the Lambda function. Since this new log group wouldn't be managed by the CDK, it wouldn't inherit the retention settings defined in the stack definition.

Apologies for the confusion here. By retentionPeriod, I referred to this statement of yours,for more clarification around the scenario-

One of the Log Groups seemed to survive the stack destruction despite having RemovalPolicy.DESTROY set. However, upon closer inspection, you can notice **the retention period is set to Never expire**

However I agree in some cases, this might lead to race condition and recreating the log group when its not desired.
Since there is already an alternate suggested by @Dreamescaper (Big thanks !), I would be marking this as P2 which means it would not be immediately addressed by the team, but would be on their radar and contributions are also welcome from the community.

@khushail khushail added effort/small Small work item – less than a day of effort and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Jan 2, 2025
@khushail khushail removed their assignment Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-logs Related to Amazon CloudWatch Logs bug This issue is a bug. effort/small Small work item – less than a day of effort p2
Projects
None yet
Development

No branches or pull requests

3 participants