Skip to content

Commit

Permalink
Serve cache on transient errors (#34)
Browse files Browse the repository at this point in the history
STSGetCallerIdentity, DescribeSecret and GetSecretValue requests may
fail because of common network errors like Sdkerror::Timeout and
server-side errors like Sdkerror::ServiceError<Box, HttpResponse>. This
cr adds a new configurable parameter ignore_transient_errors. With that
enabled, the agent will return the cached secret when running into
common transient errors like the above.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: Kai Zhu <kaizuu@amazon.com>
Co-authored-by: Simon Marty <martysi@amazon.com>
Co-authored-by: Simon Marty <simon.marty0@gmail.com>
  • Loading branch information
4 people authored Oct 25, 2024
1 parent c604b4b commit d753a25
Show file tree
Hide file tree
Showing 15 changed files with 408 additions and 307 deletions.
300 changes: 113 additions & 187 deletions Cargo.lock

Large diffs are not rendered by default.

20 changes: 12 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,8 @@ Based on the type of compute, you have several options for installing the Secret

**To install the Secrets Manager Agent**

1. Use the `install` script provided in the repository\.
1. `cd aws_secretsmanager_agent/configuration`
1. Run the `install` script provided in the repository\.

The script generates a random SSRF token on startup and stores it in the file `/var/run/awssmatoken`\. The token is readable by the `awssmatokenreader` group that the install script creates\.

Expand Down Expand Up @@ -178,10 +179,13 @@ You can run the Secrets Manager Agent as a sidecar container alongside your appl
1. Create a Dockerfile for your client application\.

1. Create a Docker Compose file to run both containers, being sure that they use the same network interface\. This is necessary because the Secrets Manager Agent does not accept requests from outside the localhost interface\. The following example shows a Docker Compose file where the `network_mode` key attaches the `secrets-manager-agent` container to the network namespace of the `client-application` container, which allows them to share the same network interface\.
**Important**
You must load AWS credentials and the SSRF token for the application to be able to use the Secrets Manager Agent\. See the following:
[Manage access](https://docs.aws.amazon.com/eks/latest/userguide/cluster-auth.html) in the *Amazon Elastic Kubernetes Service User Guide*
[Amazon ECS task IAM role](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html) in the *Amazon Elastic Container Service Developer Guide*

**Important**

You must load AWS credentials and the SSRF token for the application to be able to use the Secrets Manager Agent\. For EKS and ECS, see the following:
* [Manage access](https://docs.aws.amazon.com/eks/latest/userguide/cluster-auth.html) in the *Amazon Elastic Kubernetes Service User Guide*
* [Amazon ECS task IAM role](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html) in the *Amazon Elastic Container Service Developer Guide*


```yaml
version: '3'
Expand Down Expand Up @@ -223,7 +227,7 @@ The following instructions show how to get a secret named *MyTest* by using the

**To create a Lambda extension that packages the Secrets Manager Agent**

1. Create a Python Lambda function that queries `http://localhost:2773/secretsmanager/get?secretId=MyTest` to get the secret\. Be sure to implement retry logic in your application code to accommodate delays in initialization and registration of the Lambda extension\.
1. Create a Python Lambda function that reads the SSRF token from environment variable `AWS_TOKEN`, and queries `http://localhost:2773/secretsmanager/get?secretId=MyTest` to get the secret. Be sure to specify environment variable `AWS_TOKEN` for your lambda, and additionally, implement retry logic in your application code to accommodate delays in initialization and registration of the Lambda extension\.

1. From the root of the Secrets Manager Agent code package, run the following commands to test the Lambda extension\.

Expand Down Expand Up @@ -285,7 +289,7 @@ The following curl example shows how to get a secret from the Secrets Manager Ag
```sh
curl -v -H \
"X-Aws-Parameters-Secrets-Token: $(</var/run/awssmatoken)" \
'http://localhost:2773/secretsmanager/get?secretId=<YOUR_SECRET_ID>}'; \
'http://localhost:2773/secretsmanager/get?secretId=<YOUR_SECRET_ID>'; \
echo
```

Expand All @@ -301,7 +305,7 @@ import json
# Function that fetches the secret from Secrets Manager Agent for the provided secret id.
def get_secret():
# Construct the URL for the GET request
url = f"http://localhost:2773/secretsmanager/get?secretId=<YOUR_SECRET_ID>}"
url = f"http://localhost:2773/secretsmanager/get?secretId=<YOUR_SECRET_ID>"

# Get the SSRF token from the token file
with open('/var/run/awssmatoken') as fp:
Expand Down
15 changes: 8 additions & 7 deletions aws_secretsmanager_agent/configuration/install
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@

PATH=/bin:/usr/bin:/sbin:/usr/sbin # Use a safe path

AGENTDIR=/opt/aws/secretsmanageragent
AGENTTARGETDIR=/opt/aws/secretsmanageragent
AGENTSOURCEDIR=../../target/release
AGENTBIN=aws_secretsmanager_agent
TOKENGROUP=awssmatokenreader
AGENTUSER=awssmauser
Expand All @@ -21,18 +22,18 @@ if [ ! -r ${TOKENSCRIPT} ]; then
exit 1
fi

if [ ! -r ${AGENTBIN} ]; then
if [ ! -r ${AGENTSOURCEDIR}/${AGENTBIN} ]; then
echo "Can not read ${AGENTBIN}" >&2
exit 1
fi

groupadd -f ${TOKENGROUP}
useradd -r -m -g ${TOKENGROUP} -d ${AGENTDIR} ${AGENTUSER} || true
chmod 755 ${AGENTDIR}
useradd -r -m -g ${TOKENGROUP} -d ${AGENTTARGETDIR} ${AGENTUSER} || true
chmod 755 ${AGENTTARGETDIR}

install -D -T -m 755 ${AGENTBIN} ${AGENTDIR}/bin/${AGENTBIN}
install -D -T -m 755 ${TOKENSCRIPT} ${AGENTDIR}/bin/${TOKENSCRIPT}
chown -R ${AGENTUSER} ${AGENTDIR}
install -D -T -m 755 ${AGENTSOURCEDIR}/${AGENTBIN} ${AGENTTARGETDIR}/bin/${AGENTBIN}
install -D -T -m 755 ${TOKENSCRIPT} ${AGENTTARGETDIR}/bin/${TOKENSCRIPT}
chown -R ${AGENTUSER} ${AGENTTARGETDIR}
install -T -m 755 ${TOKENSCRIPT}.service ${SYSTEMDFILES}/${TOKENSCRIPT}.service
install -T -m 755 ${AGENTSCRIPT}.service ${SYSTEMDFILES}/${AGENTSCRIPT}.service

Expand Down
1 change: 1 addition & 0 deletions aws_secretsmanager_agent/src/cache_manager.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ impl CacheManager {
asm_client(cfg).await?,
cfg.cache_size(),
cfg.ttl(),
cfg.ignore_transient_errors(),
)?))
}

Expand Down
23 changes: 21 additions & 2 deletions aws_secretsmanager_agent/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ const DEFAULT_SSRF_ENV_VARIABLES: [&str; 3] = [
"AWS_CONTAINER_AUTHORIZATION_TOKEN",
];
const DEFAULT_PATH_PREFIX: &str = "/v1/";
const DEFAULT_IGNORE_TRANSIENT_ERRORS: bool = true;

const DEFAULT_REGION: Option<String> = None;

Expand All @@ -39,6 +40,7 @@ struct ConfigFile {
path_prefix: String,
max_conn: String,
region: Option<String>,
ignore_transient_errors: bool,
}

/// The log levels supported by the daemon.
Expand Down Expand Up @@ -97,6 +99,9 @@ pub struct Config {

/// The AWS Region that will be used to send the Secrets Manager request to.
region: Option<String>,

/// Whether the agent should serve cached data on transient refresh errors
ignore_transient_errors: bool,
}

/// The default configuration options.
Expand Down Expand Up @@ -138,7 +143,8 @@ impl Config {
)?
.set_default("path_prefix", DEFAULT_PATH_PREFIX)?
.set_default("max_conn", DEFAULT_MAX_CONNECTIONS)?
.set_default("region", DEFAULT_REGION)?;
.set_default("region", DEFAULT_REGION)?
.set_default("ignore_transient_errors", DEFAULT_IGNORE_TRANSIENT_ERRORS)?;

// Merge the config overrides onto the default configurations, if provided.
config = match file_path {
Expand Down Expand Up @@ -232,6 +238,15 @@ impl Config {
self.region.as_ref()
}

/// Whether the client should serve cached data on transient refresh errors
///
/// # Returns
///
/// * `ignore_transient_errors` - Whether the client should serve cached data on transient refresh errors. Defaults to "true"
pub fn ignore_transient_errors(&self) -> bool {
self.ignore_transient_errors
}

/// Private helper that fills in the Config instance from the specified
/// config overrides (or defaults).
///
Expand Down Expand Up @@ -279,6 +294,7 @@ impl Config {
None,
)?,
region: config_file.region,
ignore_transient_errors: config_file.ignore_transient_errors,
};

// Additional validations.
Expand Down Expand Up @@ -349,7 +365,7 @@ mod tests {
use super::*;
use std::collections::HashMap;

/// Test helper function that returns the a ConfigFile with default values.
/// Test helper function that returns a ConfigFile with default values.
fn get_default_config_file() -> ConfigFile {
ConfigFile {
log_level: String::from(DEFAULT_LOG_LEVEL),
Expand All @@ -361,6 +377,7 @@ mod tests {
path_prefix: String::from(DEFAULT_PATH_PREFIX),
max_conn: String::from(DEFAULT_MAX_CONNECTIONS),
region: None,
ignore_transient_errors: DEFAULT_IGNORE_TRANSIENT_ERRORS,
}
}

Expand All @@ -386,6 +403,7 @@ mod tests {
assert_eq!(config.clone().path_prefix(), DEFAULT_PATH_PREFIX);
assert_eq!(config.clone().max_conn(), 800);
assert_eq!(config.clone().region(), None);
assert_eq!(config.ignore_transient_errors(), true);
}

/// Tests the config overrides are applied correctly from the provided config file.
Expand All @@ -410,6 +428,7 @@ mod tests {
assert_eq!(config.clone().path_prefix(), "/other");
assert_eq!(config.clone().max_conn(), 10);
assert_eq!(config.clone().region(), Some(&"us-west-2".to_string()));
assert_eq!(config.ignore_transient_errors(), false);
}

/// Tests that an Err is returned when an invalid value is provided in one of the configurations.
Expand Down
3 changes: 1 addition & 2 deletions aws_secretsmanager_agent/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,9 @@ fn forever() -> bool {
///
/// # Arguments
///
/// * `addr` - The socket address on which the daemon is listening.
/// * `args` - The command line arguments.
/// * `report` - A call back used to report startup and the listener port.
/// * `end` - A call back used to signal shut down.
///
/// # Returns
///
/// * `Ok(())` - Never retuned when started by the main entry point.
Expand Down
7 changes: 6 additions & 1 deletion aws_secretsmanager_agent/src/utils.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ use aws_sdk_secretsmanager::config::interceptors::BeforeTransmitInterceptorConte
use aws_sdk_secretsmanager::config::{ConfigBag, Intercept, RuntimeComponents};
#[cfg(not(test))]
use aws_sdk_secretsmanager::Client as SecretsManagerClient;
use aws_secretsmanager_caching::error::is_transient_error;
use std::env::VarError;
use std::fs;
use std::time::Duration;
Expand Down Expand Up @@ -136,7 +137,11 @@ pub async fn validate_and_create_asm_client(

// Validate the region and credentials first
let sts_client = aws_sdk_sts::Client::from_conf(sts_builder.build());
let _ = sts_client.get_caller_identity().send().await?;
match sts_client.get_caller_identity().send().await {
Ok(_) => (),
Err(e) if config.ignore_transient_errors() && is_transient_error(&e) => (),
Err(e) => Err(e)?,
};

Ok(aws_sdk_secretsmanager::Client::from_conf(
asm_builder.build(),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ ssrf_env_variables = ["MY_TOKEN"]
path_prefix = "/other"
# checking that number with no quotes work.
ttl_seconds = 300
# checking that numbe with single quote works
# checking that number with single quote works
cache_size = '1000'
max_conn = 10
region = "us-west-2"

ignore_transient_errors = false
3 changes: 2 additions & 1 deletion aws_secretsmanager_caching/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ readme = "README.md"

[dependencies]
aws-sdk-secretsmanager = "1"
aws-smithy-runtime-api = "1"
aws-smithy-types = "1"
serde_json = "1"
serde_with = "3"
Expand All @@ -21,7 +22,7 @@ aws-config = "1"

[dev-dependencies]
aws-smithy-mocks-experimental = "0"
aws-smithy-runtime = { version = "1", features = ["test-util"] }
aws-smithy-runtime = { version = "1", features = ["test-util", "wire-mock"] }
aws-sdk-secretsmanager = { version = "1", features = ["test-util"] }
tokio = { version = "1", features = ["macros", "rt", "sync", "test-util"] }
http = "0"
Expand Down
1 change: 1 addition & 0 deletions aws_secretsmanager_caching/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ let client = match SecretsManagerCachingClient::from_builder(
asm_builder,
NonZeroUsize::new(1000).unwrap(),
Duration::from_secs(300),
false
)
.await
{
Expand Down
22 changes: 22 additions & 0 deletions aws_secretsmanager_caching/src/error.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
use aws_smithy_runtime_api::client::{orchestrator::HttpResponse, result::SdkError};

/// Helper function to determine transient errors. Transient errors include any timeout error,
/// unparseable response error, dispatch error due to timeout or IO, and 5xx server-side error.
///
/// # Arguments
/// * `e` - An SDK error
///
/// # Returns
/// * true if transient error, false if not
pub fn is_transient_error<S>(e: &SdkError<S, HttpResponse>) -> bool
where
S: std::error::Error + 'static,
{
match e {
SdkError::TimeoutError(_) => true,
SdkError::ResponseError(_) => true,
SdkError::DispatchFailure(derr) if derr.is_timeout() || derr.is_io() => true,
SdkError::ServiceError(serr) if serr.raw().status().is_server_error() => true,
_ => false,
}
}
Loading

0 comments on commit d753a25

Please sign in to comment.