Skip to content

Commit

Permalink
[agroal#219] Provide Prometheus metrics cache
Browse files Browse the repository at this point in the history
This commit refactors the way the Prometheus metrics are served in
order to provide the capability to cache the responses for a specified
amount of time.

Two new configuration settings have been introduced:
- `metrics_cache_max_age`, expressed in seconds, defines for how long
a cached response can be served without having to compute a new one
set of metrics. This parameter is required for the caching machinery
to be enabled.
- `metrics_cache_max_size` allows the user to avoid caching of
responses greater than the expressed value in bytes.

In any case, a cache response is tied to 1MegaByte, by a value set in
the source code.

If the cache overflows the 1MB memory limit, it is immediatly marked
as invalid, that is the system is not going to serve the cached
result. However, as soon as a new Prometheus request arrives, the
cache will be re-filled from scratch, so if the current answer is
lower than 1MB and lower than the optional `metrics_cache_max_size`,
the response will be cached.
Whenever an overflow of the cache happens, the user is warned. Due to
the nature the cache is populated, the user could see more than one
single advice in logs (at level DEBUG).

The `struct prometheus` has been expanded with an inner struct, named
`struct prometheus_cache` that holds the cache payload, a lock (for
concurrency), the size of payload,
and the timestamp the cache will be valid until.

New utility functions have been added:
- `metrics_cache_append(char* data)` that appends the data to the
cache in the case it is safe to do it, that is no overflow in the
cache size. In case the data to append makes the cache to overflow,
the cache is invalidated;
- `metrics_cache_finalize(void)` set the timestamp for the cache
vlidation.
- `metrics_cache_invalidate(void)` set the cache as invalid by
zero-filling the data. Used for example when an overflow is detected.
- `metrics_cache_size_to_alloc(void)` computes the size, in bytes,
required for the cache payload allocation. The size is set to the
default, or `metrics_cache_max_size` if configured and in any case
nothing more than the max size of 1MB.
- `metrics_cache_alloc(struct prometheus* prometheus)` allocates the
memory for the cache and sets the initial values. This has to be
called when the `struct prometheus` is allocated and initialized.

The `metrics_page()` method has been refactored to check if the cache
is valid, in such case the response is served directly out of
cache. If the cache is not valid, the function gets the cache lock and
invokes several times the append function (in inner method
calls). Last, it finalizes the cache and release the lock.

If the optional configuration parameter `metrics_cache_max_size`
changes, the system does not reallocate memory, rather it issue a
warning about the need for a restart.

The documentation has been updated to reflect changes.

Close agroal#219

Modified the prometheus_cache structure to handle dynamic data.

Use dynamic cache content.

Not yet protected by the lock.

Sahred memory for the cache.

Cache locking

Uncrustify
  • Loading branch information
fluca1978 committed Jun 8, 2022
1 parent 5d14217 commit ad85093
Show file tree
Hide file tree
Showing 5 changed files with 405 additions and 36 deletions.
2 changes: 2 additions & 0 deletions doc/CONFIGURATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ See a more complete [sample](./etc/pgagroal.conf) configuration for running `pga
| port | | Int | Yes | The bind port for pgagroal |
| unix_socket_dir | | String | Yes | The Unix Domain Socket location |
| metrics | 0 | Int | No | The metrics port (disable = 0) |
| metrics_cache_max_age | 0 | String | No | The number of seconds to cache the metrics response. Can be a string including a modifier suffix, e.g., `2M` for two minutes. A value of zero disables caching. |
| metrics_cache_max_size | 262144 | String | No | An optional value, in bytes, to control the metrics cache. If a single responses is greater than this value, the cache will be temporarely disabled. Changing this parameter will change the way the cache memory is allocated. The maximum value is 1 Megabyte |
| management | 0 | Int | No | The remote management port (disable = 0) |
| log_type | console | String | No | The logging type (console, file, syslog) |
| log_level | info | String | No | The logging level (fatal, error, warn, info, debug, debug1 thru debug5). Debug level greater than 5 will be set to `debug5`. Not recognized values will make the log_level be `info` |
Expand Down
27 changes: 26 additions & 1 deletion src/include/pgagroal.h
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,26 @@ struct prometheus_connection
atomic_ullong query_count; /**< The number of queries per connection */
} __attribute__ ((aligned (64)));

/**
* A structure to handle the Prometheus response
* so that it is possible to serve the very same
* response over and over depending on the cache
* settings.
*
* The `valid_until` field stores the result
* of `time(2)`.
*
* The cache is protected by the `lock` field.
*/
struct prometheus_cache
{

time_t valid_until; /**< when the cache will become not valid */
atomic_schar lock; /**< lock to protect the cache */
size_t size; /**< size of the cache */
char* data; /**< the payload */
} __attribute__ ((aligned (64)));

/** @struct
* Defines the Prometheus metrics
*/
Expand Down Expand Up @@ -270,7 +290,10 @@ struct prometheus

atomic_ulong server_error[NUMBER_OF_SERVERS]; /**< The number of errors for a server */
atomic_ulong failed_servers; /**< The number of failed servers */
struct prometheus_connection prometheus_connections[]; /**< The number of prometheus connections (FMA) */

struct prometheus_cache cache; /**< The cache for the response */

struct prometheus_connection prometheus_connections[]; /**< The number of prometheus connections (FMA) */

} __attribute__ ((aligned (64)));

Expand All @@ -290,6 +313,8 @@ struct configuration
char host[MISC_LENGTH]; /**< The host */
int port; /**< The port */
int metrics; /**< The metrics port */
unsigned int metrics_cache_max_age; /**< Number of seconds to cache the Prometheus response */
unsigned int metrics_cache_max_size; /**< Number of bytes max to cache the Prometheus response */
int management; /**< The management port */
bool gracefully; /**< Is pgagroal in gracefully mode */

Expand Down
21 changes: 21 additions & 0 deletions src/include/prometheus.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,27 @@ extern "C" {
#include <ev.h>
#include <stdlib.h>

/*
* Value to disable the Prometheus cache,
* it is equivalent to set `metrics_cache`
* to 0 (seconds).
*/
#define PGAGROAL_PROMETHEUS_CACHE_DISABLED 0

/**
* Max size of the cache (in bytes).
* If the cache request exceeds this size
* the caching should be aborted in some way.
*/
#define PROMETHEUS_MAX_CACHE_SIZE 1048576

/**
* The default cache size in the case
* the user did not set any particular
* configuration option.
*/
#define PROMETHEUS_DEFAULT_CACHE_SIZE 262144

/**
* Create a prometheus instance
* @param fd The client descriptor
Expand Down
19 changes: 19 additions & 0 deletions src/libpgagroal/configuration.c
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
#include <security.h>
#include <shmem.h>
#include <utils.h>
#include <prometheus.h>

/* system */
#include <ctype.h>
Expand All @@ -59,6 +60,7 @@ static int as_bool(char* str, bool* b);
static int as_logging_type(char* str);
static int as_logging_level(char* str);
static int as_logging_mode(char* str);

static int as_logging_rotation_size(char* str, unsigned int* size);
static int as_logging_rotation_age(char* str, unsigned int* age);
static int as_validation(char* str);
Expand Down Expand Up @@ -265,6 +267,20 @@ pgagroal_read_configuration(void* shm, char* filename, bool emitWarnings)
unknown = true;
}
}
else if (key_in_section("metrics_cache_max_age", section, key, true, &unknown))
{
if (as_seconds(value, &config->metrics_cache_max_age, PGAGROAL_PROMETHEUS_CACHE_DISABLED))
{
unknown = true;
}
}
else if (key_in_section("metrics_cache_max_size", section, key, true, &unknown))
{
if (as_bytes(value, &config->metrics_cache_max_size, PROMETHEUS_DEFAULT_CACHE_SIZE))
{
unknown = true;
}
}
else if (key_in_section("management", section, key, true, &unknown))
{
if (as_int(value, &config->management))
Expand Down Expand Up @@ -2393,6 +2409,8 @@ transfer_configuration(struct configuration* config, struct configuration* reloa
memcpy(config->host, reload->host, MISC_LENGTH);
config->port = reload->port;
config->metrics = reload->metrics;
config->metrics_cache_max_age = reload->metrics_cache_max_age;
restart_int("metrics_cache_max_size", config->metrics_cache_max_size, reload->metrics_cache_max_size);
config->management = reload->management;
/* gracefully */

Expand Down Expand Up @@ -2741,6 +2759,7 @@ as_logging_rotation_size(char* str, unsigned int* size)
* The default is expressed in seconds.
* The function sets the number of rotationg age as minutes.
* Returns 1 for errors, 0 for correct parsing.
*
*/
static int
as_logging_rotation_age(char* str, unsigned int* age)
Expand Down
Loading

0 comments on commit ad85093

Please sign in to comment.