Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update PostgreSQL integration to support logs in CSV format #747

Merged
merged 33 commits into from
Mar 22, 2021
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
9b6ddeb
Add docs for CSV
jsoriano Feb 25, 2021
5f11c27
Import from beats, and some grooming
jsoriano Feb 25, 2021
1e82c8e
Fix changelog
jsoriano Feb 25, 2021
457e230
Remove release tags
jsoriano Feb 25, 2021
cc7d5ab
elastic-package build
jsoriano Feb 25, 2021
0d35d36
Remove aliases
jsoriano Feb 25, 2021
c41f421
Import dashboards with Kibana 7.10
jsoriano Mar 2, 2021
efc85f5
Add the test files
jsoriano Mar 2, 2021
9d3ac0d
Add config
jsoriano Mar 2, 2021
cb829e1
Fix multiline
jsoriano Mar 2, 2021
55b5deb
Remove numeric_keyword_fields
jsoriano Mar 3, 2021
62872ff
Unify configs
jsoriano Mar 3, 2021
0469efa
Update expected files
jsoriano Mar 3, 2021
f4f4a82
Fix changelog
jsoriano Mar 3, 2021
3e69db5
Fix format
jsoriano Mar 3, 2021
8c43f8b
Explicitly convert fields to their expected types
jsoriano Mar 3, 2021
5c87bc4
Revert changes made by importBeats in metrics manifests
jsoriano Mar 3, 2021
aede54d
Remove empty temp object
jsoriano Mar 3, 2021
e4d1653
Rename test files according to the spec
jsoriano Mar 3, 2021
de7eb6d
Merge remote-tracking branch 'origin/master' into postgresql-7-12
jsoriano Mar 4, 2021
82d52dc
Add event fields from ECS
jsoriano Mar 4, 2021
8730f7b
Add docker deploy files
jsoriano Mar 4, 2021
a47b1a3
Merge remote-tracking branch 'origin/master' into postgresql-7-12
jsoriano Mar 5, 2021
6046262
Merge remote-tracking branch 'origin/master' into postgresql-7-12
jsoriano Mar 9, 2021
77a34a0
Merge remote-tracking branch 'origin/master' into postgresql-7-12
jsoriano Mar 10, 2021
a77033b
Rephrase data stream title
jsoriano Mar 11, 2021
e31ee6b
Increase target stack
jsoriano Mar 11, 2021
77a8ad1
Merge remote-tracking branch 'origin/master' into postgresql-7-12
jsoriano Mar 11, 2021
ed898f7
Add error.message ECS field
jsoriano Mar 11, 2021
111a53e
Add missing period to the statement metricset
jsoriano Mar 11, 2021
d917539
Adjust logging options
jsoriano Mar 11, 2021
1b61205
Update dashboard screenshots
jsoriano Mar 11, 2021
3eaee22
Merge remote-tracking branch 'origin/master' into postgresql-7-12
jsoriano Mar 16, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 28 additions & 3 deletions packages/postgresql/_dev/build/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,40 @@ This integration periodically fetches logs and metrics from [PostgreSQL](https:/

## Compatibility

The `log` dataset was tested with logs from versions 9.5 on Ubuntu, 9.6 on Debian, and finally 10.11, 11.4 and 12.2 on Arch Linux 9.3.
The `log` dataset was tested with logs from versions 9.5 on Ubuntu, 9.6 on Debian, and finally 10.11, 11.4 and 12.2 on Arch Linux 9.3. CSV format was tested using versions 11 and 13 (distro is not relevant here).

The `activity`, `bgwriter`, `database` and `statement` datasets were tested with PostgreSQL 9.5.3 and is expected to work with all versions >= 9.

## Logs

### log

The `log` dataset collects the PostgreSQL logs.
The `log` dataset collects the PostgreSQL logs in plain text format or CSV.

#### Using CSV logs

Since the PostgreSQL CSV log file is a well-defined format,
there is almost no configuration to be done in Fleet, just the filepath.

On the other hand, it's necessary to configure PostgreSQL to emit `.csv` logs.

The recommended parameters are:
```
logging_collector = 'on';
log_destination = 'csvlog';
log_statement = 'none';
log_checkpoints = on;
log_connections = on;
log_disconnections = on;
log_lock_waits = on;
log_min_duration_statement = 0;
```

In busy servers, `log_min_duration_statement` can cause contention, so you can assign
a value greater than 0.

Both `log_connections` and `log_disconnections` can cause a lot of events if you don't have
persistent connections, so enable with care.

{{fields "log"}}

Expand Down Expand Up @@ -48,4 +73,4 @@ The `statement` dataset periodically fetches metrics from PostgreSQL servers.

{{event "statement"}}

{{fields "statement"}}
{{fields "statement"}}
4 changes: 4 additions & 0 deletions packages/postgresql/_dev/deploy/docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
ARG SERVICE_VERSION=${SERVICE_VERSION:-9.5.3}
FROM postgres:${SERVICE_VERSION}
COPY docker-entrypoint-initdb.d /docker-entrypoint-initdb.d
HEALTHCHECK --interval=10s --retries=6 CMD psql -h localhost -U postgres -l
11 changes: 11 additions & 0 deletions packages/postgresql/_dev/deploy/docker/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
version: '2.3'
services:
postgresql:
# Commented out `image:` below until we have a process to refresh the hosted images from
# Dockerfiles in this repo. Until then, we build the image locally using `build:` below.
# image: docker.elastic.co/integrations-ci/beats-postgresql:${POSTGRESQL_VERSION:-9.5.3}-1
build: .
ports:
- 5432
volumes:
- ${SERVICE_LOGS_DIR}/postgresql:/var/log/postgresql
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/usr/bin/env bash
chmod a+wx /var/log/postgresql

cat <<-EOF >> $PGDATA/postgresql.conf
# Enable some log facilities.
log_statement = 'all'
log_duration = 'on'
log_connections = 'on'
log_disconnections = 'on'

# Give agent read permissions. In NO case for production usage.
log_file_mode = '0666'

# Try to imitate logging behaviour in Debian/Ubuntu, but there the logging collector
# is not used.
logging_collector = 'on'
log_directory = '/var/log/postgresql'
log_line_prefix = '%m [%p] %q%u@%d '
EOF
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env bash
cat <<-EOF >> $PGDATA/postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all
EOF
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
create extension pg_stat_statements;
4 changes: 4 additions & 0 deletions packages/postgresql/_dev/deploy/variants.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
variants:
v9_5_3:
SERVICE_VERSION: 9.5.3
default: v9_5_3
5 changes: 5 additions & 0 deletions packages/postgresql/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "0.3.0"
changes:
- description: Add support for logs in CSV format
type: enhancement # can be one of: enhancement, bugfix, breaking-change
link: https://github.com/elastic/integrations/pull/747
- version: "0.2.7"
changes:
- description: Updating package owner
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
vars:
hosts:
- postgres://{{Hostname}}:{{Port}}?sslmode=disable
username: postgres
password: postgres
data_stream:
vars: ~
51 changes: 51 additions & 0 deletions packages/postgresql/data_stream/activity/fields/ecs.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,54 @@
- name: ecs
jsoriano marked this conversation as resolved.
Show resolved Hide resolved
title: ECS
group: 2
description: Meta-information specific to ECS.
type: group
fields:
- name: version
level: core
required: true
type: keyword
ignore_above: 1024
description: 'ECS version this event conforms to. `ecs.version` is a required field and must exist in all events.

When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events.'
example: 1.0.0
- name: event
title: Event
group: 2
description: 'The event fields are used for context information about the log or metric event itself.

A log is defined as an event containing details of something that happened. Log events must include the time at which the thing happened. Examples of log events include a process starting on a host, a network packet being sent from a source to a destination, or a network connection between a client and a server being initiated or closed. A metric is defined as an event containing one or more numerical measurements and the time at which the measurement was taken. Examples of metric events include memory pressure measured on a host and device temperature. See the `event.kind` definition in this section for additional details about metric and state events.'
type: group
fields:
- name: dataset
level: core
type: keyword
ignore_above: 1024
description: 'Name of the dataset.

If an event source publishes more than one type of log or events (e.g. access log, error log), the dataset is used to specify which one the event comes from.

It''s recommended but not required to start the dataset name with the module name, followed by a dot, then the dataset name.'
example: apache.access
- name: duration
level: core
type: long
format: duration
input_format: nanoseconds
output_format: asMilliseconds
output_precision: 1
description: 'Duration of the event in nanoseconds.

If event.start and event.end are known this value should be the difference between the end and start time.'
- name: module
level: core
type: keyword
ignore_above: 1024
description: 'Name of the module this data is coming from.

If your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module.'
example: apache
- name: service.address
type: keyword
description: Service address
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
vars:
hosts:
- postgres://{{Hostname}}:{{Port}}?sslmode=disable
username: postgres
password: postgres
data_stream:
vars: ~
51 changes: 51 additions & 0 deletions packages/postgresql/data_stream/bgwriter/fields/ecs.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,54 @@
- name: ecs
title: ECS
group: 2
description: Meta-information specific to ECS.
type: group
fields:
- name: version
level: core
required: true
type: keyword
ignore_above: 1024
description: 'ECS version this event conforms to. `ecs.version` is a required field and must exist in all events.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need the quotes here for all descriptions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are multi-lined descriptions they need to be between quotes, or with yaml blocks started with | or |-.
I think elastic-package format fixed some of them, but I see we have all the options along these fields, so not sure which one to use, I would leave it as is by now.


When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events.'
example: 1.0.0
- name: event
title: Event
group: 2
description: 'The event fields are used for context information about the log or metric event itself.

A log is defined as an event containing details of something that happened. Log events must include the time at which the thing happened. Examples of log events include a process starting on a host, a network packet being sent from a source to a destination, or a network connection between a client and a server being initiated or closed. A metric is defined as an event containing one or more numerical measurements and the time at which the measurement was taken. Examples of metric events include memory pressure measured on a host and device temperature. See the `event.kind` definition in this section for additional details about metric and state events.'
type: group
fields:
- name: dataset
level: core
type: keyword
ignore_above: 1024
description: 'Name of the dataset.

If an event source publishes more than one type of log or events (e.g. access log, error log), the dataset is used to specify which one the event comes from.

It''s recommended but not required to start the dataset name with the module name, followed by a dot, then the dataset name.'
example: apache.access
- name: duration
level: core
type: long
format: duration
input_format: nanoseconds
output_format: asMilliseconds
output_precision: 1
description: 'Duration of the event in nanoseconds.

If event.start and event.end are known this value should be the difference between the end and start time.'
- name: module
level: core
type: keyword
ignore_above: 1024
description: 'Name of the module this data is coming from.

If your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module.'
example: apache
- name: service.address
type: keyword
description: Service address
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
vars:
hosts:
- postgres://{{Hostname}}:{{Port}}?sslmode=disable
username: postgres
password: postgres
data_stream:
vars: ~
15 changes: 15 additions & 0 deletions packages/postgresql/data_stream/database/fields/ecs.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
- name: ecs
title: ECS
group: 2
description: Meta-information specific to ECS.
type: group
fields:
- name: version
level: core
required: true
type: keyword
ignore_above: 1024
description: 'ECS version this event conforms to. `ecs.version` is a required field and must exist in all events.

When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events.'
example: 1.0.0
- name: service.address
type: keyword
description: Service address
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
2020-04-15 12:02:55.244 CEST [23922] LOG: database system was shut down at 2020-04-15 12:02:52 CEST
2020-04-15 12:02:55.247 CEST [23920] LOG: database system is ready to accept connections
2020-04-15 12:04:45.416 CEST [24981] FATAL: password authentication failed for user "root"
2020-04-15 12:04:45.416 CEST [24981] DETAIL: Role "root" does not exist.
Connection matched pg_hba.conf line 80: "local all all md5"
2020-04-15 12:04:45.416 CEST [24981] LOG: could not send data to client: Broken pipe
2020-04-15 12:06:36.719 CEST [25143] ERROR: syntax error at or near "l" at character 1
2020-04-15 12:56:29.569 CEST [25143] STATEMENT: SELECT al.id, al.tenant_id, al.created_by_id, al.create_ip, al.audit_date, al.audit_table, al.entity_id, al.entity_name, al.reason_for_change, al.audit_log_event_type_id,
aet.lookup_code, al.old_value, al.new_value, al.event_crf_id, al.event_crf_version_id, al.study_id, al.study_site_id, ss.rc_oid, al.subject_id, s.unique_identifier,
al.study_event_id, sed.name AS studyEventName, al.user_id, al.value_index, al.crf_version_id, al.global_logs, cv.version_name, crf.id AS crfId, crf.name AS crfName
FROM public.rc_audit_log_events AS al
LEFT JOIN rc_crf_versions AS cv ON cv.id=al.crf_version_id
LEFT JOIN rc_crfs AS crf ON crf.id=cv.crf_id
LEFT JOIN ad_lookup_codes AS aet ON aet.id=al.audit_log_event_type_id
LEFT JOIN rc_study_sites AS ss ON ss.id=al.study_site_id
LEFT JOIN rc_subjects AS s ON s.id=al.subject_id
LEFT JOIN rc_study_events AS se ON se.id=al.study_event_id
LEFT JOIN rc_study_event_definitions AS sed ON sed.id=se.study_event_definition_id
WHERE al.tenant_id=$1 AND al.study_id=$2 AND aet.lookup_code IN ($3, $4, $5, $6) AND al.audit_date >= $7 ORDER BY al.id DESC limit $8
;
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
dynamic_fields:
event.ingested: ".*"
multiline:
first_line_pattern: '^\d{4}-\d{2}-\d{2} '
Loading