From 1af87d1e7651ed0e747e42062603dcb5433f9914 Mon Sep 17 00:00:00 2001 From: Shaunak Kashyap Date: Wed, 20 Mar 2024 18:42:27 -0700 Subject: [PATCH 01/17] Add documentation about filelog receiver offset tracking file --- receiver/filelogreceiver/README.md | 122 +++++++++++++++++++++++++++++ 1 file changed, 122 insertions(+) diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index 4ee5d2820e51..261fb035f1c2 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -183,3 +183,125 @@ Exception in thread 2 "main" java.lang.NullPointerException While the storage parameter can ensure that log files are consumed accurately, it is possible that logs are dropped while moving downstream through other components in the collector. For additional resiliency, see [Fault tolerant log collection example](../../examples/fault-tolerant-logs-collection/README.md) + +### Debugging + +Sometimes, it's useful to take a peek at the `storage` file in which offsets are stored. At the moment, +the simplest way to do this is by printing out the contents of this file using the `strings` utility. + +Consider a collector pipeline that's using the `filelog` receiver with the `storage` extension as shown +below. Note that [compaction](../../extension/storage/filestorage/README.md#compaction) is not being used. + +```yaml +receivers: + filelog: + include: /tmp/*.log + storage: file_storage/filelogreceiver + +exporters: + ... + +extensions: + file_storage/filelogreceiver: + directory: /tmp/otelcol/file_storage/filelogreceiver + +service: + extensions: [file_storage/filelogreceiver] + pipelines: + logs: + receivers: [filelog] + exporters: [...] +``` + +Assume there are no log files matching `/tmp/*.log` when the above collector pipeline starts executing. In this +scenario, the `/tmp/otelcol/file_storage/filelogreceiver` directory contains one file: + +``` +$ ls /tmp/otelcol/file_storage/filelogreceiver +receiver_filelog_ +``` + +This is a binary file, so we can read its contents using the `strings` utility. + +``` +$ strings /tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_ +default +file_input.knownFiles0 +default +file_input.knownFiles0 +default +file_input.knownFiles0 +``` + +When a new log file is created with a single entry in it, and the `filelog` receiver in the collector +pipeline has ingested this entry, the contents of the `/tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_` file +change to reflect this new state. + +``` +$ echo "$RANDOM" >> /tmp/1.log +$ cat /tmp/1.log +31079 +$ strings /tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_ +default +file_input.knownFiles1 +{"Fingerprint":{"first_bytes":"MzEwNzkK"},"Offset":6,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:15:54.763711-07:00","LastDataLength":0}} +default +file_input.knownFiles1 +{"Fingerprint":{"first_bytes":"MzEwNzkK"},"Offset":6,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:15:54.763711-07:00","LastDataLength":0}} +default +file_input.knownFiles1 +{"Fingerprint":{"first_bytes":"MzEwNzkK"},"Offset":6,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:15:54.763711-07:00","LastDataLength":0}} +``` + +Take a closer look at the changes, we can infer a few things about the contents of the +`/tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_` file: +* The number after `file_input.knownFiles` reflects the number of log files being tracked. +* If this number is `N`, the subsequent `N` lines contain details of each log file being tracked. Each line is JSON-formatted. +* The details contain the fingerprint of the log file, how much of the log file's contents the `filelog` receiver has consumed, + the file name, and some other details. + +When another log entry is added to the same log file, the contents of the `/tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_` file +change to reflect this new state. Note that the offset has been incremented by the size of the new entry, in bytes. + +``` +$ echo "$RANDOM" >> /tmp/1.log +$ cat /tmp/1.log +31079 +219 +$ strings /tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_ +default +file_input.knownFiles1 +{"Fingerprint":{"first_bytes":"MzEwNzkKMjE5Cg=="},"Offset":10,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:18.164331-07:00","LastDataLength":0}} +default +file_input.knownFiles1 +{"Fingerprint":{"first_bytes":"MzEwNzkKMjE5Cg=="},"Offset":10,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:18.164331-07:00","LastDataLength":0}} +default +file_input.knownFiles1 +{"Fingerprint":{"first_bytes":"MzEwNzkKMjE5Cg=="},"Offset":10,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:18.164331-07:00","LastDataLength":0}} +``` + +When a new log file is created, it also gets tracked in the `/tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_` file. + +``` +$ echo "$RANDOM" >> 2.log +$ cat /tmp/2.log +24403 +$ strings otelcol/file_storage/filelogreceiver/receiver_filelog_ +default +file_input.knownFiles2 +{"Fingerprint":{"first_bytes":"MzEwNzkKMjE5Cg=="},"Offset":10,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:18.164331-07:00","LastDataLength":0}} +{"Fingerprint":{"first_bytes":"MjQ0MDMK"},"Offset":6,"FileAttributes":{"log.file.name":"2.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:39.96429-07:00","LastDataLength":0}} +default +file_input.knownFiles2 +{"Fingerprint":{"first_bytes":"MzEwNzkKMjE5Cg=="},"Offset":10,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:18.164331-07:00","LastDataLength":0}} +{"Fingerprint":{"first_bytes":"MjQ0MDMK"},"Offset":6,"FileAttributes":{"log.file.name":"2.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:39.96429-07:00","LastDataLength":0}} +default +file_input.knownFiles2 +{"Fingerprint":{"first_bytes":"MzEwNzkKMjE5Cg=="},"Offset":10,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:18.164331-07:00","LastDataLength":0}} +{"Fingerprint":{"first_bytes":"MjQ0MDMK"},"Offset":6,"FileAttributes":{"log.file.name":"2.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:39.96429-07:00","LastDataLength":0}} +``` + +#### TODO +* Document why and how the signature changes when more data is added to the file. +* Document why there are three copies of the tracking information. +* Document contents of tracking file with compaction enabled. \ No newline at end of file From d123a7c473912222b0086c1d331825866cdd6b83 Mon Sep 17 00:00:00 2001 From: Shaunak Kashyap Date: Thu, 21 Mar 2024 14:24:09 -0700 Subject: [PATCH 02/17] Document signature --- receiver/filelogreceiver/README.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index 261fb035f1c2..f4c3c85773de 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -257,8 +257,12 @@ Take a closer look at the changes, we can infer a few things about the contents `/tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_` file: * The number after `file_input.knownFiles` reflects the number of log files being tracked. * If this number is `N`, the subsequent `N` lines contain details of each log file being tracked. Each line is JSON-formatted. -* The details contain the fingerprint of the log file, how much of the log file's contents the `filelog` receiver has consumed, + * The details contain the fingerprint of the log file, how much of the log file's contents the `filelog` receiver has consumed, the file name, and some other details. + * The fingerprint of the log file, stored in the `.Fingerprint.first_bytes` JSON field, is a base64-encoding of the first + `B` bytes of the log file, where `B` corresponds to the value specified by the `fingerprint_size` configuration setting of + the `filelog` receiver. If the log file has fewer bytes than `B`, the fingerprint is calculated from the available bytes + and is re-calculated when the file grows, until it reaches `B` bytes. When another log entry is added to the same log file, the contents of the `/tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_` file change to reflect this new state. Note that the offset has been incremented by the size of the new entry, in bytes. @@ -302,6 +306,5 @@ file_input.knownFiles2 ``` #### TODO -* Document why and how the signature changes when more data is added to the file. * Document why there are three copies of the tracking information. * Document contents of tracking file with compaction enabled. \ No newline at end of file From bcc1caee795cf7ed5f15e6566b41d167fab09334 Mon Sep 17 00:00:00 2001 From: Shaunak Kashyap Date: Thu, 21 Mar 2024 14:27:38 -0700 Subject: [PATCH 03/17] Remove trailing whitespace --- receiver/filelogreceiver/README.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index f4c3c85773de..cba4316d306a 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -179,7 +179,7 @@ Exception in thread 2 "main" java.lang.NullPointerException ## Offset tracking -`storage` setting allows to define the proper storage extension to be used for storing file offsets. +`storage` setting allows to define the proper storage extension to be used for storing file offsets. While the storage parameter can ensure that log files are consumed accurately, it is possible that logs are dropped while moving downstream through other components in the collector. For additional resiliency, see [Fault tolerant log collection example](../../examples/fault-tolerant-logs-collection/README.md) @@ -234,7 +234,7 @@ file_input.knownFiles0 ``` When a new log file is created with a single entry in it, and the `filelog` receiver in the collector -pipeline has ingested this entry, the contents of the `/tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_` file +pipeline has ingested this entry, the contents of the `/tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_` file change to reflect this new state. ``` @@ -253,12 +253,12 @@ file_input.knownFiles1 {"Fingerprint":{"first_bytes":"MzEwNzkK"},"Offset":6,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:15:54.763711-07:00","LastDataLength":0}} ``` -Take a closer look at the changes, we can infer a few things about the contents of the -`/tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_` file: +Take a closer look at the changes, we can infer a few things about the contents of the + `/tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_` file: * The number after `file_input.knownFiles` reflects the number of log files being tracked. -* If this number is `N`, the subsequent `N` lines contain details of each log file being tracked. Each line is JSON-formatted. - * The details contain the fingerprint of the log file, how much of the log file's contents the `filelog` receiver has consumed, - the file name, and some other details. +* If this number is `N`, the subsequent `N` lines contain details of each log file being tracked. Each line is JSON-formatted. + * The details contain the fingerprint of the log file, how much of the log file's contents the `filelog` receiver has consumed, + the file name, and some other details. * The fingerprint of the log file, stored in the `.Fingerprint.first_bytes` JSON field, is a base64-encoding of the first `B` bytes of the log file, where `B` corresponds to the value specified by the `fingerprint_size` configuration setting of the `filelog` receiver. If the log file has fewer bytes than `B`, the fingerprint is calculated from the available bytes From 634420d84f8c301387c1c320867b5a1ce6d9dca2 Mon Sep 17 00:00:00 2001 From: Shaunak Kashyap Date: Thu, 21 Mar 2024 14:38:45 -0700 Subject: [PATCH 04/17] Clarify that docs are about file storage only --- receiver/filelogreceiver/README.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index cba4316d306a..d45d7908481d 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -184,10 +184,12 @@ While the storage parameter can ensure that log files are consumed accurately, i logs are dropped while moving downstream through other components in the collector. For additional resiliency, see [Fault tolerant log collection example](../../examples/fault-tolerant-logs-collection/README.md) -### Debugging +### File storage -Sometimes, it's useful to take a peek at the `storage` file in which offsets are stored. At the moment, -the simplest way to do this is by printing out the contents of this file using the `strings` utility. +A common storage extension that's used for tracking log file offsets is the +[`filestorage` extension](../../extension/storage/filestorage). Sometimes, typically for debugging reasons, it's useful +to take a peek at the file in which offsets are stored. At the moment, the simplest way to do this is by printing out +the contents of this file using the `strings` utility. Consider a collector pipeline that's using the `filelog` receiver with the `storage` extension as shown below. Note that [compaction](../../extension/storage/filestorage/README.md#compaction) is not being used. From f4712609673dea941eb3ac56f681dc507f23a194 Mon Sep 17 00:00:00 2001 From: Shaunak Kashyap Date: Thu, 21 Mar 2024 16:21:47 -0700 Subject: [PATCH 05/17] Remove TODOs --- receiver/filelogreceiver/README.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index d45d7908481d..e5ecfb48df16 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -306,7 +306,3 @@ file_input.knownFiles2 {"Fingerprint":{"first_bytes":"MzEwNzkKMjE5Cg=="},"Offset":10,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:18.164331-07:00","LastDataLength":0}} {"Fingerprint":{"first_bytes":"MjQ0MDMK"},"Offset":6,"FileAttributes":{"log.file.name":"2.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:39.96429-07:00","LastDataLength":0}} ``` - -#### TODO -* Document why there are three copies of the tracking information. -* Document contents of tracking file with compaction enabled. \ No newline at end of file From 304ece79b673e5a87ca562a5183183bd2bb9cae3 Mon Sep 17 00:00:00 2001 From: Shaunak Kashyap Date: Fri, 22 Mar 2024 14:08:01 -0700 Subject: [PATCH 06/17] Update receiver/filelogreceiver/README.md Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com> --- receiver/filelogreceiver/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index e5ecfb48df16..ca70b3cb35fc 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -179,7 +179,7 @@ Exception in thread 2 "main" java.lang.NullPointerException ## Offset tracking -`storage` setting allows to define the proper storage extension to be used for storing file offsets. +The `storage` setting allows you to define the proper storage extension for storing file offsets. While the storage parameter can ensure that log files are consumed accurately, it is possible that logs are dropped while moving downstream through other components in the collector. For additional resiliency, see [Fault tolerant log collection example](../../examples/fault-tolerant-logs-collection/README.md) From 91dd2be99d2c186cbd9f3b3ab18f18ee255cbd50 Mon Sep 17 00:00:00 2001 From: Shaunak Kashyap Date: Fri, 22 Mar 2024 14:08:53 -0700 Subject: [PATCH 07/17] Update receiver/filelogreceiver/README.md Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com> --- receiver/filelogreceiver/README.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index ca70b3cb35fc..0b98e0e8e45f 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -186,10 +186,7 @@ For additional resiliency, see [Fault tolerant log collection example](../../exa ### File storage -A common storage extension that's used for tracking log file offsets is the -[`filestorage` extension](../../extension/storage/filestorage). Sometimes, typically for debugging reasons, it's useful -to take a peek at the file in which offsets are stored. At the moment, the simplest way to do this is by printing out -the contents of this file using the `strings` utility. +The [`filestorage` extension](../../extension/storage/filestorage) is a common storage extension that's used for tracking log file offsets. Sometimes, typically for debugging reasons, it's useful to view the file in which offsets are stored. The simplest way to do this is by printing out the contents of this file using the `strings` utility. Consider a collector pipeline that's using the `filelog` receiver with the `storage` extension as shown below. Note that [compaction](../../extension/storage/filestorage/README.md#compaction) is not being used. From 48b9bedbd2cd446cfa049dbdc264827cf9973b47 Mon Sep 17 00:00:00 2001 From: Shaunak Kashyap Date: Fri, 22 Mar 2024 14:09:27 -0700 Subject: [PATCH 08/17] Update receiver/filelogreceiver/README.md Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com> --- receiver/filelogreceiver/README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index 0b98e0e8e45f..c44e0b6a5b6d 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -188,8 +188,7 @@ For additional resiliency, see [Fault tolerant log collection example](../../exa The [`filestorage` extension](../../extension/storage/filestorage) is a common storage extension that's used for tracking log file offsets. Sometimes, typically for debugging reasons, it's useful to view the file in which offsets are stored. The simplest way to do this is by printing out the contents of this file using the `strings` utility. -Consider a collector pipeline that's using the `filelog` receiver with the `storage` extension as shown -below. Note that [compaction](../../extension/storage/filestorage/README.md#compaction) is not being used. +The following configuration shows a collector pipeline that's using the `filelog` receiver with the `storage` extension. Note that [compaction](../../extension/storage/filestorage/README.md#compaction) is not being used. ```yaml receivers: From a1f82b8a385e8e77505af0f1b533461e068a057e Mon Sep 17 00:00:00 2001 From: Shaunak Kashyap Date: Fri, 22 Mar 2024 14:10:04 -0700 Subject: [PATCH 09/17] Update receiver/filelogreceiver/README.md Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com> --- receiver/filelogreceiver/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index c44e0b6a5b6d..96d99c092bc0 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -251,7 +251,7 @@ file_input.knownFiles1 {"Fingerprint":{"first_bytes":"MzEwNzkK"},"Offset":6,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:15:54.763711-07:00","LastDataLength":0}} ``` -Take a closer look at the changes, we can infer a few things about the contents of the +Taking a closer look at the changes, we can infer a few things about the contents of the `/tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_` file: * The number after `file_input.knownFiles` reflects the number of log files being tracked. * If this number is `N`, the subsequent `N` lines contain details of each log file being tracked. Each line is JSON-formatted. From fa536d16ff4476407c04acbd92bdb86e2b9fdb26 Mon Sep 17 00:00:00 2001 From: Shaunak Kashyap Date: Fri, 22 Mar 2024 14:10:20 -0700 Subject: [PATCH 10/17] Update receiver/filelogreceiver/README.md Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com> --- receiver/filelogreceiver/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index 96d99c092bc0..20c4e8aaddf5 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -211,7 +211,7 @@ service: exporters: [...] ``` -Assume there are no log files matching `/tmp/*.log` when the above collector pipeline starts executing. In this +Assume there are no log files matching `/tmp/*.log` when the previous collector pipeline starts executing. In this scenario, the `/tmp/otelcol/file_storage/filelogreceiver` directory contains one file: ``` From 201e22bb204379006d12836b8224d4fc9c4c8213 Mon Sep 17 00:00:00 2001 From: Shaunak Kashyap Date: Mon, 25 Mar 2024 16:02:07 -0700 Subject: [PATCH 11/17] Update receiver/filelogreceiver/README.md --- receiver/filelogreceiver/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index 20c4e8aaddf5..03ab0bebf9e0 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -231,8 +231,8 @@ default file_input.knownFiles0 ``` -When a new log file is created with a single entry in it, and the `filelog` receiver in the collector -pipeline has ingested this entry, the contents of the `/tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_` file +When a new log file is created with one or more entries in it, and the `filelog` receiver in the collector +pipeline has ingested these entries, the contents of the `/tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_` file change to reflect this new state. ``` From a1469678279a564dd44c41938f665f1cb5e74edf Mon Sep 17 00:00:00 2001 From: Shaunak Kashyap Date: Thu, 28 Mar 2024 13:16:19 -0700 Subject: [PATCH 12/17] Update receiver/filelogreceiver/README.md Co-authored-by: Chris Mark --- receiver/filelogreceiver/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index 03ab0bebf9e0..2648efd5c1d5 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -188,7 +188,7 @@ For additional resiliency, see [Fault tolerant log collection example](../../exa The [`filestorage` extension](../../extension/storage/filestorage) is a common storage extension that's used for tracking log file offsets. Sometimes, typically for debugging reasons, it's useful to view the file in which offsets are stored. The simplest way to do this is by printing out the contents of this file using the `strings` utility. -The following configuration shows a collector pipeline that's using the `filelog` receiver with the `storage` extension. Note that [compaction](../../extension/storage/filestorage/README.md#compaction) is not being used. +The following configuration shows a collector pipeline that's using the `filelog` receiver with the `file_storage` extension. Note that [compaction](../../extension/storage/filestorage/README.md#compaction) is not being used. ```yaml receivers: From be821606a549953c861b641b41f0ab6354aec9e1 Mon Sep 17 00:00:00 2001 From: Shaunak Kashyap Date: Thu, 28 Mar 2024 13:17:59 -0700 Subject: [PATCH 13/17] Fix typo --- receiver/filelogreceiver/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index 2648efd5c1d5..12a5c2b940aa 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -186,7 +186,7 @@ For additional resiliency, see [Fault tolerant log collection example](../../exa ### File storage -The [`filestorage` extension](../../extension/storage/filestorage) is a common storage extension that's used for tracking log file offsets. Sometimes, typically for debugging reasons, it's useful to view the file in which offsets are stored. The simplest way to do this is by printing out the contents of this file using the `strings` utility. +The [`file_storage` extension](../../extension/storage/filestorage) is a common storage extension that's used for tracking log file offsets. Sometimes, typically for debugging reasons, it's useful to view the file in which offsets are stored. The simplest way to do this is by printing out the contents of this file using the `strings` utility. The following configuration shows a collector pipeline that's using the `filelog` receiver with the `file_storage` extension. Note that [compaction](../../extension/storage/filestorage/README.md#compaction) is not being used. From 778b8f7c21c99b279dd7ac0fc52de170373470d2 Mon Sep 17 00:00:00 2001 From: Shaunak Kashyap Date: Tue, 2 Apr 2024 15:31:07 -0700 Subject: [PATCH 14/17] Abstractly describe information persisted --- receiver/filelogreceiver/README.md | 127 ++--------------------------- 1 file changed, 9 insertions(+), 118 deletions(-) diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index 12a5c2b940aa..eed534799377 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -184,121 +184,12 @@ While the storage parameter can ensure that log files are consumed accurately, i logs are dropped while moving downstream through other components in the collector. For additional resiliency, see [Fault tolerant log collection example](../../examples/fault-tolerant-logs-collection/README.md) -### File storage - -The [`file_storage` extension](../../extension/storage/filestorage) is a common storage extension that's used for tracking log file offsets. Sometimes, typically for debugging reasons, it's useful to view the file in which offsets are stored. The simplest way to do this is by printing out the contents of this file using the `strings` utility. - -The following configuration shows a collector pipeline that's using the `filelog` receiver with the `file_storage` extension. Note that [compaction](../../extension/storage/filestorage/README.md#compaction) is not being used. - -```yaml -receivers: - filelog: - include: /tmp/*.log - storage: file_storage/filelogreceiver - -exporters: - ... - -extensions: - file_storage/filelogreceiver: - directory: /tmp/otelcol/file_storage/filelogreceiver - -service: - extensions: [file_storage/filelogreceiver] - pipelines: - logs: - receivers: [filelog] - exporters: [...] -``` - -Assume there are no log files matching `/tmp/*.log` when the previous collector pipeline starts executing. In this -scenario, the `/tmp/otelcol/file_storage/filelogreceiver` directory contains one file: - -``` -$ ls /tmp/otelcol/file_storage/filelogreceiver -receiver_filelog_ -``` - -This is a binary file, so we can read its contents using the `strings` utility. - -``` -$ strings /tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_ -default -file_input.knownFiles0 -default -file_input.knownFiles0 -default -file_input.knownFiles0 -``` - -When a new log file is created with one or more entries in it, and the `filelog` receiver in the collector -pipeline has ingested these entries, the contents of the `/tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_` file -change to reflect this new state. - -``` -$ echo "$RANDOM" >> /tmp/1.log -$ cat /tmp/1.log -31079 -$ strings /tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_ -default -file_input.knownFiles1 -{"Fingerprint":{"first_bytes":"MzEwNzkK"},"Offset":6,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:15:54.763711-07:00","LastDataLength":0}} -default -file_input.knownFiles1 -{"Fingerprint":{"first_bytes":"MzEwNzkK"},"Offset":6,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:15:54.763711-07:00","LastDataLength":0}} -default -file_input.knownFiles1 -{"Fingerprint":{"first_bytes":"MzEwNzkK"},"Offset":6,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:15:54.763711-07:00","LastDataLength":0}} -``` - -Taking a closer look at the changes, we can infer a few things about the contents of the - `/tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_` file: -* The number after `file_input.knownFiles` reflects the number of log files being tracked. -* If this number is `N`, the subsequent `N` lines contain details of each log file being tracked. Each line is JSON-formatted. - * The details contain the fingerprint of the log file, how much of the log file's contents the `filelog` receiver has consumed, - the file name, and some other details. - * The fingerprint of the log file, stored in the `.Fingerprint.first_bytes` JSON field, is a base64-encoding of the first - `B` bytes of the log file, where `B` corresponds to the value specified by the `fingerprint_size` configuration setting of - the `filelog` receiver. If the log file has fewer bytes than `B`, the fingerprint is calculated from the available bytes - and is re-calculated when the file grows, until it reaches `B` bytes. - -When another log entry is added to the same log file, the contents of the `/tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_` file -change to reflect this new state. Note that the offset has been incremented by the size of the new entry, in bytes. - -``` -$ echo "$RANDOM" >> /tmp/1.log -$ cat /tmp/1.log -31079 -219 -$ strings /tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_ -default -file_input.knownFiles1 -{"Fingerprint":{"first_bytes":"MzEwNzkKMjE5Cg=="},"Offset":10,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:18.164331-07:00","LastDataLength":0}} -default -file_input.knownFiles1 -{"Fingerprint":{"first_bytes":"MzEwNzkKMjE5Cg=="},"Offset":10,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:18.164331-07:00","LastDataLength":0}} -default -file_input.knownFiles1 -{"Fingerprint":{"first_bytes":"MzEwNzkKMjE5Cg=="},"Offset":10,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:18.164331-07:00","LastDataLength":0}} -``` - -When a new log file is created, it also gets tracked in the `/tmp/otelcol/file_storage/filelogreceiver/receiver_filelog_` file. - -``` -$ echo "$RANDOM" >> 2.log -$ cat /tmp/2.log -24403 -$ strings otelcol/file_storage/filelogreceiver/receiver_filelog_ -default -file_input.knownFiles2 -{"Fingerprint":{"first_bytes":"MzEwNzkKMjE5Cg=="},"Offset":10,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:18.164331-07:00","LastDataLength":0}} -{"Fingerprint":{"first_bytes":"MjQ0MDMK"},"Offset":6,"FileAttributes":{"log.file.name":"2.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:39.96429-07:00","LastDataLength":0}} -default -file_input.knownFiles2 -{"Fingerprint":{"first_bytes":"MzEwNzkKMjE5Cg=="},"Offset":10,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:18.164331-07:00","LastDataLength":0}} -{"Fingerprint":{"first_bytes":"MjQ0MDMK"},"Offset":6,"FileAttributes":{"log.file.name":"2.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:39.96429-07:00","LastDataLength":0}} -default -file_input.knownFiles2 -{"Fingerprint":{"first_bytes":"MzEwNzkKMjE5Cg=="},"Offset":10,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:18.164331-07:00","LastDataLength":0}} -{"Fingerprint":{"first_bytes":"MjQ0MDMK"},"Offset":6,"FileAttributes":{"log.file.name":"2.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:39.96429-07:00","LastDataLength":0}} -``` +Here is some of the information the `filelog` receiver stores: +- The number of files it is currently tracking (`knownFiles`). +- For each file being tracked: + - The fingerprint of the file (`Fingerprint.first_bytes`). + - The byte offset from the start of the file, indicating the position in the file from where the + `filelog` receiver will continue reading the file (`Offset`). + - An arbitrary set of file attributes, e.g. the name of the file (`FileAttributes`). + +Exactly how this information is serialized depends on the type of storage being used. \ No newline at end of file From 251e2aa8cb7a9af049c21f55f1aaa849f04717b1 Mon Sep 17 00:00:00 2001 From: Shaunak Kashyap Date: Wed, 3 Apr 2024 13:23:08 -0700 Subject: [PATCH 15/17] Update receiver/filelogreceiver/README.md Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com> --- receiver/filelogreceiver/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index eed534799377..1697794b4cc5 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -189,7 +189,7 @@ Here is some of the information the `filelog` receiver stores: - For each file being tracked: - The fingerprint of the file (`Fingerprint.first_bytes`). - The byte offset from the start of the file, indicating the position in the file from where the - `filelog` receiver will continue reading the file (`Offset`). + `filelog` receiver continues reading the file (`Offset`). - An arbitrary set of file attributes, e.g. the name of the file (`FileAttributes`). Exactly how this information is serialized depends on the type of storage being used. \ No newline at end of file From d4d3546275b7816156e9cd42011d8906a12672a9 Mon Sep 17 00:00:00 2001 From: Shaunak Kashyap Date: Wed, 3 Apr 2024 13:23:16 -0700 Subject: [PATCH 16/17] Update receiver/filelogreceiver/README.md Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com> --- receiver/filelogreceiver/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index 1697794b4cc5..3112c06da9fc 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -190,6 +190,6 @@ Here is some of the information the `filelog` receiver stores: - The fingerprint of the file (`Fingerprint.first_bytes`). - The byte offset from the start of the file, indicating the position in the file from where the `filelog` receiver continues reading the file (`Offset`). - - An arbitrary set of file attributes, e.g. the name of the file (`FileAttributes`). + - An arbitrary set of file attributes, such as the name of the file (`FileAttributes`). Exactly how this information is serialized depends on the type of storage being used. \ No newline at end of file From ec98397a681b53acd479efa3888f7bc92dbf65ee Mon Sep 17 00:00:00 2001 From: Shaunak Kashyap Date: Wed, 3 Apr 2024 15:20:10 -0700 Subject: [PATCH 17/17] Link to fingerprint doc --- receiver/filelogreceiver/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index 3112c06da9fc..b650098a982b 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -187,7 +187,7 @@ For additional resiliency, see [Fault tolerant log collection example](../../exa Here is some of the information the `filelog` receiver stores: - The number of files it is currently tracking (`knownFiles`). - For each file being tracked: - - The fingerprint of the file (`Fingerprint.first_bytes`). + - The [fingerprint](../../pkg/stanza/fileconsumer/design.md#fingerprints) of the file (`Fingerprint.first_bytes`). - The byte offset from the start of the file, indicating the position in the file from where the `filelog` receiver continues reading the file (`Offset`). - An arbitrary set of file attributes, such as the name of the file (`FileAttributes`).