Skip to content

Commit

Permalink
[LOI-346] Milvus Logs (#19331)
Browse files Browse the repository at this point in the history
* Add logs assets

* Update README

* Fix logs assets

* Add test results

* Add logs to spec.yaml

* Update manifest

* Add saved views

* Add changelog

* Improve message_rule

* Add second parsing rule

* Replace data with regex in parsing rule 1

* Add quotes to samples

* Fix rules in yaml

* Add test results for rule2

* Use regex in message_rule

* Update parsing rules

* Update test results
  • Loading branch information
dkirov-dd authored Jan 30, 2025
1 parent d3e1bdf commit 54f6114
Show file tree
Hide file tree
Showing 9 changed files with 290 additions and 0 deletions.
38 changes: 38 additions & 0 deletions milvus/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,42 @@ For containerized environments, see the [Autodiscovery Integration Templates][3]
<!-- xxz tab xxx -->
<!-- xxz tabs xxx -->

#### Logs

The Milvus integration can collect logs from the Milvus pods or containers.

<!-- xxx tabs xxx -->
<!-- xxx tab "Host" xxx -->

Apply this if you want to collect logs from Milvus standalone containers.

1. Collecting logs is disabled by default in the Datadog Agent. Enable it in your `datadog.yaml` file:

```yaml
logs_enabled: true
```
2. Uncomment and edit the logs configuration block in your `milvus.d/conf.yaml` file. Here's an example:

```yaml
logs:
- type: docker
source: milvus
service: milvus
```

<!-- xxz tab xxx -->
<!-- xxx tab "Kubernetes" xxx -->

Apply this if you want to collect logs from a Milvus Kubernetes cluster.

Collecting logs is disabled by default in the Datadog Agent. To enable it, see [Kubernetes Log Collection][10].

Then, set Log Integrations as pod annotations. This can also be configured with a file, a configmap, or a key-value store. For more information, see the configuration section of [Kubernetes Log Collection][11].

<!-- xxz tab xxx -->
<!-- xxz tabs xxx -->

### Validation

[Run the Agent's status subcommand][6] and look for `milvus` under the Checks section.
Expand Down Expand Up @@ -66,3 +102,5 @@ Need help? Contact [Datadog support][9].
[7]: https://github.com/DataDog/integrations-core/blob/master/milvus/metadata.csv
[8]: https://github.com/DataDog/integrations-core/blob/master/milvus/assets/service_checks.json
[9]: https://docs.datadoghq.com/help/
[10]: https://docs.datadoghq.com/agent/kubernetes/log/#setup
[11]: https://docs.datadoghq.com/agent/kubernetes/log/#configuration
5 changes: 5 additions & 0 deletions milvus/assets/configuration/spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,8 @@ files:
openmetrics_endpoint.description: |
Endpoint exposing Milvus' Prometheus metrics. For more information, refer to
https://milvus.io/docs/monitor.md#Monitor-metrics-with-Prometheus.
- template: logs
example:
- type: docker
source: milvus
service: <SERVICE>
53 changes: 53 additions & 0 deletions milvus/assets/logs/milvus.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
id: milvus
metric_id: milvus
backend_only: false
facets: null
pipeline:
type: pipeline
name: Milvus
enabled: true
filter:
query: source:milvus
processors:
- type: grok-parser
name: Grok Parser
enabled: true
source: message
samples:
- '[2024/11/27 14:07:51.849 +00:00] [INFO] [datacoord/handler.go:341]
["channel seek position set from channel checkpoint meta"]
[channel=by-dev-rootcoord-dml_2_453764875273209568v0]
[posTs=454221223538458625] [posTime=2024/11/27 14:07:39.421 +00:00]'
- '[2024/11/27 14:07:01.849 +00:00] [INFO] [datacoord/services.go:833]
["datacoord append channelInfo in GetRecoveryInfo"]
[traceID=ed216b196edf0589f281c4ad800f6565]
[collectionID=453764875273209568] [partitionIDs="[]"]
[channel=by-dev-rootcoord-dml_2_453764875273209568v0] ["# of unflushed
segments"=0] ["# of flushed segments"=1] ["# of dropped segments"=0]
["# of indexed segments"=0] ["# of l0 segments"=0]'
- '[2024/11/27 14:06:51.852 +00:00] [INFO] [datacoord/services.go:818]
["get recovery info request received"]
[traceID=54cda8d3229d00982db785351a12ea7a]
[collectionID=453764875273212700] [partitionIDs="[]"]'
- '[2024/11/18 15:15:45.120 +00:00] [INFO] [roles/roles.go:282] [setupPrometheusHTTPServer]'
grok:
supportRules: message_rule
\["?%{regex("[^]^\"]+"):message.body}"?](\s+%{data:message.details:array("[]","] [")})?
matchRules: |
rule1 \[%{date("yyyy/MM/dd HH:mm:ss.SSS ZZ"):date}]\s+\[%{word:level}\]\s+\[%{word:component}/%{regex("[^:]+"):file}:%{integer:lineno}\]\s+%{message_rule}
rule2 %{regex("[IWE]"):level}%{date("yyyyMMdd HH:mm:ss.SSSSSS"):date}\s+%{integer:pid}\s+%{regex("[^:]+"):file}:%{integer:lineno}]\s+%{data:message.details:array("[]","][")}\s+%{data:message.body}
- type: status-remapper
name: Define `level` as the official status of the log
enabled: true
sources:
- level
- type: message-remapper
name: Define `message.body` as the official message of the log
enabled: true
sources:
- message.body
- type: date-remapper
name: Define `date` as the official date of the log
enabled: true
sources:
- date
117 changes: 117 additions & 0 deletions milvus/assets/logs/milvus_tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
id: milvus
tests:
-
sample: "[2024/11/27 14:07:51.849 +00:00] [INFO] [datacoord/handler.go:341] [\"channel seek position set from channel checkpoint meta\"] [channel=by-dev-rootcoord-dml_2_453764875273209568v0] [posTs=454221223538458625] [posTime=2024/11/27 14:07:39.421 +00:00]"
result:
custom:
component: "datacoord"
date: 1732716471849
file: "handler.go"
level: "INFO"
lineno: 341
message:
details:
- "channel=by-dev-rootcoord-dml_2_453764875273209568v0"
- "posTs=454221223538458625"
- "posTime=2024/11/27 14:07:39.421 +00:00"
message: "channel seek position set from channel checkpoint meta"
status: "info"
tags:
- "source:LOGS_SOURCE"
timestamp: 1732716471849
-
sample: "[2024/11/27 14:07:01.849 +00:00] [INFO] [datacoord/services.go:833] [\"datacoord append channelInfo in GetRecoveryInfo\"] [traceID=ed216b196edf0589f281c4ad800f6565] [collectionID=453764875273209568] [partitionIDs=\"[]\"] [channel=by-dev-rootcoord-dml_2_453764875273209568v0] [\"# of unflushed segments\"=0] [\"# of flushed segments\"=1] [\"# of dropped segments\"=0] [\"# of indexed segments\"=0] [\"# of l0 segments\"=0]"
result:
custom:
component: "datacoord"
date: 1732716421849
file: "services.go"
level: "INFO"
lineno: 833
message:
details:
- "traceID=ed216b196edf0589f281c4ad800f6565"
- "collectionID=453764875273209568"
- "partitionIDs=\"[]\""
- "channel=by-dev-rootcoord-dml_2_453764875273209568v0"
- "\"# of unflushed segments\"=0"
- "\"# of flushed segments\"=1"
- "\"# of dropped segments\"=0"
- "\"# of indexed segments\"=0"
- "\"# of l0 segments\"=0"
message: "datacoord append channelInfo in GetRecoveryInfo"
status: "info"
tags:
- "source:LOGS_SOURCE"
timestamp: 1732716421849
-
sample: "[2024/11/27 14:06:51.852 +00:00] [INFO] [datacoord/services.go:818] [\"get recovery info request received\"] [traceID=54cda8d3229d00982db785351a12ea7a] [collectionID=453764875273212700] [partitionIDs=\"[]\"]"
result:
custom:
component: "datacoord"
date: 1732716411852
file: "services.go"
level: "INFO"
lineno: 818
message:
details:
- "traceID=54cda8d3229d00982db785351a12ea7a"
- "collectionID=453764875273212700"
- "partitionIDs=\"[]\""
message: "get recovery info request received"
status: "info"
tags:
- "source:LOGS_SOURCE"
timestamp: 1732716411852
-
sample: "[2024/11/18 15:15:45.120 +00:00] [INFO] [roles/roles.go:282] [setupPrometheusHTTPServer]"
result:
custom:
component: "roles"
date: 1731942945120
file: "roles.go"
level: "INFO"
lineno: 282
message: "setupPrometheusHTTPServer"
status: "info"
tags:
- "source:LOGS_SOURCE"
timestamp: 1731942945120
-
sample: "I20241118 15:15:45.622637 26 thread_pool.h:178] [KNOWHERE][InitGlobalBuildThreadPool][milvus] Init global build thread pool with size 7"
result:
custom:
date: 1731942945622
file: "thread_pool.h"
level: "I"
lineno: 178
message:
details:
- "KNOWHERE"
- "InitGlobalBuildThreadPool"
- "milvus"
pid: 26
message: "Init global build thread pool with size 7"
status: "info"
tags:
- "source:LOGS_SOURCE"
timestamp: 1731942945622
-
sample: "I20241118 15:15:45.626205 33 MmapChunkManager.cpp:302] [SERVER][MmapChunkManager][milvus] Init MappChunkManager with: Path /var/lib/milvus/data/mmap/mmap_chunk_manager, MaxDiskSize 121418772 MB, FixedFileSize 1 MB."
result:
custom:
date: 1731942945626
file: "MmapChunkManager.cpp"
level: "I"
lineno: 302
message:
details:
- "SERVER"
- "MmapChunkManager"
- "milvus"
pid: 33
message: "Init MappChunkManager with: Path /var/lib/milvus/data/mmap/mmap_chunk_manager, MaxDiskSize 121418772 MB, FixedFileSize 1 MB."
status: "info"
tags:
- "source:LOGS_SOURCE"
timestamp: 1731942945626
24 changes: 24 additions & 0 deletions milvus/assets/saved_views/error_logs_overview.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"name": "Milvus Error Logs Overview",
"type": "logs",
"page": "stream",
"query": "source:milvus -status:(warn OR info)",
"timerange": {
"interval_ms": 3600000
},
"visible_facets": [
"source",
"host",
"service"
],
"options": {
"columns": [
"host",
"service"
],
"show_date_column": true,
"show_message_column": true,
"message_display": "inline",
"show_timeline": true
}
}
24 changes: 24 additions & 0 deletions milvus/assets/saved_views/logs_overview.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"name": "Milvus Logs Overview",
"type": "logs",
"page": "stream",
"query": "source:milvus",
"timerange": {
"interval_ms": 3600000
},
"visible_facets": [
"source",
"host",
"service"
],
"options": {
"columns": [
"host",
"service"
],
"show_date_column": true,
"show_message_column": true,
"message_display": "inline",
"show_timeline": true
}
}
1 change: 1 addition & 0 deletions milvus/changelog.d/19331.added
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add Milvus logs
20 changes: 20 additions & 0 deletions milvus/datadog_checks/milvus/data/conf.yaml.example
Original file line number Diff line number Diff line change
Expand Up @@ -634,3 +634,23 @@ instances:
# - <INCLUDE_REGEX>
# exclude:
# - <EXCLUDE_REGEX>

## Log Section
##
## type - required - Type of log input source (tcp / udp / file / windows_event).
## port / path / channel_path - required - Set port if type is tcp or udp.
## Set path if type is file.
## Set channel_path if type is windows_event.
## source - required - Attribute that defines which integration sent the logs.
## encoding - optional - For file specifies the file encoding. Default is utf-8. Other
## possible values are utf-16-le and utf-16-be.
## service - optional - The name of the service that generates the log.
## Overrides any `service` defined in the `init_config` section.
## tags - optional - Add tags to the collected logs.
##
## Discover Datadog log collection: https://docs.datadoghq.com/logs/log_collection/
#
# logs:
# - type: docker
# source: milvus
# service: <SERVICE>
8 changes: 8 additions & 0 deletions milvus/manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
"Supported OS::Windows",
"Supported OS::macOS",
"Category::AI/ML",
"Category::Log Collection",
"Offering::Integration",
"Submitted Data Type::Metrics",
"Submitted Data Type::Logs"
Expand Down Expand Up @@ -51,6 +52,13 @@
"DML channel lag": "assets/monitors/dml_channel_lag.json",
"Request latency": "assets/monitors/request_latency.json",
"Index build latency": "assets/monitors/index_build_latency.json"
},
"saved_views": {
"Milvus Logs Overview": "assets/saved_views/logs_overview.json",
"Milvus Error Logs Overview": "assets/saved_views/error_logs_overview.json"
},
"logs": {
"source": "milvus"
}
},
"author": {
Expand Down

0 comments on commit 54f6114

Please sign in to comment.