-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Prototype] Convert Oracle Performance Datastream to TSDB #4966
Changes from 17 commits
21305e6
08e4f2d
ec93dba
3651ddd
a69e6ba
c904de8
4641a35
6a20e23
b5ade92
6b42bf5
1435521
349fb6a
8e025a2
f65eae3
962da35
8975c00
8bf4264
7e4d1f1
8a57d6b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,5 +4,6 @@ | |
name: ecs.version | ||
- external: ecs | ||
name: service.address | ||
dimension: true | ||
- external: ecs | ||
name: service.type |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,13 @@ | ||
- external: ecs | ||
name: host | ||
- external: ecs | ||
name: host.ip | ||
- external: ecs | ||
name: ecs.version | ||
- external: ecs | ||
name: service.address | ||
dimension: true | ||
- external: ecs | ||
name: service.type | ||
- external: ecs | ||
name: host.name |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,16 @@ | ||
- name: oracle.query | ||
type: keyword | ||
dimension: true | ||
- name: oracle.performance | ||
type: group | ||
release: beta | ||
fields: | ||
- name: query_id | ||
type: keyword | ||
dimension: true | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that dimension values are limited to 1024. IIRC, documents that exceed that value are rejected. It seems like the raw query can easily get over the limit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes true. I plan to convert query to a hash value and use the hash value as the dimension field. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this data stream contain event based data, similar to a slow log which has an entry for each individual slow execution of a query? Or is it a summary of the statistics for each query? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The queries are for returning the summary statistics. Reporting Slow running queries are beyond the scope of Oracle integration. |
||
- name: machine | ||
type: keyword | ||
dimension: true | ||
description: | | ||
Operating system machine name. | ||
- name: buffer_pool | ||
|
@@ -12,6 +19,7 @@ | |
Name of the buffer pool in the instance. | ||
- name: username | ||
type: keyword | ||
dimension: true | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is the username a dimension? If the username changes, should it be a different time series? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason why username is added is mentioned above |
||
description: | | ||
Oracle username | ||
- name: io_reloads | ||
|
@@ -151,9 +159,10 @@ | |
unit: s | ||
description: Amount of time spent in the wait class by the session. | ||
- name: total_waits | ||
type: double | ||
type: integer | ||
metric_type: counter | ||
description: Number of times waits of the class occurred for the session. | ||
- name: wait_class | ||
type: keyword | ||
description: Every wait event belongs to a class of wait event. Wait classes can be one of the following - Administrative, Application, Cluster, Commit, Concurrency, Configuration, Idle, Network, Other, Scheduler, System IO, User IO | ||
dimension: true |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,5 +4,6 @@ | |
name: ecs.version | ||
- external: ecs | ||
name: service.address | ||
dimension: true | ||
- external: ecs | ||
name: service.type |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,5 +4,6 @@ | |
name: ecs.version | ||
- external: ecs | ||
name: service.address | ||
dimension: true | ||
- external: ecs | ||
name: service.type |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1293,7 +1293,9 @@ Performance metrics give an overview of where time is spent in the system and en | |
| ecs.version | ECS version this event conforms to. `ecs.version` is a required field and must exist in all events. When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events. | keyword | | | | ||
| event.dataset | Event module | constant_keyword | | | | ||
| event.module | Event module | constant_keyword | | | | ||
| host | A host is defined as a general computing instance. ECS host.\* fields should be populated with details about the host on which the event happened, or from which the measurement was taken. Host types include hardware, virtual machines, Docker containers, and Kubernetes nodes. | group | | | | ||
| host.ip | Host ip addresses. | ip | | | | ||
| host.name | Name of the host. It can contain what `hostname` returns on Unix systems, the fully qualified domain name, or a name specified by the user. The sender decides which value to use. | keyword | | | | ||
| oracle.performance.buffer_pool | Name of the buffer pool in the instance. | keyword | | | | ||
| oracle.performance.cache.buffer.hit.pct | The cache hit ratio of the specified buffer pool. | double | percent | gauge | | ||
| oracle.performance.cache.get.consistent | Consistent gets statistic. | long | | gauge | | ||
|
@@ -1313,15 +1315,17 @@ Performance metrics give an overview of where time is spent in the system and en | |
| oracle.performance.lock_requests | Average of the ratio between 'gethits' and 'gets', where 'gethits' the number of times an object's handle was found in memory and 'gets' is the number of times a lock was requested for objects of this namespace. | double | | gauge | | ||
| oracle.performance.machine | Operating system machine name. | keyword | | | | ||
| oracle.performance.pin_requests | Average of all pinhits/pins ratios, where 'PinHits' is the number of times all of the metadata pieces of the library object were found in memory and 'pins' is the number of times a PIN was requested for objects of this namespace. | double | | gauge | | ||
| oracle.performance.query_id | | keyword | | | | ||
| oracle.performance.session_count.active | Total count of sessions. | double | | gauge | | ||
| oracle.performance.session_count.inactive | Total count of Inactive sessions. | double | | gauge | | ||
| oracle.performance.session_count.inactive_morethan_onehr | Total inactive sessions more than one hour. | double | | gauge | | ||
| oracle.performance.username | Oracle username | keyword | | | | ||
| oracle.performance.wait.pct_time | Percentage of time waits that are not Idle wait class. | double | percent | gauge | | ||
| oracle.performance.wait.pct_waits | Percentage of number of pct time waits that are not of Idle wait class. | double | percent | gauge | | ||
| oracle.performance.wait.time_waited_secs | Amount of time spent in the wait class by the session. | double | s | gauge | | ||
| oracle.performance.wait.total_waits | Number of times waits of the class occurred for the session. | double | | counter | | ||
| oracle.performance.wait.total_waits | | integer | | counter | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems the description disappeared? |
||
| oracle.performance.wait.wait_class | Every wait event belongs to a class of wait event. Wait classes can be one of the following - Administrative, Application, Cluster, Commit, Concurrency, Configuration, Idle, Network, Other, Scheduler, System IO, User IO | keyword | | | | ||
| oracle.query | | keyword | | | | ||
| service.address | Address where data about this service was collected from. This should be a URI, network address (ipv4:port or [ipv6]:port) or a resource path (sockets). | keyword | | | | ||
| service.type | The type of the service data is collected from. The type can be used to group and correlate logs and metrics from one service type. Example: If logs or metrics are collected from Elasticsearch, `service.type` would be `elasticsearch`. | keyword | | | | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this is the only field set as dimension. You had host.name below but commented out. Can you share a bit background on why? What happens if this runs under k8s? Are there additional dimensions needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we should always add a dimension that uniquely identifies the shipper, something like an ephemeral_id, to avoid that two senders can send data to the same time series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A service.address will have the DSN value that would uniquely identify a database connection. An Oracle DSN comprise of 3 parts - hostname name/ cluster URL, port and database name.
Since oracle integration is not capturing the host level metrics, service.address which identifies a unique DB connection, is sufficient enough, i believe.
I am also thinking about username to additionally include as a dimension. This is because, a database can have multiple users created under it. So, if a user configures two users to collect data from same database identified by the DSN, not including a username may lead to missing series (data). This is not a practical usecase but an exceptional handling mechanism. Currently the username field is not having value and to have it contain username value, additional changes may be needed in metricbeat code. I added the username as a dimension to handle this exception in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the details. I suggest to add parts of this also the fields.yml as a comment. The reason is that if in the future someone will touch this integration and will modify the dimensions fully understands why it was set for it.
For the username: My understanding is, this applies if the exact same query at the exact same time is run and ingested. For monitoring purpose, I wonder what the use case is to do the same queries with 2 different users. For other data, I could see the use case that different data is returned for the same user with different permissions for the same query. But I would expect in the monitoring use case, the user that runs the query has access to all the monitoring data?
I see the theoretical reason behind adding username but I don't fully understand yet how it would happen in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, it would be good to display fields that are dimensions in the documentation showing "Exported Fields" .
@ruflin , do you agree?
Adding one more column would further limit the space available for showing Descriptions. Column - "Type" have values such as "keyword, dimension" would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There exist one more reason why username is a dimension field. There exist a query that returns session count grouped by username and machine (machine name).
If there exist a N-tier application having multiple webserver / application server connecting to the database layer, having machine name, in addition to the username helps.
I would need to add the machine name (machine) additionally as the dimension field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixbarny , If i add ephermal_id of agent as a dimension field, wouldn't that create a duplicate time series data ( case when two or more agents receive same policy) ?
Please correct me if i misinterpreted your comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruflin , All metric types under Oracle datastreams are of type
gauge
.I tested the visualisation having
I didn't notice any issues.
There exist a plan to test against every time_series_metric_type. But, it may not be possible with Oracle Integration. A different integration must be picked, probably an integration based on prometheus (eg: Influxdb). Will share the details soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good do hear everything just kept working.
One interesting bit to test later would be on what happens if TSDB is enabled with
gauge
and later is turned off again. I expect everything still keeps working.The test you did above I assume is first you had it not set and then set? Did you mix both data or you only had either / or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't have the time_series_metric value set and then i set it to
gauge
When i did the visualisation testing, i have some timeseries without having
time_series_metric
set and few timeseries record withtime_series_metric
values set. It didn't show any problems in Metric type, Area Chart and Line chart based views.I would test for the scenario you mentioned above and share the outcome.