Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Prototype] Convert Oracle Performance Datastream to TSDB #4966

Closed
wants to merge 19 commits into from
Closed
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions packages/oracle/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "1.10.1"
changes:
- description: TSDB prototyping for performance datastream.
type: enhancement
link: https://github.com/elastic/integrations/pull/4461
- version: "1.10.0"
changes:
- description: Update package to ECS 8.6.0.
Expand Down
1 change: 1 addition & 0 deletions packages/oracle/data_stream/memory/fields/ecs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,6 @@
name: ecs.version
- external: ecs
name: service.address
dimension: true
- external: ecs
name: service.type
4 changes: 4 additions & 0 deletions packages/oracle/data_stream/memory/manifest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,7 @@ streams:
show_user: false
default:
- oracle_memory_metrics

elasticsearch:
index_mode: "time_series"
source_mode: "synthetic"
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,6 @@ processors:
field: sql.driver
ignore_missing: true
ignore_failure: true
- remove:
field: sql.query
ignore_missing: true
ignore_failure: true
- rename:
field: sql
target_field: oracle
Expand Down Expand Up @@ -128,6 +124,15 @@ processors:
target_field: oracle.performance.wait.wait_class
ignore_missing: true
ignore_failure: true
- fingerprint:
fields: ["oracle.query"]
target_field: oracle.performance.query_id
ignore_failure: true
ignore_missing: true
- remove:
field: oracle.query
ignore_missing: true
ignore_failure: true
- foreach:
field: oracle.performance
ignore_missing: true
Expand Down Expand Up @@ -169,4 +174,4 @@ processors:
on_failure:
- set:
field: error.message
value: "{{ _ingest.on_failure_message }}"
value: "{{ _ingest.on_failure_message }}"
5 changes: 5 additions & 0 deletions packages/oracle/data_stream/performance/fields/ecs.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
- external: ecs
name: host
- external: ecs
name: host.ip
- external: ecs
name: ecs.version
- external: ecs
name: service.address
dimension: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is the only field set as dimension. You had host.name below but commented out. Can you share a bit background on why? What happens if this runs under k8s? Are there additional dimensions needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should always add a dimension that uniquely identifies the shipper, something like an ephemeral_id, to avoid that two senders can send data to the same time series.

Copy link
Contributor Author

@agithomas agithomas Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A service.address will have the DSN value that would uniquely identify a database connection. An Oracle DSN comprise of 3 parts - hostname name/ cluster URL, port and database name.

Since oracle integration is not capturing the host level metrics, service.address which identifies a unique DB connection, is sufficient enough, i believe.

I am also thinking about username to additionally include as a dimension. This is because, a database can have multiple users created under it. So, if a user configures two users to collect data from same database identified by the DSN, not including a username may lead to missing series (data). This is not a practical usecase but an exceptional handling mechanism. Currently the username field is not having value and to have it contain username value, additional changes may be needed in metricbeat code. I added the username as a dimension to handle this exception in future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the details. I suggest to add parts of this also the fields.yml as a comment. The reason is that if in the future someone will touch this integration and will modify the dimensions fully understands why it was set for it.

For the username: My understanding is, this applies if the exact same query at the exact same time is run and ingested. For monitoring purpose, I wonder what the use case is to do the same queries with 2 different users. For other data, I could see the use case that different data is returned for the same user with different permissions for the same query. But I would expect in the monitoring use case, the user that runs the query has access to all the monitoring data?

I see the theoretical reason behind adding username but I don't fully understand yet how it would happen in production.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2023-01-23 at 2 29 09 PM

I think, it would be good to display fields that are dimensions in the documentation showing "Exported Fields" .

@ruflin , do you agree?

Adding one more column would further limit the space available for showing Descriptions. Column - "Type" have values such as "keyword, dimension" would help.

Copy link
Contributor Author

@agithomas agithomas Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There exist one more reason why username is a dimension field. There exist a query that returns session count grouped by username and machine (machine name).

If there exist a N-tier application having multiple webserver / application server connecting to the database layer, having machine name, in addition to the username helps.

I would need to add the machine name (machine) additionally as the dimension field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixbarny , If i add ephermal_id of agent as a dimension field, wouldn't that create a duplicate time series data ( case when two or more agents receive same policy) ?

Please correct me if i misinterpreted your comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin , All metric types under Oracle datastreams are of type gauge.

I tested the visualisation having

  1. time_series_metric = gauge
  2. time_series_metric value not set

I didn't notice any issues.

There exist a plan to test against every time_series_metric_type. But, it may not be possible with Oracle Integration. A different integration must be picked, probably an integration based on prometheus (eg: Influxdb). Will share the details soon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good do hear everything just kept working.

One interesting bit to test later would be on what happens if TSDB is enabled with gauge and later is turned off again. I expect everything still keeps working.

The test you did above I assume is first you had it not set and then set? Did you mix both data or you only had either / or?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have the time_series_metric value set and then i set it to gauge

When i did the visualisation testing, i have some timeseries without having time_series_metric set and few timeseries record with time_series_metric values set. It didn't show any problems in Metric type, Area Chart and Line chart based views.

I would test for the scenario you mentioned above and share the outcome.

- external: ecs
name: service.type
- external: ecs
name: host.name
11 changes: 10 additions & 1 deletion packages/oracle/data_stream/performance/fields/fields.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
- name: oracle.query
type: keyword
dimension: true
- name: oracle.performance
type: group
release: beta
fields:
- name: query_id
type: keyword
dimension: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that dimension values are limited to 1024. IIRC, documents that exceed that value are rejected. It seems like the raw query can easily get over the limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes true. I plan to convert query to a hash value and use the hash value as the dimension field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this data stream contain event based data, similar to a slow log which has an entry for each individual slow execution of a query? Or is it a summary of the statistics for each query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queries are for returning the summary statistics. Reporting Slow running queries are beyond the scope of Oracle integration.

- name: machine
type: keyword
dimension: true
description: |
Operating system machine name.
- name: buffer_pool
Expand All @@ -12,6 +19,7 @@
Name of the buffer pool in the instance.
- name: username
type: keyword
dimension: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the username a dimension? If the username changes, should it be a different time series?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why username is added is mentioned above

description: |
Oracle username
- name: io_reloads
Expand Down Expand Up @@ -151,9 +159,10 @@
unit: s
description: Amount of time spent in the wait class by the session.
- name: total_waits
type: double
type: integer
metric_type: counter
description: Number of times waits of the class occurred for the session.
- name: wait_class
type: keyword
description: Every wait event belongs to a class of wait event. Wait classes can be one of the following - Administrative, Application, Cluster, Commit, Concurrency, Configuration, Idle, Network, Other, Scheduler, System IO, User IO
dimension: true
4 changes: 4 additions & 0 deletions packages/oracle/data_stream/performance/manifest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,7 @@ streams:
show_user: false
default:
- oracle_performance

elasticsearch:
index_mode: "time_series"
source_mode: "synthetic"
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,6 @@
name: ecs.version
- external: ecs
name: service.address
dimension: true
- external: ecs
name: service.type
1 change: 1 addition & 0 deletions packages/oracle/data_stream/tablespace/fields/ecs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,6 @@
name: ecs.version
- external: ecs
name: service.address
dimension: true
- external: ecs
name: service.type
1 change: 1 addition & 0 deletions packages/oracle/data_stream/tablespace/fields/fields.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
description: Tablespace unique identifier.
- name: name
type: keyword
dimension: true
description: Filename of the data file
- name: size
type: group
Expand Down
6 changes: 5 additions & 1 deletion packages/oracle/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1293,7 +1293,9 @@ Performance metrics give an overview of where time is spent in the system and en
| ecs.version | ECS version this event conforms to. `ecs.version` is a required field and must exist in all events. When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events. | keyword | | |
| event.dataset | Event module | constant_keyword | | |
| event.module | Event module | constant_keyword | | |
| host | A host is defined as a general computing instance. ECS host.\* fields should be populated with details about the host on which the event happened, or from which the measurement was taken. Host types include hardware, virtual machines, Docker containers, and Kubernetes nodes. | group | | |
| host.ip | Host ip addresses. | ip | | |
| host.name | Name of the host. It can contain what `hostname` returns on Unix systems, the fully qualified domain name, or a name specified by the user. The sender decides which value to use. | keyword | | |
| oracle.performance.buffer_pool | Name of the buffer pool in the instance. | keyword | | |
| oracle.performance.cache.buffer.hit.pct | The cache hit ratio of the specified buffer pool. | double | percent | gauge |
| oracle.performance.cache.get.consistent | Consistent gets statistic. | long | | gauge |
Expand All @@ -1313,15 +1315,17 @@ Performance metrics give an overview of where time is spent in the system and en
| oracle.performance.lock_requests | Average of the ratio between 'gethits' and 'gets', where 'gethits' the number of times an object's handle was found in memory and 'gets' is the number of times a lock was requested for objects of this namespace. | double | | gauge |
| oracle.performance.machine | Operating system machine name. | keyword | | |
| oracle.performance.pin_requests | Average of all pinhits/pins ratios, where 'PinHits' is the number of times all of the metadata pieces of the library object were found in memory and 'pins' is the number of times a PIN was requested for objects of this namespace. | double | | gauge |
| oracle.performance.query_id | | keyword | | |
| oracle.performance.session_count.active | Total count of sessions. | double | | gauge |
| oracle.performance.session_count.inactive | Total count of Inactive sessions. | double | | gauge |
| oracle.performance.session_count.inactive_morethan_onehr | Total inactive sessions more than one hour. | double | | gauge |
| oracle.performance.username | Oracle username | keyword | | |
| oracle.performance.wait.pct_time | Percentage of time waits that are not Idle wait class. | double | percent | gauge |
| oracle.performance.wait.pct_waits | Percentage of number of pct time waits that are not of Idle wait class. | double | percent | gauge |
| oracle.performance.wait.time_waited_secs | Amount of time spent in the wait class by the session. | double | s | gauge |
| oracle.performance.wait.total_waits | Number of times waits of the class occurred for the session. | double | | counter |
| oracle.performance.wait.total_waits | | integer | | counter |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the description disappeared?

| oracle.performance.wait.wait_class | Every wait event belongs to a class of wait event. Wait classes can be one of the following - Administrative, Application, Cluster, Commit, Concurrency, Configuration, Idle, Network, Other, Scheduler, System IO, User IO | keyword | | |
| oracle.query | | keyword | | |
| service.address | Address where data about this service was collected from. This should be a URI, network address (ipv4:port or [ipv6]:port) or a resource path (sockets). | keyword | | |
| service.type | The type of the service data is collected from. The type can be used to group and correlate logs and metrics from one service type. Example: If logs or metrics are collected from Elasticsearch, `service.type` would be `elasticsearch`. | keyword | | |

Expand Down
Binary file modified packages/oracle/img/Oracle-memory-dashboard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified packages/oracle/img/Oracle-performance-dashboard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file modified packages/oracle/img/Oracle-tablespace-dashboard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading