Skip to content

Commit

Permalink
Add IBM i integration (#9992)
Browse files Browse the repository at this point in the history
* Add integration for IBM i

Co-Authored-By: Pablo Baeyens <pbaeyens31+github@gmail.com>

* Fix QueryManager init

* Add CPU usage & job status queries

* Add subsystem info

* Change log to warning

* Add service check
We report the check as OK if the connection is successful,
and CRITICAL if the connection fails.

* Fix style

* Rename disk usage

* Add logic to get hostname from the remote system

* Fix ibmi.job.cpu_usage, add ibmi.job.active

* Add ibmi.subsystem.active metric

* Add memory metrics

* Add check duration debug-level log

* Coherent tag names

* Add job queue queries

* ASP metadata

* Only run subsystem queries on 7.3+

* Reset connection when there is an error while executing queries

* Fix linter errors

* Refactor query manager initialization

* Use NamedTuple for system info

* Start adding unit tests

* Test fixes

* Fix remaining __delete_connection calls

* Add system message queues info

* Add check duration metric

* Update message queue query

* Refactor message queue queries

* Update job cpu metrics: better calculation, split job name

* Fix style issues

* Fetch system info tests

* Add unit tests for some failure conditions

* Update metadata.csv

* Rename prefix to ibm_i.

* Add complete metadata

* Fix error messages

* Add IBM MQ queue size metric

* Add feature flag for IBM MQ metrics

* Fix connection string

* Update active job query

* Fix severity comparison

* Linting

* Add per-job status in JOBQ

* Fix job.active tagging regression for active jobs

* Add disk usage metrics for 7.3+

* Update tests with the correct number of queries

* Make severity level threshold configurable

* Change ibm_i.job.active to ibm_i.job.status

* Coherent tag names

* Remove superfluous and costly JobsInJobQueueInfo query

* Work around timeout issues in IBM i driver (#9720)

* Add 'timeout' field to QueryManager queries

* Run query on a subprocess to handle timeouts

(This is a squashed commit, see original commits on kserrania/IBM-i-query-script-backup)

Co-Authored-By: Kylian Serrania <kylian.serrania@datadoghq.com>

Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>

* Fix unit tests

Run `ddev validate config -s ibm_i`

* Report active and jobq durations

* Fix disk query on IBM i 7.2

* Add metadata for new metrics

* Fix validation issues

`ddev validate models -s ibm_i`
Fix remaining validation issues

* Add Agent Platform as CODEOWNERS

* Remove TODO for sys.executable

* Remove `ibm_i.check.duration` metric

* Remove `ibm_i.ibm_mq.size`

* Document service check

* Document the IBM i ODBC driver set up

* Increase test coverage

* Review timeout values

Co-authored-by: Kylian Serrania <kylian.serrania@datadoghq.com>

* Apply suggestions from code review

Co-authored-by: Kylian Serrania <kylian.serrania@datadoghq.com>

* Use OpenAPI minimum/maximum for configuration validation

* [docs/architecture] Document IBM i check limitations

* Apply suggestions from docs review

Co-authored-by: ruthnaebeck <19349244+ruthnaebeck@users.noreply.github.com>

* Address remaining comments from docs review
Run `ddev validate config -s ibm_i`

Co-authored-by: Ofek Lev <ofekmeister@gmail.com>
Co-authored-by: Kylian Serrania <kylian.serrania@datadoghq.com>
Co-authored-by: ruthnaebeck <19349244+ruthnaebeck@users.noreply.github.com>
(cherry picked from commit 0fa9e60)
  • Loading branch information
mx-psi committed Oct 12, 2021
1 parent 402598e commit df79cef
Show file tree
Hide file tree
Showing 32 changed files with 1,494 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .azure-pipelines/templates/test-all-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,9 @@ jobs:
- checkName: ibm_db2
displayName: IBM Db2
os: linux
- checkName: ibm_i
displayName: IBM i
os: linux
- checkName: ibm_mq
displayName: IBM MQ
os: linux
Expand Down
9 changes: 9 additions & 0 deletions .codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,10 @@ coverage:
target: 75
flags:
- ibm_was
IBM_i:
target: 75
flags:
- ibm_i
IIS:
target: 75
flags:
Expand Down Expand Up @@ -818,6 +822,11 @@ flags:
paths:
- ibm_db2/datadog_checks/ibm_db2
- ibm_db2/tests
ibm_i:
carryforward: true
paths:
- ibm_i/datadog_checks/ibm_i
- ibm_i/tests
ibm_mq:
carryforward: true
paths:
Expand Down
2 changes: 2 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ assets/ @DataDog/agent-integrations
/process/ @DataDog/agent-integrations @DataDog/agent-platform
/otel/ @DataDog/agent-integrations @DataDog/agent-platform
/nvidia_jetson/ @DataDog/agent-integrations @DataDog/agent-platform
/ibm_i/ @DataDog/agent-integrations @DataDog/agent-platform
/ibm_i/*.md @DataDog/agent-integrations @DataDog/agent-platform @DataDog/documentation


# Database monitoring
Expand Down
36 changes: 36 additions & 0 deletions docs/developer/architecture/ibm_i.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# IBM i

!!! note
This section is meant for developers that want to understand the working of the IBM i integration.

## Overview

The IBM i integration uses [ODBC][1] to connect to IBM i hosts and
query system data through an SQL interface. To do so, it uses the [*ODBC Driver for IBM i Access Client Solutions*][2], an IBM propietary [ODBC driver][3] that manages connections to IBM i hosts.

Limitations in the IBM i ODBC driver make it necessary to structure the check in a more complex way than would be expected, to avoid the check from hanging or leaking threads.

### IBM i ODBC driver limitations

ODBC drivers can optionally support custom configuration through *connection attributes*, which help configure how a connection works.
One fundamental connection attribute is `SQL_ATTR_QUERY_TIMEOUT` (and related `_TIMEOUT` attributes), which set the timeout for SQL queries done through the driver (or the timeout for other connection steps for other `_TIMEOUT` attributes).
If this connection attribute is not set there is no timeout, which means the driver gets stuck waiting for a reply when a network issue happens.

As of the writing of this document, the IBM i ODBC driver behavior when setting the `SQL_ATTR_QUERY_TIMEOUT` connection attribute is similar to the one described in [ODBC Query Timeout Property][4]. For the IBM i DB2 driver: the driver estimates the running time of a query and preemptively aborts the query if the estimate is above the specified threshold, but it does not take into account the actual running time of the query (and thus, it's not useful for avoiding network issues).

### IBM i check workaround

To deal with the OBDC driver limitations, the IBM i check needs to have an alternative way to abort a query once a given timeout has passed.
To do so, the IBM i check runs queries in a subprocess which it kills and restarts when timeouts pass. This subprocess runs [`query_script.py`][5] using the embedded Python interpreter.

It is essential that the connection is kept across queries. For a given connection, `ELAPSED_` columns on IBM i views report statistics since the last time the table was queried on that connection, thus if using different connections these values are always zero.

To communicate with the main Agent process, the subprocess and the IBM i check exchange JSON-encoded messages through pipes until the special `ENDOFQUERY` message is received. Special care is needed to avoid blocking on reads and writes of the pipes.

For adding/modifying the queries, the check uses the standard `QueryManager` class used for SQL-based checks, except that each query needs to include a timeout value (since, empirically, some queries take much longer to complete on IBM i hosts).

[1]: https://en.wikipedia.org/wiki/Open_Database_Connectivity
[2]: https://www.ibm.com/support/pages/odbc-driver-ibm-i-access-client-solutions
[3]: https://en.wikipedia.org/wiki/Open_Database_Connectivity#Drivers
[4]: https://www.ibm.com/support/pages/odbc-query-timeout-property-sql0666-estimated-query-processing-time-exceeds-limit
[5]: https://github.com/DataDog/integrations-core/blob/master/ibm_i/datadog_checks/ibm_i/query_script.py
2 changes: 2 additions & 0 deletions ibm_i/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# CHANGELOG - IBM i

10 changes: 10 additions & 0 deletions ibm_i/MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
graft datadog_checks
graft tests

include MANIFEST.in
include README.md
include requirements.in
include requirements-dev.txt
include manifest.json

global-exclude *.py[cod] __pycache__
84 changes: 84 additions & 0 deletions ibm_i/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Agent Check: IBM i

## Overview

This check monitors [IBM i][1] remotely through the Datadog Agent.

## Setup

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the [Autodiscovery Integration Templates][2] for guidance on applying these instructions.

### Installation

The IBM i check is included in the [Datadog Agent][2] package.
No additional installation is needed on your server.

#### ODBC driver

The IBM i check uses the IBM i ODBC driver to connect remotely to the IBM i host.

Download the driver from the [IBM i Access - Client Solutions][9] page. Click on `Downloads for IBM i Access Client Solutions` and login to gain access to the downloads page.

Choose the `ACS App Pkg` package for your platform, such as `ACS Linux App Pkg` for Linux hosts. Download the package and follow the installation instructions to install the driver.

### Configuration

The IBM i check queries an IBM i system remotely from a host running the Datadog Agent. To communicate with the IBM i system, you need to set up the IBM i ODBC driver on the host running the Datadog Agent.

#### ODBC driver

Once the ODBC driver is installed, find the ODBC configuration files: `odbc.ini` and `odbcinst.ini`. The location may vary depending on your system. On Linux they may be located in the `/etc` directory or in the `/etc/unixODBC` directory.

Copy these configuration files to the embedded Agent environment, such as `/opt/datadog-agent/embedded/etc/` on Linux hosts.

The `odbcinst.ini` file defines the available ODBC drivers for the Agent. Each section defines one driver. For instance, the following section defines a driver named `IBM i Access ODBC Driver 64-bit`:
```
[IBM i Access ODBC Driver 64-bit]
Description=IBM i Access for Linux 64-bit ODBC Driver
Driver=/opt/ibm/iaccess/lib64/libcwbodbc.so
Setup=/opt/ibm/iaccess/lib64/libcwbodbcs.so
Threading=0
DontDLClose=1
UsageCount=1
```

The name of the IBM i ODBC driver is needed to configure the IBM i check.

#### IBM i check

1. Edit the `ibm_i.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory to start collecting your IBM i performance data. See the [sample ibm_i.d/conf.yaml][3] for all available configuration options.
Use the driver name from the `obdcinst.ini` file.

2. [Restart the Agent][4].

### Validation

[Run the Agent's status subcommand][5] and look for `ibm_i` under the Checks section.

## Data Collected

### Metrics

See [metadata.csv][6] for a list of metrics provided by this check.

### Service Checks

See [service_checks.json][8] for a list of service checks provided by this integration.

### Events

The IBM i check does not include any events.

## Troubleshooting

Need help? Contact [Datadog support][7].

[1]: https://www.ibm.com/it-infrastructure/power/os/ibm-i
[2]: https://docs.datadoghq.com/agent/kubernetes/integrations/
[3]: https://github.com/DataDog/integrations-core/blob/master/ibm_i/datadog_checks/ibm_i/data/conf.yaml.example
[4]: https://docs.datadoghq.com/agent/guide/agent-commands/#start-stop-and-restart-the-agent
[5]: https://docs.datadoghq.com/agent/guide/agent-commands/#agent-status-and-information
[6]: https://github.com/DataDog/integrations-core/blob/master/ibm_i/metadata.csv
[7]: https://docs.datadoghq.com/help/
[8]: https://github.com/DataDog/integrations-core/blob/master/ibm_i/datadog_checks/ibm_i/assets/service_checks.json
[9]: https://www.ibm.com/support/pages/ibm-i-access-client-solutions
68 changes: 68 additions & 0 deletions ibm_i/assets/configuration/spec.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
name: IBM i
files:
- name: ibm_i.yaml
options:
- template: init_config
options:
- template: init_config/default
- template: instances
options:
- name: system
description: |
The name of the IBM i system.
value:
type: string
- name: username
description: |
The user profile name used to authenticate to the system.
value:
type: string
- name: password
description: |
The user profile password used to authenticate to the system.
value:
type: string
- name: driver
description: |
The name of the ODBC driver used to connect to the system.
value:
type: string
example: iSeries Access ODBC Driver
- name: connection_string
description: |
The raw connection string used to connect to the system, ignoring all of the above options.
value:
type: string
- name: severity_threshold
description: |
The minimum severity level for a message to be considered 'critical' (see ibm_i.message_queue.critical_size).
value:
type: integer
minimum: 0
maximum: 99
example: 50
- name: job_query_timeout
description: |
The timeout (in seconds) applied to queries on job views (ACTIVE_JOB_INFO, JOB_INFO) made on the IBM i system.
value:
type: integer
minimum: 0
exclusiveMinimum: true
example: 240
- name: system_mq_query_timeout
description: |
The timeout (in seconds) applied to queries on message queue views (MESSAGE_QUEUE_INFO) made on the IBM i system.
value:
type: integer
minimum: 0
exclusiveMinimum: true
example: 80
- name: query_timeout
description: |
The timeout (in seconds) applied to queries made on the IBM i system.
value:
type: integer
minimum: 0
exclusiveMinimum: true
example: 30
- template: instances/default
Empty file.
16 changes: 16 additions & 0 deletions ibm_i/assets/service_checks.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[
{
"agent_version": "7.32.0",
"integration": "IBM i",
"groups": [
"host"
],
"check": "ibm_i.can_connect",
"statuses": [
"ok",
"critical"
],
"name": "Can Connect",
"description": "Returns `CRITICAL` if the Agent is unable to connect and collect metrics from the monitored IBM i instance, otherwise returns `OK`."
}
]
4 changes: 4 additions & 0 deletions ibm_i/datadog_checks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# (C) Datadog, Inc. 2021-present
# All rights reserved
# Licensed under a 3-clause BSD style license (see LICENSE)
__path__ = __import__('pkgutil').extend_path(__path__, __name__) # type: ignore
4 changes: 4 additions & 0 deletions ibm_i/datadog_checks/ibm_i/__about__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# (C) Datadog, Inc. 2021-present
# All rights reserved
# Licensed under a 3-clause BSD style license (see LICENSE)
__version__ = '0.0.1'
7 changes: 7 additions & 0 deletions ibm_i/datadog_checks/ibm_i/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# (C) Datadog, Inc. 2021-present
# All rights reserved
# Licensed under a 3-clause BSD style license (see LICENSE)
from .__about__ import __version__
from .check import IbmICheck

__all__ = ['__version__', 'IbmICheck']
Loading

0 comments on commit df79cef

Please sign in to comment.