Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RethinkDB integration #5715

Merged
merged 165 commits into from
Mar 27, 2020
Merged

Add RethinkDB integration #5715

merged 165 commits into from
Mar 27, 2020

Conversation

florimondmanca
Copy link
Contributor

@florimondmanca florimondmanca commented Feb 12, 2020

What does this PR do?

Add a new integration for RethinkDB.

Work in progress.

Items still TODO:

  • Investigate using QueryManager for default metrics => Created a DocumentQuery helper for querying metrics from JSON DBs
  • Collect "config totals" metrics.
  • Version metadata.
  • Authentication.
  • TLS support.
  • Collect "current issues" metrics.
  • Custom tags.
  • E2E.
  • Write up README.md.
  • Config spec.
  • metadata.csv.
  • manifest.json.
  • service_checks.json.

Later:

  • Logs.
  • OTOB dashboard.

Motivation

Allow users to monitor RethinkDB clusters with Datadog.

Additional Notes

To run the check locally:

# Must use --dev to install dependencies (they're not yet in master)
ddev env start --dev rethinkdb py38-2.3
ddev env check rethinkdb py38-2.3

Metadata generation

# Generate metadata.csv
cat ../architecture/rfcs/agent-integrations/rethinkdb.md | python rfc_md_to_metadata_csv.py > rethinkdb/metadata.csv

# Edit `rethinkdb.py` by wrapping `config.collect_metrics(conn)` in `dump_metrics()`.
# Then run a check to dump metrics.
ddev test -pa="tests/test_rethinkdb.py::test_check" rethinkdb:py38-2.3

# Compare submitted metrics with metadata.csv
python validate_metrics.py rethinkdb/metadata.csv rethinkdb/metrics.csv
Click to expand and see source code
  • rfc_md_to_metadata_csv.py: generate contents of metadata.csv from tables in the ### Metrics section of an RFC.
import csv
import io
import sys

import bs4
import httpx

from datadog_checks.dev.tooling.config import load_config
from datadog_checks.dev.tooling.github import get_auth_info


def markdown_to_html(markdown: str) -> str:
    config = load_config()

    url = "https://api.github.com/markdown"
    payload = {"text": markdown}
    auth = get_auth_info(config)

    response = httpx.post(url, json=payload, auth=auth)
    assert response.status_code == 200, response.json()

    return response.text


def extract_metrics_tables(html: str) -> str:
    soup = bs4.BeautifulSoup(html, 'html.parser')

    anchors = soup.select("h3 a[href='#metrics']")
    assert len(anchors) == 1
    h3 = anchors[0].parent

    tables = []

    node = h3

    while True:
        node = node.next_sibling

        if node is None:
            break

        if isinstance(node, bs4.Tag):
            if node.name == 'h3':
                break
            if node.name == 'table':
                tables.append(str(node))

    return '\n'.join(tables)


def html_table_to_csv(html: str) -> str:
    soup = bs4.BeautifulSoup(html, 'html.parser')
    tables = soup.find_all('table')

    fieldnames = None
    output = io.StringIO()
    writer = csv.writer(output)

    for table in tables:
        head = [th.text.lower().replace(' ', '_') for th in table.select('thead tr th')]

        if fieldnames is None:
            fieldnames = head
            writer.writerow(fieldnames)
        else:
            assert head == fieldnames, f"Table headers mismatch: {head} (expected {fieldnames})"

        rows = [[td.text for td in tr.find_all("td")] for tr in table.select("tbody tr")]
        writer.writerows(rows)

    assert fieldnames is not None

    return output.getvalue()


def rfc_csv_to_metadata_csv(text: str, integration: str) -> str:
    reader = csv.DictReader(text.splitlines())
    assert set(reader.fieldnames).issubset({'name', 'type', 'unit', 'per_unit', 'description'}), reader.fieldnames

    output = io.StringIO()
    fields = (
        'metric_name,metric_type,interval,unit_name,per_unit_name,description,orientation,integration,short_name'
    ).split(',')
    writer = csv.DictWriter(output, delimiter=',', fieldnames=fields)
    writer.writeheader()

    for row in reader:
        row = {
            'metric_name': row['name'],
            'metric_type': row['type'],
            'interval': '',
            'unit_name': row['unit'],
            'per_unit_name': row.get('per_unit'),
            'description': row['description'],
            'orientation': '0',
            'integration': integration,
            'short_name': row['name'].replace(f'{integration}.', '').capitalize().replace('.', ' ').replace('_', ' ')
        }
        writer.writerow(row)

    return output.getvalue()


def main() -> None:
    text = sys.stdin.read()
    text = markdown_to_html(text)
    text = extract_metrics_tables(text)
    text = html_table_to_csv(text)
    text = rfc_csv_to_metadata_csv(text, integration='rethinkdb')
    print(text)


if __name__ == "__main__":
    main()
  • dump_metrics(): wrap the stream of RethinkDB metrics to write submitted metrics to a CSV file
def dump_metrics(filename, metrics):
    # type: (str, Iterator[Metric]) -> Iterator[Metric]
    import csv

    seen = set()

    with open(filename, 'w') as f:
        writer = csv.DictWriter(f, fieldnames=['name', 'type'])
        writer.writeheader()

        for metric in metrics:
            name = metric['name']
            typ = metric['type']  # type: str

            if typ == 'monotonic_count':
                typ = 'count'

            row = {'name': name, 'type': typ}
            key = (name, typ)

            if typ != 'service_check' and key not in seen:
                writer.writerow(row)
                seen.add(key)

            yield metric
  • validate_metrics.py: compare metadata.csv with a dump of metrics generated when running a check
import csv
import sys


if __name__ == "__main__":
    metadata_dot_csv, metrics_dot_csv = sys.argv[1:3]

    with open(metadata_dot_csv) as f:
        reader = csv.DictReader(f)
        metadata_metrics = {(row['metric_name'], row['metric_type']) for row in reader}

    with open(metrics_dot_csv) as f:
        reader = csv.DictReader(f)
        metrics = {(row['name'], row['type']) for row in reader}

    assert metrics.issubset(metadata_metrics), metrics - metadata_metrics

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have changelog/ and integration/ labels attached

@codecov
Copy link

codecov bot commented Feb 12, 2020

Codecov Report

Merging #5715 into master will not change coverage by %.
The diff coverage is n/a.

* Modifiers -> Transformers
* Add docs on `DocumentQuery` parameters and usage.
* Add and test an example script for `DocumentQuery`.
* Drop hard requirement for a logger on `query.run()`.
* Drop trace logs (too noisy to be debug logs).
@florimondmanca
Copy link
Contributor Author

@AlexandreYang Thanks, addressed your feedback, see details in 3b65f86 :-)

AlexandreYang
AlexandreYang previously approved these changes Mar 26, 2020
Copy link
Member

@AlexandreYang AlexandreYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 , thx for all the changes :)

@ofek
Copy link
Contributor

ofek commented Mar 26, 2020

lgtm but tests failing

AlexandreYang
AlexandreYang previously approved these changes Mar 26, 2020
l0k0ms
l0k0ms previously approved these changes Mar 27, 2020
@florimondmanca florimondmanca merged commit c38938d into master Mar 27, 2020
@florimondmanca florimondmanca deleted the florimondmanca/rethinkdb branch March 27, 2020 15:29
@florimondmanca florimondmanca mentioned this pull request Dec 4, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants