Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: add option to enable monitoring stack #2927

Merged
merged 3 commits into from
Sep 27, 2024
Merged

ci: add option to enable monitoring stack #2927

merged 3 commits into from
Sep 27, 2024

Conversation

gartnera
Copy link
Member

@gartnera gartnera commented Sep 26, 2024

Description

Add option to enable monitoring stack in CI. Expose grafana URL via cloudflared. This allows you to easily browse the logs and metrics during a CI run.

Screen.Recording.2024-09-26.at.12.37.09.PM.mov

Summary by CodeRabbit

  • New Features

    • Introduced a new input parameter enable-monitoring for GitHub Actions workflows, allowing dynamic control of monitoring features.
    • Added a new service grafana-cloudflared to enhance monitoring capabilities.
  • Updates

    • Updated image sources for grafana, prometheus, loki, and promtail services to a new repository for improved reliability.

@gartnera gartnera added no-changelog Skip changelog CI check ENABLE_MONITORING Enable monitoring stack in CI E2E runs labels Sep 26, 2024
Copy link
Contributor

coderabbitai bot commented Sep 26, 2024

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough
📝 Walkthrough

Walkthrough

The pull request introduces significant updates to GitHub Actions workflows and a Docker Compose configuration. It adds a new required input parameter enable-monitoring to the workflows, which allows conditional execution of monitoring features. Additionally, the Docker Compose file updates the image sources for several services to a new repository and introduces a new service for Cloudflare tunneling to Grafana.

Changes

File Change Summary
.github/workflows/e2e.yml Added enable-monitoring input parameter (boolean, required, default: false) and output variable ENABLE_MONITORING. Updated workflow logic to use this parameter.
.github/workflows/reusable-e2e.yml Introduced enable-monitoring input parameter (boolean, required, default: false) with conditional steps based on its value.
contrib/localnet/docker-compose.yml Updated image sources for grafana, prometheus, loki, and promtail to ghcr.io/zeta-chain. Added new service grafana-cloudflared for Cloudflare tunneling. Simplified depends_on for prometheus.

Possibly related PRs

  • chore: upgrade grafana #2906: The changes in this PR involve updates to the grafana service in the docker-compose.yml, which may relate to the monitoring capabilities introduced in the main PR's workflow changes.

Suggested reviewers

  • fbac
  • swift1337
  • skosito
  • kingpinXD
  • ws4charlie
  • brewmaster012
  • lumtis
  • CryptoFewka
  • morde08

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the ci Changes to CI pipeline or github actions label Sep 26, 2024
@gartnera gartnera marked this pull request as ready for review September 26, 2024 19:41
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (8)
.github/workflows/reusable-e2e.yml (2)

96-100: Approve Grafana URL retrieval step with a suggestion for error handling

The step to retrieve the Grafana URL is well-implemented and aligns with the PR objective of exposing the URL through cloudflared. The grep command effectively extracts the URL from the container logs.

To enhance robustness, consider adding error handling in case the URL is not found in the logs:

 - name: Get grafana URL
   if: inputs.enable-monitoring
   run: |
-    docker logs grafana-cloudflared 2>&1 | grep -oP 'https?://\S+\.trycloudflare\.com'
+    url=$(docker logs grafana-cloudflared 2>&1 | grep -oP 'https?://\S+\.trycloudflare\.com' || echo "")
+    if [ -z "$url" ]; then
+      echo "Error: Grafana URL not found in logs"
+      exit 1
+    else
+      echo "Grafana URL: $url"
+      echo "GRAFANA_URL=$url" >> "$GITHUB_ENV"
+    fi

This change adds error handling and stores the URL as an environment variable for potential use in subsequent steps.


Documentation and Related Workflows Require Updates

The verification confirms that the reusable E2E workflow is utilized in .github/workflows/e2e.yml. Additionally, several documentation files and Makefile targets related to E2E testing and monitoring exist and may need updates to accommodate the new monitoring feature.

Action Items:

  • Update Documentation:

    • ./cmd/zetae2e/README.md
    • ./contrib/localnet/README.md
    • All relevant files within the ./docs/ directory, including:
      • ./docs/cli/zetacored/cli.md
      • ./docs/cli/zetatool/filterdeposit.md
      • ./docs/development/DEPLOY_NODES.md
      • ...and others as listed in the script output.
  • Review Related Workflows:

    • .github/workflows/e2e.yml to ensure it correctly references and utilizes the updated reusable E2E workflow.
  • Verify Makefile Targets:

    • Ensure that targets related to E2E testing and monitoring (e2e, test, monitor, etc.) are compatible with the changes in the workflow.
🔗 Analysis chain

Line range hint 1-100: Consider documentation updates and verify impact on related files

The changes effectively implement the optional monitoring stack without disrupting the existing workflow structure. The new steps are logically placed and conditionally executed.

To ensure comprehensive implementation:

  1. Update relevant documentation (e.g., README.md, docs/) to explain the new monitoring feature and how to use it.
  2. Verify if any changes are needed in related files or workflows that might use this reusable workflow.

Run the following script to check for potential impacts and documentation needs:

Please review the script output and update the relevant files as necessary.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for related files and documentation that might need updates

# Check for other workflow files that might use this reusable workflow
echo "Workflows potentially using this reusable workflow:"
grep -R "uses: ./.github/workflows/reusable-e2e.yml" .github/workflows

# Check for potential documentation files that might need updates
echo "\nPotential documentation files to update:"
find . -name "README.md" -o -path "./docs/*"

# Check for Makefile targets related to E2E testing or monitoring
echo "\nRelated Makefile targets:"
grep -n "e2e\|test\|monitor" Makefile

Length of output: 5010

🧰 Tools
🪛 actionlint

90-90: shellcheck reported issue in this script: SC2086:info:1:106: Double quote to prevent globbing and word splitting

(shellcheck)


103-103: shellcheck reported issue in this script: SC2046:warning:3:6: Quote this to prevent word splitting

(shellcheck)

contrib/localnet/docker-compose.yml (5)

316-316: Approve image source change with a minor suggestion.

The modification to use ghcr.io/zeta-chain/grafana-grafana:11.2.0 as the image source is a prudent decision, enhancing control over the deployment pipeline and potentially improving security. However, to further fortify the configuration, consider implementing image digest pinning.

Implement image digest pinning to ensure immutability:

-    image: ghcr.io/zeta-chain/grafana-grafana:11.2.0
+    image: ghcr.io/zeta-chain/grafana-grafana:11.2.0@sha256:<digest>

Replace <digest> with the actual SHA256 digest of the image.


Line range hint 336-348: Approve image source change, but address depends_on modification.

The transition to ghcr.io/zeta-chain/prom-prometheus:v2.53.1 aligns with the strategy applied to the Grafana service, promoting consistency and potentially enhancing security. However, the simplification of the depends_on configuration warrants attention.

The removal of the health check condition in the depends_on configuration may lead to race conditions during service startup. Consider reinstating the health check:

    depends_on:
-      - zetacore0
+      zetacore0:
+        condition: service_healthy

This ensures Prometheus starts only after zetacore0 is fully operational, maintaining system reliability.


379-379: Approve image source change with consistency suggestion.

The modification to use ghcr.io/zeta-chain/grafana-loki:3.1.0 as the image source is consistent with the changes made to other services, enhancing control and potential security. For uniformity across the configuration, consider implementing image digest pinning here as well.

Implement image digest pinning to ensure immutability:

-    image: ghcr.io/zeta-chain/grafana-loki:3.1.0
+    image: ghcr.io/zeta-chain/grafana-loki:3.1.0@sha256:<digest>

Replace <digest> with the actual SHA256 digest of the image.


391-391: Approve image source change with consistency suggestion.

The shift to ghcr.io/zeta-chain/grafana-promtail:2.9.9 as the image source aligns with the changes made to other services, enhancing control and potential security. To maintain consistency across the configuration, consider implementing image digest pinning here as well.

Implement image digest pinning to ensure immutability:

-    image: ghcr.io/zeta-chain/grafana-promtail:2.9.9
+    image: ghcr.io/zeta-chain/grafana-promtail:2.9.9@sha256:<digest>

Replace <digest> with the actual SHA256 digest of the image.


404-421: Approve addition of grafana-cloudflared service with minor suggestion.

The introduction of the grafana-cloudflared service is a commendable enhancement, facilitating secure exposure of the Grafana interface through a Cloudflare tunnel. This addition aligns well with modern security practices by mitigating direct internet exposure of the Grafana port.

For consistency with other services and to ensure immutability, consider implementing image digest pinning:

-    image: ghcr.io/zeta-chain/cloudflare-cloudflared:2024.9.1
+    image: ghcr.io/zeta-chain/cloudflare-cloudflared:2024.9.1@sha256:<digest>

Replace <digest> with the actual SHA256 digest of the image.

Additionally, to enhance the robustness of the service startup process, consider adding a health check:

    healthcheck:
      test: ["CMD", "cloudflared", "version"]
      interval: 30s
      timeout: 10s
      retries: 3

This ensures the service is fully operational before being considered ready.

.github/workflows/e2e.yml (1)

48-48: ENABLE_MONITORING output is correctly implemented, but consider handling all event types.

The new output variable and its setting logic are correctly implemented for pull requests and workflow dispatch events. However, to ensure completeness and consistency, consider handling the ENABLE_MONITORING output for all event types.

Add the following lines to handle other event types:

 } else if (context.eventName === 'merge_group') {
   core.setOutput('DEFAULT_TESTS', true);
   core.setOutput('UPGRADE_LIGHT_TESTS', true);
+  core.setOutput('ENABLE_MONITORING', false);
 } else if (context.eventName === 'push' && context.ref === 'refs/heads/develop') {
   core.setOutput('DEFAULT_TESTS', true);
+  core.setOutput('ENABLE_MONITORING', false);
 } else if (context.eventName === 'push' && context.ref.startsWith('refs/heads/release/')) {
   core.setOutput('DEFAULT_TESTS', true);
   // ... (other outputs)
+  core.setOutput('ENABLE_MONITORING', false);
 } else if (context.eventName === 'schedule') {
   core.setOutput('DEFAULT_TESTS', true);
   // ... (other outputs)
+  core.setOutput('ENABLE_MONITORING', false);
 }

This ensures that the ENABLE_MONITORING output is explicitly set for all event types, maintaining consistency and preventing potential undefined behavior.

Also applies to: 75-75, 120-120

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 2892114 and f62386a.

📒 Files selected for processing (3)
  • .github/workflows/e2e.yml (5 hunks)
  • .github/workflows/reusable-e2e.yml (2 hunks)
  • contrib/localnet/docker-compose.yml (6 hunks)
🧰 Additional context used
🪛 actionlint
.github/workflows/reusable-e2e.yml

26-26: input "enable-monitoring" of workflow_call event has the default value "false", but it is also required. if an input is marked as required, its default value will never be used

(events)


90-90: shellcheck reported issue in this script: SC2086:info:1:106: Double quote to prevent globbing and word splitting

(shellcheck)

🔇 Additional comments (2)
.github/workflows/e2e.yml (2)

20-24: Input parameter enable-monitoring is well-defined.

The new input parameter is correctly implemented as a boolean with an appropriate default value of false. This addition aligns with the PR objective of introducing an option to enable the monitoring stack.


175-175: Reusable workflow input enable-monitoring is correctly implemented.

The enable-monitoring input for the reusable workflow is properly set based on the output from the matrix-conditionals job. The boolean conversion ensures that a valid boolean value is passed to the workflow.

.github/workflows/reusable-e2e.yml Outdated Show resolved Hide resolved
.github/workflows/reusable-e2e.yml Show resolved Hide resolved
Copy link
Contributor

@morde08 morde08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@lumtis lumtis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks cool

.github/workflows/reusable-e2e.yml Outdated Show resolved Hide resolved
@gartnera gartnera enabled auto-merge September 27, 2024 15:52
@gartnera gartnera added this pull request to the merge queue Sep 27, 2024
Merged via the queue into develop with commit 0e5bca6 Sep 27, 2024
32 checks passed
@gartnera gartnera deleted the monitoring-ci branch September 27, 2024 16:22
morde08 pushed a commit that referenced this pull request Oct 2, 2024
* ci: add option to enable monitoring stack

* start prometheus faster

* update
github-merge-queue bot pushed a commit that referenced this pull request Oct 3, 2024
…guration (#2953)

* update artillery config

* more fixes

* feat: integrate authenticated calls smart contract functionality into protocol (#2904)

* e2e tests and modifications for authenticated call

* extend test with sender check and revert case

* separate tests into separate files

* cleanup

* withdraw and call support and tests

* bump protocol contracts

* split tests into separate files

* small cleanup

* fmt

* generate

* lint

* changelog

* PR  comments

* fix case in proto

* bump vote inbound gas limit in zetaclient

* fix test

* generate

* fixing tests

* call options non empty

* generate

* test fix

* rename gateway caller

* pr comments rename tests

* PR comment

* generate

* tests

* update tests fixes

* tests fixes

* fix

* test fix

* feat!: bank precompile (#2860)

* feat: bank precompile

* feat: add deposit

* feat: extend deposit

* PoC: spend amount on behalf of EOA

* feat: expand deposit with transferFrom

* use CallEVM instead on ZRC20 bindings

* divide the contract into different files

* initialize e2e testing

* remove duplicated funding

* add codecov

* expand e2e

* fix: wait for deposit tx to be mined

* apply first round of reviews

* cover al error types test

* fixes using time.Since

* Include CallContract interface

* fix eth events in deposit precompile method

* emit Deposit event

* add withdraw function

* finalize withdraw

* pack event arguments generically

* add high level event function

* first round of review fixes

* second round of reviews

* create bank account when instantiating bank

* e2e: add good and bad scenarios

* modify fmt

* chore: group input into eventData struct

* docs: document bank's methods

* chore: generate files with suffix .gen.go

* chore: assert errors with errorIs

* chore: reset e2e test by resetting allowance

* test: add first batch of unit test

* test: cover all cases

* test: complete unit test cases

* include review suggestions

* include e2e through contract

* test: add e2e through contract complete

* test: revert balance between tests

* Update precompiles/bank/const.go

Co-authored-by: Lucas Bertrand <lucas.bertrand.22@gmail.com>

* fix: changed coin denom

---------

Co-authored-by: skosito <skostic9242@gmail.com>
Co-authored-by: Lucas Bertrand <lucas.bertrand.22@gmail.com>

* feat: add sender to revert context (#2919)

* e2e tests and modifications for authenticated call

* extend test with sender check and revert case

* separate tests into separate files

* cleanup

* withdraw and call support and tests

* bump protocol contracts

* split tests into separate files

* small cleanup

* fmt

* generate

* lint

* changelog

* PR  comments

* fix case in proto

* bump vote inbound gas limit in zetaclient

* fix test

* generate

* fixing tests

* call options non empty

* generate

* test fix

* rename gateway caller

* pr comments rename tests

* PR comment

* generate

* tests

* add sender in test contract

* extend e2e tests

* generate

* changelog

* PR comment

* generate

* update tests fixes

* tests fixes

* fix

* test fix

* gas limit fixes

* PR comment fix

* fix bad merge

* ci: add option to enable monitoring stack (#2927)

* ci: add option to enable monitoring stack

* start prometheus faster

* update

* ci: add rpcimportable test (#2817)

* ci: add rpcimportable test

* add ci

* fmt

* use github.com/btcsuite/btcd/btcutil in pkg/chains

* remove app imports types tests

* use standalone sdkconfig package

* fix policies test

* move crosschain keeper tests from types to keeper

* never seal config in tests

* use zeta-chain/ethermint#126

* add some comments

* use merged ethermint hash

* show resulting go.mod

* ci: Add SARIF upload to GitHub Security Dashboard (#2929)

* add semgrep sarif upload to GHAS

* added comment to clairfy the usage of the utility script

* use ghcr.io instead

* add tag to image

* bad org name

---------

Co-authored-by: jkan2 <5862123+jkan2@users.noreply.github.com>

* fix: add recover to InitChainer to diplay informative message when starting a node from block 1 (#2925)

* add recover to InitChainer

* generate files

* add docs link to error message

* move InitChainErrorMessage to app.go

* Update app/app.go

Co-authored-by: Francisco de Borja Aranda Castillejo <borja@zetachain.com>

* use const for message

---------

Co-authored-by: Francisco de Borja Aranda Castillejo <borja@zetachain.com>

* test: add wait for block to tss migration test (#2931)

* add wait for block to tss migration test

* add comments

* refactor identifiers

* rename checkNumberOfTssGenerated to checkNumberOfTSSGenerated

* chore: allow full zetaclient config overlay (#2945)

* test(e2e): add gateway upgrade in upgrade test (#2932)

* add gateway upgrade

* change reference

* add v2 setup for all tests

* test v2 in light upgrade

* refactor setup to use custody v2 directly

* chore: improve localnet build performance (#2928)

* chore: improve localnet build performance

* propagate NODE_VERSION and NODE_COMMIT

* update hashes

---------

Co-authored-by: skosito <skostic9242@gmail.com>
Co-authored-by: Francisco de Borja Aranda Castillejo <borja@zetachain.com>
Co-authored-by: Lucas Bertrand <lucas.bertrand.22@gmail.com>
Co-authored-by: Alex Gartner <alexg@zetachain.com>
Co-authored-by: jkan2 <jkan2@users.noreply.github.com>
Co-authored-by: jkan2 <5862123+jkan2@users.noreply.github.com>
Co-authored-by: Tanmay <tanmay@zetachain.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Changes to CI pipeline or github actions ENABLE_MONITORING Enable monitoring stack in CI E2E runs no-changelog Skip changelog CI check
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants