Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus grafana & add cirun.io #733

Merged
merged 35 commits into from
Aug 13, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
f48f364
add grafana traefik route
balast Jun 29, 2021
8144ee0
grafana working
balast Jul 14, 2021
6677c3e
prometheus-helm-chart-working
balast Jul 14, 2021
62b6be4
initial integration - wip
balast Jul 14, 2021
f200ab3
add external-url variable
balast Jul 14, 2021
1529ca9
add external-url variable
balast Jul 14, 2021
29e83a1
add tls var
balast Jul 14, 2021
43fbbc4
add tls var
balast Jul 14, 2021
d1d21bc
add tls var
balast Jul 14, 2021
dc23e26
merge with main
balast Jul 14, 2021
06c7cac
cluster monitoring docs
balast Jul 14, 2021
ddb28e9
fix debug change
balast Jul 14, 2021
6e9e4d4
fix formatting, delete ingress
balast Jul 14, 2021
b9b0eb5
add monitoring by default, fix routing service name
balast Jul 14, 2021
169766d
terraform format
balast Jul 14, 2021
f23e99d
Update monitoring instructions
Adam-D-Lewis Jul 14, 2021
c32074d
don't include helm chart in repo
balast Jul 16, 2021
9da48ee
Merge branch 'prometheus_grafana' of github.com:Quansight/qhub into p…
balast Jul 16, 2021
47be777
terraform format
balast Jul 16, 2021
a851719
terraform format
balast Jul 16, 2021
d08db3f
add the values file back
balast Jul 16, 2021
689ca2a
remove values files
Aug 2, 2021
6d25c36
terraform fmt
Aug 2, 2021
f30c0bf
terraform fmt
Aug 2, 2021
5130237
Merge branch 'main' into prometheus_grafana
Aug 2, 2021
59b0ef0
Merge remote-tracking branch 'origin/main' into prometheus_grafana
Aug 10, 2021
1bf1d19
up minikube memory
Aug 10, 2021
120321e
set CI minikube memory to 6500mb
Aug 10, 2021
6300040
move kubernetes tests to new file
Aug 12, 2021
844aa52
use self-hosted action runner (cirun.io)
Aug 12, 2021
7cb1247
add .cirun.yml
Aug 12, 2021
063ce2a
Misc fixes
aktech Aug 12, 2021
439ea44
Install cypress after k8s tests
aktech Aug 13, 2021
4b8ad35
use cheapest acceptable DO droplet
Aug 13, 2021
5d0e799
add release notes
Aug 13, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .cirun.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Self-Hosted Github Action Runners on DigitalOcean via Cirun.io
# Reference: https://docs.cirun.io/reference/yaml.html
runners:
- name: run-k8s-tests
# Cloud Provider: DigitalOcean
cloud: digitalocean
# Cheapest VM on DigitalOcean
instance_type: s-4vcpu-8gb
# Ubuntu-20.4 image"
machine_image: docker-20-04
region: nyc1
# Path of the relevant workflow file
workflow: .github/workflows/kubernetes_test.yaml
# Number of runners to provision on every trigger on Actions job
count: 1
150 changes: 150 additions & 0 deletions .github/workflows/kubernetes_test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
name: "Kubernetes Tests"

on:
pull_request: # Workflow only runs for PR against main anyway
push:
branches:
- '**'
tags:
- 'v*'
paths-ignore:
- "docs/**"
- "*.md"
jobs:
test-kubernetes:
name: "Kubernetes Tests"
runs-on: self-hosted
defaults:
run:
shell: bash -l {0}
steps:
- name: 'QHUB_GH_BRANCH set for PR'
if: ${{ github.event_name == 'pull_request' }}
run: |
echo "QHUB_GH_BRANCH=${GITHUB_HEAD_REF}" >> $GITHUB_ENV
echo "GITHUB_BASE_REF: ${GITHUB_BASE_REF}"
echo "GITHUB_HEAD_REF: ${GITHUB_HEAD_REF}"
echo "GITHUB_REF: ${GITHUB_REF}"
- name: 'QHUB_GH_BRANCH set for a branch (not a tag)'
if: ${{ github.event_name == 'push' && startsWith(github.ref, 'refs/heads/') }}
# e.g. QHUB_GH_BRANCH="main"
run: |
echo "QHUB_GH_BRANCH=${GITHUB_REF:11}" >> $GITHUB_ENV

- name: 'Checkout Infrastructure'
uses: actions/checkout@main
- name: Set up Python
uses: conda-incubator/setup-miniconda@v2
env:
CONDA: /home/runnerx/miniconda3
with:
python-version: 3.8
miniconda-version: "latest"
- name: Install QHub
run: |
conda install -c anaconda pip
pip install .[dev]

- name: Download and Install Minikube and Kubectl
run: |
mkdir -p bin
pushd bin
curl -L https://github.com/kubernetes/minikube/releases/download/v1.22.0/minikube-linux-amd64 -o minikube
chmod +x minikube

curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.19.0/bin/linux/amd64/kubectl
chmod +x kubectl

echo "$PWD" >> $GITHUB_PATH
popd
- name: Start Minikube
run: |
minikube start --kubernetes-version=1.19.4 --driver=docker --cpus 2 --memory 6500 --wait=all
- name: Versions
run: |
minikube version
kubectl version
- name: Add nfs client to kubernetes docker node
run: |
minikube ssh "sudo apt update; sudo apt install nfs-common -y"
- name: Get routing table for docker pods
run: |
ip route
- name: Configure LoadBalancer IPs
run: |
python tests/scripts/minikube-loadbalancer-ip.py
- name: Enable Minikube metallb
run: |
minikube addons enable metallb
- name: Basic kubectl checks before deployment
run: |
kubectl get all,cm,secret,ing -A
- name: Initialize QHub Cloud
run: |
mkdir -p local-deployment
cd local-deployment
qhub init local --project=thisisatest --domain github-actions.qhub.dev --auth-provider=password

# Need smaller profiles on Minikube
sed -i -E 's/(cpu_guarantee):\s+[0-9\.]+/\1: 0.25/g' "qhub-config.yaml"
sed -i -E 's/(mem_guarantee):\s+[A-Za-z0-9\.]+/\1: 0.25G/g' "qhub-config.yaml"

cat qhub-config.yaml
- name: Deploy QHub Cloud
run: |
cd local-deployment
qhub deploy --config qhub-config.yaml --disable-prompt
- name: Basic kubectl checks after deployment
run: |
kubectl get all,cm,secret,ing -A
- name: Check github-actions.qhub.dev resolves
run: |
nslookup github-actions.qhub.dev
- name: Curl jupyterhub login page
run: |
curl -k https://github-actions.qhub.dev/hub/home -i

### CYPRESS TESTS
- name: Setup Node
uses: actions/setup-node@v2
with:
node-version: '14'
- name: npm version
run: |
npm --version
- name: Install Cypress dependencies
run: |
sudo apt-get -y update
sudo apt-get install -y libgtk2.0-0 libgtk-3-0 libgbm-dev libnotify-dev libgconf-2-4 libnss3 libxss1 libasound2 libxtst6 xauth xvfb

- name: Read example-user password
run: python -c "import tempfile, os; print('CYPRESS_EXAMPLE_USER_PASSWORD='+open(os.path.join(tempfile.gettempdir(), 'QHUB_DEFAULT_PASSWORD')).read())" >> $GITHUB_ENV

- name: Get qhub-config.yaml full path
run: echo "QHUB_CONFIG_PATH=`realpath ./local-deployment/qhub-config.yaml`" >> $GITHUB_ENV

- name: Cypress run
uses: cypress-io/github-action@v2
env:
CYPRESS_BASE_URL: https://github-actions.qhub.dev/
with:
working-directory: tests_e2e

- name: Save Cypress screenshots and videos
if: always()
uses: actions/upload-artifact@v2
with:
name: e2e-cypress
path: |
./tests_e2e/cypress/screenshots/
./tests_e2e/cypress/videos/

### CLEANUP AFTER CYPRESS

- name: Cleanup qhub deployment
run: |
cd local-deployment
qhub destroy --config qhub-config.yaml
- name: Basic kubectl checks after cleanup
run: |
kubectl get all,cm,secret,ing -A
120 changes: 0 additions & 120 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -89,123 +89,3 @@ jobs:
with:
name: "qhub-${{ matrix.provider }}-artifact"
path: "qhub-${{ matrix.provider }}-deployment"

test-kubernetes:
name: "Kubernetes Tests"
runs-on: ubuntu-latest

steps:
- name: 'QHUB_GH_BRANCH set for PR'
if: ${{ github.event_name == 'pull_request' }}
run: |
echo "QHUB_GH_BRANCH=${GITHUB_HEAD_REF}" >> $GITHUB_ENV
echo "GITHUB_BASE_REF: ${GITHUB_BASE_REF}"
echo "GITHUB_HEAD_REF: ${GITHUB_HEAD_REF}"
echo "GITHUB_REF: ${GITHUB_REF}"
- name: 'QHUB_GH_BRANCH set for a branch (not a tag)'
if: ${{ github.event_name == 'push' && startsWith(github.ref, 'refs/heads/') }}
# e.g. QHUB_GH_BRANCH="main"
run: |
echo "QHUB_GH_BRANCH=${GITHUB_REF:11}" >> $GITHUB_ENV

- name: 'Checkout Infrastructure'
uses: actions/checkout@main
- name: Set up Python
uses: actions/setup-python@v1
with:
python-version: 3.8
- name: Install QHub
run: |
pip install .[dev]
- name: Download and Install Minikube and Kubectl
run: |
mkdir -p bin
pushd bin
curl -L https://github.com/kubernetes/minikube/releases/download/v1.22.0/minikube-linux-amd64 -o minikube
chmod +x minikube

curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.19.0/bin/linux/amd64/kubectl
chmod +x kubectl

echo "$PWD" >> $GITHUB_PATH
popd
- name: Start Minikube
run: |
minikube start --kubernetes-version=1.19.4 --driver=docker --cpus 2 --memory 4096 --wait=all
- name: Versions
run: |
minikube version
kubectl version
- name: Add nfs client to kubernetes docker node
run: |
minikube ssh "sudo apt update; sudo apt install nfs-common -y"
- name: Get routing table for docker pods
run: |
ip route
- name: Configure LoadBalancer IPs
run: |
python tests/scripts/minikube-loadbalancer-ip.py
- name: Enable Minikube metallb
run: |
minikube addons enable metallb
- name: Basic kubectl checks before deployment
run: |
kubectl get all,cm,secret,ing -A
- name: Initialize QHub Cloud
run: |
mkdir -p local-deployment
cd local-deployment
qhub init local --project=thisisatest --domain github-actions.qhub.dev --auth-provider=password

# Need smaller profiles on Minikube
sed -i -E 's/(cpu_guarantee):\s+[0-9\.]+/\1: 0.25/g' "qhub-config.yaml"
sed -i -E 's/(mem_guarantee):\s+[A-Za-z0-9\.]+/\1: 0.25G/g' "qhub-config.yaml"

cat qhub-config.yaml
- name: Deploy QHub Cloud
run: |
cd local-deployment
qhub deploy --config qhub-config.yaml --disable-prompt
- name: Basic kubectl checks after deployment
run: |
kubectl get all,cm,secret,ing -A
- name: Check github-actions.qhub.dev resolves
run: |
nslookup github-actions.qhub.dev
- name: Curl jupyterhub login page
run: |
curl -k https://github-actions.qhub.dev/hub/home -i

### CYPRESS TESTS

- name: Read example-user password
run: python -c "import tempfile, os; print('CYPRESS_EXAMPLE_USER_PASSWORD='+open(os.path.join(tempfile.gettempdir(), 'QHUB_DEFAULT_PASSWORD')).read())" >> $GITHUB_ENV

- name: Get qhub-config.yaml full path
run: echo "QHUB_CONFIG_PATH=`realpath ./local-deployment/qhub-config.yaml`" >> $GITHUB_ENV

- name: Cypress run
uses: cypress-io/github-action@v2
env:
CYPRESS_BASE_URL: https://github-actions.qhub.dev/
with:
working-directory: tests_e2e

- name: Save Cypress screenshots and videos
if: always()
uses: actions/upload-artifact@v2
with:
name: e2e-cypress
path: |
./tests_e2e/cypress/screenshots/
./tests_e2e/cypress/videos/

### CLEANUP AFTER CYPRESS

- name: Cleanup qhub deployment
run: |
cd local-deployment
qhub destroy --config qhub-config.yaml
- name: Basic kubectl checks after cleanup
run: |
kubectl get all,cm,secret,ing -A
2 changes: 2 additions & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
## Upcoming Release

### Feature changes and enhancements

- Added basic cluster monitoring capability via Grafana/Prometheus integration

### Bug fixes

Expand Down
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ source/admin_guide/upgrade.md
source/admin_guide/gpu.md
source/admin_guide/preemptible-spot-instances.md
source/admin_guide/system_maintenance.md
source/admin_guide/monitoring.md
source/admin_guide/clearml.md
source/admin_guide/prefect.md
source/admin_guide/faq.md
Expand Down
16 changes: 16 additions & 0 deletions docs/source/admin_guide/monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Monitoring

Cluster monitoring via Grafana/Prometheus comes built in with QHub, and is enabled by default.

## Accessing the Grafana Dashboards

The monitoring dashboards can be accessed via Grafana at: `your-qhub-domain.com/monitoring`. The initial login credentials are username: `admin` and password: `prom-operator`, but users should change the admin password immediately after the first log in.

## Disabling Cluster Monitoring

1. To disable cluster monitoring on QHub deployments, simply disable the feature flag within your `qhub-config.yaml` file. For example:

```yaml
monitoring:
enabled: false
```
2 changes: 2 additions & 0 deletions docs/source/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ with.

+ [**prefect**](https://www.prefect.io/) workflow management
+ [**clearml**](https://clear.ml/) machine learning platform
+ [**prometheus**](https://prometheus.io/) cluster monitoring
+ [**grafana**](https://grafana.com/) cluster monitoring visualizations

# Why use QHub?

Expand Down
3 changes: 3 additions & 0 deletions qhub/initialize.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,9 @@
"h2_color": "#652e8e",
}
},
"monitoring": {
"enabled": True,
},
"cdsdashboards": {
"enabled": True,
"cds_hide_user_named_servers": True,
Expand Down
8 changes: 8 additions & 0 deletions qhub/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,13 @@ class CICD(Base):
after_script: typing.Optional[typing.List[str]]


# ============== Monitoring =============


class Monitoring(Base):
enabled: bool


# ============== ClearML =============


Expand Down Expand Up @@ -362,6 +369,7 @@ class Main(Base):
theme: Theme
profiles: Profiles
environments: typing.Dict[str, CondaEnvironment]
monitoring: typing.Optional[Monitoring]
clearml: typing.Optional[ClearML]


Expand Down
3 changes: 3 additions & 0 deletions qhub/template/cookiecutter.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@
"certificate": {
"type": "self-signed"
},
"monitoring": {
"enabled": null
},
"clearml": {
"enabled": null
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,19 @@ module "prefect" {
}
{% endif -%}

{% if cookiecutter.monitoring.enabled -%}
module "monitoring" {
source = "./modules/kubernetes/services/monitoring"
namespace = var.environment
external-url = var.endpoint
tls = module.qhub.tls
depends_on = [
module.qhub
]
}
{% endif -%}


{% if cookiecutter.clearml.enabled -%}
module "clearml" {
source = "./modules/kubernetes/services/clearml"
Expand Down
Loading