perforce · p4misc · Nov 20, 2019 · Nov 20, 2019 · Nov 21, 2019 · Nov 21, 2019
diff --git a/README.md b/README.md
@@ -1,341 +1,49 @@
-# p4prometheus
+# Docker for Helix Core
+この環境は https://github.com/rcowham/p4prometheus から派生して作成しています。
 
-Utility which integrates Perforce (Helix Core) with Prometheus. If performs real-time analysis of p4d log files feeding to a dashboard and for system alerting.
-
-It continuously parses p4d log files and write a summary to 
-a specified Prometheus compatible metrics file which can be handled via the `node_exporter`
-textfile collector module.
-
-Uses [go-libp4dlog](https://github.com/rcowham/go-libp4dlog) for actual log file parsing.
-
-## Overview
-
-This is part of a solution consisting of the following components:
-
-* Prometheus - time series metrics management system: https://prometheus.io/
-* Grafana - The leading open source software for time series analytics - https://grafana.com/
-* node_exporter - Prometheus collector for basic Linux metrics - https://github.com/prometheus/node_exporter
-
-Two custom components:
-
-* p4prometheus - This component.
-* monitor_metrics.sh - [SDP](https://swarm.workshop.perforce.com/projects/perforce-software-sdp) compatible bash script to generate simple supplementary metrics - [monitor_metrics.sh](https://swarm.workshop.perforce.com/files/guest/perforce_software/sdp/dev/Server/Unix/p4/common/site/bin/monitor_metrics.sh)
-
-Check out the ![Prometheus architecture](https://prometheus.io/assets/architecture.png) - the custom components are "Prometheus targets".
-
-# Grafana Dashboards
-
-When installed and setup, you can get dashboards such as the following to appear.
-
-Commands Summary:
-
-![Commands Summary](images/p4stats_cmds_summary.png)
-
-Rates for command durations and count:
-
-![Commands](images/p4stats_cmds.png)
-
-Active commands (monitor):
-
-![Commands](images/p4stats_monitor.png)
-
-Replication status:
-
-![Commands](images/p4stats_replication.png)
-
-Read/write locks held/waiting status:
-
-![Commands](images/p4stats_table_read_locks.png)
-
-Dashboard alerts can be defined, as well as alert rules which are actioned by [alertmanager](https://prometheus.io/docs/alerting/alertmanager/)
-
-# Detailed Installation
-
-You need to install Prometheus and Grafana using standard methods. This is typically done on a seperate VM/machine to the Perforce server itself (for security and HA reasons).
-
-For example:
-
-* https://www.howtoforge.com/tutorial/how-to-install-grafana-on-linux-servers/
-* https://www.howtoforge.com/tutorial/how-to-install-prometheus-and-node-exporter-on-centos-7/
-
-## Install node_exporter
-
-Use above instructions, or these. This must be done on the Perforce (Helix Core) server machine (ditto for any other servers such as replicas which are being monitored).
-
-Run the following as root:
-
-    sudo useradd --no-create-home --shell /bin/false node_exporter
-
-    export PVER="0.18.0"
-    wget https://github.com/prometheus/node_exporter/releases/download/v$PVER/node_exporter-$PVER.linux-amd64.tar.gz
-
-    tar xvf node_exporter-$PVER.linux-amd64.tar.gz 
-
-    mv node_exporter-$PVER.linux-amd64/node_exporter /usr/local/bin/
-
-Create a metrics directory, give ownership to account writing metrics, and make sure it has global read access (so `node_exporter` account can read entries)
-
-    mkdir /hxlogs/metrics
-
-    chown perforce:perforce /hxlogs/metrics
-
-    ls -al /hxlogs/metrics
-
-Ensure the above has global read access (perforce user will write files, node_exporter will read them).
-
-Create service file:
-
-```ini
-cat << EOF > /etc/systemd/system/node_exporter.service
-[Unit]
-Description=Node Exporter
-Wants=network-online.target
-After=network-online.target
-
-[Service]
-User=node_exporter
-Group=node_exporter
-Type=simple
-ExecStart=/usr/local/bin/node_exporter --collector.textfile.directory="/hxlogs/metrics"
-
-[Install]
-WantedBy=multi-user.target
-EOF
-```
-
-Start and enable service:
-
-    sudo systemctl daemon-reload
-    sudo systemctl start node_exporter
-    sudo systemctl status node_exporter
-    sudo systemctl enable node_exporter
-
-Check logs for service in case of errors:
-
-    journalctl -u node_exporter --no-pager | tail
-
-Check that metrics are being exposed:
-
-    curl http://localhost:9100/metrics | less
-
-## Install p4prometheus - details
-
-This must be done on the Perforce (Helix Core) server machine (and any replica machines).
-
-This assumes SDP structure is in use on the server, and thus that user `perforce` exists.
-
-Get latest release download link: https://github.com/rcowham/p4prometheus/releases
-
-Run the following as `root` (using link copied from above page):
-
-    wget https://github.com/rcowham/p4prometheus/files/3446515/p4prometheus.linux-amd64.gz
-
-    gunzip p4prometheus.linux-amd64.gz
-
-    chmod +x p4prometheus.linux-amd64
-
-    mv p4prometheus.linux-amd64 /usr/local/bin/p4prometheus
-
-As user `perforce`:
+環境を起動し終えると、次のサーバが稼働します。
+- Helix Coreのコミットサーバ (SSLなし、Unicodeモード、サンプルDepot付き、p4prometheus入り)
+- Helix Coreのエッジサーバ
+- Prometheus
+- Grafana
 
+docker-composeを以下のように実行します。
 ```bash
-cat << EOF > /p4/common/config/p4prometheus.yaml
-# SDP instance - typically integer, but can be
-# See: https://swarm.workshop.perforce.com/projects/perforce-software-sdp for more
-sdp_instance:   1
-# Path to p4d server log
-log_path:       /p4/1/logs/log
-# Name of output file to write for processing by node_exporter
-metrics_output: /hxlogs/metrics/p4_cmds.prom
-# Optional - serverid for metrics - typically read from /p4/<sdp_instance>/root/server.id
-server_id:      
-EOF
+docker-compose build
+docker-compose up -d
 ```
 
-As user `root`:
-
-Create service file:
-
-```ini
-cat << EOF > /etc/systemd/system/p4prometheus.service
-[Unit]
-Description=P4prometheus
-Wants=network-online.target
-After=network-online.target
-
-[Service]
-User=perforce
-Group=perforce
-Type=simple
-ExecStart=/usr/local/bin/p4prometheus --config=/p4/common/config/p4prometheus.yaml
-
-[Install]
-WantedBy=multi-user.target
-EOF
-```
-
-Start and enable service:
-
-    sudo systemctl daemon-reload
-    sudo systemctl start p4prometheus
-    sudo systemctl status p4prometheus
-    sudo systemctl enable p4prometheus
-
-Check logs for service in case of errors:
-
-    journalctl -u p4prometheus --no-pager | tail
-
-Check that metrics are being written:
-
-    cat /hxlogs/metrics/p4_cmds.prom
+実行後は、以下のイメージを元にしたコンテナが起動します。
 
-# Alerting
+ホスト名 | IMAGE名 | ポート設定1 | ポート設定2 
+--- | --- | --- | ---
+grafana | grafana/grafana | 3000:3000 | 
+monitor | p4prometheus_monitor | 9090:9090 | 9100:9100
+master | p4prometheus_master | 2166:1999 | 9101:9100
+replica_edge | p4prometheus_replica_edge | 2266:1999 | 9102:9100
 
-Done via alertmanager
+互いのリンク状態は以下のとおりです。
+- grafana -> monitor
+- monitor -> master
+- replica_edge -> master 
 
-Setup is very similar to the above.
+コンテナを起動させるだけでは、Helix Coreのコミットサーバとエッジサーバが起動しません。
 
-Sample `/etc/systemd/system/alertmanager.service`:
-
-```ini
-[Unit]
-Description=Alertmanager
-Wants=network-online.target
-After=network-online.target
-
-[Service]
-User=alertmanager
-Group=alertmanager
-Type=simple
-ExecStart=/usr/local/bin/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --storage.path=/var/lib/alertmanager --log.level=debug
-
-[Install]
-WantedBy=multi-user.target
-```
-
-* create alertmanager user
-* create /etc/alertmanager directory
-
-
-## Prometheus config
-
-```yaml
-global:
-  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
-  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
-  # scrape_timeout is set to the global default (10s).
-
-# Alertmanager configuration
-alerting:
-  alertmanagers:
-  - static_configs:
-    - targets:
-        - localhost:9093
-
-# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
-rule_files:
-  - "perforce_rules.yml"
-
-# A scrape configuration containing exactly one endpoint to scrape:
-# Here it's Prometheus itself.
-scrape_configs:
-  - job_name: 'prometheus'
-    static_configs:
-    - targets: ['localhost:9090']
-
-  - job_name: 'node_exporter'
-    static_configs:
-    - targets: ['p4hms:9100', 'p4main:9100', 'p4_ha:9100']
-
-```
-
-## Alerting rules
-
-This is an example, assuming simple email and local postfix or equivalent setup.
-
-```yaml
-groups:
-- name: alert.rules
-  rules:
-  - alert: NoLogs
-    expr: 100 > rate(p4_prom_log_lines_read{sdpinst="1",serverid="master"}[1m])
-    for: 1m
-    labels:
-      severity: "critical"
-    annotations:
-      summary: "Endpoint {{ $labels.instance }} too few log lines"
-      description: "{{ $labels.instance }} of job {{ $labels.job }} has been below target for more than 1 minutes."
-  - alert: Replication Slow HA
-    expr: p4_replica_curr_pos{instance="p4master:9100",job="node_exporter",sdpinst="1",servername="master"} - ignoring(serverid,servername) p4_replica_curr_pos{instance="p4master:9100",job="node_exporter",sdpinst="1",servername="p4d_ha_bos"} > 5e+7
-    for: 10m
-    labels:
-      severity: "warning"
-    annotations:
-      summary: "Endpoint {{ $labels.instance }} replication warning"
-      description: "{{ $labels.instance }} of job {{ $labels.job }} has been above target for more than 1 minutes."
-  - alert: Replication Slow London
-    expr: p4_replica_curr_pos{instance="p4master:9100",job="node_exporter",sdpinst="1",servername="master"} - ignoring(serverid,servername) p4_replica_curr_pos{instance="p4master:9100",job="node_exporter",sdpinst="1",servername="p4d_fr_lon"} > 5e+7
-    for: 10m
-    labels:
-      severity: "warning"
-    annotations:
-      summary: "Endpoint {{ $labels.instance }} replication warning"
-      description: "{{ $labels.instance }} of job {{ $labels.job }} has been above target for more than 1 minutes."
-  - alert: Checkpoint slow
-    expr: p4_sdp_checkpoint_duration{sdpinst="1",serverid="master"} > 50 * 60
-    for: 5m
-    labels:
-      severity: "warning"
-    annotations:
-      summary: "Endpoint {{ $labels.instance }} checkpoint job duration longer than expected"
-      description: "{{ $labels.instance }} of job {{ $labels.job }} has been above target for more than 1 minutes."
-  - alert: Checkpoint not taken 
-    expr: time() - p4_sdp_checkpoint_log_time{sdpinst="1",serverid="master"} > 25 * 60 * 60
-    for: 5m
-    labels:
-      severity: "warning"
-    annotations:
-      summary: "Endpoint {{ $labels.instance }} checkpoint not taken in 25 hours warning"
-      description: "{{ $labels.instance }} of job {{ $labels.job }} has been above target for more than 1 minutes."
-  - alert: P4D service not running
-    expr: node_systemd_unit_state{state="active",name="p4d_1.service"} != 1
-    for: 5m
-    labels:
-      severity: "warning"
-    annotations:
-      summary: "Endpoint {{ $labels.instance }} p4d service not running"
-      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for 5 minutes."
-  - alert: DiskspaceLow
-    expr: node_filesystem_free_bytes{mountpoint=~"/hx.*"} / node_filesystem_size_bytes{mountpoint=~"/hx.*"} * 100 < 10
-    for: 5m
-    labels:
-      severity: "warning"
-    annotations:
-      summary: "Endpoint {{ $labels.instance }} disk space below 10%"
-      description: "{{ $labels.instance }} of job {{ $labels.job }} has been below limit for 5 minutes."
+コンテナにログインをしてコミットサーバとエッジサーバの構築用シェルを実行します。
+```bash
+# 例
+docker exec -it p4prometheus_master_1 /bin/bash
+cd /p4
+./configure_master.sh
 ```
 
-## Alertmanager config
-
-This is an example, assuming simple email and local postfix or equivalent setup - `/etc/alertmanager/alertmanager.yml`
+実行後は master のコンテナ内でHelix Coreのコミットサーバ、replica_edge のコンテナ内でHelix Coreのエッジサーバが起動します。
 
-```yaml
-global:
-  smtp_from: alertmanager@perforce.com
-  smtp_smarthost: localhost:25
-  smtp_require_tls: false
-  # Hello is the local machine name
-  smtp_hello: p4hms
+Dockerのホスト側のIPアドレスが 192.168.1.2 であると仮定した場合、それぞれのツールに以下の方法でアクセスできます。
 
-route:
-  group_by: ['alertname']
-  group_wait: 30s
-  group_interval: 5m
-  repeat_interval: 60m
-  receiver: mail
-
-receivers:
-- name: mail
-  email_configs:
-  - to: p4-group@perforce.com
-```
+ツール | アクセスに使うツール | アクセス方法 | ユーザ | パスワード
+--- | --- | --- | --- | --- 
+grafana | WEBブラウザ | http://192.168.1.2:3000 | admin | admin
+prometheus | WEBブラウザ | http://192.168.1.2:9090 | なし | なし
+Helix Coreコミットサーバ | P4Vなど | 192.168.1.2:2166 | bruno | なし
+Helix Coreエッジサーバ | P4Vなど | 192.168.1.2:2266 | bruno | なし
diff --git a/p4d.sdp/Dockerfile b/p4d.sdp/Dockerfile
@@ -18,6 +18,8 @@ RUN yum install -y openssh-server openssh-clients passwd; \
 RUN yum install -y https://centos7.iuscommunity.org/ius-release.rpm; \
     yum update; \
     yum install -y python36u python36u-libs python36u-devel python36u-pip; \
+    rm -f /usr/bin/python3; \
+    rm -f /usr/bin/pip3; \
     ln -s /usr/bin/python3.6 /usr/bin/python3; \
     ln -s /usr/bin/pip3.6 /usr/bin/pip3;
 
@@ -47,8 +49,11 @@ FROM sdpbase as sdpmaster
 USER root
 RUN pip3.6 install ansible 
 
+ADD configure_sample_depot_for_sdp.sh /root
 RUN mkdir -p /hxdepots/reset && \
     cd /hxdepots/reset && \
+    mv /root/configure_sample_depot_for_sdp.sh . && \
+    chmod +x configure_sample_depot_for_sdp.sh && \
     curl -k -s -O https://swarm.workshop.perforce.com/download/guest/perforce_software/helix-installer/main/src/reset_sdp.sh && \
     chmod +x reset_sdp.sh && \
     ./reset_sdp.sh -fast -no_ssl