-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GNIP-50: GeoNode monitoring #3137
Comments
+1 |
Great. Have also a look at Hypermap: https://github.com/cga-harvard/HHypermap We use it to track health check of thousands of services and layers, including our GeoNode instance (WorldMap). Here is our live instance: http://hh.worldmap.harvard.edu/ For example here is the situation for WorldMap: http://hh.worldmap.harvard.edu/registry/hypermap/service/2a96b71c-96b2-4432-b31f-219c45f3fc52/ |
@capooti thanks. looks interesting, but correct me if i'm wrong here: this is just external visibility check, right? |
We test services and layers using OWSLib and ArcREST. |
code merged in, closing |
GNIP: geonode monitoring
Overview
GeoNode monitoring is an infrastructure to extract and present information on installation's health status and resource (layers, maps, documents) usage. Monitoring is an additional Django/GeoNode application which will:
GeoNode monitoring functionality is not limited to plain GeoNode, but it will also collect data from accompanying GeoServer instances, and from operating system on hardware resources usage.
Proposed by
Cezary Statkiewicz GeoSolutions
Assigned to release
None yet.
Motivation
GeoNode lacks information on resources usage and system health, which can be problematic in most cases, where operator(s) want to know some insights of running system. This was formulated as a significant problem by GFDRR’s Innovation Lab, which, through the Open Data for Resilience Initiative has assisted in the creation of National and Regional Geospatial data sharing platforms since 2010. Many of these platforms were deployed outside formal data centers and have administrators with other responsibilities unrelated to GeoNode. Some real problems raised:
Technically, data needed to deduce such information could be extracted in various ways (client-side analytics, log parsing, external monitoring), but each way has it's drawbacks, and none would show full picture. Also, existing or previous attempts (GeoHealthCheck, geonode-monitor are quite incomplete and focus mostly on measuring external visibility/state only.
This proposal introduces contrib monitoring application, which would provide insights into actual usage of data and do health check of underlying system. Application should be optional, although there are few integration points in GeoNode core.
Note, this is not a replacement for full-fledge monitoring systems like Zabbix or Nagios. GeoNode monitoring is a simplified, especially from user's perspective. However, while GeoNode monitoring can work in stand-alone mode, it could be also integrated with 3rd party systems as well, as a data source (not covered by this GNIP).
Proposal
GN monitoring has two main tasks:
Collecting data starts with recording request with context: besides basic http context, it should also contain information about used resources, service which was used, more detailed information on client etc. Similar data structure is already available in GeoServer.
Data collection may be implemented in several ways. By default, data will be pulled from probes, although there should be a way for probes to push data to collector. Also, monitoring should be ready to handle data exchange through AMQP, which will be future default way of notifications handling. Data collection can be performed periodically or persistently in real time.
Statistics calculation is performed periodically, into fixed length periods with aggregated data. Aggregated data would contain general statistics and per-resource statistics, so presentation layer can present system status from overview to layer-level without much of recalculation.
Architecture overview
Monitoring is composed of several components, described below. Note, that those are logical units. Code should reside in
geonode.contrib.monitoring
module as a Django application.GeoNode probes
Probes are points of integration in GeoNode core, which will record it's activity. This is build with:
GeoServer probes
GeoServer provides Monitoring/Audit API, which can be used. GeoServer improvements will be handled outside this GNIP.
System-level probes
GeoNode monitoring can collect system-level data (cpu usage, memory usage, disks usage). System-level data can be extracted by reading system indicators from GeoNode and GeoServer processes and expose with Status API in GeoServer. GeoNode would have Expose API, which is a set of views which will present system-level data at the moment of request.
Collector
This is the core element of monitoring, because it connects both main functionalities. Collector provides following facilities:
This can be any of following:
Collector can be run as:
collect_metrics
),Dashboard/Status UI
(Note, those are designs, not actual implementation)
main view:
list of captured exceptions
exception details
notifications configuration
response statistics
resources statistics
Status UI is a set of views and client-side application that will present metrics. User should get main indicators in simplified form (to judge if system is working properly), and have a way to see more detailed data few clicks away. Status UI should also provide a way to configure notifications and collector.
Notifications
Monitoring Notifications shouldn't be confused with GeoNode notifications app, which is a separate entity. However, Monitoring Notifications will use general notifications as a backend for sending alerts. User should be able to configure thresholds for certain indicators, which can consist of several metrics. Notifications will check metrics for each indicator after each metrics calculation, and send alerts in alarm conditions.
Beacon
Beacon is an API that exposes current status of GeoNode for external monitoring.
Data model
Collected data
There are different types of probes and data they provide. Basically, two base types are distinguished, service type and host type. Service type provides stream of events from service (GeoNode, GeoSever). Stream can contain data from past or be provided in real-time. Host type probe provides only data for current moment.
Collector will get following data from probes:
Data will be aggregated and stored in fixed-lenght periods. For near-present data, periods should be 1-5 minutes, for older data periods could be longer.
Metrics
Metric is an aggregated value for specific indicator. There are three types of metrics:
While metric types seems similar, they are handled differently when are aggregated in API.
A metric has several main properties:
valid_from
,valid_to
),request.ip
, orrequest.count
(which is defined in MetricName model),Additionally, metric can be associated to:
Following metric organization allows to have different levels of granularity (per-service, per-metric, per-resource etc) and further aggregation (increased intervals, aggregating total request count from sum of requests to specific resources etc).
Errors
Errors captured by GN or GS are stored along with request details, and are exposed with dedicated API endpoint. Error information contains:
Monitoring API
Detailed API description: https://github.com/geosolutions-it/geonode/wiki/Monitoring:-API
The text was updated successfully, but these errors were encountered: