Author: Antonio Di Mariano - antonio.dimariano@gmail.com
This service is part of the Websites Monitoring Application, a small distributed application that aims to produce and collect metrics about the availability of one or more websites. This service is responsible for consuming messages about metrics being produced to an Apache Kafka Avro Topic by a different service. The main action of this service is to store the incoming message into a PostgreSQL database.
The application is made of three services that can run in different systems.
There are main two services.
The first one https://github.com/antoniodimariano/websites_metrics_collector is responsible for fetching and collecting the information from a list of URLs. The information collected is
HTTP Status returned HTTP response time regexp pattern that is expected to be found on the HTML content. For each record, a message is produced to an Apache Kafka Topic. This service exposes a REST API Service with a POST method to accept a list of URLs to fetch and process.
The second service is https://github.com/antoniodimariano/metrics_consumer that is responsible for consuming messages about metrics being produced to an Apache Kafka Avro Topic by a different service. The main action of this service is to store the incoming message into a PostgreSQL database.
The last one is a Celery Beat based service https://github.com/antoniodimariano/crontab_request that periodically send a POST request with a list of URLS to the websites_metrics_collector
So, to run all the whole application, you don't need to clone the three repos, You can use these two public python packages
- https://pypi.org/project/websites-metrics-consumer/
- https://pypi.org/project/websites-metrics-collector/
Create two python
applications. One will consume messages
pip3 install websites-metrics-consumer
The other will produce metrics
pip3 install websites-metrics-collector
In order to produce metrics, the https://github.com/antoniodimariano/websites_metrics_collector runs a REST Server with a POST
/api/v1/websites_metrics
endpoint that accepts a list of URLs to fetch.
For the complete documentation go here https://github.com/antoniodimariano/websites_metrics_collector/blob/master/README.md
The last application (https://github.com/antoniodimariano/crontab_request) uses Celery Beat to periodically run the task of reading a list or URLs from a local json
file and will send it to as payload.
of the POST
request to /api/v1/websites_metrics
It requires Redis
as broker.
You can decide not to use https://github.com/antoniodimariano/crontab_request and implements your own way of requesting a list or URLs to monitor.
As long as you send a POST Request to the endpoint /api/v1/websites_metrics
metrics will be collected, messages will be produced and data will be stored.
I decided to use Celery and not, for instance, a simple timer implemented with Thread or 3-party libs because Celery is robust, scalable and production ready.
I know it comes at the price of having a broker, but I prefer to pay a small price for a significant advance. Not to mention, I am a big fan of Celery!
If you want to run from the source code, go to the directory websites_metrics_consumer
and run python main
If you want to use it as package (suggested method) install pip3 install websites_metrics_consumer
Then
- Set the ENV variables as show in this README.md
- Then use it this way
def consume_message():
from websites_metrics_consumer.main import start_consumer
start_consumer()
- Python >=3.8
- confluent_kafka==1.7.0
- psycopg2==2.9.1
NOTE: for simplicity I am assuming you have a running local instance of PostgreSQL with the following information
- database='test'
- user='postgresql'
- password='test123'
- host='localhost'
python -m unittest tests/test_db_operations.py
python -m unittest test/test_events_handler_class.py
If you are using a Broker that uses SSL
, like Aiven.com, you need to download the following certificates files and copy them to a folder
- Access Key
- Access Certificate
- CA Certificate
for more information about SSL and Kafka please read https://docs.confluent.io/3.0.0/kafka/ssl.html
Example of your project folder
myprojectname/
|------configuration/
__init__.py
ca.pem
service.cert
service.key
|------venv
|main.py
|requirements.txt
ENV Variable | VALUE | DESCRIPTION |
---|---|---|
brokers | string | Required. The FQDN of the Apache Kafka Brokers. |
schema_registry | string | Required. The FQDN of the Service Registry. |
ssl_ca_location | string | Required. he relative path to your ca.pem |
ssl_certificate_location | string | Required. The relative path to your service.cert |
ssl_key_location | string | Required. The relative path to your service.key |
security_protocol | string | Required. SSL. |
persistence_conf | string | Required. The full string for connecting to your PostgreSQL database. |
group_id | string | Optional. The consumer group id to assign to the consumer(s) . The default value is metrics_consumer |
consumer_topics | string or list | Optional. The Kafka Avro Topic where to consume messages from. The defaul value is websites_metrics |
logging_level | string | Optional. The level of logging to use for the built-in logging package. The default is logging.INFO |
Example of mandatory ENV variables to use
- brokers=kafka-xxx-yyy-abc.com:12026
- schema_registry=https://user:password@kafka-xxx-yyy-abc.com:12029
- ssl_ca_location=configuration/ca.pem
- ssl_certificate_location=configuration/service.cert
- ssl_key_location=configuration/service.key
- security_protocol=SSL
- persistence_conf=postgres://user:password@mydbhost.com:12024/metrics?sslmode=require
If your broker requires SASL authentication, like Confluent Cloud, these are the ENV variables to include
ENV Variable | VALUE | DESCRIPTION |
---|---|---|
brokers | string | Required. The FQDN of the Apache Kafka Brokers. |
schema_registry | string | Required. The FQDN of the Service Registry. |
sasl_username | string | Required. YOUR USERNAME HERE |
sasl_password | string | Required. YOUR PASSWORD HERE |
schema_registry_basic_auth_user_info | string | Required. AUTH HERE |
schema_registry_basic_auth_credentials_source | string | Required. USER_INFO. |
sasl_mechanisms | string | Required. PLAIN. |
security_protocol | string | Required. SASL_SSL. |
persistence_conf | string | Required. The full string for connecting to your PostgreSQL database. |
group_id | string | Optional. The consumer group id to assign to the consumer(s) . The default value is metrics_consumer |
consumer_topics | string or list | Optional. The Kafka Avro Topic where to consume messages from. The defaul value is websites_metrics |
logging_level | string | Optional. The level of logging to use for the built-in logging package. The default is logging.INFO |