Skip to content

Database Tables

chuckablack edited this page Nov 24, 2020 · 8 revisions

Overview

Quokka uses a database to store information about devices, hosts, services, time-series data regarding status, hourly summary data of statuses, events, response data from packet captures, port scans, and traceroutes, as well as status of remote workers.

The database type chosen for holding this data is SQL. Quokka has been tested with both SQLite3 (a small DB used mainly for small testing) and Postgresql (an open source production-ready DB). The VM that ships with quokka is using Postgresql (often shortened verbally to just "postgres").

SQL databases are considered "relational", in that different tables can hold separate, but related, information, and functionality is often implemented when multiple tables are "joined". However, quokka chooses to keep things as simple as possible, so relationships are few, generally in the form of a main table (e.g. devices) being referenced in a time-series data table (e.g. device-status).

SQLAlchemy

SQLAlchemy is one of the most popular python projects for allowing developers to interface with SQL databases from Python. In fact, it is so popular that even Flask has its own package, 'flask_sqlalchemy', which quokka uses for its SQL access.

The general idea with SQLAlchemy is that you define Python classes, which inherit from SQLAlchemy classes, giving these classes the ability to perform create, read, update, and delete (often called CRUD) operations of data in the database. In general, these classes map to SQL tables. When you create an object of this type, and then do a commit, SQLAlchemy performs all the hard work of writing the data to the database.

The same is true if you are reading from the database; you can set filters, order the results, etc., and SQLAlchemy will perform the associated SQL operations on the DB and return you the results, in Python objects.

Quite often in quokka, those objects are translated into simpler Python data structures, without all of the object-oriented and SQL-database-oriented information, for ease of use and manipulation.

DB at Quokka Startup

Initially, when quokka started up, it would delete all existing data in the tables - a "fresh start". This served to reduce potential errors with old table data and definitions. And it overcame issues related to some of the status data pointing at 'old' entries in the devices, hosts, or services tables.

However, it was found that it was possible to preserve the existing data, and quokka would still function reasonably across restarts. And since some of the data - e.g. status, and hourly summaries - can take a long time to re-create, it was decided to allow as much data as possible to persist across restarts of quokka.

So for now, the following occurs at startup:

  • Devices are imported from its YAML configuration file. Since device definitions have their own ID, then the existing status information will still point at the right device in the table, based on that ID.
  • Hosts are preserved in the DB across reboots of quokka. Since searching for hosts involves looking for a match on both IP address and hostname, this helps to minimize the likelihood that a host will acquire a new IP address, which might cause us to confuse one host for another should this occur over time.
  • Services are also imported from its YAML configuration file. And since services have their own ID, the existing status information is preserved across reboots.

Note that if the data is stale, or you suspect that it might be stale, there are a couple options as described below.

Resetting DB data

If you end up in a situation where the existing data needs to be removed for some reason, and you are not comfortable doing this using SQL commands manually, there are a couple options.

  • dropdb and createdb: Fortunately for us, postgres provides simple commands to completely blow away the database, called 'dropdb' and 'createdb'. So if you want to, or believe you need to, completely blow away the existing database, and start afresh, you can do this:

$ dropdb quokka
$ createdb quokka

This will remove the DB entirely, and then will create a new and completely empty DB, without any tables even. When you then run quokka, it will be starting completely from scratch.

Note that if there are any structural changes to the DB - i.e. adding a table, adding a column to a table, etc. - then you must recreate the DB in this way.

  • Reset: In the quokka UI, at the top left there is a menu that allows you to selectively delete specific data - devices, hosts, services, events, etc. Selecting one of these will delete the existing data for whatever was selected.

Note: This functionality was added more as a demonstration of the menu functionality, than anything really pragmatic, so there may or may not be issues with the implementation. The last time I tested it, it worked - but that's all I can say at this time! :-)

DB maintenance

In a 'real' implementation and production environment, the continual gathering of time-series data, events, packet captures, etc., could pose a problem, and ultimately eat of all disk space on the quokka server. In my own environment, running on a laptop 100% of the time, quokka data doesn't even make a minor dent in the disk space usage; but in a real environment, this could become an issue.

Therefore, quokka has a DB maintenance task, which runs every hour, and trims data from the tables mentioned above. The process is generally as follows:

  • Query the table for records older than a certain age (e.g. older than 2 hours for status data, older than 24 hours for diagnostic results)
  • Delete any records that match the query above

This is entirely arbitrary of course, and could be made configurable. A couple notes to address questions you may have:

  • I am not trimming events - these last forever or until manually deleted.
  • I can trim status data after two hours because currently, because by default we are only displaying the last 30 datapoints, and we are getting status every minute (that's 120 datapoints for two hours). Also, we summarize the data (for hosts and services), and we maintain that data forever.

Command table

You may notice a table called "Command". This is where commands are stored, that need to be given to remote workers that are connected to the quokka server via HTTP (i.e. the direction is from the remote worker, to the quokka server, sending heartbeat, and saying "do you have any commands for me?").

In that situation, the way it works is that the user requests an operation such as packet capture, portscan, or traceroute; if the remote worker is configured to use HTTP for communication, then we put the command into the command table. When the worker does the heartbeat, the quokka server checks for outstanding commands, and if any exist, sends it down to the worker with the HTTP response.

This is described in more detail in the page describing workers, but the point here is simply that the "command" table gets trimmed of commands that are greater than 24 hours old.

Clone this wiki locally