Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigtable-backup-tool: Improvements #895

Merged

Conversation

sandeepsukhani
Copy link
Contributor

What this PR does / why we need it:
Added some metrics for monitoring of backup jobs
Added a command to delete out of range backups otherwise, there would be more backups than expected and alerts would start firing for unexpected backups

Added some metrics for monitoring of backup jobs
Added a command to delete out of range backups otherwise, there would be more backups than expected and alerts would start firing for unexpected backups
def set_ensure_backups_specific_metrics(args, num_backups_deleted, newest_table_number, active_table_number):
# ensure-backups job specific metrics
bigtable_backup_job_tables_backed_up = Gauge('bigtable_backup_job_tables_backed_up', 'Number of backups for active and inactive tables found during last run', ['kind'], registry=registry)
bigtable_backup_job_backups = Gauge('bigtable_backup_job_backups', 'Sum of number of backups for all active and inactive tables found during last run', ['kind'], registry=registry)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Number of backups for all active and inactive tables.

bigtable_backup_job_tables_backed_up = Gauge('bigtable_backup_job_tables_backed_up', 'Number of backups for active and inactive tables found during last run', ['kind'], registry=registry)
bigtable_backup_job_backups = Gauge('bigtable_backup_job_backups', 'Sum of number of backups for all active and inactive tables found during last run', ['kind'], registry=registry)
bigtable_backup_job_backups_deleted = Gauge('bigtable_backup_job_backups_deleted', 'Number of backups deleted during last run', registry=registry)
bigtable_backup_job_expected_backups = Gauge('bigtable_backup_job_expected_backups', 'Number of backups expected for active and inactive tables', ['kind'], registry=registry)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the issue being that this script expects itself to be run every day. This'll fail for active table if it's being run ever other day for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we get by knowing the expected backups for active table?

Further, I think it's better to expose the last backup time for the active table. That way will be able to an alert like active_backup is less than 36hrs old, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, the script should not know how frequently it is going to run.
removing expected number of backups for active table and adding timestamp of last last backup of active table.

@sandeepsukhani sandeepsukhani merged commit e65fd2e into grafana:master Aug 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants