-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bigtable-backup-tool: Improvements #895
bigtable-backup-tool: Improvements #895
Conversation
Added some metrics for monitoring of backup jobs Added a command to delete out of range backups otherwise, there would be more backups than expected and alerts would start firing for unexpected backups
def set_ensure_backups_specific_metrics(args, num_backups_deleted, newest_table_number, active_table_number): | ||
# ensure-backups job specific metrics | ||
bigtable_backup_job_tables_backed_up = Gauge('bigtable_backup_job_tables_backed_up', 'Number of backups for active and inactive tables found during last run', ['kind'], registry=registry) | ||
bigtable_backup_job_backups = Gauge('bigtable_backup_job_backups', 'Sum of number of backups for all active and inactive tables found during last run', ['kind'], registry=registry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Number of backups for all active and inactive tables.
bigtable_backup_job_tables_backed_up = Gauge('bigtable_backup_job_tables_backed_up', 'Number of backups for active and inactive tables found during last run', ['kind'], registry=registry) | ||
bigtable_backup_job_backups = Gauge('bigtable_backup_job_backups', 'Sum of number of backups for all active and inactive tables found during last run', ['kind'], registry=registry) | ||
bigtable_backup_job_backups_deleted = Gauge('bigtable_backup_job_backups_deleted', 'Number of backups deleted during last run', registry=registry) | ||
bigtable_backup_job_expected_backups = Gauge('bigtable_backup_job_expected_backups', 'Number of backups expected for active and inactive tables', ['kind'], registry=registry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the issue being that this script expects itself to be run every day. This'll fail for active table if it's being run ever other day for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we get by knowing the expected backups for active table?
Further, I think it's better to expose the last backup time for the active table. That way will be able to an alert like active_backup is less than 36hrs old, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, the script should not know how frequently it is going to run.
removing expected number of backups for active table and adding timestamp of last last backup of active table.
What this PR does / why we need it:
Added some metrics for monitoring of backup jobs
Added a command to delete out of range backups otherwise, there would be more backups than expected and alerts would start firing for unexpected backups