Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QOLDEV-863 Fix solr HA #454

Merged
merged 4 commits into from
Aug 7, 2024
Merged

QOLDEV-863 Fix solr HA #454

merged 4 commits into from
Aug 7, 2024

Conversation

ThrawnCA
Copy link
Contributor

@ThrawnCA ThrawnCA commented Aug 7, 2024

  • Import backups into secondary server(s) by stopping the server and wholesale replacing the index, rather than trying to dynamically import.
  • Export backups as archives rather than exploded directories.

TODO Use S3 to pass backups, rather than EFS.

- Export to EFS as an archive, not an exploded directory
- Import by stopping Solr and wholesale replacing the index, not via replication restore endpoint
@ThrawnCA ThrawnCA requested a review from a team August 7, 2024 02:21
return 1
if [ -f "$SYNC_SNAPSHOT" ]; then
sudo service solr stop
sudo -u solr mkdir $LOCAL_DIR/index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have a -p for safety?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, line 80 already does that.

code <<-EOS
rsync -a --delete #{efs_data_dir}/ #{real_data_dir}/
LATEST_INDEX=`ls -dtr #{efs_data_dir}/data/#{core_name}/data/snapshot.* |tail -1`
rsync $LATEST_INDEX/ #{real_data_dir}/data/#{core_name}/data/index/
CORE_DATA="#{real_data_dir}/data/#{core_name}/data"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many snapshots do we keep on the efs?
could we move from the full file on efs but use a s3 pointer file to reduce efs costs also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sync script, on export, removes all snapshots except the current one (solr-sync.sh line 90).

We can probably just drop EFS and use S3 without too much trouble. I didn't do it here because it wasn't needed, but it should be fairly straightforward. We don't use EFS for anything that demands high I/O performance; it's just putting timestamps in heartbeat files, and passing snapshots in the background.

Copy link
Member

@duttonw duttonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a positive step forward, just wondering why the solr api for backup/restore did not work (was it due to the index being corrupted and not booting?

https://solr.apache.org/guide/6_6/making-and-restoring-backups.html

@@ -52,18 +52,18 @@ function export_snapshot () {
if [ "$REPLICATION_STATUS" != "0" ]; then
return $REPLICATION_STATUS
fi
sudo -u solr sh -c "$LUCENE_CHECK $LOCAL_SNAPSHOT && rsync -a --delete $LOCAL_SNAPSHOT/ $SYNC_SNAPSHOT/" || return 1
sh -c "$LUCENE_CHECK $LOCAL_SNAPSHOT && sudo -u solr tar --force-local --exclude=write.lock -czf $SYNC_SNAPSHOT -C $LOCAL_SNAPSHOT ." || return 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've forgotten why we did not go with snapshot over backup. i know that backup is a full instead of a partial, but is also more disk/resource intensive.

Are we still running this every 2min or did we slow it down to every 10?

Copy link
Contributor Author

@ThrawnCA ThrawnCA Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see any distinction in the docs between snapshot and backup. The commands are just 'backup' and 'restore'.

It runs every 5 minutes.

@ThrawnCA ThrawnCA merged commit 2b06233 into develop Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants