-
Notifications
You must be signed in to change notification settings - Fork 913
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
doc/BACKUP.md: Document backup strategies for
lightningd
.
- Loading branch information
Showing
3 changed files
with
312 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,310 @@ | ||
Backing Up Your C-Lightning Node | ||
================================ | ||
|
||
Lightning Network channels get their scalability and privacy benefits | ||
from the very simple technique of *not telling anyone else about your | ||
in-channel activity*. | ||
This is in contrast to onchain payments, where you have to tell everyone | ||
about each and every payment and have it recorded on the blockchain, | ||
leading to scaling problems (you have to push data at everyone) and | ||
privacy problems (everyone knows every payment you were ever involved in). | ||
|
||
Unfortunately, this removes a property that onchain users are so used | ||
to, they react in surprise when learning about this removal. | ||
Your onchain activity is recorded in all archival fullnodes, so if you | ||
forget all your onchain activity because your storage got fried, you | ||
just go redownload the activity from the nearest archival fullnode. | ||
|
||
But in Lightning, since *you* are the only one storing all your | ||
financial information, you ***cannot*** recover this financial | ||
information anywhere else. | ||
|
||
This means that on Lightning, you **have to** responsibly back up your | ||
financial information yourself, using various processes and automation. | ||
|
||
The discussion below assumes that you know where you put your | ||
`$LIGHTNINGDIR`, and you know the directory structure within. | ||
By default your `$LIGHTNINGDIR` will be in `~/.lightning`, and will | ||
contain a subdirectory for the coin you are using. | ||
|
||
`hsm_secret` | ||
------------ | ||
|
||
You need a copy of the `hsm_secret` file regardless of whatever backup | ||
strategy you use. | ||
This is just 32 bytes, and you can do something like the below and | ||
write the hexadecimal digits a few times on a piece of paper: | ||
|
||
xxd hsm_secret | ||
|
||
You can re-enter the hexdump into a text file later and use `xxd` to | ||
convert it back to a binary `hsm_secret`: | ||
|
||
cat > hsm_secret_hex.txt <<HEX | ||
00: 30cc f221 94e1 7f01 cd54 d68c a1ba f124 | ||
10: e1f3 1d45 d904 823c 77b7 1e18 fd93 1676 | ||
HEX | ||
xxd -r hsm_secret_hex.txt > hsm_secret | ||
chmod 0400 hsm_secret | ||
|
||
Notice that you need to ensure that the `hsm_secret` is only readable by | ||
the user, and is not writable, as otherwise `lightningd` will refuse to | ||
start. | ||
Hence the `chmod 0400 hsm_secret` command. | ||
|
||
Alternately, if you are deploying a new node that has no funds and | ||
channels yet, you can generate BIP39 words using any process, and | ||
create the `hsm_secret` using the `hsmtool generatehsm` command. | ||
If you did `make install` then `hsmtool` is installed as | ||
`lightning-hsmtool`, else you can find it in the `tools/` directory | ||
of the build directory. | ||
|
||
lightning-hsmtool generatehsm hsm_secret | ||
|
||
Then enter the BIP39 words, plus an optional passphrase. | ||
|
||
You can regenerate the same `hsm_secret` file using the same BIP39 | ||
words, which again, you can back up on paper. | ||
|
||
Recovery of the `hsm_secret` is sufficient to recover any onchain | ||
funds. | ||
Recovery of the `hsm_secret` is necessary, but insufficient, to recover | ||
any in-channel funds. | ||
To recover in-channel funds, you need to use one or more of the other | ||
backup strategies below. | ||
|
||
Database File Backups | ||
--------------------- | ||
|
||
This is the least desirable backup strategy, as it *can* lead to loss | ||
of all in-channel funds if you use it. | ||
However, having *no* backup strategy at all *will* lead to loss of all | ||
in-channel funds, so this is still better than nothing. | ||
|
||
While `lightningd` is not running, just copy the `lightningd.sqlite3` file | ||
in the `$LIGHTNINGDIR` on backup media somewhere. | ||
|
||
To recover, just copy the backed up `lightningd.sqlite3` into your new | ||
`$LIGHTNINGDIR` together with the `hsm_secret`. | ||
|
||
You can also use any automated backup system as long as it includes the | ||
`lightningd.sqlite3` file (and optionally `hsm_secret`, but note that | ||
as a secret key, hackers getting a copy of your backups may allow them | ||
to steal your funds, even in-channel funds). | ||
|
||
This backup method is undesirable, since it cannot recover the following | ||
channels: | ||
|
||
* Channels with peers that do not support `option_dataloss_protect`. | ||
* Most nodes on the network already support `option_dataloss_protect` | ||
as of November 2020. | ||
* If the peer does not support this option, then the entire channel | ||
funds will be revoked by the peer. | ||
* Specially-coded peers can pretend to support this option, but end | ||
up taking the entire channel funds (or otherwise more than what it | ||
should have) when you try to recover. | ||
Standard releases of Lightning Network node software do not do this | ||
(they just support this option according to the standard) but note | ||
that anyone can modify their node software, especially with our | ||
well-documented open-source code. | ||
* Channels created *after* the copy was made are not recoverable. | ||
* Data for those channels does not exist in the backup, so your node | ||
cannot recover them. | ||
|
||
Because of the above, this strategy is discouraged: you *can* potentially | ||
lose all funds in open channels. | ||
|
||
However, again, note that a "no backups #reckless" strategy leads to | ||
*definite* loss of funds, so you should still prefer this strategy rather | ||
than not have **any** backups at all. | ||
|
||
Even if you have the better options below, you might still want to do | ||
this as a fallback, as long as you ensure: | ||
|
||
* Attempt to recover using the other backup options below first. | ||
Any one of them will be better than this backup option. | ||
* Recovering by this method ***MUST*** always be the ***last*** resort. | ||
* Recover using the most recent backup you can find. | ||
Take time to look for the most recent available backup. | ||
|
||
Again, this strategy can lead to only ***partial*** recovery of funds, | ||
or even to complete failure to recover, so use the other methods below | ||
first to recover! | ||
|
||
### Backing Up While `lightningd` Is Running | ||
|
||
Since `sqlite3` will be writing to the file while `lightningd` is running, | ||
`cp`ing the `lightningd.sqlite3` file while `lighningd` is running may | ||
result in the file not being copied properly if `sqlite3` happens to be | ||
committing database transactions at that time, potentially leading to a | ||
corrupted backup file that cannot be recovered from. | ||
|
||
Instead, the **proper** way to do this would be: | ||
|
||
* `echo "VACUUM INTO 'backup.sqlite3';" | sqlite3 lightningd.sqlite3` | ||
* `sync` | ||
* `mv backup.sqlite3 ${YOURBACKUPLOCATION}/lightningd.sqlite3` | ||
|
||
This creates a consistent snapshot of the database, sampled in a | ||
transaction, that is assured to be openable later by `sqlite3`. | ||
The operation of the `lightningd` process will be paused while `sqlite3` | ||
is creating the backup, as well, since the `VACUUM INTO` command will | ||
be in a transaction that prevents the `lightningd` process from starting | ||
its own database transactions. | ||
|
||
You can also directly give the path to the backup media in the `VACUUM | ||
INTO` command, just remember to `sync` afterwards, and if the backup | ||
media is removable, properly unmount it before removing it. | ||
|
||
An advantage of the `VACUUM INTO` command is that it also removes the | ||
space used by deleted rows, meaning the backup it creates is smaller. | ||
|
||
Filesystem Redundancy | ||
--------------------- | ||
|
||
You can set up a RAID-1 with multiple storage devices, and point the | ||
`$LIGHTNINGDIR` to the RAID-1 setup. | ||
That way, failure of one storage device will still let you recover | ||
funds. | ||
|
||
On a Linux system, one of the simpler things you can do would be to use | ||
BTRFS RAID-1 setup between a partition on your primary storage and a USB | ||
flash disk. | ||
The below "should" work, but assumes you are comfortable with low-level | ||
Linux administration. | ||
If you are on a system that would make you cry if you break it, you **MUST** | ||
stop your Lightning node and back up all files before doing the below. | ||
|
||
* Install `btrfs-progs` or `btrfs-tools` or equivalent. | ||
* Get a 32Gb USB flash disk. | ||
* Stop your Lightning node and back up everything, do not be stupid. | ||
* Repartition your hard disk to have a 30Gb partition. | ||
* This is risky and may lose your data, so this is best done with a | ||
brand-new hard disk that contains no data. | ||
* Connect the USB flash disk. | ||
* Find the `/dev/sdXX` devices for the HDD 30Gb partition and the flash disk. | ||
* `lsblk -o NAME,TYPE,SIZE,MODEL` should help. | ||
* Create a RAID-1 `btrfs` filesystem. | ||
* `mkfs.btrfs -m raid1 -d raid1 /dev/${HDD30GB} /dev/${USB32GB}` | ||
* You may need to add `-f` if the USB flash disk is already formatted. | ||
* Create a mountpoint for the `btrfs` filesystem. | ||
* Create a `/etc/fstab` entry. | ||
* Use the `UUID` option instad of `/dev/sdXX` since the exact device letter | ||
can change across boots. | ||
* You can get the UUID by `lsblk -o NAME,UUID`. | ||
Specifying *either* of the devices is sufficient. | ||
* e.g. `UUID=${UUID} ${BTRFSMOUNTPOINT} btrfs defaults 0 0` | ||
* `mount -a` then `df` to confirm it got mounted. | ||
* Copy the contents of the `$LIGHTNINGDIR` to the BTRFS mount point. | ||
* Copy the entire directory, then `chown -R` the copy to the user who will | ||
run the `lightningd`. | ||
* If you are paranoid, run `diff -R` on both copies to check. | ||
* Remove the existing `$LIGHTNINGDIR`. | ||
* `ln -s ${BTRFSMOUNTPOINT}/lightningdirname ${LIGHTNINGDIR}`. | ||
* Make sure the `$LIGHTNINGDIR` has the same structure as what you | ||
originally had. | ||
* Add `crontab` entries for `root` that perform regular `btrfs` maintenance | ||
tasks. | ||
* `0 0 * * * /usr/bin/btrfs balance start -dusage=50 -dlimit=2 -musage=50 -mlimit=4 ${BTRFSMOUNTPOINT}` | ||
This prevents BTRFS from running out of blocks even if it has unused | ||
space *within* blocks, and is run at midnight everyday. | ||
You may need to change the path to the `btrfs` binary. | ||
* `0 0 * * 0 /usr/bin/btrfs scrub start -B -c 2 -n 4 ${BTRFSMOUNTPOINT}` | ||
This detects bit rot (i.e. bad sectors) and auto-heals the filesystem, | ||
and is run on Sundays at midnight. | ||
* Restart your Lightning node. | ||
|
||
If one or the other device fails completely, shut down your computer, boot | ||
on a recovery disk or similar, then: | ||
|
||
* Connect the surviving device. | ||
* Mount the partition/USB flash disk in `degraded` mode: | ||
* `mount -o degraded /dev/sdXX /mnt/point` | ||
* Copy the `lightningd.sqlite3` and `hsm_secret` to new media. | ||
* Do **not** write to the degraded `btrfs` mount! | ||
* Start up a `lightningd` using the `hsm_secret` and `lightningd.sqlite3` | ||
and close all channels and move all funds to onchain cold storage you | ||
control, then set up a new Lightning node. | ||
|
||
If the device that fails is the USB flash disk, you can replace it using | ||
BTRFS commands. | ||
You should probably stop your Lightning node while doing this. | ||
|
||
* `btrfs replace start /dev/sdOLD /dev/sdNEW ${BTRFSMOUNTPOINT}`. | ||
* Monitor status with `btrfs replace status ${BTRFSMOUNTPOINT}`. | ||
|
||
More sophisticated setups with more than two devices are possible. | ||
Take note that "RAID 1" in `btrfs` means "data is copied on up to two | ||
devices", meaning only up to one device can fail. | ||
You may be interested in `raid1c3` and `raid1c4` modes if you have | ||
three or four storage devices. | ||
BTRFS would probably work better if you were purchasing an entire set | ||
of new storage devices to set up a new node. | ||
|
||
On BSD you can use a ZFS RAID-Z setup, which is probably better than BTRFS | ||
on Linux (in particular the equivalent to the `crontab` entries on Linux | ||
BTRFS are automatically run in the background by ZFS). | ||
ZFS *can* be installed and used on Linux but requires greater effort to | ||
install on a typical Linux system; | ||
try looking at [ZFSonLinux](https://zfsonlinux.org). | ||
|
||
Filesystem RAID-1 like the above protects against failure of a single storage | ||
device, but does not protect you in case of certain disasters, such as fire or | ||
computer confiscation. | ||
|
||
`backup` Plugin And Remote NFS Mount | ||
------------------------------------ | ||
|
||
You can get the `backup` plugin here: | ||
https://github.com/lightningd/plugins/tree/master/backup | ||
|
||
The `backup` plugin requires Python 3. | ||
|
||
* `cd` into its directory and install requirements. | ||
* `pip3 install -r requirements.txt` | ||
* Figure out where you will put the backup files. | ||
* Ideally you have an NFS or other network-based mount on your system, | ||
into which you will put the backup. | ||
* Stop your Lightning node. | ||
* `/path/to/backup-cli init ${LIGHTNINGDIR} file:///path/to/nfs/mount`. | ||
This creates an initial copy of the database at the NFS mount. | ||
* Add these settings to your `lightningd` configuration: | ||
* `important-plugin=/path/to/backup.py` | ||
* `backup-destination=file:///path/to/nfs/mount` | ||
* Restart your Lightning node. | ||
|
||
It is recommended that you use a network-mounted filesystem for the backup | ||
destination. | ||
|
||
Alternately, you *could* put it in another storage device (e.g. USB flash | ||
disk) on the same physical location, but if so, you might want to use the | ||
above `btrfs` RAID-1 instead, as that tends to be more performant | ||
(though using this `backup` plugin does not require you to repartition an | ||
existing hard disk, which might matter to you if you are using an | ||
everyday-use laptop or desktop). | ||
|
||
To recover: | ||
|
||
* Re-download the `backup` plugin and install Python 3 and the | ||
requirements of `backup`. | ||
* `/path/to/backup-cli restore file:///path/to/nfs/mount ${LIGHTNINGDIR}` | ||
|
||
If your backup destination is a network-mounted filesystem that is in a | ||
remote location, then even loss of all hardware in one location will allow | ||
you to still recover your Lightning funds. | ||
|
||
However, if instead you are just replicating the database on another | ||
storage device on a single location, you remain vulnerable to disasters | ||
like fire or computer confiscation. | ||
|
||
PostgreSQL Cluster | ||
------------------ | ||
|
||
`lightningd` may also be compiled with PostgreSQL support. | ||
PostgreSQL is generally faster than SQLITE3, and also supports running a | ||
PostgreSQL cluster to be used by `lightningd`, with automatic replication | ||
and turnover in case an entire node of the PostgreSQL cluster fails. | ||
|
||
Setting this up, however, is more involved. | ||
|
||
TODO: @cdecker |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters