Skip to content
jhofstee edited this page Jan 15, 2020 · 51 revisions

1. Budgetting the data partition

Available

  • CCGX: 28M orso
  • Venus GX 1st version: 100MB
  • Cerbo GX: 512MB

Usage

  • Log files: We have around 40 processes that always run and log. Per v2.23; the maximum space that the logfiles for a particular process takes has been reduced to 4 files of 25kB each, 100kB. So in total for all 40 processes, this amounts to 4.000kB. Details of the change in commit 9ce14ef1e, which was backported to v2.23
  • Firmware & settings file cache (mqtt-rpc)
  • Settings
  • Factory installed files (negligible from a size point of view)
  • VRM Logger backlog

2. Factory installed files on the data partition

# cat /data/venus/installer-version 
v2.11
Victron Energy

# cat /data/venus/serial-number     
HQ1825ZUT5T

# cat /data/venus/wpa-psk       
gt5nyede

# cat /data/venus/part-number
BPP900400100

In the same folder there is also one other file, which is auto generated:

# cat /data/venus/unique-id  
985dxxxxx3a1

3. Writing files to data

Filesystem in Linux are typically asynchronous; the data is reported as written when it is in the page cache, but not yet on the storage itself. For user settings / keys which are generated once and distribute etc it is important that the data and meta-info is on the disk before using the data. See Ensuring data reaches disk

  1. create a new temp file (on the same file system!)
  2. write data to the temp file
  3. fsync() the temp file
  4. rename the temp file to the appropriate name
  5. fsync() the containing directory

If the fsync on the data is omitted, the file meta-data might point to an invalid area after a power cycle and become an zero length file on ubifs or zero filled file on ext4. If the fsync on the directory is omitted, the file might still have the temp name after a power cycle. This might be omitted if the application looks for the temp file and can verify it is written completely.

4. Handling failures related to the data partition

In v2.30, various improvements have been added.

vrmlogger reads /run/data-partition-state, translates its content to a number, and sends it to VRM on boot and there-after only when different from its previously submitted value.

In case the data-partition is not mounted (state == failed or state == failed-to-mount); then an init script will stop vrmlogger; since it can't run without datapartition anyway; and uses curl to send dps to VRM itself.

While normally checking all data against a device-authorisation-token; vrm will accept dps transmissions always.

Note that curl sends it as a DPS-TRANSMISSION (c=100). Which causes it to be stored in the events table. Vrmlogger sends it as a normal data transmission; and then its not stored in the events table; instead its in the normal databases.

In VRM; this status is saved as dataAttribute dps; its different values are:

______State______ Description
0 - fine
1 - failed-once Set on device reboot; A run-time read-only remount occurred and was stored in u-boot var `data-failed-count` and a second fail was not detected.
2 - recovered This follows a 'failed-once', after 24 hours of no failure.
3 - failed This is set on device reboot on a second run-time read-only remount, based on the u-boot var `data-failed-count`
4 - failed-to-mount If `/data` wasn't even mounted at boot. It will mount a tmpfs for `/var/log`.

Primary reporting is done with report-data-failure.sh, where it ends up in the eventLog MySQL table. VRM logger also reportes the state of /run/data-partition-state, but failed and failed-to-mount are not (reliably) sent by vrmlogger. See report-data-failure.sh. This is because vrmlogger won't operate properly with a malfunctioning /data.

The test-data-partition.sh script contains more explanation of how the conclusions are reached.

To analyse status in the field; there is a Grafana dashboard.

Note: there is no authoritative status of which Venus/GX Devices are broken. As it stands, the curl script reports 'broken' events but no recovery, and vrmlogger doesn't report 'failed' and 'failed-to-mount'. So, it's half here, half there. Changes in VRM logger are underway to be able to handle /data getting read-only, after which we no longer need the curl reporting.

5. Used data files

vrmlogger

  • db/vrmlogger-backlog.sqlite3

sqlite3 does make sure the data is on the disk in unixSync. Both the data and meta-data.

vebus

  • var/lib/mk2-dbus/mkxport.settings

fsync on data and sync after rename.

serial starter

  • var/lib/serial-starter/ttyUSB0
  • var/lib/serial-starter/ttyUSB1
  • var/lib/serial-starter/ttyUSB2
  • var/lib/serial-starter/ttyUSB3
  • var/lib/serial-starter/ttyUSB4
  • var/lib/serial-starter/ttyO0
  • var/lib/serial-starter/ttyO1
  • var/lib/serial-starter/ttyO2

Not important to be on disk. Files will be recreated when missing.

Connman / glib

  • var/lib/connman/settings
  • var/lib/connman/ethernet_7c38665aa305_cable/data
  • var/lib/connman/ethernet_7c38665aa305_cable/settings

Connman uses the glib g_file_set_contents() which only fsyncs the data when a file gets replaced. This can lead to zero files when the file didn't exist yet. It doesn't do the directory / meta-data fsync after rename. Since they have concerns about performance on spinning disc etc. The glib in Venus is patched to make sure the connman settings hit the disk directly.

Qt4

  • home/root/Settings/Trolltech.conf

fine, not important file

VNC

  • conf/vncpassword.txt
  • conf/vrm_auth_token.txt
  • home/vnctunnel/.ssh/id_rsa
  • home/vnctunnel/.ssh/authorized_keys
  • home/vnctunnel/.ssh/id_rsa.pub

localsettings

  • conf/settings.xml

Didn't fsync the rename, but used to look at the tmp file as well. For consistency changed to flush the rename as well.

MQTT

  • conf/mqtt_password.txt

  • conf/mosquitto.d/vrm_bridge.conf

  • keys/mosquitto.crt

  • keys/mosquitto.key Fine will be recreated by start-mosquitto when corrupt.

opensshd

  • keys/ssh_host_dsa_key
  • keys/ssh_host_rsa_key.pub
  • keys/ssh_host_ecdsa_key.pub
  • keys/ssh_host_rsa_key
  • keys/ssh_host_dsa_key.pub
  • keys/ssh_host_ecdsa_key

Fine, files will be regenerated when invalid.

boot

  • venus/unique-id
  • etc/timestamp
  • var/lib/random-seed

production

  • venus/part-number
  • venus/installer-version
  • venus/serial-number
Clone this wiki locally