Avoid Catch 22 with BTRFS
On 1.1.4.29 I ended up in a deadlock through the dreaded metadata full - 8GB of internal MMC used but 13.75GB of 13.75GB reported as full from btrfs fi show. I could (for a time) still delete some files (though I didn't have much to delete). But this didn't make it possible to start any balancing beyond the filter -dusage=3, where it had no chunks to move.
Following some old advice on this forum (https://together.jolla.com/question/30822/root-and-home-disks-full-and-causing-various-problems/) (and reading some more less then pellucid discussion elsewhere on interwebs on btrfs) I created a loop device on the sdcard and added that to the btrfs volume. I was now able to balance the disk as a whole (though unfiltered, 'btrfs balance start /' still failed with error no space). However, I could not delete the added loop volume, as per instructions: although I had not instructed btrfs to do this, it had now arranged metadata as per RAID 1 rather than single. Unfortunately, at this point, rather than discovering the command to turn metadata back to single, I rebooted. Disaster! Filesystem now corrupt - impossible to run recovery operation. Eventually I saved the system by booting into recovery console; creating an additional loop device; adding that to the btrfs volume; deleting the missing device from btrfs volume; and then rebalancing for metadata to be single (at this point it needed the force flag, since metadata integrity could not be guaranteed given missing device). Then deleted new loop device from btrfs volume, and issued (successful) factory reset command from recovery console.
Note issuing btrfs from recovery console did not solve the original problem of a frozen fs (before I used the trick of adding a loop device). And it did not solve the missing device problem after I managed to re-balance but foolishly rebooted. Moreover, it was not possible to do a factory reset at this point before I went through the above described procedure, which seems to me a very serious bug.
I think Jolla should not leave the end user in this kind of situation. At the very least, the recovery script should be prepared to reformat the mmcblk0p28, which would remove the error of a missing device.
I wrote this (long screed) mainly to warn people that at least from btrfs 3.16, adding an additional device to balance has the risk of turning your system into a RAID 1 system which will prevent you from removing the added device until you re-balance it as single. This was not flagged on together jolla.
And that's why LVM thin provisioning is much better & robust CoW than what BTRFS is trying to do - LVM thin pool has separate internal metadata and data volumes.
Like this you always know how many data and metadata blocks are available & you can (separately) extend the volumes if you are running out of space. Or you can of course remove something (files, old, snapshots, etc.) to free some space.
As four your issue - LVM would just not convert you metadata volume to RAID1 without telling you.
On the other hand, if you added a loop file with as a PV to a volume group & used it (say for expanding a thin pool metadata or data LV) - you would still be in trouble after reboot (nothing LVM can do about such user-induced error). On the other hand the data would still be there, you would just need to boot the system somehow and add the loop file again, so that LVM could find it and reconstruct the volume group.
In short: If you are adding temporary block devices to a volume manager (BTRFS, LVM, ZFS, etc.), always properly remove then before rebooting!
MartinK ( 2015-06-01 00:12:14 +0200 )editDrawback with LVM is that a snapshot has the same UUID as the device as you took a snapshot of.
Trizt ( 2015-07-17 23:08:30 +0200 )editCould you please elaborate which commands you have used to fix it? Im stuck at the same part!
OK -> Eventually I saved the system by booting into recovery console; ? -> creating an additional loop device Use losetup on what?
? -> and then rebalancing for metadata to be single What command does that?
Help much appreciated
ozzi776 ( 2016-05-10 18:33:51 +0200 )edit