Root and home disks full and causing various problems
SailfishOS 1.1.4.28 (Äijänpäivänjärvi) and onwards: Jolla has implemented a workaround
Jolla has implemented filesystem checks to their upgrade process. Balance will be run when you are updating your phone to the latest versions of SailfishOS.
This does not fix any issues you may get in between the update releases. It is still a good idea to avoid filling up your phone's own filesystem more than 7 GB, and use sdcard for storing big amounts of data instead. In any situation where you've gone past the 7 GB limit, even temporarily, you should check the btrfs allocation level on your phone and run balance if necessary.
Btrfs-balancer command requires developer mode.
devel-su
btrfs-balancer allocation
Should the allocation (used) show higher value than 13153337344 bytes
, you should run balance operation. Before doing this, make sure you can connect to your phone via SSH.
btrfs-balancer balance
If you've successfully upgraded to SailfishOS 1.1.7.24 or later, this operation is unlikely to take very long time, as your filesystem has been balanced recently during the upgrade.
Problems and notes:
- Btrfs-balancer's prerequisites seem to be in order now. (Vitaminj found and opened up a bug report regarding btrfs-balance command here: https://together.jolla.com/question/90031/new-btrfs-balance-service-in-114-seems-to-ignore-battery-percentage-gate/
- Btrfs-balance command is scheduled to run in Tuesdays 3:00am as a service. Unfortunately it does not seem to work. It did in 1.1.4.28 but not in later versions, regardless of whether the phone is in sleep state or not.
Below are files introduced with 'btrfs-balancer' command. See also the rest of this article for plain 'btrfs balance' command.
/lib/systemd/system/btrfs-balancer.timer
/lib/systemd/system/btrfs-balance.service
/usr/sbin/btrfs-balancer
/usr/share/btrfs-balancer/btrfs-balancer.conf
Applicable: SailfishOS 1.1.2.16 (Yliaavanlampi) or earlier
This is a report about btrfs filesystem issue on Jolla device's internal sdcard partition /dev/mmcblk0p28. The device contains /, swap and home mountpoints. This issue manifests itself as inability to write on root and home mounts and their subdirectories. This can happen when there are several gigabytes of free space left on the device. You may only want to write 10 kilobytes change and that too fails. With btrfs usual methods of checking free space on the device do not apply, such as Sailfish OS built in report or commands du or df.
There is no clear indication to the user that btrfs is incapable of writing requested changes on the volume, but many usability problems it causes are very clearly noticeable. Because most of the symptoms are not unique to this issue only, it is mandatory to enable developer mode and have a look at the filesystem status and logs.
It is not known whether this is an unfortunate design feature of btrfs or a bug in btrfs implementation of current kernel. Internal mass memory as small as 16 GB may not simply be big enough for a filesystem as complicated as btrfs. Or at least not if the user saves his/hers data on the same volume in any reasonable amount.
If you have constant stability problems with your phone and suspect you are having this issue, it is best to try and fix it before the situation gets worse. Please note that trying to upgrade or factory reset the device instead may complicate your situation considerably. So far updates or factory reset do not fix the ongoing issue on your device! There is a high probability that this problem is going to be fixed or worked around in coming Jolla updates. See the notes in the end of this article.
If you want to overrule the possibility of btrfs allocation problems on your device, the btrfs fi show
command is completely safe to use. See 'How to evaluate the situation' down below.
Unfortunately the actual btrfs balance operation is NOT SAFE. According to Dez's answer in this article, running balance is not safe with current kernel version of Sailfish OS.
Running full balance without filters can cause high CPU and IO loads. This may lead to restarts to sailfish services or something that look like reboots but are only GUI related (green led, black screen). Reboots may happen too if you are very unlucky. Reboots or power cuts during the filesystem operations can cause loss of data and lead to a bricked device, recoverable only by Jolla care with a firmware flasher. However reboot during balance is not "autobrick"; btrfs recovery and balance operations may start and continue after the reboot without problems. This seems to be most often the case.
To keep the balance operation as light as possible running it with balance filter (parameter '-dusage') is highly recommended. With filter CPU and IO load and risks are significantly lower compared to a full balance.
Btrfs balance operation requires enabling developer mode on Jolla phone and some basic experience of working with linux command line. For how to achieve devel-su (eg. root), see this wiki article.
It is vital that you have SSH connection enabled. You cannot do this afterwards if the graphical interface freezes, or your screen turns and stays black just because of high CPU and IO load or crashing services. In addition it would be best that you are familiar with recovery steps using telnet in case that things do not go smoothly.
In no circumstance should you remove the battery during the balance process. You have to cancel the operation or confirm that the operation has finished. Previously enabled and established SSH connection is a way to do it should you not be able to access the terminal window locally on the Jolla phone anymore.
General symptoms
When the user runs out of disk space, at least following symptoms may occur:
- Sailfish browser may crash or does not load pages. Back button may stop working.
- Messages application crashes
- Email application crashes
- Any program that wants to write on disk and fails may freeze and crash
- Any application database may corrupt while the writes fail
- Sailfish user interface freezes, crashes and restarts (green blinking led, blank screen)
- Flight-mode does not engage (possibly connman fails to bring interfaces down). Could affect any connection changes.
- Connecting Jolla to charger causes GUI crash and blinking green led
- Factory reset fails
- Serious issues after the factory reset when trying to update the phone back to current level
What happens under the bonnet
In short the thesis is that the shortage of disk space is caused by how btrfs allocates 1 GB chunks of raw space for filesystem use. When no single existing blockgroup is suitable writing a specific change, new 1 GB chunk gets allocated.
When the user fills up his/her /home mountpoint with data, all free chunks get allocated very easily. Of available 14 GB space on Jolla device, it is enough to write only ~9 GB (estimate) data on the device to allocate whole 14 GB of free space. Generally speaking the longer the device has been in use, more likely it is that all space has already been allocated.
When btrfs filesystem does not find suitable blockgroup where to write the data in, and it cannot allocate a new 1 GB chunk, it gives an error that there is no space left. For a regular Jolla phone user this is visible as crashing applications, including the graphical interface of the phone itself.
Chunk allocation size for filesystem metadata is smaller, 256 MB, but on Jolla device this is duplicated and requires 512 MB.
Note that both data and metadata can run out of space separately, yet the error and symptoms are the same. They also allocate blockgroups from the same unallocated pool on internal sdcard of Jolla device. Fully allocated device does not have enough space to allocate a new metadata blockgroup. This is more difficult situation than just with data blockgroups: You may not be able to delete files, as there is no space to mark this change to the metadata, and the filesystem refuses the operation.
According to answer below, posted in November 2013, a minimum unallocated space for the btrfs filesystem should be at least 1.5 GB. That would be 1 GB for data and 256+256 MB for metadata.
http://permalink.gmane.org/gmane.comp.file-systems.btrfs/30066
(Original question http://comments.gmane.org/gmane.comp.file-systems.btrfs/30047)
For more information about btrfs, including space problems, see:
http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
https://btrfs.wiki.kernel.org/index.php/FAQ
https://btrfs.wiki.kernel.org/index.php/FAQ#if_your_device_is_large_.28.3E16GiB.29
https://btrfs.wiki.kernel.org/index.php/FAQ#What_does_.22balance.22_do.3F
How to evaluate the situation
Useful commands to evaluate the situation are (run as devel-su):
btrfs fi show
btrfs fi df /
btrfs sub list /
Check also these for errors:
journalctl
dmesg | less
When 'btrfs fi show' command shows 13.75GB used of 13.75GB for 'devid 1' (see example below), new chunks cannot be allocated anymore and the random problems begin. Depending on what, when and where is written, write either succeeds or doesn't.
[root@localhost nemo]# btrfs fi show
Label: 'sailfish' uuid: 0f8a2490-53ed-4ff6-ba34-b81df3430387
Total devices 1 FS bytes used 6.42GB
devid 1 size 13.75GB used 13.75GB path /dev/mmcblk0p28
Btrfs v0.20-rc1
At this point you should also try copying ~500 MB of files under your home directory mountpoint with a 'cp' command. If the copying fails with disk space errors, you have encountered a situation where btrfs thinks it absolutely cannot use any existing blockgroups, and tries to allocate new ones, no matter what.
Measures to free up raw space
Deleting big files helps freeing up space both in under visible filesystem and in the area of allocated space. However it does not shrink the allocated space automatically. When you run out of space it is recommended to delete several gigabytes of data to make sure that the filesystem has enough space to "breathe".
The method to free up space reserved by unused or sparsely used blockgroups is called 'balancing'. It resembles filesystem 'defrag' operation, but instead of just speeding up the filesystem by regrouping splintered files, balance also frees up space. Classic filesystems do not suffer from space usage problems to the extent of btrfs.
Balance operation can consume more space than what you have available. If you do not free up enough space in existing blockgroups manually, you may run out of space during the balancing operation depicted below.
Btrfs balance command manual can be found here:
https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-balance
Before running the balance operation
Take backups of everything you can't afford to lose.
Close all unnecessary applications. Ensure that network connections work.
Remove any excess files from the device you may have copied on it earlier. Try to free up preferably several gigabytes of space in filesystem.
Especially look out for directories with very high amounts of files. It has been observed that these can prolong and cause balance operation to run out of disk space (to stop, not harmful). For instance Android's Tripadvisor can cache tens of thousands of files under its data directory, and btrfs balance operation has hard time going through those with IO-rates as bad as sdcards have. Plain uninstalling Android programs does not help; you have to find and delete files manually.
If you are unable to delete files, your problem may be that you are in a chicken-egg situation of running out of metadata space. You can free up metadata space in already used block group by overwriting the data in files you do not need.
echo > /path/to/yourfile
After this you can try normal deleting again.
If you have previously done a factory reset on your device, you may have old snapshots of the filesystem taking up lots of space. To get rid of these, follow instructions in this case:
https://together.jolla.com/question/14633/bug-factory-reset-no-storage-left/
The balance operation is heavy on the cpu and mmc and your phone may become unresponsive during the process. Since it can take from 10 minutes to hours, and even days(!), it is preferred to have the phone connected to a computer or directly in mains. Start the process directly through SSH session, so you can see what is happening all the time.
- If the screen goes blank and you can't wake it up by double tap on the screen or short press of a power button, do not longpress power button or remove the battery from the phone! You would risk losing crucial data or cause bootloop. The phone should work normally when the process is finished. No forced reboots should be needed. Absolutely do not remove the battery!
Open two terminal windows or SSH connections. In a second one you can run dmesg command to see any possible errors during the operation.
As devel-su, at first try to balance any block groups that have 0% or under 5% of space in use. This should be a lot faster operation than full balance. The command will tell you how many chunks were relocated but you can check the result with 'btrfs fi show' too.
btrfs balance start -dusage=0 /
btrfs balance start -dusage=5 /
Keep going to bigger -dusage parameter values in increments of 5 or 10 % until you see difference. After each balance operation, check the situation with btrfs fi show. Really it shouldn't be necessary to go higher than 25 %. Run the full balance without -dusage parameter only if previous did not free up enough block group chunks (atleast 1.5 GB).
Since both / and /home reside on the same btrfs volume, you can use 'btrfs balance start -dusage=xx /home' too. The result is the same.
When the balance operation is finished (normally in under an hour), you get the result:
[root@localhost nemo]# btrfs balance start /
Done, had to relocate 13 out of 13 chunks
Now run 'btrfs fi show' again:
[root@localhost nemo]# btrfs fi show
Label: 'sailfish' uuid: 0f8a2490-53ed-4ff6-ba34-b81df3430387
Total devices 1 FS bytes used 6.42GB
devid 1 size 13.75GB used 10.13GB path /dev/mmcblk0p28
Btrfs v0.20-rc1
As we can see here, with 6.42 GB of actual data balancing can clear up 3.5 GB of raw space for future allocations. You should always have at least 1.5 GB of unallocated disk to spare on the volume 'devid 1'. 1 GB for data and 0.5 GB for next metadata blockgroup.
Cancelling balance operation
If the balance operation runs out of space it ends with an error. In experience this is not destructive. You can clear up more files from the disk and try again.
It has also been observed that if your device reboots during the balance, it is triggered to run automatically after the reboot again.
If the balance operation runs too long you can pause or cancel it with these commands:
btrfs balance cancel /
btrfs balance pause /
btrfs balance resume /
Use another SSH session to run them if you did not start the balance operation as a shell background process.
Please note that you get successful result message from these commands in the original shell window where you started the balance. It may take a minute or two until the balance gets cancelled. If the command is not successful, you get an error message on the same window where tried to do cancel/pause/resume.
Notes and tips
According to Nekron's find here and here, Jolla is planning to add a systemd service and scheduler script for btrfs balance in coming updates.
To be absolutely certain of the health of the filesystem, it would be best to keep Jolla's internal sdcard as empty as possible, and use only external sdcard for storing lots of space consuming data.
Before installing any big Jolla upgrades, it might be wise to clear up lots of free space on disk. Running out of space during the upgrade could cause a reboot loop curable only through the recovery mode and factory reset.
When clearing up space before balance, btrfs FAQ recommends clobbering files instead of deleting them by, eg. either:
true > /path/to/file
echo > /path/to/file
This clears up space without causing need for new metadata allocation. This is useful if you are running out of metadata space. Try '-dusage=0' parameter in balance command first, as it deletes all unused block groups.
-dusage parameter after start command is useful and a lot faster in situations where you have freed up lots of space by deleting files, and just want to free up unused block groups fast.
If you you fail at clearing up enough space for balance via previous methods just keep trying. If nothing else works, you can try Juho's method. It is risky if the phone happens to reboot during the operation.
In case you get "no space left on device" when you are running the balance, you can use the external SD card to get the extra space. Please note that if balance fails when loop device is in use, and you cannot delete it properly, you will end up with devided system partition and your device may not boot anymore. At this stage you cannot factory reset either because factory reset on Jolla depends on btrfs snapshots on now broken filesystem.
dd if=/dev/zero of=/media/sdcard/0000-BDCB/btrfs bs=100M count=5 losetup -v -f /media/sdcard/0000-BDCB/btrfs btrfs device add /dev/loop0 /
That way the balance can get the needed extra. You can remove the device like this:
btrfs device delete /dev/loop0 /
It might be a good idea to use one liner, as suggested by lpr, to make the process a bit less prone to fail:
btrfs device add /dev/loop0 / && btrfs-balancer balance && btrfs device delete /dev/loop0 /
1 GB chunk allocation size was noted with a device with lots of unallocated space left, while copying a large file (4.6 GB) on the device through ssh.
It has been observed with
btrfs fi df /
that during the balance operation filesystem disk usage on the disk slowly but constantly rises to a 1 GB higher figure than what was the original disk usage. When it reaches 1 GB, the usage drops to original level, only to start rising again. It is assumed that filesystem is moving files between block groups, and the process does not discard old block group until the new block group has been completely written. This makes balance as a process less prone to lose data if the system shuts down, but it is ever more important to have enough free space before starting it.Jolla device does have a recovery mode of its own where you can run Btrfs recovery (option 5 in recovery mode). Do this if you cannot access command line through the phone or SSH: https://together.jolla.com/question/22079/howto-all-pc-users-recover-or-reset-a-device-that-is-stuck-in-boot-loop/
Ok, now this problem happened second time. It started again with a crashing browser, then GUI going down couple of times with green led. I was able to make one copy of 75 MB file I've used for testing, but after that the filesystem claims that /home is full. Yet df-command clains over 6 gigs of free space. I think many of the problems reported here at together.jolla.com about the phone crashing all the time, and led blinking green, may be related to this same filesystem or mmc device problem.
Manatus ( 2014-03-02 01:27:32 +0200 )editIn the linked article the problem was resolved by upgrading kernel from 3.2.x to.3.11.x. As Jolla is in 1.0.3.8 running with #Linux kernel 3.4.0 does this suggest that the kernel is maybe too old for reliable btrfs (which is still in heavy development)?
foss4ever ( 2014-03-02 04:35:42 +0200 )editPossibly yes. But of course there is a possibility that Jolla could have backported things from newer kernels.
Being just an enthusiast and not knowing kernel development is hard to say what is the current status of btrfs. Discussions two years ago mentioned that autorecovery that seems to be enabled (according to dmesg) could make things worse. But that was then and I expect Jolla devs knew its current status when they chose btrfs for the release version.
Manatus ( 2014-03-02 13:25:52 +0200 )editStrange. I got UI reset twice. And my disk space is same strange ..
And remove rec* snapshots not helped with this ..
Kaacz ( 2014-03-03 19:52:00 +0200 )editThanks Kaacz! What does
say?
Have you ever filled up your internal storage knowingly on Jolla?
I'm suspecting the cause for this problem could be that btrfs tries to increase the btrfs volume size instead of writing on already existing free space. The reasons could vary, starting from trimming to something else. If you haven't ever filled up your device, then that figure has kept rising for other reasons than pure file size.
Manatus ( 2014-03-03 20:14:38 +0200 )edit