Ask / Submit
291

Root and home disks full and causing various problems

asked 2014-02-27 13:37:14 +0200

Manatus gravatar image

updated 2016-09-07 19:12:00 +0200

SailfishOS 1.1.4.28 (Äijänpäivänjärvi) and onwards: Jolla has implemented a workaround

Jolla has implemented filesystem checks to their upgrade process. Balance will be run when you are updating your phone to the latest versions of SailfishOS.

This does not fix any issues you may get in between the update releases. It is still a good idea to avoid filling up your phone's own filesystem more than 7 GB, and use sdcard for storing big amounts of data instead. In any situation where you've gone past the 7 GB limit, even temporarily, you should check the btrfs allocation level on your phone and run balance if necessary.

Btrfs-balancer command requires developer mode.

devel-su
btrfs-balancer allocation

Should the allocation (used) show higher value than 13153337344 bytes, you should run balance operation. Before doing this, make sure you can connect to your phone via SSH.

btrfs-balancer balance

If you've successfully upgraded to SailfishOS 1.1.7.24 or later, this operation is unlikely to take very long time, as your filesystem has been balanced recently during the upgrade.

Problems and notes:

  • Btrfs-balancer's prerequisites seem to be in order now. (Vitaminj found and opened up a bug report regarding btrfs-balance command here: https://together.jolla.com/question/90031/new-btrfs-balance-service-in-114-seems-to-ignore-battery-percentage-gate/
  • Btrfs-balance command is scheduled to run in Tuesdays 3:00am as a service. Unfortunately it does not seem to work. It did in 1.1.4.28 but not in later versions, regardless of whether the phone is in sleep state or not.
  • Below are files introduced with 'btrfs-balancer' command. See also the rest of this article for plain 'btrfs balance' command.

    /lib/systemd/system/btrfs-balancer.timer

    /lib/systemd/system/btrfs-balance.service

    /usr/sbin/btrfs-balancer

    /usr/share/btrfs-balancer/btrfs-balancer.conf


Applicable: SailfishOS 1.1.2.16 (Yliaavanlampi) or earlier

This is a report about btrfs filesystem issue on Jolla device's internal sdcard partition /dev/mmcblk0p28. The device contains /, swap and home mountpoints. This issue manifests itself as inability to write on root and home mounts and their subdirectories. This can happen when there are several gigabytes of free space left on the device. You may only want to write 10 kilobytes change and that too fails. With btrfs usual methods of checking free space on the device do not apply, such as Sailfish OS built in report or commands du or df.

There is no clear indication to the user that btrfs is incapable of writing requested changes on the volume, but many usability problems it causes are very clearly noticeable. Because most of the symptoms are not unique to this issue only, it is mandatory to enable developer mode and have a look at the filesystem status and logs.

It is not known whether this is an unfortunate design feature of btrfs or a bug in btrfs implementation of current kernel. Internal mass memory as small as 16 GB may not simply be big enough for a filesystem as complicated as btrfs. Or at least not if the user saves his/hers data on the same volume in any reasonable amount.

If you have constant stability problems with your phone and suspect you are having this issue, it is best to try and fix it before the situation gets worse. Please note that trying to upgrade or factory reset the device instead may complicate your situation considerably. So far updates or factory reset do not fix the ongoing issue on your device! There is a high probability that this problem is going to be fixed or worked around in coming Jolla updates. See the notes in the end of this article.

If you want to overrule the possibility of btrfs allocation problems on your device, the btrfs fi show command is completely safe to use. See 'How to evaluate the situation' down below.

Unfortunately the actual btrfs balance operation is NOT SAFE. According to Dez's answer in this article, running balance is not safe with current kernel version of Sailfish OS.

Running full balance without filters can cause high CPU and IO loads. This may lead to restarts to sailfish services or something that look like reboots but are only GUI related (green led, black screen). Reboots may happen too if you are very unlucky. Reboots or power cuts during the filesystem operations can cause loss of data and lead to a bricked device, recoverable only by Jolla care with a firmware flasher. However reboot during balance is not "autobrick"; btrfs recovery and balance operations may start and continue after the reboot without problems. This seems to be most often the case.

To keep the balance operation as light as possible running it with balance filter (parameter '-dusage') is highly recommended. With filter CPU and IO load and risks are significantly lower compared to a full balance.

Btrfs balance operation requires enabling developer mode on Jolla phone and some basic experience of working with linux command line. For how to achieve devel-su (eg. root), see this wiki article.

It is vital that you have SSH connection enabled. You cannot do this afterwards if the graphical interface freezes, or your screen turns and stays black just because of high CPU and IO load or crashing services. In addition it would be best that you are familiar with recovery steps using telnet in case that things do not go smoothly.

In no circumstance should you remove the battery during the balance process. You have to cancel the operation or confirm that the operation has finished. Previously enabled and established SSH connection is a way to do it should you not be able to access the terminal window locally on the Jolla phone anymore.


General symptoms

When the user runs out of disk space, at least following symptoms may occur:

  1. Sailfish browser may crash or does not load pages. Back button may stop working.
  2. Messages application crashes
  3. Email application crashes
  4. Any program that wants to write on disk and fails may freeze and crash
  5. Any application database may corrupt while the writes fail
  6. Sailfish user interface freezes, crashes and restarts (green blinking led, blank screen)
  7. Flight-mode does not engage (possibly connman fails to bring interfaces down). Could affect any connection changes.
  8. Connecting Jolla to charger causes GUI crash and blinking green led
  9. Factory reset fails
  10. Serious issues after the factory reset when trying to update the phone back to current level

What happens under the bonnet

In short the thesis is that the shortage of disk space is caused by how btrfs allocates 1 GB chunks of raw space for filesystem use. When no single existing blockgroup is suitable writing a specific change, new 1 GB chunk gets allocated.

When the user fills up his/her /home mountpoint with data, all free chunks get allocated very easily. Of available 14 GB space on Jolla device, it is enough to write only ~9 GB (estimate) data on the device to allocate whole 14 GB of free space. Generally speaking the longer the device has been in use, more likely it is that all space has already been allocated.

When btrfs filesystem does not find suitable blockgroup where to write the data in, and it cannot allocate a new 1 GB chunk, it gives an error that there is no space left. For a regular Jolla phone user this is visible as crashing applications, including the graphical interface of the phone itself.

Chunk allocation size for filesystem metadata is smaller, 256 MB, but on Jolla device this is duplicated and requires 512 MB.

Note that both data and metadata can run out of space separately, yet the error and symptoms are the same. They also allocate blockgroups from the same unallocated pool on internal sdcard of Jolla device. Fully allocated device does not have enough space to allocate a new metadata blockgroup. This is more difficult situation than just with data blockgroups: You may not be able to delete files, as there is no space to mark this change to the metadata, and the filesystem refuses the operation.

According to answer below, posted in November 2013, a minimum unallocated space for the btrfs filesystem should be at least 1.5 GB. That would be 1 GB for data and 256+256 MB for metadata.

http://permalink.gmane.org/gmane.comp.file-systems.btrfs/30066

(Original question http://comments.gmane.org/gmane.comp.file-systems.btrfs/30047)

For more information about btrfs, including space problems, see:

http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html

https://btrfs.wiki.kernel.org/index.php/FAQ

https://btrfs.wiki.kernel.org/index.php/FAQ#if_your_device_is_large_.28.3E16GiB.29

https://btrfs.wiki.kernel.org/index.php/FAQ#Aaargh.21_My_filesystem_is_full.2C_and_I.27ve_put_almost_nothing_into_it.21

https://btrfs.wiki.kernel.org/index.php/FAQ#What_does_.22balance.22_do.3F


How to evaluate the situation

Useful commands to evaluate the situation are (run as devel-su):

btrfs fi show
btrfs fi df /
btrfs sub list /

Check also these for errors:

journalctl
dmesg | less

When 'btrfs fi show' command shows 13.75GB used of 13.75GB for 'devid 1' (see example below), new chunks cannot be allocated anymore and the random problems begin. Depending on what, when and where is written, write either succeeds or doesn't.

[root@localhost nemo]# btrfs fi show
Label: 'sailfish'  uuid: 0f8a2490-53ed-4ff6-ba34-b81df3430387
    Total devices 1 FS bytes used 6.42GB
    devid    1 size 13.75GB used 13.75GB path /dev/mmcblk0p28
Btrfs v0.20-rc1

At this point you should also try copying ~500 MB of files under your home directory mountpoint with a 'cp' command. If the copying fails with disk space errors, you have encountered a situation where btrfs thinks it absolutely cannot use any existing blockgroups, and tries to allocate new ones, no matter what.


Measures to free up raw space

Deleting big files helps freeing up space both in under visible filesystem and in the area of allocated space. However it does not shrink the allocated space automatically. When you run out of space it is recommended to delete several gigabytes of data to make sure that the filesystem has enough space to "breathe".

The method to free up space reserved by unused or sparsely used blockgroups is called 'balancing'. It resembles filesystem 'defrag' operation, but instead of just speeding up the filesystem by regrouping splintered files, balance also frees up space. Classic filesystems do not suffer from space usage problems to the extent of btrfs.

Balance operation can consume more space than what you have available. If you do not free up enough space in existing blockgroups manually, you may run out of space during the balancing operation depicted below.

Btrfs balance command manual can be found here:

https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-balance

Before running the balance operation

  1. Take backups of everything you can't afford to lose.

  2. Close all unnecessary applications. Ensure that network connections work.

  3. Remove any excess files from the device you may have copied on it earlier. Try to free up preferably several gigabytes of space in filesystem.

    • Especially look out for directories with very high amounts of files. It has been observed that these can prolong and cause balance operation to run out of disk space (to stop, not harmful). For instance Android's Tripadvisor can cache tens of thousands of files under its data directory, and btrfs balance operation has hard time going through those with IO-rates as bad as sdcards have. Plain uninstalling Android programs does not help; you have to find and delete files manually.

    • If you are unable to delete files, your problem may be that you are in a chicken-egg situation of running out of metadata space. You can free up metadata space in already used block group by overwriting the data in files you do not need.

      echo > /path/to/yourfile

      After this you can try normal deleting again.

    • If you have previously done a factory reset on your device, you may have old snapshots of the filesystem taking up lots of space. To get rid of these, follow instructions in this case:

      https://together.jolla.com/question/14633/bug-factory-reset-no-storage-left/

  4. The balance operation is heavy on the cpu and mmc and your phone may become unresponsive during the process. Since it can take from 10 minutes to hours, and even days(!), it is preferred to have the phone connected to a computer or directly in mains. Start the process directly through SSH session, so you can see what is happening all the time.

    • If the screen goes blank and you can't wake it up by double tap on the screen or short press of a power button, do not longpress power button or remove the battery from the phone! You would risk losing crucial data or cause bootloop. The phone should work normally when the process is finished. No forced reboots should be needed. Absolutely do not remove the battery!
  5. Open two terminal windows or SSH connections. In a second one you can run dmesg command to see any possible errors during the operation.

  6. As devel-su, at first try to balance any block groups that have 0% or under 5% of space in use. This should be a lot faster operation than full balance. The command will tell you how many chunks were relocated but you can check the result with 'btrfs fi show' too.

    btrfs balance start -dusage=0 /

    btrfs balance start -dusage=5 /

Keep going to bigger -dusage parameter values in increments of 5 or 10 % until you see difference. After each balance operation, check the situation with btrfs fi show. Really it shouldn't be necessary to go higher than 25 %. Run the full balance without -dusage parameter only if previous did not free up enough block group chunks (atleast 1.5 GB).

Since both / and /home reside on the same btrfs volume, you can use 'btrfs balance start -dusage=xx /home' too. The result is the same.

When the balance operation is finished (normally in under an hour), you get the result:

[root@localhost nemo]# btrfs balance start /
Done, had to relocate 13 out of 13 chunks

Now run 'btrfs fi show' again:

[root@localhost nemo]# btrfs fi show
Label: 'sailfish'  uuid: 0f8a2490-53ed-4ff6-ba34-b81df3430387
Total devices 1 FS bytes used 6.42GB
devid    1 size 13.75GB used 10.13GB path /dev/mmcblk0p28
Btrfs v0.20-rc1

As we can see here, with 6.42 GB of actual data balancing can clear up 3.5 GB of raw space for future allocations. You should always have at least 1.5 GB of unallocated disk to spare on the volume 'devid 1'. 1 GB for data and 0.5 GB for next metadata blockgroup.


Cancelling balance operation

If the balance operation runs out of space it ends with an error. In experience this is not destructive. You can clear up more files from the disk and try again.

It has also been observed that if your device reboots during the balance, it is triggered to run automatically after the reboot again.

If the balance operation runs too long you can pause or cancel it with these commands:

btrfs balance cancel /
btrfs balance pause /
btrfs balance resume /

Use another SSH session to run them if you did not start the balance operation as a shell background process.

Please note that you get successful result message from these commands in the original shell window where you started the balance. It may take a minute or two until the balance gets cancelled. If the command is not successful, you get an error message on the same window where tried to do cancel/pause/resume.


Notes and tips

  • According to Nekron's find here and here, Jolla is planning to add a systemd service and scheduler script for btrfs balance in coming updates.

  • To be absolutely certain of the health of the filesystem, it would be best to keep Jolla's internal sdcard as empty as possible, and use only external sdcard for storing lots of space consuming data.

  • Before installing any big Jolla upgrades, it might be wise to clear up lots of free space on disk. Running out of space during the upgrade could cause a reboot loop curable only through the recovery mode and factory reset.

  • When clearing up space before balance, btrfs FAQ recommends clobbering files instead of deleting them by, eg. either:

    true > /path/to/file

    echo > /path/to/file

    This clears up space without causing need for new metadata allocation. This is useful if you are running out of metadata space. Try '-dusage=0' parameter in balance command first, as it deletes all unused block groups.

  • -dusage parameter after start command is useful and a lot faster in situations where you have freed up lots of space by deleting files, and just want to free up unused block groups fast.

  • If you you fail at clearing up enough space for balance via previous methods just keep trying. If nothing else works, you can try Juho's method. It is risky if the phone happens to reboot during the operation.

In case you get "no space left on device" when you are running the balance, you can use the external SD card to get the extra space. Please note that if balance fails when loop device is in use, and you cannot delete it properly, you will end up with devided system partition and your device may not boot anymore. At this stage you cannot factory reset either because factory reset on Jolla depends on btrfs snapshots on now broken filesystem.

dd if=/dev/zero of=/media/sdcard/0000-BDCB/btrfs bs=100M count=5
losetup -v -f /media/sdcard/0000-BDCB/btrfs
btrfs device add /dev/loop0 /

That way the balance can get the needed extra. You can remove the device like this:

btrfs device delete /dev/loop0 /

It might be a good idea to use one liner, as suggested by lpr, to make the process a bit less prone to fail:

btrfs device add /dev/loop0 / && btrfs-balancer balance && btrfs device delete /dev/loop0 /
  • 1 GB chunk allocation size was noted with a device with lots of unallocated space left, while copying a large file (4.6 GB) on the device through ssh.

  • It has been observed with btrfs fi df / that during the balance operation filesystem disk usage on the disk slowly but constantly rises to a 1 GB higher figure than what was the original disk usage. When it reaches 1 GB, the usage drops to original level, only to start rising again. It is assumed that filesystem is moving files between block groups, and the process does not discard old block group until the new block group has been completely written. This makes balance as a process less prone to lose data if the system shuts down, but it is ever more important to have enough free space before starting it.

  • Jolla device does have a recovery mode of its own where you can run Btrfs recovery (option 5 in recovery mode). Do this if you cannot access command line through the phone or SSH: https://together.jolla.com/question/22079/howto-all-pc-users-recover-or-reset-a-device-that-is-stuck-in-boot-loop/

edit retag flag offensive close delete

Comments

4

Ok, now this problem happened second time. It started again with a crashing browser, then GUI going down couple of times with green led. I was able to make one copy of 75 MB file I've used for testing, but after that the filesystem claims that /home is full. Yet df-command clains over 6 gigs of free space. I think many of the problems reported here at together.jolla.com about the phone crashing all the time, and led blinking green, may be related to this same filesystem or mmc device problem.

Manatus ( 2014-03-02 01:27:32 +0200 )edit

In the linked article the problem was resolved by upgrading kernel from 3.2.x to.3.11.x. As Jolla is in 1.0.3.8 running with #Linux kernel 3.4.0 does this suggest that the kernel is maybe too old for reliable btrfs (which is still in heavy development)?

foss4ever ( 2014-03-02 04:35:42 +0200 )edit

Possibly yes. But of course there is a possibility that Jolla could have backported things from newer kernels.

Being just an enthusiast and not knowing kernel development is hard to say what is the current status of btrfs. Discussions two years ago mentioned that autorecovery that seems to be enabled (according to dmesg) could make things worse. But that was then and I expect Jolla devs knew its current status when they chose btrfs for the release version.

Manatus ( 2014-03-02 13:25:52 +0200 )edit

Strange. I got UI reset twice. And my disk space is same strange ..

# btrfs filesystem show 
Label: 'sailfish'  uuid: 0f8a2490-53ed-4ff6-ba34-b81df3430387
        Total devices 1 FS bytes used 3.92GB
        devid    1 size 13.75GB used 13.75GB path /dev/mmcblk0p28
# df -h | grep /dev/mmcblk0p28
/dev/mmcblk0p28        14G  4.1G  9.3G  31% /
/dev/mmcblk0p28        14G  4.1G  9.3G  31% /swap
/dev/mmcblk0p28        14G  4.1G  9.3G  31% /home

And remove rec* snapshots not helped with this ..

Kaacz ( 2014-03-03 19:52:00 +0200 )edit

Thanks Kaacz! What does

btrfs fi df /

say?

Have you ever filled up your internal storage knowingly on Jolla?

I'm suspecting the cause for this problem could be that btrfs tries to increase the btrfs volume size instead of writing on already existing free space. The reasons could vary, starting from trimming to something else. If you haven't ever filled up your device, then that figure has kept rising for other reasons than pure file size.

Manatus ( 2014-03-03 20:14:38 +0200 )edit

11 Answers

Sort by » oldest newest most voted
0

answered 2015-04-29 10:22:28 +0200

cy8aer gravatar image

updated 2015-04-29 10:45:32 +0200

Please forgive me for this answer (it is no answer) and eventually a "rant" - Information of this btrfs balance thread is important for the end user - when shipping 1.1.4:

I for myself was surprised why my battery suddenly and drastically drained overnight with 1.1.4.28 upwards. When you look at the actual posts of battery problems you may find out that these posts become more frequent around thursday/wednesday - and it does not really look like filthy contacts every time.

I am running productive machines with btrfs and I know that it is necessary to run btrfs balance frequently (otherwise you may have the remount ro problem on disk full problem). But I also know that this needs CPU power - and this would drain the battery on a battery powered device.

With the update of this post:

Btrfs-balance command is scheduled to run in Tuesdays 3:00am as a service. For now this requires that the phone is connected to a charger, as the timer service does not get executed when the phone is in deep sleep state.

it is clear why my battery drains - does it really not balance without charger? Are there states where the device is out of deep sleep at this time? So for publishing 1.1.4.x it is needed

  • to have this warning in the changelog/release notes of the end version that there may be a battery drain - when the device does not sleep
  • eventually at a next release have a ui interface for the end user to set the time of balancing and or temporary suppressing this balancing

Sometimes you need your device over monday/thursday night without battery and after massive data use (which has metadata changes) you would be unhappy at the next morning.

Update (forgot the rant ;-)) I hate devices/software which change the normal behaviour without any control (who looks into systemd confs after an update) or information about.

edit flag offensive delete publish link more

Comments

1

Are there states where the device is out of deep sleep at this time? Settings > System > Display: "Keep display on while charging".

So for publishing 1.1.4.x it is needed... Thanks for the hint. Release notes to be improved.

jovirkku ( 2015-04-29 10:46:06 +0200 )edit
1

"does it really not balance without charger?" - No, it migth also run without a charger, as long as the device happens to be awake. The charger is only needed to make sure the systemd timer unit gets run since systemd cannot wake up the device (in that version at least).

Jolly-Jo ( 2015-04-29 10:55:31 +0200 )edit

During crtical operations (including OS updates, filesystem balance, device reset) it is recommended to keep the phone connected to a charger. In this way sudden shutdowns are best avoided.

jovirkku ( 2015-06-05 11:53:40 +0200 )edit
Login/Signup to Answer

Question tools

Follow
95 followers

Stats

Asked: 2014-02-27 13:37:14 +0200

Seen: 29,257 times

Last updated: Sep 07 '16