We have moved to a new Sailfish OS Forum. Please start new discussions there.
1

file corruption caused by flash memory or btrfs?

asked 2015-07-14 10:52:54 +0200

ugeuder gravatar image

updated 2015-07-15 14:34:26 +0200

I am a bit astonished that the search does not turn up any match for "btrfs scrub".

After using my device for less then 6 months with pretty modest data volumes (not a single piece of music, just some pictures and videos from the device camera) my Jolla got pretty slow compared to when it was new.

So I decided to run "btrfs balance" as mention repeatedly in this forum. That run never to completion, first it reported out of space (although several GB free). So I copied a couple of bigger videos out of the device using scp. Already scp complained that 2 of the videos could not be read because of I/O error. (I only deleted those, which could be copied successfully.) Further "btrfs balance" attempts got a bit further, but eventually I got stuck with I/O errors.

So I found the "btrfs scrub" command. It also seem to stop at the first uncorrectable I/O error. I found 2 videos, which could not be read by scp and also their inodes where reported by scrub. I found out that ffmpeg happily copies them until the first error (also the Jolla player plays that part), so I could at least salvage the first parts of the videos (actually more than 80%). After deleting those offenders, both "btrfs scrub" and "btrfs balance" completed without further complaints.

(I'm really astonished that "btrfs scrub" is so much faster than "btrfs balance".)

That was about 3 weeks ago. Yesterday again I had a corrupted file [1]. Not sure whether the corruption happened on the network or in the storage side. Anyway I run "btrfs scrub" again and it reported one

WARNING: errors detected during scrubbing, corrected.

I could not find an inode or any other info from dmesg this time.

So long story, shorter questions:

Has anybody else seen file corruption? Any opinions whether these are HW problems (flash memory) or software problems (btrfs in Jollas somewhat dated kernel probably still has quite some bugs)? Of course flash has a limited life length, but 6 months with very moderate usage sounds a bit too short.


[1] currently last comment of the original post at https://together.jolla.com/question/30926/howto-install-google-play/ (I cannot find out how to created direct links to comments and the 13 hours ago is not good reference either...)


Addition 15-Jul-2015:

more of them.. That's what they look like in demsg

[45175.680384] btrfs csum failed ino 135610 off 63725568 csum 1759428533 private 68200767
[45175.682734] btrfs csum failed ino 135610 off 63725568 csum 1759428533 private 68200767
[45178.607440] btrfs csum failed ino 111540 off 241664 csum 3887731566 private 2339636708
[45178.609332] btrfs csum failed ino 111540 off 241664 csum 3887731566 private 2339636708
[45180.916428] btrfs csum failed ino 135565 off 48885760 csum 3301422542 private 2819203908
[45180.918565] btrfs csum failed ino 135565 off 48885760 csum 3301422542 private 2819203908
[45184.222120] btrfs csum failed ino 111523 off 90112 csum 253123999 private 1675119381
[45184.224501] btrfs csum failed ino 111523 off 90112 csum 253123999 private 1675119381

I don't have time to debug that now, want to leave to holidays...

edit retag flag offensive close delete

Comments

1

Have you raised this with Jolla care so they can assist you in investigating? Did you allow the automated process to complete (Tuesday mornings if I remember right)?

Whilst doing it yourself is possible, if there is a hardware issue with your device it maybe worth getting it done under a ticket so that the results can be analysed on completion!

timearp ( 2015-07-14 11:14:37 +0200 )edit
1

A corrupted download can have various reasons, only one of it is your device ran full even though it is likely to happen especially with alien-dalvik (android) in general as those apps and the whole system is leaking to sailfish RAM and internal storage, I have not much installed and android's /data/media on uSD but that is actually how I recognize it, my uSD is running on its limits ever since I share space with android, so now have that on internal storage and you will frequently run into trouble. Android needs its own subvolume with quota to prevent it caching the internet to your device!

chemist ( 2015-07-14 12:55:57 +0200 )edit

@timearp I have seen this automated balance(?) mentioned before. I haven't investigated how it works and I have certainly not touched it. So I assume it should have completed as long as it was still possible. However, once no space and I/O errors started I would assume that that automatic process stop at the same point as my manual ones. Not sure whether there is any way to find out after the device has been rebooted

ugeuder ( 2015-07-14 17:20:40 +0200 )edit

@chemist Yeah I can image that alien dalvik leaks. After having seen its logs during a night (when I was asleep and did not touch the device, I keep it usually stopped (or even masked). I'm quite sure it is a battery killer at least. I only start it only on the rare occasion that I want to use some Android app.

ugeuder ( 2015-07-14 17:24:57 +0200 )edit

1 Answer

Sort by » oldest newest most voted
4

answered 2015-12-12 23:36:28 +0200

piezpai gravatar image

updated 2015-12-12 23:43:35 +0200

After upgrading to version SailfishOS 1.1.9.30 (Eineheminlampi) (armv7hl) the device rebooted rather often without obvious reason. I did not investigate further, but after several days, it did not boot up any more. It stopped with the sailfish logo.

Recovery mode (remove battery, press volume down ..., enter bash) showed an unmountable root volume:

mkdir /mymmc/
mount /dev/mmcblk0p28 /mymmc/

did not work, dmesg said something about btrfs errors. I copied the volume to my computer to investigate further:

# copy filesystem from Jolla to SD-card
# start Jolla in rescue mode
# - Take battery out
# - Press volume down button, insert battery. Still holding volume down press power button until the device vibrates.
# - Connect to PC via USB, log into recovery menu:
telnet 10.42.66.66
# Option "4" starts a shell
# Copy root filesystem volume
mkdir /mysd
mount /dev/mmcblk1p1 /mysd
dd if=/dev/mmcblk0p28 of=/mysd/mmcblk0p28.img
# wait (this will create a 14761836032 byte image (~14GB))

This image was unmountable on PC, until I cleared the transaction log:

mkdir rootfs/
# try to clear the log
btrfs-zero-log mmcblk0p28.img
# try to mount
mount -t btrfs -o loop,ro,recovery,skip_balance mmcblk0p28.img rootfs/

As the image looked rather damaged, I copied all data to create a new filesystem (format).

# extract data from the image
mkdir btrfs_backup/
rsync --archive --xattrs --acls --hard-links rootfs/ btrfs_backup/

#############

# create filesystem without extra features
losetup /dev/loop0 mmcblk0p28.img
mkfs.btrfs -O ^extref,^skinny-metadata -f /dev/loop0
losetup -d /dev/loop0

# mount the new filesystem
mount -o loop mmcblk0p28.img rootfs/

# create subvolumes
cd rootfs/
btrfs subvolume create factory-@
btrfs subvolume create factory-@home
btrfs subvolume create @
btrfs subvolume create @home

# set default filesystem to "@"
btrfs subvolume set-default 269 .

# it should look like this
btrfs subvolume list .
ID 258 gen 22 top level 5 path factory-@
ID 259 gen 22 top level 5 path factory-@home
ID 260 gen 1253 top level 5 path @
ID 261 gen 1251 top level 5 path @home

# copy data into it (same rsync options as above)
rsync -aHAX btrfs_backup/ rootfs/

Finally, the fstab has to be adapted as it contains the UUID, which changed due to formatting.

UUID=45e2b400-9634-4a7f-8fa2-5575f2368def  /  btrfs  defaults,autodefrag,noatime 0 0
UUID=45e2b400-9634-4a7f-8fa2-5575f2368def  /home  btrfs  noatime,subvol=@home 0 0
devpts     /dev/pts  devpts  gid=5,mode=620   0 0
tmpfs      /dev/shm  tmpfs   defaults         0 0
proc       /proc     proc    defaults         0 0
sysfs      /sys      sysfs   defaults         0 0
The real UUID of the newly formated volume can be read with blkid This way my Jolla is currently running again, although it produces csum errors from time to time.

In short, yes

As the new, fresh formatted volume showed csum errors even after writing the image to the phone I currently believe that it is a hardware issue.

The volume run clean through btrfs scrub rootfs/ on PC but after it was copied to the device in recovery mode a btrfs scrub mymmc/ from inside recovery mode showed csum errors.

edit flag offensive delete publish link more

Comments

1

Shouldn't eMMCs replace badblocks with unused? How about reducing the size of the volume to give it a chance to do that? (Just pushing some ideas, no clue what could work in the end though)

chemist ( 2015-12-14 12:58:08 +0200 )edit

I know this from (spinning) hard disks: the controller replaces a defective sector with a spare sector from outside the usual addressable range. This requires that there are spare sectors. Reducing the size of the space used by the filesystem inside the partition will not help as the controller has no knowledge of which space is in use and which will never be used. One exception is the TRIM command: it tells which space inside a file system is unused - reducing the size of the volume does still not affect anything.

piezpai ( 2015-12-14 21:22:18 +0200 )edit

@piezpai that is non-spinning storage only, HDDs (spinning) just mark a block as bad after failed write and the size of the available usable blocks just degrades over time. The TRIM command is already in place on Jolla1 (mount options) so I am wondering about what is going sideways here.

chemist ( 2015-12-14 21:29:30 +0200 )edit
Login/Signup to Answer

Question tools

Follow
5 followers

Stats

Asked: 2015-07-14 10:52:54 +0200

Seen: 1,101 times

Last updated: Dec 12 '15