We have moved to a new Sailfish OS Forum. Please start new discussions there.

Revision history [back]

click to hide/show revision 1
initial version

posted 2015-06-17 02:02:40 +0200

Data loss during IO in deep sleep

Preface

I have created a script for regular checking for updates, downloading of zip archive with update and uncompressing it's contents to ~/android_storage/XXX. It worked perfectly during manual testing, but when I finally configured it with timed to run without manual interaction I found that most of the time the script produces corrupted file. WHAT?!

Main point

Long story short: after excessive/exhaustive debugging I found, that writing large amounts of data (~100MB) while Jolla is in sleep mode are likely to fail. The result may vary: you may get truncated file, no file at all or in my case I got file 64 KiB shorter than proper result with 64 KiB chunk missing somewhere around the middle of the file. That was really "WTF?!?!" experience.

IO in ~/android_storage is much more likely to fail than IO without fuse intermediate layer. Most of the time I've noticed write failures, but once I got read failure. Looks like system load (read/write proportion, CPU usage) affects somehow, but I found sequence that always reproduces this issue on my device.

For me this looks like a bug. Operating system might postpone actions when going to sleep, OS might consider long running apps as hanging and ask them (politely for the first time!) to stop, but reporting file system error is a bug. Entering sleep mode should be imperceptible for processes.

Guide to reproduce

Download sample zip file (~100MB). curl -o /home/nemo/android_storage/test.obf.zip -L 'http://download.osmand.net/download.php?standard=yes&file=Russia_moskovskaya-oblast_asia_2.obf.zip'

Save following script as /home/nemo/android_storage/test_unpack.sh

#!/bin/sh

DIR=$(dirname "$0")
ARCHIVE='test.obf.zip'
OUTFILE="$DIR/archive_unzipped_$(date +%T)"

exec >> "$DIR/test_unpack.log"
exec 2>> "$DIR/test_unpack.log"

unzip -p "$DIR/$ARCHIVE" | tee "$OUTFILE" |md5sum
ls -l "$OUTFILE"

echo ====

Schedule it execution for several times: for t in {300..1800..300};do timedclient-qt5 -a'whenDue;runCommand=/bin/sh /home/nemo/android_storage/test_unpack.sh' -e"APPLICATION=test_unpack;ticker=$t"; done

Make sure your device is not connected to charger or USB, switch off screen, put it aside and don't touch for 40 minutes.

When the time passes check ~/android_storage/test_unpack.logHere is my sample output. Only one uncompress out of six was successful!

Workaround and afterword

Now as a workaround I have scheduled the script to run as media_rw user and added several uncompression attempts to the script. I hope it would be robust now. I have neither skills nor courage to investigate the issue further and hope someone will pick it. Well, at least I would be glad if someone tries to follow reproduce guide and check if this issue happens on his device as well.

Data loss during IO in deep sleep

Preface

I have created a script for regular checking for updates, downloading of zip archive with update and uncompressing it's contents to ~/android_storage/XXX. It worked perfectly during manual testing, but when I finally configured it with timed to run without manual interaction I found that most of the time the script produces corrupted file. WHAT?!

Main point

Long story short: after excessive/exhaustive debugging I found, that writing large amounts of data (~100MB) while Jolla is in sleep mode are likely to fail. The result may vary: you may get truncated file, no file at all or in my case I got file 64 KiB shorter than proper result with 64 KiB chunk missing somewhere around the middle of the file. That was really "WTF?!?!" experience.

IO in ~/android_storage is much more likely to fail than IO without fuse intermediate layer. Most of the time I've noticed write failures, but once I got read failure. Looks like system load (read/write proportion, CPU usage) affects somehow, but I found sequence that always reproduces this issue on my device.

For me this looks like a bug. Operating system might postpone actions when going to sleep, OS might consider long running apps as hanging and ask them (politely for the first time!) to stop, but reporting file system error is a bug. Entering sleep mode should be imperceptible for processes.

Guide to reproduce

Download sample zip file (~100MB). curl -o /home/nemo/android_storage/test.obf.zip -L 'http://download.osmand.net/download.php?standard=yes&file=Russia_moskovskaya-oblast_asia_2.obf.zip'

Save following script as /home/nemo/android_storage/test_unpack.sh

#!/bin/sh

DIR=$(dirname "$0")
ARCHIVE='test.obf.zip'
OUTFILE="$DIR/archive_unzipped_$(date +%T)"

exec >> "$DIR/test_unpack.log"
exec 2>> "$DIR/test_unpack.log"

unzip -p "$DIR/$ARCHIVE" | tee "$OUTFILE" |md5sum
ls -l "$OUTFILE"

echo ====

Schedule it execution for several times: for t in {300..1800..300};do timedclient-qt5 -a'whenDue;runCommand=/bin/sh /home/nemo/android_storage/test_unpack.sh' -e"APPLICATION=test_unpack;ticker=$t"; done

Make sure your device is not connected to charger or USB, switch off screen, put it aside and don't touch for 40 minutes.

When the time passes check ~/android_storage/test_unpack.logHere is my sample output. Only one uncompress out of six was successful!

Workaround and afterword

Now as a workaround I have scheduled the script to run as media_rw user to access /opt/alien/data/media/ directly and added several uncompression attempts to the script. I hope it would be robust now. I have neither skills nor courage to investigate the issue further and hope someone will pick it. Well, at least I would be glad if someone tries to follow reproduce guide and check if this issue happens on his device as well.

Data loss during IO in deep sleep

Preface

I have created a script for regular checking for updates, downloading of zip archive with update and uncompressing it's contents to ~/android_storage/XXX. It worked perfectly during manual testing, but when I finally configured it with timed to run without manual interaction I found that most of the time the script produces corrupted file. WHAT?!

Main point

Long story short: after excessive/exhaustive debugging I found, that writing large amounts of data (~100MB) while Jolla is in sleep mode are likely to fail. The result may vary: you may get truncated file, no file at all or in my case I got file 64 KiB shorter than proper result with 64 KiB chunk missing somewhere around the middle of the file. That was really "WTF?!?!" experience.

IO in ~/android_storage is much more likely to fail than IO without fuse intermediate layer. Most of the time I've noticed write failures, but once I got read failure. Looks like system load (read/write proportion, CPU usage) affects somehow, but I found sequence that always reproduces this issue on my device.device (see below).

For me this looks like a bug. Operating system might postpone actions when going to sleep, OS might consider long running apps as hanging and ask them (politely for the first time!) to stop, but reporting file system error is a bug. Entering sleep mode should be imperceptible for processes.

Guide to reproduce

Download sample zip file (~100MB). curl -o /home/nemo/android_storage/test.obf.zip -L 'http://download.osmand.net/download.php?standard=yes&file=Russia_moskovskaya-oblast_asia_2.obf.zip'

Save following script as /home/nemo/android_storage/test_unpack.sh

#!/bin/sh

DIR=$(dirname "$0")
ARCHIVE='test.obf.zip'
OUTFILE="$DIR/archive_unzipped_$(date +%T)"

exec >> "$DIR/test_unpack.log"
exec 2>> "$DIR/test_unpack.log"

unzip -p "$DIR/$ARCHIVE" | tee "$OUTFILE" |md5sum
ls -l "$OUTFILE"

echo ====

Schedule it execution for several times: for t in {300..1800..300};do timedclient-qt5 -a'whenDue;runCommand=/bin/sh /home/nemo/android_storage/test_unpack.sh' -e"APPLICATION=test_unpack;ticker=$t"; done

Make sure your device is not connected to charger or USB, switch off screen, put it aside and don't touch for 40 minutes.

When the time passes check ~/android_storage/test_unpack.logHere is my sample output. Only one uncompress out of six was successful!

Workaround and afterword

Now as a workaround I have scheduled the script to run as media_rw user to access /opt/alien/data/media/ directly and added several uncompression attempts to the script. I hope it would be robust now. I have neither skills nor courage to investigate the issue further and hope someone will pick it. Well, at least I would be glad if someone tries to follow reproduce guide and check if this issue happens on his device as well.