We have moved to a new Sailfish OS Forum. Please start new discussions there.
4

locale set to .utf8, not .UTF-8, causes problems in some applications

asked 2020-06-01 01:55:07 +0200

Kabouik gravatar image

updated 2020-06-03 21:46:58 +0200

[Edit] According to RFC documents, *-UTF-8 is the standard. In most cases, using *.utf8 should be equally recognized, but my humble tests suggest that there are some apps that expect only the standard format and will fail to display any UTF-8 characters if the locale is set using the *-utf8 formatting.


The locale is set to xx_XX.utf8 in Sailfish. This causes issue in some applications.

Setting my language in my environment manually as follows solves it because it overrides the default locale name:

LANG="en_GB.UTF-8"
export LANG

Is there a reason why changing the language in the SFOS UI sets it to xx_XX.utf8? This format should work in most cases on Linux, but there are some issues that the standard format should prevent.

edit retag flag offensive close delete

Comments

I recall having problems when I wrote

LANG=en_IE.utf8 LC_ALL=en_IE.utf8; export LANG LC_ALL; unset LANGUAGE

to my shell scripts -- worked fine in linux but failed in MacOSX (and probably *BSD) machines

after i changed the above to

LANG=en_IE.UTF-8 LC_ALL=en_IE.UTF-8; export LANG LC_ALL; unset LANGUAGE

then the scripts worked much better on any of the above systems.

(note that LC_ALL should override LANG (and probably LANGUAGE) but that doesn't happen always, therefore LANG also set (and LANGUAGE unset -- cannot remember just now that that did...)

too ( 2020-06-28 19:19:36 +0200 )edit

4 Answers

Sort by » oldest newest most voted
3

answered 2020-06-03 21:41:35 +0200

Kabouik gravatar image

updated 2020-06-03 22:46:54 +0200

I might have incorrectly described the issue in the FP, however, some interesting news:


Missing locale.alias

Considering this, I created a /usr/share/X11/locale/locale.alias file containing:

en_GB.utf8       en_GB.UTF-8

And this immediately solved the issue with apps that couldn't properly interpret LANG. No need for any export LANG in ~/.bashrc to override the locale name anymore. I believe locale.alias should be here by default.

No /usr/share/X11/locale/ folder

The /usr/share/X11/locale directory is typically shipped with libX11, but it's non Wayland so it would make sense to ship them with xkbcommon for us. This would solve the above issue with applications that rely on the alias to canonize the locale name between .utf8 and .UTF-8.

Consequences for compose

This folder also normally contains compose.dir as well as languages which are used to enable compose with hardware keyboards. I copied files from my PC for my language and added them manually in that folder on device. Then, in a Wayland application currently being ported from desktop Linux to SailfishOS, I noticed full compose capability with a hardware keyboard and xkb layout using dead keys. Is there a chance stock Sailfish apps can pick those files too?

Can this please be checked @jolla? This could solve the present issue, as well as that one on compose with hardware keyboards.

edit flag offensive delete publish link more
2

answered 2020-06-26 03:09:45 +0200

Kabouik gravatar image

updated 2020-06-26 03:10:41 +0200

I found a solution regarding the other issue mentioned with SSH from computer to phone defaulting to POSIX (and hence breaking utf8 characters):

Add the following somewhere in /etc/ssh/ssh_config on the client (computer):

SendEnv LANG LC_*

And add the following somewhere in /etc/ssh/sshd_config on the host (phone):

AcceptEnv LANG LC_*

No more issues with utf8 characters incorrectly displayed from SSH!

edit flag offensive delete publish link more

Comments

I just noticed than ssh'ing from sailfish phone to remote computer, LANG was not set on remote, and then utf-8 -chars in emacs looked '?'s. setting lang to anuything .UTF-8 (or .utf8) on that Linux remove made those chars work fine...

too ( 2020-06-28 19:23:33 +0200 )edit
1

answered 2020-06-01 02:22:12 +0200

coderus gravatar image

echo $LANG says ru_RU.utf8 for me, so check your setup.

edit flag offensive delete publish link more

Comments

Didn't you set it manually from command line? Because I asked someone else with the Pro¹, it was set to POSIX too like mine, and the old Jolla C I have at home is also saying POSIX. All are updated to 3.3.0.16 and the Jolla C was never tweaked, it's not mine.

Kabouik ( 2020-06-01 03:24:44 +0200 )edit

please specify if you talking about fingerterm or ssh, it have totally different setup

coderus ( 2020-06-01 04:38:28 +0200 )edit
1

Huh, good question. I checked from SSH I think. I'll ask @mosen if he checked directly on device., but probably did it from SSH too.

I already edited my ~/.bash_profile to set it to en_GB.UTF-8 manually now. Howerver I tried commenting out those lines and rebooted. It indeed shows POSIX from ssh and en_GB.UTF-8 in Fingerterm.

What I don't understand is before I edited it myself, tmux would inevitably fail displaying UTF-8 characters. Editing then sourcing ~/.bash_profile to set LANG immediately solved it. It is still solved if I comment out the manual setting. So somehow I imagine the locale was not properly set before, and now sticks after reboot?

In any case, your answer confirms there's no risk in changing the locale. I just don't know what I had before since I never modified it before, and why it was different.

Kabouik ( 2020-06-01 04:50:27 +0200 )edit
1

According to RFC documents, *-UTF-8 is the standard. In most cases, using *.utf8 should be equally recognized, but my humble tests show that there are some apps that expect only the standard format and will fail to display any UTF-8 characters if the locale is set using the *-utf8 formatting.

Would this be a valid request for a future SFOS update?

Kabouik ( 2020-06-02 14:06:26 +0200 )edit

(deleted, posted as answer)

Kabouik ( 2020-06-26 03:08:32 +0200 )edit
0

answered 2020-06-26 09:10:10 +0200

spiiroin gravatar image

AFAIK it is perfectly fine to have locale / alias with name that contains no trace of encoding it uses.

So, applications that need to know what encoding selected locale uses could / should query what encoding the locale uses instead of doing heuristics based on locale name - e.g.

setlocale(LC_ALL, "");
char *codeset = nl_langinfo(CODESET);
if( !strcmp(codeset, "UTF-8") )
    do_utf8_things();
edit flag offensive delete publish link more
Login/Signup to Answer

Question tools

Follow
3 followers

Stats

Asked: 2020-06-01 01:55:07 +0200

Seen: 963 times

Last updated: Jun 26 '20