Internationalization

More
21 Aug 2012 17:40 #1173 by PhracturedBlue
Internationalization was created by PhracturedBlue
I have started work on internationalization support.
My idea is as follows:
wrap each string with a '_tr()' call. Initially this is just a macro that does nothing.
Allow the definition of a strings.<lng> file that contains translations. The strings file will be UTF-8 encoded, and will basically consist of pairs of lines:
:<english string>
<translated string>
this file will be parsed, and the english string will be hashed. The hash result will be used as a key to the translated string.

then the _tr() function will hash the input string, see if there is a match in the translation table and return the translated string if it exists or the english string if not.

I need to actually try it to see if the performance is acceptable though.

After that there will be issues with not enough memory being allocated for UTF-8 strings, issues with the fonts not containing internationalized characters, and issues with sprintf and friends not understanding UTF-8.

I expect this to be a long process to get right

Please Log in or Create an account to join the conversation.

More
21 Aug 2012 18:34 #1174 by FDR
Replied by FDR on topic Internationalization
Why not the same .ini format as usual?
<english string>=<translated string>

Please Log in or Create an account to join the conversation.

More
21 Aug 2012 20:09 #1177 by PhracturedBlue
Replied by PhracturedBlue on topic Internationalization
because the strings can have special characters that would break the ini parser. This also potentially allows me to store file offset pointers if it turns out that we can't store all the strings in RAM.

Please Log in or Create an account to join the conversation.

More
21 Aug 2012 20:23 #1181 by FDR
Replied by FDR on topic Internationalization
Yep, they can contain at least %s, %d and \n type formatters, I guess...

Please Log in or Create an account to join the conversation.

More
22 Aug 2012 05:49 #1186 by PhracturedBlue
Replied by PhracturedBlue on topic Internationalization
I just committed a major code reorg to make handing internationalization easier (and to reduce some duplication for mulitple Tx, though a lot more is still needed on this front).

Here is a list of all the strings I've marked so far that need translation.
I generated this list via:
find . -name "*.c" | xargs xgettext -o - --omit-header -k --keyword=_tr --keyword=_tr_noop --no-wrap | grep -v 'msgstr ""' | grep -v "^#" | grep -v "^$"
Which probably won't work on windows. The output isn't in the necessary syntax, but gives a good idea of what is needed.
Attachments:

Please Log in or Create an account to join the conversation.

More
22 Aug 2012 06:03 #1187 by FDR
Replied by FDR on topic Internationalization
Introduced a compile error in the emulator:
+ Compiling 'target/emu_devo8/fltk.cpp'
target/emu_devo8/fltk.cpp: In function 'void LCD_Init()':
target/emu_devo8/fltk.cpp:235:63: error: 'asprintf' was not declared in this sco
pe
make: *** [objs/emu_devo8-w32/fltk.o] Error 1

Please Log in or Create an account to join the conversation.

More
22 Aug 2012 06:18 #1188 by PhracturedBlue
Replied by PhracturedBlue on topic Internationalization

FDR wrote: Introduced a compile error in the emulator:

+ Compiling 'target/emu_devo8/fltk.cpp'
target/emu_devo8/fltk.cpp: In function 'void LCD_Init()':
target/emu_devo8/fltk.cpp:235:63: error: 'asprintf' was not declared in this sco
pe
make: *** [objs/emu_devo8-w32/fltk.o] Error 1

That one is windows specific. I knew there was a reason I took it out before.
No time to look at it tonight though.

Please Log in or Create an account to join the conversation.

More
22 Aug 2012 13:59 #1189 by PhracturedBlue
Replied by PhracturedBlue on topic Internationalization

PhracturedBlue wrote:

FDR wrote: Introduced a compile error in the emulator:

+ Compiling 'target/emu_devo8/fltk.cpp'
target/emu_devo8/fltk.cpp: In function 'void LCD_Init()':
target/emu_devo8/fltk.cpp:235:63: error: 'asprintf' was not declared in this sco
pe
make: *** [objs/emu_devo8-w32/fltk.o] Error 1

That one is windows specific. I knew there was a reason I took it out before.
No time to look at it tonight though.

This is fixed now, but I made even more sweeping changes in the name of reducing duplication (mostly just code moves, no functionality changes). Hopefully I didn't break anything else.

Please Log in or Create an account to join the conversation.

More
22 Aug 2012 14:11 #1190 by FDR
Replied by FDR on topic Internationalization

PhracturedBlue wrote: This is fixed now, but I made even more sweeping changes in the name of reducing duplication (mostly just code moves, no functionality changes). Hopefully I didn't break anything else.

Yep, it compiles now...

Does that mean, that the tx fw and the emulator code have more common?
Than they should behave more alike, I guess, not that they were too much different... ;)

Please Log in or Create an account to join the conversation.

More
22 Aug 2012 14:14 #1191 by PhracturedBlue
Replied by PhracturedBlue on topic Internationalization

FDR wrote: Does that mean, that the tx fw and the emulator code have more common?
Than they should behave more alike, I guess, not that they were too much different... ;)

No, they were already about as close as I know how to make them. But it will allow me to merge the Devo10 code (someday) with minimal code duplication.

Please Log in or Create an account to join the conversation.

More
24 Aug 2012 06:18 #1202 by PhracturedBlue
Replied by PhracturedBlue on topic Internationalization
I've committed a bunch of Unicode related work.
With an expanded font (not yet included) I was able to actually display a utf-8 string (mostly).
Since I know mostly nothing about European character sets, I am unsure which characters need to be supported.
So far I have ascii characters 0x20 - 0x7f and 0xa1 - 0xff but I don't think that is sufficient. I don't have enough ROM to include complete fonts, so I need to figure out the best ranges of characters to meet our needs.

Still, the capability is coming along. I have no idea what to do about the keyboard yet. We'll likely start without an internationalized keyboard.

Please Log in or Create an account to join the conversation.

More
24 Aug 2012 07:40 #1203 by wuselfuzz
Replied by wuselfuzz on topic Internationalization
ASCII ends at 0x7f. If you're referring to ISO 8859-1, you should have Western Europe covered. The German Wikipedia page actually lists the languages for which 8859-1 is sufficient:

Wikipedia wrote:
Afrikaans (È/è, É/é, Ê/ê, Ë/ë, Î/î, Ï/ï, Ô/ô, Û/û),
Albanian (Ç/ç, Ë/ë),
Basque (Ñ/ñ),
Danish (Å/å, Æ/æ, Ø/ø),
German (Ä/ä, Ö/ö, Ü/ü, ß, in Fremdwörtern: É/é, nicht Euro-Symbol und ggf. ſ),
English (£, ¢; veraltend: Æ/æ, ä, ë, ï, ö, ü, nicht Œ/œ),
Faroese (Á/á, Ð/ð, Í/í, Ó/ó, Ú/ú, Ý/ý, Æ/æ, Ø/ø),
Finnish (Ä/ä, Ö/ö, in Fremdwörtern: Å/å, nicht Š/š, Ž/ž),
French (Æ/æ, À/à, Â/â, È/è, É/é, Ê/ê, Ë/ë, Î/î, Ï/ï, Ô/ô, Ù/ù, Û/û, Ç/ç, Ü/ü, ÿ, nicht Œ/œ, Ÿ),
Irishes Gaelic, new (Á/á, É/é, Í/í, Ó/ó, Ú/ú),
Icelandic (Á/á, Ð/ð, É/é, Í/í, Ó/ó, Ú/ú, Ý/ý, Þ/þ, Æ/æ, Ö/ö),
Italian (À/à, È/è, É/é, Ò/ò, Ù/ù),
Catalan (À/à, Ç/ç, È/è, É/é, Í/í, Ï/ï, Ò/ò, Ó/ó, Ú/ú, Ü/ü, nicht dagg. Ŀl/ŀl),
Dutch (nicht IJ/ij, aber ÿ),
Norwegian, Bokmål und Nynorsk (Å/å, Æ/æ, Ø/ø, Ò/ò),
Portuguese (À/à, Á/á, Â/â, Ã/ã, Ç/ç, É/é, Ê/ê, Í/í, Ó/ó, Ô/ô, Õ/õ, Ú/ú, Ü/ü),
Rhaeto-Romanic,
Scottisch Gaelic (À/à, È/è, Ì/ì, Ò/ò, Ù/ù)
Swedish (Å/å, Ä/ä, Ö/ö),
Spanish (¡, ¿, ª, º, Á/á, É/é, Í/í, Ñ/ñ, Ó/ó, Ú/ú, Ü/ü, früher auch Ç/ç),
Swahili und
Wallonisch (Â/â, Å/å, Ç/ç, È/è, É/é, Ê/ê, Î/î, Ô/ô, Û/û).

Please Log in or Create an account to join the conversation.

More
24 Aug 2012 08:23 - 24 Aug 2012 08:23 #1204 by FDR
Replied by FDR on topic Internationalization
Yep, my language is outside that range, we use ISO 8859-2, but I don't want to use my tx in hungarian anyway. I've learned these terms in english, don't even know how they are in hungarian! :lol:

But I think that supporting the eastern languages would be important, because english is not always common knowledge there (and here ;) ), not to mention far east...
Last edit: 24 Aug 2012 08:23 by FDR.

Please Log in or Create an account to join the conversation.

More
24 Aug 2012 08:39 #1205 by wuselfuzz
Replied by wuselfuzz on topic Internationalization
For German at least, there's rules on how to avoid umlauts if you don't have them for some reason, e.g. you just write "ae" instead of "ä".

I don't really think the font will be an issue for languages that use the roman alphabet as a base.

Now, for Cyrillic, Chinese, Japanese letters/symbols, an entirely different font would be necessary. An idea for that would be making the font loadable from USB storage and creating a PC side converter tool that maps a certain unicode range to the ascii range.

About Arabic and other languages writing from the right to left: I have no idea, except mirroring the whole screen. :S

Please Log in or Create an account to join the conversation.

More
24 Aug 2012 14:17 #1206 by PhracturedBlue
Replied by PhracturedBlue on topic Internationalization
I don't know how to load the font via USB without massive performance impact. The one possible option I can think of would be a 'font replace' capability that loaded a font from SPI and overwrite the ones in the STM32 ROM. We'd need to align the fonts on a page boundary, and it would make it hard to specify fonts in the ini file. I think I'm going to avoid that option for now.

So my thought is to provide a latin font that is mostly good enough. Other languages will require a firmware upgrade in order to add in the requisite font.

As it turned out, I tried Hungarian 1st (because I was using google translate on a string and none of the western europoean languages had any special characters for my string). It was not an ideal choice as it turns out, because my font did not have sufficient coverage.

The code supports a segmented font, so one font can support multiple non-contiguous ranges. However, if the character doesn't exist in the specified font, it just skips the character (we should probably do something better than that I guess)

Anyhow, I found this:
en.wikipedia.org/wiki/ISO/IEC_8859-1
so I think the initial font set will include 8859-1 with the listed special characters, and should give reasonable coverage to start with.

On the devo-8 LCD there is an option to write conditionally in flipped mode (I use this for bmps which are stored bottom-line-1st. You can also write right-to-left. But I'm not sure I want to enable that for fonts (since I think I thne need to actually store the characters backwards too) Easier might be to have the translation file written reversed (or to have a flag in the file that reverses it as we read it.

I'm also making a lot of assumptions here. Like that anyone will care enough to translate into those languages.

Please Log in or Create an account to join the conversation.

More
25 Aug 2012 06:34 #1209 by PhracturedBlue
Replied by PhracturedBlue on topic Internationalization
Well, I've just committed the first full internationalization code.
There is a new font '12narrow' that has all of 8859-1 and a smattering of other characters which should handle western-european languages pretty well. Building a font that could be internationalized and also fit within the 16-pixel height was quite challenging.

I've switched to using Ubuntu's font as the base, as it is a bit thinner than Arial, and was shorter in the tallest characters without being shorter in the smaller ones. The current font is composed of a composite of 12, 11, 10, an 9 point fonts which is a little weird, but seems to give a reasonable overall result.

I used FNV1 in 16bit mode as the hash function for the font. It seem sto work well so far.
The file language/lang.hu is just an testcase to see if it worked. It contains a google-translate version of the binding dialog. Obviously we won't keep it that way.

I have not tested on the Tx, just in the emulator so far.

You do need to reload the filesystem (at least config.ini and the language dir) in the Tx to use the latest release.

Please Log in or Create an account to join the conversation.

More
25 Aug 2012 08:34 #1210 by FDR
Replied by FDR on topic Internationalization
There is a lot of such warnings, is the ok:
pages/trim_page.c:103:5: warning: passing argument 8 of 'GUI_CreateLabelBox' dis
cards 'const' qualifier from pointer target type [enabled by default]
In file included from pages/mixer_page.h:5:0,
                 from pages/pages.h:4,
                 from pages/trim_page.c:17:
./gui/gui.h:310:14: note: expected 'void *' but argument is of type 'const char
*'

Please Log in or Create an account to join the conversation.

More
25 Aug 2012 09:39 #1211 by FDR
Replied by FDR on topic Internationalization
I wanted to edit the language file, but it is quite hard to edit it because (if I'm right) carriage returns indicate the new lines in it and line feeds separate the entities. But in a text editor they are all the same, and I don't see the difference.
Wouldn't be simpler to keep the \n to accomplish the new lines and keep each string entity on one line pair?

Please Log in or Create an account to join the conversation.

More
25 Aug 2012 13:14 #1212 by PhracturedBlue
Replied by PhracturedBlue on topic Internationalization

FDR wrote: I wanted to edit the language file, but it is quite hard to edit it because (if I'm right) carriage returns indicate the new lines in it and line feeds separate the entities. But in a text editor they are all the same, and I don't see the difference.
Wouldn't be simpler to keep the \n to accomplish the new lines and keep each string entity on one line pair?

Simpler yes but moe complicated to implement. In linux there is a difference between CR and LF so it is easy to write the files there. I'll figure something out.

The warnings are expected for now. I'll be cleaning them up today.

Please Log in or Create an account to join the conversation.

More
25 Aug 2012 13:57 #1214 by FDR
Replied by FDR on topic Internationalization
I've made some changes to the binding message in both languages, and pushed the changes.
It seems, that it can't calculate right the string lengths with the new font.
My message is the same row count, but it draws it to a much larger box...

Please Log in or Create an account to join the conversation.

Time to create page: 0.061 seconds
Powered by Kunena Forum