[Request] word prediction engine for cyrrilic languages (and not only)

asked 2014-09-16 18:18:07 +0200

virgi26 gravatar image

Hey sailors!

this question has been raised allready to some extent, but never the less here is my thoughts in the subject of word prediction. Please do not hesitate to read whole article.

Problem: in most cyrrilic languages there are a lot of different cases for words with different suffixes/flections. And since average length is pretty big, what we get now is following: when you start typing a word you get a lot of predictions with same root but different endings, and the word i'm wanting to type could have like 15 letters, so i have to swipe prediction line pretty long to find correct form. More often then not i end up typing whole word by myself. that's really annoying=(

Solution: there was several proposals about grouping with drop-down menus, but interface could get messy with long words. What i propose is, we may call it partial word completion, that first proposed word for auto-completion would be common root (ending with, for example, ...), and when i click it, there will appear different endings for the word but without common root.

i will make a small example with english word "predict"

i type: pred

now, at word prediction line, i get: predict prediction predictable predicting etc.

what should i get: predict predict... prediction predictable predicting etc. (notice "predict..." option)

so when i click "predict..." the word will be entered in the text area where i'm typing, but wordprediction won't be complete (no space after entered word)

so now, at prediction line, i get something like this ...ion, ...able, ...ing, and then i just click one of the choices to complete the process.

Implementation:

  1. Best way to do it is probably to make some kinf of linguistic or morphological reconstruction of the word, but that probably won't happen =)

  2. That one can be easyly implemented imho: prediction engine can just analyze all the words that he's currently proposing for a user. If there more then 3 words with common beggining (root), and that root has more than 3 letters in it will combine them in the manner i've proposed earlier

Thanks for reading this. Please comment and upvote if you like it. I understand that in english language (and probably most european languages) that is not probably an issue, but for some languages that probably can became salvation. Cheers!

edit retag flag offensive close delete