Aspell

Ka OSSO org Wikipedia

U bood: navigation, raadi

Baakatka afka Soomaaliga ee aspell (Khalfoof:) Here is the skeleton of an aspell package for Somali:

http://borel.slu.edu/obair/aspell6-so-0.01-0.tar.bz2


unpack the tar.bz2 file, then run

$ ./configure
$ make
$ sudo make install

as you would for other packages.


http://aspell.net/man-html/Working-With-Affix-Info-in-Word-Lists.html#Working-With-Affix-Info-in-Word-Lists


aspell -c -l so tijaabo


Unfortunately aspell is too advanced to use a plain text word list. But there is a way to dump it:

aspell dump master

This will print the entire word list for your default language. You can specify the language used with -l:

aspell -l so dump master

The argument to -l is the ISO 639 language code (see man aspell for details). The argument master tells aspell to use the systemwide dictionary, not your personal wordlist. The dictionary must be installed on your system; on Ubuntu the Dutch language package is called aspell-nl.

When we run aspell dump master for Dutch we get something unexpected:


There are strange tags attached to the end of many words. These are affixes and they represent variations of that word. (Although there is an English affix file, no affixes tags are printed if we dump an English dictionary.) We can expand the affix tags into all possible variations by sending them through aspell expand:

aspell -l so dump master | aspell -l so expand

If we now pipe this through tr we get all variations on separate lines as well. Thus the final command to get a word list for any aspell-supported language becomes:

aspell -l so dump master | aspell -l so expand | tr ' ' '\n' > so_SO.list
count all words: cat so_SO.list |grep -c .


(Note that this breaks for words that originally contained spaces.)


5.6 Working With Affix Info in Word Lists 5.6.1 The Munch Command

The munch command takes a list of words from standard input and outputs a list of possible root words and affixes. The root may, however, be invalid as it does not check them against the existing dictionary. For example the command:

    echo brother | aspell -l en munch

produces

    brother broth/R brothe/R

5.6.2 The Expand Command

The expand command is the reverse of munch, it expands affix flags to produce a list of words. For example:

    echo both/R | aspell -l en expand

produces

    both bother

The formal usage is:

    aspell expand [level] [limit]

Where level is the expansion level. Valid values are between 1 and 3. Level 1 is the default if not otherwise specified. Level 2 causes the original root/affix to be included, for example:

    both/R both bother

Level 3 causes multiple lines to be printed, one for each generated word, with the original root/affix combination followed by the word it creates:

    both/R both
    both/R bother

Levels larger than 3 may also be supported, but should not be used as they may eventually be removed.

If a limit parameter is given then only expansions which affect the first limit letters will be expanded. If a base word is not completely expanded for a given affix flag that flag will be left on the word. Note that prefixes are always expanded. 5.6.3 The Munch-list Command

The munch-list command will reduce the size of word list via affix compression. It will reduce a list of words to a minimal (or close to it) set of roots and affixes that will match the same list of words. The list of words is read from standard input and the result, the “munched” list, is written to standard out. It's usage is:

    aspell munch-list [keep] [single|multi] [simple] < infile > outfile

where simple, single, multi, and keep are literal values.

The default algorithm used should give near optimum results. In some cases the set of words returned is, provably, the minimum number possible. In the typical case the number of words returned is within 1% of the optimum number.

By default Aspell will remove redundant affix flags. The keep flag will avoid removing them, which can be useful if you want to include all possible expansions for each base word.

When cross products are involved it may be beneficial to list a base word more than once. Unfortunately, the current version of Aspell can not correctly handle multiple base words in a dictionary. Therefore, the current default behavior is to only include the one with the most expansions. All of them can be included via the multi flag. Once Aspell is able to handle multiple base words the default will be to include them all. The single flag can be used to only include one of them.

The simple flag will select an alternate faster algorithm. This algorithm is very similar to the munch command distributed with MySpell (the Open Office spell checker), however, it doesn't give nearly as good results. It does okay for the English word list but not for some other languages such as German; the normal algorithm reduced a list of 312,002 German words to 79,420 base words while the simple algorithm only reduced it to 115,927 words. This algorithm may disappear in a future version of Aspell.


Kala soocista iyo isku xijinta erayada

cat list-1 list-2 list-3 | sort | uniq > final.list

#Concatenates the list files,
#sorts them,
#removes duplicate lines,
#and finally writes the result to an output file.

Furfurista faylka .cwl

si uu u noqdo .wl (wuxuu la midyahay so_SO.dic)

preunzip so.cwl
cat so.wl

Sameynta falyka .rws aspell (hash fayl)

sudo aspell create master --lang=so so.rws < so.wl

Hingaad-saxaha erayada ku jira/jirto waxaa lagu ogaan karaa amarrada soo socdo

aspell dump -l so master |aspell -l so expand |tr ' ' '\n'|grep -c .
#Waxaa loo baahan yahay in barnaamishka aspell uu ku jiro kumbuyuutarka.
  1. Baarista shaciga ku haboon erayga
echo qaybahaas |aspell -l so munch -> qaybahaas qaybo/H qaybo/h qayb/H qaybah/E
  1. Ku tijaabinta sharciyada sameysan
echo qayb/T |aspell -l so expand
-> qayb qaybtaan qaybtee qaybtaas qaybtii qaybta qaybtooda qaybtiina qaybteena qaybteeda qaybtiisa qaybtaada qaybteyda
  1. Koobiyenta faylka affix:ka
sudo cp so_SO.aff /usr/lib/aspell/so_affix.dat