abiword UK spellcheck

Using applications, configuring, problems
Post Reply
Message
Author
Jasper

abiword UK spellcheck

#1 Post by Jasper »

Hi all,

Using abiword 2.9.4 (inbuilt into Precise 5.6) I have been unable to change spell checking from US to UK.

I have googled, studied and failed trying apparently promising methods - so now I hope for winning advice.

My regards

31st May [update] - just a courtesy note to say I'll be away for a few days from today.

npierce
Posts: 858
Joined: Tue 29 Dec 2009, 01:40

#2 Post by npierce »

Jasper wrote:I have googled, studied and failed trying apparently promising methods . . .
One would think that it shouldn't be so hard to find out how to do this. But in this case, the answer was rather elusive.

A little background:

The dictionaries that were first supported by and shipped with AbiWord, years ago, were made with the Ispell spell-checker. Support for Pspell dictionaries was added for Unix versions of AbiWord (but, I think, Pspell dictionaries were only available from third-party sources). Now AbiWord uses "Enchant", another project from the developers of AbiWord, which allows AbiWord (and other applications) to use dictionaries from, I think, eight spell-checkers.

But in the Puppy community, and probably elsewhere where AbiWord is used, the Ispell dictionaries are still the most commonly used. On a brief search through old forum posts and the Puppy repositories, a sampling of links to dictionaries for AbiWord were all links to Ispell dictionaries. You've probably tried some of them. And you've probably found don570's fine collection of AbiWord dictionary .pet files at Abispell dictionaries for Abiword. Perhaps you have even gone directly to the collection that the AbiWord folks have available at http://abisource.com/downloads/dictiona ... /archives/.

There is no shortage of Ispell dictionaries available for AbiWord. Yet none of these mentioned above work on Precise Puppy.

Why not?

Ispell directories may be available in a number of slightly different formats. The format required for AbiWord is specified in the following document:

Spell Checking and AbiWord
. . . Unless your distributor has packaged AbiWord specially, AbiWord expects its Ispell dictionaries to be in the following format:

128 byte long strings
52 "flags"
capitalization enabled
Proper endianness (little endian if you´re on a i386 class machine, or big endian for Alpha, PPC, MIPS)
And, in fact, that is the format of all the dictionaries mentioned above.

So what's the problem?

The problem is that the AbiWord and Enchant packages from Ubuntu (and Debian) are not the official AbiWord and Enchant packages. (Remember that "Unless your distributor has packaged AbiWord specially" bit in the above quote?) They have been modified from the originals for various things, including the use of a different format for Ispell dictionaries (which was a change made by Debian in October 2011).

This may or may not be an improvement. It is said that the new format will properly support more languages. (The price may be a larger file size, although I've not yet* looked into that.) And, in fact, this format has been the default format for Ispell dictionaries for over a decade. So it is not a radical change. And yet, for people who have been using AbiWord for years to suddenly find that their dictionaries no longer work results in a fair bit of confusion. And that confusion is only increased when users take the obvious step of going to the AbiWord web site to see if there has been some change that explains this. No related information is found there because no such change has been made to the official AbiWord and Enchant.

To finally figure out what was going on, I had to go find the package at Ubuntu, then download the corresponding source package for it, and wander through it.

This AbiWord/Ispell problem isn’t specific to AbiWord 2.9.4. The problem also appears when using AbiWord 2.8.6 on Precise 5.5, although the exact same binary works fine on Wary 5.2.2 and Slacko 5.5. The problem is related to the /usr/lib/enchant/libenchant_ispell.so library. That is part of Enchant. And that library is where the code lives that actually reads the Ispell dictionaries.



Here are four methods you can use to get a British dictionary that AbiWord will use. (Obviously, anyone who finds this post while looking for a way to install a different dictionary can also use these methods with the appropriate changes for the file names.)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

1. Install an Ispell dictionary which is formatted to make the Debian/Ubuntu modified build of Enchant happy.

Open the Puppy Package Manager and enter ibritish in the Find: box (or ispell to find all languages).
Then click Go and choose to search all repositories.

Install one of the ibritish* dictionaries and its five dependencies.

Now the data is installed, but there is not yet a dictionary in the required hash format. To create the .hash file, use these three commands (If you didn't install the standard, medium-sized dictionary, you will need to replace "med" with "sml", "lrg", "xlg", or "xxl" in the last two commands.):

Code: Select all

cd /usr/share/ispell
gunzip british.med+.mwl.gz
buildhash british.med+.mwl english.aff /var/lib/ispell/british.hash
(Alternatively, you may create the .hash file using the ispell-autobuildhash command instead of the above commands, but you would need to have first loaded your devx_precise_5.6.sfs file. The above commands avoid the need to load that. Also, using ispell-autobuildhash with anything other than the standard, medium-sized dictionary will require that the .hash file be renamed, or a symlink added.)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

-- or --

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

2. Install a MySpell dictionary.

This is perhaps the easiest method. The MySpell dictionaries seem to be popular with the people creating Puppy's language packs.

Open the Puppy Package Manager and enter myspell-en-gb in the Find: box (or myspell to find all languages). Then click Go and choose to search all repositories.

Install myspell-en-gb_3.3.0 and its three dependencies.

That’s it!

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

-- or ---

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

3. Install an Aspell dictionary.

Be aware that in order to build Aspell dictionaries you will need to load the appropriate devx .sfs file (for Precise 5.6 it is here: http://distro.ibiblio.org/quirky/precise-5.6/).

Open the Puppy Package Manager and enter aspell-en in the Find: box (or aspell to find all languages).
Then click Go and choose to search all repositories.

Install the aspell-en_6.0-0 dictionary and its five dependencies.

Now the data is installed, but there is not yet a dictionary in the required hash format. To create the .rws files, use this command:

Code: Select all

aspell-autobuildhash
(There may be a way to build the .rws files manually, without the need to load your devx .sfs file, but I've not looked into that.)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

-- or --

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

4. Replace the Debian/Ubuntu libenchant_ispell with the official one.

Using a version of /usr/lib/enchant/libenchant_ispell.so from another Puppy may not be the most "correct" solution, since there could be issues related to the fact that a library from another Puppy was built in an environment different from Precise. But it does work for me.

I have had success with overwriting libenchant_ispell.so with the file of the same name from Racy 5.2.2 and from Slacko 5.5.

Normally, when replacing a library for use with only one application, I would leave the original where it is so that it could be found by other applications, and put the one needed for that one application somewhere outside the usual library path. Then I would write a wrapper script to start the application using LD_PRELOAD or LD_LIBRARY_PATH so that that one application could find it. But in this case that won't work because this library is only loaded when it is needed, so isn't loaded by the dynamic linker when the application starts.

I don't expect that overwriting this library would cause problems for other applications since I doubt that it is used by anything other than Enchant. The only potential issue I can see would occur if another application called on Enchant to read a dictionary for a different language that happened to be in the new format. But no such dictionaries come pre-installed in Precise 5.6.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

If you wanted to, you could install British dictionaries for Aspell, Ispell, and MySpell, and experiment to see which you liked best by adding a line like this to the end of /usr/share/enchant/enchant.ordering, and changing the order as desired:

Code: Select all

en_GB:aspell,ispell,myspell
This example would cause enchant to always use the Aspell dictionary unless there was a problem reading it, in which case Enchant would fall back to Ispell, etc.

If you are unsure of which dictionary you are using, you can check by typing a couple of words to see if they are in the dictionary. Ispell accepts "gray". MySpell accepts "apologize". Aspell accepts neither of those. (This assumes that the three dictionaries are the three British dictionaries installed in items 1, 2, and 3 above.)



You can skip reading the remainder of this post. I'm just going to document some of the details of what I've found as I was looking into this. Read on if you need something to help you fall asleep.


The AbiWord/Ispell problem resulted from a change to the /usr/lib/enchant/libenchant_ispell.so library. That is part of a project called "Enchant" which was also developed by the AbiWord developers. And that library is where the code lives that actually reads the Ispell dictionaries.

On Precise 5.5 and Precise 5.6, that library has problems reading the header for the dictionary, and spits out this error message:

Code: Select all

Illegal format hash table /usr/share/enchant/ispell/british.hash - expected magic2 0x9602, got 0x5952
(The last number will differ for different dictionaries.)

The Ispell dictionary .hash files that we have always used in the past with AbiWord have a header which is a 6604 byte block of data at the start of the file. But the header for the .hash files required by the Debian/Ubuntu modified Enchant has, instead, a length of 17740.

At the beginning of the header is a value used as a signature (a.k.a. "magic number"). That value is repeated at the end of the header. Any software reading the header may check these values to help ensure that it is reading the header correctly. The value used for Ispell dictionaries is 0x9602.

If the value at the start (called "magic") is wrong, the file is either not an Ispell dictionary or, if the value read is 0x0296, it is probably an Ispell dictionary with a different byte order, intended for use with another computer architecture. In our case, this first value is read correctly, so doesn't cause an error.

It is the value at the end of the header (called "magic2") that, in our case, is read incorrectly. This usually means that the size of the header is different than what is expected. So the software has read a value that it thinks is at the end of the header, but is really somewhere else, since the length of the header is not what the software expected.

So the header can not be properly read, and therefore the dictionary data cannot be read.

The sad thing is that quite possibly there is no need for this problem to have occurred. The difference in the two formats all hinges on the value of a variable called "maxstringchars". The official Enchant wants that to be set to 128. The Debian/Ubuntu modified Enchant wants it to be 512. The greater the value, the bigger the header.

But since Ispell can create dictionaries with different formats, the creators of Ispell planned for this, and put that value in the header near its beginning. It is in the same place in all formats. So any software that reads the file may read that value and calculate the size of the header. Unfortunately, the creators of Enchant did not take advantage of this, but tied their code to a specific format. And the folks at Debian, when adding support for the larger format also missed the opportunity to fix this, and so they removed support for the smaller format when they could have allowed the code to support both formats as well as others of any size.

In fairness, I have not looked very closely at the code, so may have overlooked some reason that support for multiple formats is not possible. And I just tried Ispell itself, and it only supports one format. So maybe supporting multiple formats wouldn't be as easy as I would like to think.


To check to see which format a .hash file is, use the file command. For example:

Official Enchant format:

Code: Select all

# file /usr/share/enchant/ispell/british.hash
/usr/share/enchant/ispell/british.hash: little endian ispell 3.1 hash file, 7-bit, capitalization, 52 flags and 128 string characters
Debian/Ubuntu modified Enchant format:

Code: Select all

# file /var/lib/ispell/british.hash
/var/lib/ispell/british.hash: little endian ispell 3.1 hash file, 7-bit, capitalization, 52 flags and 512 string characters

Precise 5.6 has no libaspell.so.15 library, and AbiWord spits out this message (if started from a terminal window):

Code: Select all

** (abiword:11290): WARNING **: Error loading plugin: libaspell.so.15: cannot open shared object file: No such file or directory
This doesn't cause a problem unless you are trying to use an Aspell dictionary. Installing an Aspell dictionary and its dependencies as shown in method 3 above will install the library and so eliminate the error message.


Here are some related links:

Spell Checking and AbiWord (Modified: Fri 03 Dec 2004)
Enchant (Modified 2010?)
International Ispell (Modified: Thu 04 Apr 2013)
GNU Aspell (Modified: Mon 19 Sep 2011)



* (By the way, I after writing the above, I later investigated the size difference for the two .hash file formats and found that the file size differs by 28416 bytes for the British and American dictionaries included with the Ispell source code. For the small-sized British and American dictionaries (which increased from 449716 to 478132 bytes, and 448676 to 477092 bytes, respectively) this was about a 6.3% increase in size. For the medium-sized British and American dictionaries (which increased from 819028 to 847444 bytes, and 817764 to 846180 bytes, respectively) this was about a 3.5% increase in size. So there is an increase in file size, but not a major one.)

Jasper

#3 Post by Jasper »

Hi npierce,

Success - and my untold and endless thanks.

Of the four solutions you gave, I chose your easy option No:
2. Install a MySpell dictionary [using the PPM].

Your research was deep and your explanation extensively
rich (2,330 words and 14,168 characters, including spaces, -
as reported when I copied into abiword to do the counting),

After you went to such trouble I read everying you wrote,
though I have yet to follow your links. I'll do that very soon.

I hope BarryK will start awarding a PPhD [Puppy Doctor of
Philosophy] to all deserving pupstars such as your good self.

My regards

npierce
Posts: 858
Joined: Tue 29 Dec 2009, 01:40

#4 Post by npierce »

Jasper,

You're welcome.

I'm glad the MySpell dictionary was a satisfactory solution for you.

Actually, at one point I was thinking of giving up on figuring out how to get Ispell dictionaries working again, and just advise you to install a MySpell dictionary.

But I dislike walking away from mysteries, so I hung in there until I figured out what happened to cause the problem with Ispell, and how to fix it. But since the MySpell solution was somewhat easier, I included that as well. And since Aspell dictionaries are also widely available, I figured that, while I was at it, I would toss that option in as well.

(Gee, 2,330 words! I won't ask how many of them were misspelled. :) :) )

User avatar
GullyProber
Posts: 28
Joined: Sun 12 Aug 2007, 00:00

abiword UK spellcheck

#5 Post by GullyProber »

thank you, npeirce,
this is something that has been bugging me for seven years. No.2 makes it so easy.
ta muchly

:P

npierce
Posts: 858
Joined: Tue 29 Dec 2009, 01:40

#6 Post by npierce »

GullyProber,

You're welcome. 'Tis always good to hear when others are helped by old threads.

By the way, for Puppy Precise users who find this thread but can't find a MySpell dictionary in their language, another source of dictionaries for AbiWord can be found by searching for "hunspell" in Precise's PPM. Although Ubuntu's Precice repositories don't have a British Hunspell dictionary, probably because it would be redundant (since they already have the British MySpell dictionary, which should be compatible with any application that uses Hunspell since Hunspell is based on MySpell), there are some Hunspell dictionaries available for languages which are not available as MySpell dictionaries.

croclip
Posts: 3
Joined: Tue 01 Sep 2009, 02:37

abiword UK spellcheck

#7 Post by croclip »

This is the first error that I have found. My errors would probably be less than 5% without spellcheck.
I use the term "spelling error" as that would be the most common search term.
Actually it is a typing error.
I was using the English (UK) dictionary.
clumpiness # highlighted as error
c lumpiness # as a suggested correction. NOTE: contains a SPACE!

User avatar
ardvark
Posts: 1448
Joined: Tue 02 Jul 2013, 03:43
Location: USA

#8 Post by ardvark »

Hi npierce...

I agree with Jasper, You have my compliments on such a thorough post! One doesn't often see something like that in any forum! :D

Best wishes...
Last edited by ardvark on Wed 13 Nov 2013, 18:18, edited 1 time in total.
Our Lord and Savior [url=http://peacewithgod.jesus.net/]Jesus Christ[/url] loves and cares about you most of all!

PLEASE READ! You don't have to end up [url=http://www.spiritlessons.com/Documents/BillWiese_23MinutesInHell_Text.htm]here![/url]

npierce
Posts: 858
Joined: Tue 29 Dec 2009, 01:40

#9 Post by npierce »

ardvark,

Thanks for the kind words. I wish I had the time more often to delve into a subject as deeply as I apparently did when I wrote that post.


croclip,
croclip wrote:c lumpiness # as a suggested correction. NOTE: contains a SPACE!
Actually, the spell-checker isn't suggesting that "c lumpiness" is the correct spelling of "clumpiness", or any other single word. What has happened is that "clumpiness" is not in your spelling dictionary, but "c", and "lumpiness" are in it, so the spell-checker is suggesting that you might have omitted the space between these two words, just as it would if you had typed "twowords".

Post Reply