Page 2 of 2

Posted: Sun 04 Mar 2018, 00:23
by musher0
Hi guys.

Here is a script that does what greengeek was doing manually above. Basically, an
RTF file has plain text in it, not html or xml code, so it can be parsed easily with a
Bash script.

I think the script is well commented, but if you have questions please ask them.
Also, please test on various RTF you may have, to provide more validations. TIA.

An illustration of the result is attached,using Smithy's test document. Also, the
replaceit utility, in 32-bit or 64-bit version, if you do not have it.

IHTH.

~~~~~~~~~~~~~~~~~

Code: Select all

#!/bin/ash
# abi-highlight-correction.sh # (Store in /opt/local/bin, or
# /root/my-applications/bin, or any "/bin" directory.)
#
# Required: 
# replaceit, by P.L. Daniels, at http://www.pldaniels.com/replaceit.
#
# Purpose: 
# correct the highlighting bug in abiword.
#
# Usage: 
# abi-highlight-correction.sh name-of-document
# 
# Note: 
# The user should be in the document's directory before running this script.
#
# Credits: 
# PuppyLinux forum members Smithy (for the 95 Kb test document) and greengeek 
# (for insight into the problem).
#
# Discussion thread: 
# http://murga-linux.com/puppy/viewtopic.php?t=112434&start=15
#
# (c) musher0, 2018-03-04.
####

LC_ALL=C # For greater speed. We may need it on large documents.

A="$1"

cp -f "$A" "$A".original # We make a back-up beforehand. Remove if process successful.

n="`grep -c highlight0 "$A"`" # Gives us the number of occurrences 
############################### of the expression in the document.

for i in `seq $n`;do
	replaceit --input="$A" "highlight0" "highlight25"

# replaceit tends to replace one instance at a time. So, to make
# sure the expresion is entirely replaced, we do as many iterations
# as there are occurrences of the expression.

done

############################### 
# Time of process up to this point: ~ 0.83 second. (YMMV)
# Test doc was abiword_bug_test.rtf provided by PuppyLinux
# forum member Smithy.
############################### 

LC_ALL="" # Restores the original state of LC_ALL.

# Optional:
# exec abiword "$A"

### 30 ###

Posted: Sun 04 Mar 2018, 13:57
by Smithy
Ah, is that what it was? Well done Sherlock Holmes and Watson.

So if one clicks on a 'faulty' .rtf file, and docs sometimes, will it just load it up properly or am I jumping the gun a bit.

Posted: Sun 04 Mar 2018, 18:05
by greengeek
Smithy wrote:So if one clicks on a 'faulty' .rtf file, and docs sometimes, will it just load it up properly or am I jumping the gun a bit.
When you find a faulty file you could run musher's script against it to change the "highlight" variables, or you could do it manually using Geany or similar.

The downside of changing all of the "highlight" variables to white color is that you may be losing some genuinely important "highlights" - eg there may be some fluoro green highlights that someone has used to make certain text stand out. Or red highlighting to indicate safety concerns etc.

It may not always be appropriate to ignore the original highlights.

The background to this issue seems to be that for at least the last decade there has been disagreement about how to define highlighted text. Apparently MSWord2003 did not even have full highlight abilities, but MSWord2010 does, so the spec has been changing over time,

There was also disagreement about whether or not "background color" and "shading" and "highlight" were the same thing or different. The consensus in the end was that they are different things and have to be handled differently. It is not surprising that the result is confusion between non-MS word processors.

Here is an example of the bug tracking that has resulted from the confusion: https://bugs.launchpad.net/ubuntu/+sour ... bug/295014
and this is worth a full read too: https://bz.apache.org/ooo/show_bug.cgi?id=24317

I suspect that these highlight misunderstandings will always affect us no matter which alternative WP software we use. Most of the bug reports relate to .doc format but obviously RTF is similarly affected.

Perhaps the best approach is exactly what Abiword has done - to provide a means to turn highlight on or off using the tickbox using the menu: Format/Font/Highlight/SetNoHighlightColor. That way you could view an affected file without changing the existing highlights permanently.
.

Posted: Sun 04 Mar 2018, 18:09
by musher0
Smithy wrote:Ah, is that what it was? Well done Sherlock Holmes and Watson.

So if one clicks on a 'faulty' .rtf file, and docs sometimes, will it just load it up properly or am I jumping the gun a bit.
Hi Smithy.

If you have a faulty RTF, you have to run the above script beforehand, or do the
change manually from within Abiword as greengeek explained.

Now, no divination is involved; you have to load an RTF file in Abiword to know if
it is faulty according to Abiword. (If this sentence sounds like a truism, then it is!)
:lol:

Also only RTF documents, no DOCs. *.doc documents are another can of worms,
I will not try to script anything for faulty *.doc documents in Abiword. Greengeek's
approach will probably work, though.

Have a great Sunday! :)

BFN.

Posted: Sun 04 Mar 2018, 18:18
by musher0
@greengeek

There's good research in your post above. Thanks.

Posted: Sun 04 Mar 2018, 18:29
by greengeek
Clever script musher 8) - i had no idea i even had replaceit in my puppy but it's already there so the script worked fine.
I did not stick it in my PATH I just left it in the downloads directory along with Smithy's affected file and ran the following in a terminal:

Code: Select all

./abi-highlight-correction.sh abiwordbugtest
Can I suggest an improvement? Maybe consider copying the file as a slightly different name leaving the original file intact?

cheers!

Posted: Mon 05 Mar 2018, 09:07
by greengeek
This webpage offers some basic examples of how RTF handles highlight:
http://www.pindari.com/rtf2.html

One example on offer is this:

Code: Select all

{\rtf1\ansi\deff0 {\fonttbl {\f0 Courier;}{\f1 ProFontWindows;}}
{\colortbl;\red0\green0\blue0;\red255\green0\blue0;\red255\green255\blue0;}
This line is font 0 which is courier\line
\f1
This line is font 1\line
\f0
This line is font 0 again\line
This line has a \cf2 red \cf1 word\line
\highlight3 while this line has a \cf2 red \cf1 word and is highlighted in yellow\highlight0\line
Finally, back to the default color.\line
}
If you paste that into Geany and save as test.rtf Abiword fails to display the last line correctly - giving the same blackout as in Smithy's example. (But only the last line). (See pic 1)

If you then add highlight25 to the beginning of the last line it starts to display correctly. (see pic 2)

Code: Select all

\highlight25 Finally, back to the default color.\line
}
Maybe it genuinely is an Abiword bug? Or is the RTF spec just not well enough defined?

Posted: Mon 12 Mar 2018, 14:07
by musher0
Hello greengeek.

That should not be a problem if one uses the script to replace all "highlight0" expressions
with "highlight25" expressions.

ONLY "highlight0" expressions are changed -- not "highlight14" or "highlight7"
expressions--, so the other highlighting is not disturbed.

ALL "highlight0" expressions -- anywhere in the RTF document : at the top, in the middle
or at the very end, -- are changed.

I hope this answers your question.

BFN.