(old)Puppy Linux Discussion Forum

Posted: **Thu 12 May 2011, 20:59**

This is a 'hidden' text. Can you please help solve it with a bash script.....
Lyricwiki.org finds it best to return this values instead of clear text, and I think there has to be an existing converter for such.......thank you.

Sigmund

Posted: **Thu 12 May 2011, 22:20**

Hello,
It is ascii...first word is "look", but too lazy to write a script...

Posted: **Fri 13 May 2011, 04:51**

If you want to read it without creating a script to decode it, just give it a "htm" extension to the filename.
It will then open decoded in your browser.
It is a poetry of sorts.
But it lacks carriage return/line feeds though.

The first line is "Look into my eyes - You will see"

Posted: **Fri 13 May 2011, 05:30**

8-bit wrote:If you want to read it without creating a script to decode it, just give it a "htm" extension to the filename.
It will then open decoded in your browser.
It is a poetry of sorts.
But it lacks carriage return/line feeds though.

The first line is "Look into my eyes - You will see"

Looks great in a links text browser

Posted: **Fri 13 May 2011, 10:55**

Dear zigbert,

This does the trick but is incredibly slow. Wait for it to finish....and you will see the text:

Code: Select all

#!/bin/bash

text=$(cat htmltext.html)
for i in $(echo "$text" | grep -o "&#..;\|&#...;\|&#....;\|&#.....;");
do
[[ "$(echo $i | grep "x")" != "" ]] && export j=$(printf "%d" "$(echo "$i" | sed -e "s/&#\(.*\);/0\1/")") || export j=$(echo "$i" | sed -e "s/&#\(.*\);/\1/")
text=$(echo "$text" | sed -e "s/$i/$(perl -CS -e 'print chr("$ENV{j}")')/");
done;
echo "$text"

Here is a BaCon version that is MUCH faster (almost instant):

OPEN "htmltext.html" FOR READING AS MyFile
MyVar$ = ""
WHILE NOT(ENDFILE(MyFile)) DO
READLN MyLine$ FROM MyFile
MyVar$ = CONCAT$(MyVar$, MyLine$, NL$)
WEND
MyVar$ = REPLACE$(MyVar$, NL$, "
")
CLOSE FILE MyFile
SPLIT MyVar$ BY ";" TO MyArray$ SIZE MyArraySize
FOR i = 0 TO MyArraySize - 1
MyArray$ = REPLACE$(MyArray$, "&#", "")
PRINT CHR$(VAL(MyArray$));
NEXT i

With kind regards,
vovchik

PS. BBCode is messing up the bacon code. The line:
MyVar$ = REPLACE$(MyVar$, NL$, "
") should have double quote, ampersand, hash, 10, semicolon double quote as the last argument.

Posted: **Fri 13 May 2011, 17:51**

Thank you guys

Vovchik
Could your bacon-code be compiled to be a general html2txt converter where MyFile = $1 --> html2txt "/root/myfile.html"

An alternative might be a huge sed command, but that is probably slow as well.

Sigmund

Posted: **Fri 13 May 2011, 19:01**

httpd will do regular encoding, but decoding is in the form %<hex> not &#<decimal> (strangely it encodes to that format though?) ... so something like this should do it.

Code: Select all

#!/bin/sh
IFS=";"
for x in `cat htmltext` ; do
y=${x//\&#/}
a=${a}`printf '%%''%x' $y`
done
httpd -d "$a"

It misses the line returns since they aren't encoded.... you'd have to go line by line to do it exactly right but this is the gist of it... pretty quick too.
would need an outer loop with:
IFS="
"
(that would be the line return)

something like:

Code: Select all

#!/bin/sh
IFS="
" 							# use the new line character as our word separator
for outer in `cat htmltext` ; do			#this splits it up by line
	IFS=";"									#use the ";" as our word separator
	for x in $outer ; do 					#this separates each character
		y=${x//\&#/} 						#remove the "&#" from each string
		a=${a}`printf '%%''%x' $y` 			#%x prints hex, the %% is for httpd
	done
	a=$a"
"											#add the newline character
done
httpd -d $a					#httpd will decode strings in standard %hex format

edit added some comments to help people understand my borked up coding style

Posted: **Sat 14 May 2011, 15:40**

Thank you technosaurus for your solution. it works very well.

Dougal has made a simple sed function that is REALLY fast and without any new dependencies. This brings us closer to the next Pmusic.

Thank you
Sigmund

Posted: **Sun 15 May 2011, 10:37**

Dear zigbert,

Technosaurus did a very nice job that works. Here is a little mod that dispenses with the double loop.

Code: Select all

#!/bin/bash

# read file into memvar and replace each newline with 

mytext=`cat htmltext.html | sed ':a;N;$!ba;s/\n/\
/g'`
IFS=";"
# loop used to parse character
for x in $mytext ; do
	# remove the "&#" from each string
	y=${x//\&#/}
	# %x prints hex; the %% is for httpd
	a=${a}`printf '%%''%x' $y`
done
# fix newline char in text and append trailing newline
a=`echo -e "$a" | sed 's/\%a/\n/g'`$'\012'
# output the result
httpd -d $a

I don't know why, but I get %a in place of newline, so I had do the "a=`echo -e" business. I am certain that could also be fixed. I would very much like to see Dougal's terse and quick solution. Can you please post it here so we can learn something?

With thanks and kind regards,
vovchik

Posted: **Sun 15 May 2011, 11:27**

The following script is much faster than using httpd. I have used bash and nothing else.

Code: Select all

#!/bin/sh
while read line; do
	echo "$line" |
	while read -d ";" char; do
		#Trim "&#" from beginning
		ascii="${char#&\#}"
		#ascii appears to be decimal, octal conversion necessary
		num=""
		while true; do
			digit="$(($ascii%8))"
			num="$digit$num"
			ascii="$(($ascii/8))"
			if test "$ascii" = "0"; then
				#Now write to terminal
				echo -ne "\\$num"
				break
			fi
		done
	done
	#Add non-coded newline
	echo
done < htmltext

Posted: **Sun 15 May 2011, 19:17**

Dear akash,

your version works beautifully. Since I use bash 4.2 I had to change the "echo -ne ..." in the inntermost loop to a simple print f, as follows:

Code: Select all

#!/bin/bash

while read line; do 
	echo "$line" | 
	while read -d ";" char; do 
		# trim "&#" from beginning 
		ascii="${char#&\#}" 
		# ascii appears to be decimal, octal conversion necessary 
		num="" 
		while true; do 
			digit="$(($ascii%8))" 
			num="$digit$num" 
			ascii="$(($ascii/8))" 
			if test "$ascii" = "0"; then 
				# now write to terminal 
				printf "\\$num"
				break 
			fi
		done
	done 
	# add non-coded newline 
	echo 
done < htmltext.html

Until I see Dougal's version, I think this is the way to go. Thanks.

With kind regards,
vovchik

Posted: **Mon 16 May 2011, 14:07**

vovchik wrote:your version works beautifully. Since I use bash 4.2 I had to change the "echo -ne ..." in the inntermost loop to a simple print f,

Why, did they change the behaviour of echo? I can't see anything about it on the net.

In any case, HTML numbers go beyond ASCII...

Posted: **Mon 16 May 2011, 15:40**

Dear Dougal,

I am not sure that it is bash 4.2 that causes this, but I am otherwise at a loss to explain why I get no results with echo -ne - just some blank lines. The moment I use printf, everything works as expected. What about your sed solution? I am interested.

With kind regards,
vovchik

Posted: **Mon 16 May 2011, 17:23**

hmm... i notice that some programs uses "Lynx" as a dep to convert html to text. Would this be applicable?

Posted: **Mon 16 May 2011, 17:37**

Dear r1tz,

Lynx does to the job, but we are after a solution that is minimal - i.e. bash without dependencies or as few of them as possible - since it is by far not certain that lynx would be on a user's machine, and requiring it would not be desirable in this particular context. But you are right that lynx does decode those html escapes.

With kind regards,
vovchik

Posted: **Mon 16 May 2011, 17:44**

sed -f "/path/html_numbers"

Posted: **Mon 16 May 2011, 18:19**

akash_rawal
I tested your elegant solution, but it seems to work only in terminal. - Not in the Pmusic gui which only supports UTF-8.

Sigmund

Posted: **Mon 16 May 2011, 21:39**

Dear Zigbert,

Of all the solutions posted, Dougal's sed version seems to be the fastest. I had no problems getting the text into a GTK gui when I used printf (my little mod to akash's version). It is also pretty fast - and has no external dependencies. Dougal's external sed file could be incorporated into your own script, obviating the need to read an external file read and making it even faster!

WIth kind regards,
vovchik

Posted: **Tue 17 May 2011, 05:07**

My guess is that zigbert's intention is to use it to display lyrics in a gtkdialog from a.website?
I never cease to be amazed by sed.

BTW, the reason I used a single call to httpd vs sed was because I couldn't come up with a way to do a single call to sed (sed and httpd aren't nofork/noexec applets in busybox) ... busybox builds are a whole lot faster than bash with the prefer applets and nofork/noexec options enabled (most distros just disable it rather than fix the scripts that it breaks, not that I blame them, its always harder to read code than write it) if you use bash vs ash as /bin/sh then my script is 5x slower, probably more if you have the full httpd.

Posted: **Wed 18 May 2011, 19:39**

vovchik wrote:I am not sure that it is bash 4.2 that causes this, but I am otherwise at a loss to explain why I get no results with echo -ne - just some blank lines.

I doubt it is bash itself, or the internet would have a lot of noise about it... you could try \echo to see if the external echo works.
Was the new bash compiled with a different kernel? At some stage I started having problems with the builtin sleep not working for me in rc.shutdown... using \sleep fixed it, but I still don't know why it broke.

(old)Puppy Linux Discussion Forum

a bash riddle - for you [solved]

a bash riddle - for you [solved]

an example in bash

bash riddle

works beautifully

Re: works beautifully

echo/printf

lynx

html decoding

Re: echo/printf