a bash riddle - for you [solved]

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Message
Author
User avatar
zigbert
Posts: 6621
Joined: Wed 29 Mar 2006, 18:13
Location: Valåmoen, Norway
Contact:

a bash riddle - for you [solved]

#1 Post by zigbert »

This is a 'hidden' text. Can you please help solve it with a bash script.....
Lyricwiki.org finds it best to return this values instead of clear text, and I think there has to be an existing converter for such.......thank you.


Sigmund
Attachments
htmltext.gz
the gz-extension is fake - just to fool this forum
(5.9 KiB) Downloaded 344 times
Last edited by zigbert on Sat 14 May 2011, 15:40, edited 1 time in total.

User avatar
puppyluvr
Posts: 3470
Joined: Sun 06 Jan 2008, 23:14
Location: Chickasha Oklahoma
Contact:

#2 Post by puppyluvr »

:D Hello,
It is ascii...first word is "look", but too lazy to write a script... :oops:
Close the Windows, and open your eyes, to a whole new world
I am Lead Dog of the
Puppy Linux Users Group on Facebook
Join us!

Puppy since 2.15CE...

User avatar
8-bit
Posts: 3406
Joined: Wed 04 Apr 2007, 03:37
Location: Oregon

#3 Post by 8-bit »

If you want to read it without creating a script to decode it, just give it a "htm" extension to the filename.
It will then open decoded in your browser.
It is a poetry of sorts.
But it lacks carriage return/line feeds though.

The first line is "Look into my eyes - You will see"

jpeps
Posts: 3179
Joined: Sat 31 May 2008, 19:00

#4 Post by jpeps »

8-bit wrote:If you want to read it without creating a script to decode it, just give it a "htm" extension to the filename.
It will then open decoded in your browser.
It is a poetry of sorts.
But it lacks carriage return/line feeds though.

The first line is "Look into my eyes - You will see"
Looks great in a links text browser

User avatar
vovchik
Posts: 1507
Joined: Tue 24 Oct 2006, 00:02
Location: Ukraine

an example in bash

#5 Post by vovchik »

Dear zigbert,

This does the trick but is incredibly slow. Wait for it to finish....and you will see the text:

Code: Select all

#!/bin/bash

text=$(cat htmltext.html)
for i in $(echo "$text" | grep -o "&#..;\|&#...;\|&#....;\|&#.....;");
do
[[ "$(echo $i | grep "x")" != "" ]] && export j=$(printf "%d" "$(echo "$i" | sed -e "s/&#\(.*\);/0\1/")") || export j=$(echo "$i" | sed -e "s/&#\(.*\);/\1/")
text=$(echo "$text" | sed -e "s/$i/$(perl -CS -e 'print chr("$ENV{j}")')/");
done;
echo "$text"
Here is a BaCon version that is MUCH faster (almost instant):
OPEN "htmltext.html" FOR READING AS MyFile
MyVar$ = ""
WHILE NOT(ENDFILE(MyFile)) DO
READLN MyLine$ FROM MyFile
MyVar$ = CONCAT$(MyVar$, MyLine$, NL$)
WEND
MyVar$ = REPLACE$(MyVar$, NL$, "
")
CLOSE FILE MyFile
SPLIT MyVar$ BY ";" TO MyArray$ SIZE MyArraySize
FOR i = 0 TO MyArraySize - 1
MyArray$ = REPLACE$(MyArray$, "&#", "")
PRINT CHR$(VAL(MyArray$));
NEXT i


With kind regards,
vovchik

PS. BBCode is messing up the bacon code. The line:
MyVar$ = REPLACE$(MyVar$, NL$, "
") should have double quote, ampersand, hash, 10, semicolon double quote as the last argument.

User avatar
zigbert
Posts: 6621
Joined: Wed 29 Mar 2006, 18:13
Location: Valåmoen, Norway
Contact:

#6 Post by zigbert »

Thank you guys

Vovchik
Could your bacon-code be compiled to be a general html2txt converter where MyFile = $1 --> html2txt "/root/myfile.html"

An alternative might be a huge sed command, but that is probably slow as well.


Sigmund

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#7 Post by technosaurus »

httpd will do regular encoding, but decoding is in the form %<hex> not &#<decimal> (strangely it encodes to that format though?) ... so something like this should do it.

Code: Select all

#!/bin/sh
IFS=";"
for x in `cat htmltext` ; do
y=${x//\&#/}
a=${a}`printf '%%''%x' $y`
done
httpd -d "$a"
It misses the line returns since they aren't encoded.... you'd have to go line by line to do it exactly right but this is the gist of it... pretty quick too.
would need an outer loop with:
IFS="
"
(that would be the line return)

something like:

Code: Select all

#!/bin/sh
IFS="
" 							# use the new line character as our word separator
for outer in `cat htmltext` ; do			#this splits it up by line
	IFS=";"									#use the ";" as our word separator
	for x in $outer ; do 					#this separates each character
		y=${x//\&#/} 						#remove the "&#" from each string
		a=${a}`printf '%%''%x' $y` 			#%x prints hex, the %% is for httpd
	done
	a=$a"
"											#add the newline character
done
httpd -d $a					#httpd will decode strings in standard %hex format
edit added some comments to help people understand my borked up coding style
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
zigbert
Posts: 6621
Joined: Wed 29 Mar 2006, 18:13
Location: Valåmoen, Norway
Contact:

#8 Post by zigbert »

Thank you technosaurus for your solution. it works very well.

Dougal has made a simple sed function that is REALLY fast and without any new dependencies. This brings us closer to the next Pmusic.


Thank you
Sigmund

User avatar
vovchik
Posts: 1507
Joined: Tue 24 Oct 2006, 00:02
Location: Ukraine

bash riddle

#9 Post by vovchik »

Dear zigbert,

Technosaurus did a very nice job that works. Here is a little mod that dispenses with the double loop.

Code: Select all

#!/bin/bash

# read file into memvar and replace each newline with 

mytext=`cat htmltext.html | sed ':a;N;$!ba;s/\n/\
/g'`
IFS=";"
# loop used to parse character
for x in $mytext ; do
	# remove the "&#" from each string
	y=${x//\&#/}
	# %x prints hex; the %% is for httpd
	a=${a}`printf '%%''%x' $y`
done
# fix newline char in text and append trailing newline
a=`echo -e "$a" | sed 's/\%a/\n/g'`$'\012'
# output the result
httpd -d $a
I don't know why, but I get %a in place of newline, so I had do the "a=`echo -e" business. I am certain that could also be fixed. I would very much like to see Dougal's terse and quick solution. Can you please post it here so we can learn something?

With thanks and kind regards,
vovchik

akash_rawal
Posts: 229
Joined: Wed 25 Aug 2010, 15:38
Location: ISM Dhanbad, Jharkhand, India

#10 Post by akash_rawal »

The following script is much faster than using httpd. I have used bash and nothing else.

Code: Select all

#!/bin/sh
while read line; do
	echo "$line" |
	while read -d ";" char; do
		#Trim "&#" from beginning
		ascii="${char#&\#}"
		#ascii appears to be decimal, octal conversion necessary
		num=""
		while true; do
			digit="$(($ascii%8))"
			num="$digit$num"
			ascii="$(($ascii/8))"
			if test "$ascii" = "0"; then
				#Now write to terminal
				echo -ne "\\$num"
				break
			fi
		done
	done
	#Add non-coded newline
	echo
done < htmltext


User avatar
vovchik
Posts: 1507
Joined: Tue 24 Oct 2006, 00:02
Location: Ukraine

works beautifully

#11 Post by vovchik »

Dear akash,

your version works beautifully. Since I use bash 4.2 I had to change the "echo -ne ..." in the inntermost loop to a simple print f, as follows:

Code: Select all

#!/bin/bash

while read line; do 
	echo "$line" | 
	while read -d ";" char; do 
		# trim "&#" from beginning 
		ascii="${char#&\#}" 
		# ascii appears to be decimal, octal conversion necessary 
		num="" 
		while true; do 
			digit="$(($ascii%8))" 
			num="$digit$num" 
			ascii="$(($ascii/8))" 
			if test "$ascii" = "0"; then 
				# now write to terminal 
				printf "\\$num"
				break 
			fi
		done
	done 
	# add non-coded newline 
	echo 
done < htmltext.html
Until I see Dougal's version, I think this is the way to go. Thanks.

With kind regards,
vovchik

User avatar
Dougal
Posts: 2502
Joined: Wed 19 Oct 2005, 13:06
Location: Hell more grotesque than any medieval woodcut

Re: works beautifully

#12 Post by Dougal »

vovchik wrote:your version works beautifully. Since I use bash 4.2 I had to change the "echo -ne ..." in the inntermost loop to a simple print f,
Why, did they change the behaviour of echo? I can't see anything about it on the net.

In any case, HTML numbers go beyond ASCII...
What's the ugliest part of your body?
Some say your nose
Some say your toes
But I think it's your mind

User avatar
vovchik
Posts: 1507
Joined: Tue 24 Oct 2006, 00:02
Location: Ukraine

echo/printf

#13 Post by vovchik »

Dear Dougal,

I am not sure that it is bash 4.2 that causes this, but I am otherwise at a loss to explain why I get no results with echo -ne - just some blank lines. The moment I use printf, everything works as expected. What about your sed solution? I am interested. :D

With kind regards,
vovchik

User avatar
r1tz
Posts: 162
Joined: Thu 09 Sep 2010, 05:19
Location: In #puppylinux (IRC)

#14 Post by r1tz »

hmm... i notice that some programs uses "Lynx" as a dep to convert html to text. Would this be applicable?

User avatar
vovchik
Posts: 1507
Joined: Tue 24 Oct 2006, 00:02
Location: Ukraine

lynx

#15 Post by vovchik »

Dear r1tz,

Lynx does to the job, but we are after a solution that is minimal - i.e. bash without dependencies or as few of them as possible - since it is by far not certain that lynx would be on a user's machine, and requiring it would not be desirable in this particular context. But you are right that lynx does decode those html escapes.

With kind regards,
vovchik

User avatar
zigbert
Posts: 6621
Joined: Wed 29 Mar 2006, 18:13
Location: Valåmoen, Norway
Contact:

#16 Post by zigbert »

sed -f "/path/html_numbers"
Attachments
html_numbers.gz
(787 Bytes) Downloaded 232 times

User avatar
zigbert
Posts: 6621
Joined: Wed 29 Mar 2006, 18:13
Location: Valåmoen, Norway
Contact:

#17 Post by zigbert »

akash_rawal
I tested your elegant solution, but it seems to work only in terminal. - Not in the Pmusic gui which only supports UTF-8.


Sigmund

User avatar
vovchik
Posts: 1507
Joined: Tue 24 Oct 2006, 00:02
Location: Ukraine

html decoding

#18 Post by vovchik »

Dear Zigbert,

Of all the solutions posted, Dougal's sed version seems to be the fastest. I had no problems getting the text into a GTK gui when I used printf (my little mod to akash's version). It is also pretty fast - and has no external dependencies. Dougal's external sed file could be incorporated into your own script, obviating the need to read an external file read and making it even faster!

WIth kind regards,
vovchik

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#19 Post by technosaurus »

My guess is that zigbert's intention is to use it to display lyrics in a gtkdialog from a.website?
I never cease to be amazed by sed.

BTW, the reason I used a single call to httpd vs sed was because I couldn't come up with a way to do a single call to sed (sed and httpd aren't nofork/noexec applets in busybox) ... busybox builds are a whole lot faster than bash with the prefer applets and nofork/noexec options enabled (most distros just disable it rather than fix the scripts that it breaks, not that I blame them, its always harder to read code than write it) if you use bash vs ash as /bin/sh then my script is 5x slower, probably more if you have the full httpd.
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
Dougal
Posts: 2502
Joined: Wed 19 Oct 2005, 13:06
Location: Hell more grotesque than any medieval woodcut

Re: echo/printf

#20 Post by Dougal »

vovchik wrote:I am not sure that it is bash 4.2 that causes this, but I am otherwise at a loss to explain why I get no results with echo -ne - just some blank lines.
I doubt it is bash itself, or the internet would have a lot of noise about it... you could try \echo to see if the external echo works.
Was the new bash compiled with a different kernel? At some stage I started having problems with the builtin sleep not working for me in rc.shutdown... using \sleep fixed it, but I still don't know why it broke.
What's the ugliest part of your body?
Some say your nose
Some say your toes
But I think it's your mind

Post Reply