sed scratch pad -- A thread of sed examples

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Message
Author
s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

sed scratch pad -- A thread of sed examples

#1 Post by s243a »

I find sed very difficult to grasp. This thread is to help demonstrate how to do things in sed which one might not be able to find example for online.

My first post is an example on how to call an external function in sed. Here is my example:

Code: Select all

echo a | sed -ne 's/\(.*\)/echo a\1/' -e 'e' -e 'p'
or alternatively:

Code: Select all

echo a | sed -ne '
s/\(.*\)/echo a\1/ #Replace "a" with echo aa
e                  #Execute the output of the last command
p                  #Print the result
the -n option is needed to keep sed from auto printing. Otherwise sed would print each line that it reads.

* The 's' denotes string substitution.
* The brackests "\(...\)" capture the text which matches the regular expression inside the brackets. In our case the regular expression is .* which means match any string (in our case 'a'). The value of the match can be retrieved with the back reference "\1". The backslash in front of each bracket isn't necessary if you use extended regular expressions. However, with extended regular expressions more escaping of special characters may be required.

Next we Execute the external command which is the output of our last expression. In our case we are executing the external command echo aa. The "e" character means execute the external command.

Finally we print the result. The 'p' command is used to print the result.

The output is "aa"
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

Re: sed scratch pad -- A thread of sed examples

#2 Post by MochiMoppel »

s243a wrote:My first post is an example on how to call an external function in sed. Here is my example:

Code: Select all

echo a | sed -ne 's/\(.*\)/echo a\1/' -e 'e' -e 'p'
Alternatively

Code: Select all

echo a | sed -nr 's/(.*)/echo a\1/ep'
or even shorter

Code: Select all

echo a | sed 's/.*/echo a&/e'
Beware that the e command is a GNU extension and most likely will only work with GNU sed. Does not work with busybox sed.
In my experience calling shell commands from within sed is very slow. Should probably be used only when no other alternatives exist.

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

Re: sed scratch pad -- A thread of sed examples

#3 Post by s243a »

MochiMoppel wrote:
s243a wrote:My first post is an example on how to call an external function in sed. Here is my example:

Code: Select all

echo a | sed -ne 's/\(.*\)/echo a\1/' -e 'e' -e 'p'
Alternatively

Code: Select all

echo a | sed -nr 's/(.*)/echo a\1/ep'
or even shorter

Code: Select all

echo a | sed 's/.*/echo a&/e'
Beware that the e command is a GNU extension and most likely will only work with GNU sed. Does not work with busybox sed.
In my experience calling shell commands from within sed is very slow. Should probably be used only when no other alternatives exist.
Thanks for the tips and warnings. I want to to hone my sed skills because it is used a lot. This means learning both standard sed an extensions.
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

#4 Post by s243a »

I'm borrowing the next one, which simply numbers the lines of a file:
sed '/./=' test | sed '/./N; s/\n/ /'
http://tuxthink.blogspot.com/2012/01/ad ... -file.html

I spent a fiar bit of time trying to google better ways of doing this and while it can probably be done without a pipe the code to do so is probably more complex. The tricky thing about this problem is that the "=" command prints the line number but inserts a new line character after it.

In the above exmaple the "/./" means "match any non-empty line. Said match (pun unintended), wouldn't be necessary if we wanted to match every line.

This syntax is pattern (e.g. /./ ) command ( "=" ). When the pattern matches the command is executed (i.e. print the line number followed by a new line character).

The input file (i.e. test) is:
Hi
How are
You.
The first sed command in the pipe outputs:
1
Hi
2
How are
3
You.
The second sed command, reads two lines and then removes the new line character. The reading of the second line is done with the "N" command which appends the next line into pattern space. When sed prints it automatically inserts a new line character at the end of the output, unless you use the "-z" option is used in which case the null character (i.e. $\'0' ) is used instead of the new line character.

Many of the sed man pages don't mention that you can use the null caracter as the new line seperator. One way to perhaps do this in a single sed script is to use the "-z" option but there there will be hidden null characters in the output.

As a final note, the fact that we need two sed commands to do this means that some other utility is probably preferable for this application. However, there may be times where one has good reason to pipe sed to sed, in which case this example might be a good starting point.
Last edited by s243a on Mon 30 Dec 2019, 21:31, edited 2 times in total.
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

User avatar
rockedge
Posts: 1864
Joined: Wed 11 Apr 2012, 13:32
Location: Connecticut, United States
Contact:

#5 Post by rockedge »

thanks guys for the sed tips....I'm beginning to use it more often

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#6 Post by MochiMoppel »

s243a wrote:I'm borrowing the next one, which simply numbers the lines of a file
1) The code numbers only non-empty lines of a file. Intentionally?
2) The linked page shows the output having periods after the line numbers, which is not the output of this code
3) Not a mistake but still bad: Naming a file 'test' can lead to nasty errors since test is also the name of a shell command.
s243a wrote:In the above exmaple the "/./" means "match any line.
No, it means "match any line containing at least one character"
For matching any line the code could have used "/^/" or simply no match pattern at all:

Code: Select all

sed = filename | sed 'N;s/\n/ /'
s243a wrote:When sed prints it automatically inserts a new line character at the end of the output, unless you use the "-z" option is used in which case the null character (i.e. $\'0' ) is used instead of the new line character.
???
It never adds a new line character at the end of the output and I doubt that the -z option would add a null character. Have you tried this?
s243a wrote:Many of the sed man pages don't mention that you can use the null caracter as the new line seperator. One way to perhaps do this in a single sed script is to use the "-z" option but there there will be hidden null characters in the output.
I assume that one reason for not mentioning this option is the fact that it's relatively new. My GNU sed version 4.2.1 knows nothing about it. My understanding is that it treats null characters in the input like it would treat linefeeds without this option. It would treat "real" linefeeds as normal characters. Neither null characters nor linefeeds would be stripped or changed for the output, unless explicitely changed by the code.

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

#7 Post by s243a »

MochiMoppel wrote:
s243a wrote:I'm borrowing the next one, which simply numbers the lines of a file
1) The code numbers only non-empty lines of a file. Intentionally?
2) The linked page shows the output having periods after the line numbers, which is not the output of this code
3) Not a mistake but still bad: Naming a file 'test' can lead to nasty errors since test is also the name of a shell command.
s243a wrote:In the above exmaple the "/./" means "match any line.
No, it means "match any line containing at least one character"
Yes. I realized this after reading point "1" above. I suppose it is cleaner to not number empty lines.
For matching any line the code could have used "/^/" or simply no match pattern at all:
Agreed.
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

#8 Post by s243a »

MochiMoppel wrote:
s243a wrote:When sed prints it automatically inserts a new line character at the end of the output, unless you use the "-z" option is used in which case the null character (i.e. $\'0' ) is used instead of the new line character.
???
It never adds a new line character at the end of the output and I doubt that the -z option would add a null character. Have you tried this?
s243a wrote:Many of the sed man pages don't mention that you can use the null caracter as the new line seperator. One way to perhaps do this in a single sed script is to use the "-z" option but there there will be hidden null characters in the output.
I assume that one reason for not mentioning this option is the fact that it's relatively new. My GNU sed version 4.2.1 knows nothing about it. My understanding is that it treats null characters in the input like it would treat linefeeds without this option. It would treat "real" linefeeds as normal characters. Neither null characters nor linefeeds would be stripped or changed for the output, unless explicitely changed by the code.
We'll look into how sed actually works here later, but for now consider the following:

Code: Select all

[root@dpupbuster64 ~] $ { echo -n a; printf '\0'; echo -n b; } | sed -z p | tr '\0' '\n'; echo ""
a
a
b
b
[root@dpupbuster64 ~] $ { echo -n a; printf '\0'; echo -n b; } | sed -z p; echo ""
aabb
[root@dpupbuster64 ~] $ { echo -n a; printf '\0'; echo -n b; } | sed -nz p; echo ""
ab
[root@dpupbuster64 ~] $ { echo -n a; printf '\0'; echo -n b; } | sed -nz p | tr '\0' '\n'; echo ""
a
b
[root@dpupbuster64 ~] $ { echo -n a; printf '\0'; echo -n b; } | sed -zne 's/\(.*\)/c\1/;p' | tr '\0' '\n'; echo ""
ca
cb
I need some time to ponder this and part of pondering it is figuring out how to properly test it.

Note that I had to use the printf function because apparently in bash you can't sotre a null character in a variable (or even string?).

BTW on dpup buster64 we have "sed (GNU sed) 4.7"


Edit: so considering the above here is how we can do it in a single sed command:

Code: Select all

[root@dpupbuster64 ~] $ { echo -n a; printf '\0'; echo -n b; } | sed -zne '=' -rne 's/([^0-9]+.*)/\1\n/;p'
1a
2b
Of course there are hidden null characters here.

Code: Select all

[root@dpupbuster64 ~] $ { echo -n a; printf '\0'; echo -n b; } | sed -zne '=' -rne 's/([^0-9]+.*)/\1\n/;p' | tr '\0' '.'
1.a
.2.b
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#9 Post by MochiMoppel »

s243a wrote:Note that I had to use the printf function
:?:
Instead of
{ echo -n a; printf '\0'; echo -n b; }
try
echo -ne "a\x00b"

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

#10 Post by s243a »

MochiMoppel wrote:
s243a wrote:Note that I had to use the printf function
:?:
Instead of
{ echo -n a; printf '\0'; echo -n b; }
try
echo -ne "a\x00b"
That also worked.

Code: Select all

# echo -ne "a\x00b" | sed -zne '=' -rne 's/([^0-9]+.*)/\1\n/;p' | tr '\0' '.'
1.a
.2.b
Thanks for the tip. :) Do you have any documentation on those kinds of codes with the echo command?
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

#11 Post by s243a »

The next example, I will also borrow:

Code: Select all

sed -e '/./{H;$!d;}' -e 'x;/Administration/!d' thegeekstuff.txt
https://www.thegeekstuff.com/2009/12/un ... perations/

What this example does is prints paragraphs that contain the word Administration. The two ways to solve this problem, which are apparent to me are as follows:
1. Either use the hold space or alternatively
2. Use Loops.

The above example uses approach #1. I will also try this in another post using approach #2.

The reason that we use the "hold space" here is that when sed reads the next line of input [1], as part of a new cycle, the previous line that is in pattern space is replaced by the line of text read in the next cycle (see execution cycle). The two ways around this -- as noted above -- are to either append the previous line to hold space before starting the next cycle, or alternatively use the "N" command to append the next line of text (as a new line), into pattern space.

So anyway recall that /./ matches non blank lines. If there is a match, then we use the "H" command to append the line in we just read from standard in (which is currently in pattern space), into hold space. After this you'll notice "$!d", which means if we are at the last line than delete the pattern space. See "Relations between d, p, and !" at:
https://www.grymoire.com/Unix/Sed.html

Anyway, I'm not really sure of the point of doing this since in the next command (i.e. 'x') we replace the pattern space with the contents of the hold space, which will effectively delete the previous pattern space anyway. The final action in the script is:

Code: Select all

/Administration/!d'
which means, "Delete the paragraph if it doesn't contain the word "Administration".

Notes
---------------------
1. We call it the "next line of input" but the lines can be separated either by a new line character, or in the case of the "-z" option a null character. The -z option is only available in newer versions of sed.
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#12 Post by MochiMoppel »

s243a wrote:Do you have any documentation on those kinds of codes with the echo command?
Not sure what you mean by "those kinds of codes". You'll find a good starting point right at your fingertips:

Code: Select all

help echo
I recommend to stay away from octal codes and always use hex codes since with hex the syntax in bash echo, busybox echo and bash printf is the same. And unless you know what you are doing you should avoid to use abbreviated codes like '\0'. You'll always be safe when you use 3 digits for octal and 2 digits for hex.

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#13 Post by MochiMoppel »

Looks like another abandoned thread :cry:

I'll give it a try anyway since I don't know where to ask.
The challenge is to remove all comments from a XML/HTML document, using only sed.

Example text:

Code: Select all

<JWM>
	<Tray  autohide="false" insert="right" x="0" y="-1" border="1" height="28" >
		<!-- Additional TrayButton attribute: label -->
		<TrayButton label="Menu" icon="logo-mini.png" border="true">root:3</TrayButton>
border="true">exec:urxvt</TrayButton>
		<Pager/>
		<!-- Additional TaskList attribute: maxwidth -->
		<TaskList maxwidth="200"/>
		<Dock/>
		<!-- Additional Swallow attribute: height -->
	<!--	<Swallow name="blinky">
			blinkydelayed -bg "#DCDAD5"
		</Swallow> -->
	<!--	<Swallow name="xtmix-launcher">
			xtmix -launch
		</Swallow> -->
	<!--	<Swallow name="asapm">
			asapmshell -u 4
		</Swallow> -->
	<!--	<Swallow name="freememapplet" width="34">
			freememappletshell
		</Swallow> -->
		<Swallow name="xload" width="32">
			xload -nolabel -bg "#888888" -fg red -hl white
		</Swallow>
		<Clock format="%H:%M">minixcal</Clock>
	</Tray>
</JWM>
The problem is that these comments can be multiline. My rough idea is to let sed move a line to the hold buffer when a '<!--' tag is detected, then continue to fill the hold buffer until a '--> is detexted', load the hold buffer into the pattern space and remove the comment, clear the hold buffer and continue with the next cycle. May not be the right way and I'm not even close to achieve the goal. Does anybody know how to do this?

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

#14 Post by s243a »

MochiMoppel wrote:Looks like another abandoned thread :cry:

I'll give it a try anyway since I don't know where to ask.
The challenge is to remove all comments from a XML/HTML document, using only sed.

Example text:

Code: Select all

<JWM>
	<Tray  autohide="false" insert="right" x="0" y="-1" border="1" height="28" >
		<!-- Additional TrayButton attribute: label -->
		<TrayButton label="Menu" icon="logo-mini.png" border="true">root:3</TrayButton>
border="true">exec:urxvt</TrayButton>
		<Pager/>
		<!-- Additional TaskList attribute: maxwidth -->
		<TaskList maxwidth="200"/>
		<Dock/>
		<!-- Additional Swallow attribute: height -->
	<!--	<Swallow name="blinky">
			blinkydelayed -bg "#DCDAD5"
		</Swallow> -->
	<!--	<Swallow name="xtmix-launcher">
			xtmix -launch
		</Swallow> -->
	<!--	<Swallow name="asapm">
			asapmshell -u 4
		</Swallow> -->
	<!--	<Swallow name="freememapplet" width="34">
			freememappletshell
		</Swallow> -->
		<Swallow name="xload" width="32">
			xload -nolabel -bg "#888888" -fg red -hl white
		</Swallow>
		<Clock format="%H:%M">minixcal</Clock>
	</Tray>
</JWM>
The problem is that these comments can be multiline. My rough idea is to let sed move a line to the hold buffer when a '<!--' tag is detected, then continue to fill the hold buffer until a '--> is detexted', load the hold buffer into the pattern space and remove the comment, clear the hold buffer and continue with the next cycle. May not be the right way and I'm not even close to achieve the goal. Does anybody know how to do this?
I have to go to work so something like:

Code: Select all

#If we don't yet have a terminating comment just append to the hold space and start the next cycle. 
/.*-->.*/!{
  H #Append pattern space to hold space
  d #Delete pattern space and start next cycle 
  }
#If we have a closing comment append data to hold space and copy the hold space to the pattern space to see if we can match both an opening and closing comment in pattern space. 
/.*-->.*/ { 
    H #Append new data to hold space  
    x #Exchange hold space with pattern space
    h #Copy pattern space to hold space
  }
#If this block matches the previous block has already been executed and this block will be executed next. 
/.*<!--.* -->.*./ { 
    s/<!--.* -->// #Delete comment
    p #Print patter space
    s/.*//g #delete pattern space
    x #exchange pattern space with hold space
    d #delete pattern space and start next cycle.
  }
I might test this latter. We'll see.
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

User avatar
6502coder
Posts: 677
Joined: Mon 23 Mar 2009, 18:07
Location: Western United States

#15 Post by 6502coder »

Isn't this essentially the same as the problem of using sed to remove comments from a C program, for which Googling turns up a bunch of suggestion? I haven't looked into this carefully, just making an observation.

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#16 Post by MochiMoppel »

Yes, essentially the same, but the suggestions I've seen so far are either not for sed, are crap or a combination of both.

User avatar
Keef
Posts: 987
Joined: Thu 20 Dec 2007, 22:12
Location: Staffordshire

#17 Post by Keef »

Is this any good?
https://stackoverflow.com/questions/405 ... ing-regexp

Using your example, the ouput I get is:

Code: Select all

# cat file.html | sed -e :a -re 's/<!--.*?-->//g;/<!--/N;//ba'
<JWM>
   <Tray  autohide="false" insert="right" x="0" y="-1" border="1" height="28" >
      
      <TrayButton label="Menu" icon="logo-mini.png" border="true">root:3</TrayButton>
border="true">exec:urxvt</TrayButton>
      <Pager/>
      
      <TaskList maxwidth="200"/>
      <Dock/>
      
   
   
   
   
      <Swallow name="xload" width="32">
         xload -nolabel -bg "#888888" -fg red -hl white
      </Swallow>
      <Clock format="%H:%M">minixcal</Clock>
   </Tray>
</JWM>
# 

User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

html minifier in sed

#18 Post by sc0ttman »

I'd love to get this working:

An HTML minifier...

This thing nearly does the job, except that it minifies stuff inside <pre> tags...

I would love love love to fix that!!

Code: Select all

function minify_html {
  # temp fix to IFS, just in case the hmtl files contain spaces
  OLD_IFS=$IFS
  IFS="
  "
  for html_file in $html_files
  do
    :
    # dont minify HTML until we can skip contents of <pre>..</pre>
    #sed ':a;N;$!ba;/<div class="highlight"><pre>\.*<\/pre><\/div>/! s@>\s*<@><@g' $html_file > ${html_file//.html/.minhtml}
    #mv ${html_file//.html/.minhtml} ${html_file}
  done
  IFS=$OLD_IFS
}
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

#19 Post by s243a »

Keef wrote:Is this any good?
https://stackoverflow.com/questions/405 ... ing-regexp

Using your example, the ouput I get is:

Code: Select all

# cat file.html | sed -e :a -re 's/<!--.*?-->//g;/<!--/N;//ba'
<JWM>
   <Tray  autohide="false" insert="right" x="0" y="-1" border="1" height="28" >
      
      <TrayButton label="Menu" icon="logo-mini.png" border="true">root:3</TrayButton>
border="true">exec:urxvt</TrayButton>
      <Pager/>
      
      <TaskList maxwidth="200"/>
      <Dock/>
      
   
   
   
   
      <Swallow name="xload" width="32">
         xload -nolabel -bg "#888888" -fg red -hl white
      </Swallow>
      <Clock format="%H:%M">minixcal</Clock>
   </Tray>
</JWM>
# 
I get the same output with:

Code: Select all

#Match the last line
$,/.*/ {    
	H #Append new data to hold space 
    x #Exchange hold space with pattern space
    s/<!--.*-->//g #Delete comment
    p #Print pattern space
  }    
#If we don't yet have a terminating comment just append to the hold space and start the next cycle.
/.*-->.*/! {
  H #Append pattern space to hold space
  d #Delete pattern space and start next cycle
  }
#If we have a closing comment append data to hold space and copy the hold space to the pattern space to see if we can match both an opening and closing comment in pattern space.
/.*-->.*/ {
    H #Append new data to hold space 
    x #Exchange hold space with pattern space
    h #Copy pattern space to hold space
  }
#If this block matches the previous block has already been executed and this block will be executed next.
/.*<!--.*-->.*/ {
    s/<!--.*-->//g #Delete comment
    p #Print pattern space
    s/.*//g #delete pattern space
    x #exchange pattern space with hold space
    d #delete pattern space and start next cycle.
  }
Test program:
https://pastebin.com/tNttFjyT
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

jamesbond
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#20 Post by jamesbond »

MochiMoppel wrote:The challenge is to remove all comments from a XML/HTML document, using only sed.
Challenge accepted.

This removes the comments and cleans up stray newlines.

Code: Select all

sed -n 'H;x;s/<!--.*-->//;x;${x;s/\n//;s/\n[ \n\t]*\n/\n/g;p}' test.html
If you only want to remove the comments and don't worry about how it looks, this will do.

Code: Select all

sed -n 'H;x;s/<!--.*-->//;x;${x;p}' test.html
Confirmed to work with gnu sed and busybox sed.
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]

Post Reply