Thank you all for the different approaches you have suggested. I am using them all in different ways. Currently I am trying to extend the scripts to operate on the original sms backup files rather than the partially extracted ones that I have collated manually. The image below shows the format of the original raw "backup" of the full text message. If I use the various scripts you all have given me it seems that they behave differently when run against one of these files as there appears to be a character at the end of the line that is disrupting the deletion of the "\nTEXT:" field. (I'm not explaining this very well...)
EDIT : Maybe the \n that works when i search the data in geany is actually a different character in the original file - maybe a CR or LF??
Anyway, I did some tests to determine if instead of searching for the string "\nTEXT:" I could just search for "TEXT:" and replace it with as many backspaces as I needed. What I tried (manually) was to use Geany's replace function to replace "TEXT:" with two backspaces. I tried to use the regular expression of \\ in the hope it represented two backspaces but of course it doesn't - it represents a backslash. So how do I create two backspaces?
After a bit of googling I tried to replace TEXT: with \u0008 or \b but that gives me the strange character you can see in the picture. So is there a way to form a regular expression that Geany understands, that will allow me to apply multiple backspaces sequentially (followed by one or two tabs)?
EDIT 2 : Maybe I need to be searching for \RTEXT: as \R apparently matches
any type of line feed, rather than \n which is more limited.
I am currently trying 6502coders method using the awk.dt file in combination with the following script to process a whole directory of files which contain the raw sms data such as you can see in the image, but just need to fine tune the deletion of the TEXT: and adding an extra backspace to bring that line up onto the previous line.
(the script is working for me but leaving the text data on the next line down from the date/time)
Code: Select all
for i in /root/sms/*
do
awk -f dt.awk "$i" >> data2.txt
done
dt.awk is:
Code: Select all
{ if (substr($1, 1, 5) == "Date:")
{
printf( "%s\t%s", substr($1,6), $2);
}
else if (substr($1, 1, 5) == "TEXT:")
{
printf( "%s\n", substr($0,6));
}
}