Change
Code: Select all
sed -n '/^Date/ N;s/Date://;s/[^0-9]*TEXT:/\t/p' /mnt/home/test/*.vmg > /mnt/home/test/result.txt
Code: Select all
sed -n '/^Date/ N;s/Date://;s/[\n\r]\+TEXT:/\t/p' /mnt/home/test/*.vmg > /mnt/home/test/result.txt
Code: Select all
sed -n '/^Date/ N;s/Date://;s/[^0-9]*TEXT:/\t/p' /mnt/home/test/*.vmg > /mnt/home/test/result.txt
Code: Select all
sed -n '/^Date/ N;s/Date://;s/[\n\r]\+TEXT:/\t/p' /mnt/home/test/*.vmg > /mnt/home/test/result.txt
Code: Select all
sed -n '/^Date/ N;s/Date:\(.*\)\.2015 \(.*\)\r\nTEXT:\(.*\)\r/\1\t\2\t\3/p' /pathtosmsdir/*.vmg >yourextracts
Code: Select all
tb=$'\t'
cr=$'\r'
FILES=/path2smsdir/*.vmg
for f in $FILES; do
[ ! -f "$f" ] && continue #optionally
while IFS=": " read -r tkn rest || [ "$tkn" ]; do
rest="${rest/$cr/}"
case $tkn in
Date) first="${rest/.2015 /$tb}";;
TEXT) printf "%s\t\t%s\n" "$first" "$rest";;
esac
done < "$f" >>yourextracts
done
Code: Select all
tb=$'\t'
cr=$'\r'
FILES=/path2smsdir/*.vmg
for f in $FILES; do
[ ! -f "$f" ] && continue #optionally
while IFS=":" read -r tkn rest || [ "$tkn" ]; do
case $tkn in
Date) rest="${rest/$cr/}";first="${rest/.2015 /$tb}";;
TEXT) printf "%s\t\t%s\n" "$first" "${rest/$cr/}";;
esac
done < "$f" >>yourextracts
done
Code: Select all
:rest="${rest/$cr/}"
Yes, starting from the end of the string remove all chars up to and including the furthermost space.@seaside
Wouldn't fdate=${fdate%% *} remove the time strings?
Sorry for the delay, I'm busy babysitting my granddaughter for a couple of days and not getting enough computer timeseaside wrote: this is probably not a good time to bring this up, but isn't this sms data in a sql database? Couldn't you pull whatever data you wished with sqlite3 queries?
My awk solutions (as well as some of the other solutions) do not require that the TEXT line immediately follows the Date line, although they DO assume that the Date line occurs before its corresponding TEXT line, and that Date-TEXT pairs aren't interwoven (ie once a Date line occurs, it is assumed that the matching TEXT line will occur before any other Date line occurs.some1 wrote:As MochiMoppel indicated:
Hitherto - (all?) codepieces is based on the assumption that
the input-data are NOT jumbled i.e:
We ASSUME that
1) a Date: -line is immediately followed by
2) a TEXT:-line - which is NOT a multiline-field.
Code: Select all
sed '/^BEGIN:VBODY/,/^END:VBODY/!d;//d' msgfile
There are a couple of anomalies that I have seen which I may need to handle manually if they are too difficult to fix programatically. One is the \, that you mentioned - the strange thing is that this odd way of presenting a "comma" only occurs from some of the cellphones that text me, not all of them. Maybe it is caused by a difference in language selection or encoding in those phones? I will try to find an explanation for this later on. In the meantime I plan to use Geany to resolve these artifacts with a simple manual replace, after the main script has stripped the text from the messages.some1 wrote:Let greengeek run it on ALL files.
Then - based on the statistics - the questions about the reliability of
the structure would likely be answered
Code: Select all
TEXT;encoding non-standardUTF8:Wow that seems like a lively thing to say :-) %^$%#@$
See 2nd example.greengeek wrote:Code: Select all
TEXT;encoding non-standardUTF8:Wow that seems like a lively thing to say :-) %^$%#@$
Code: Select all
for f in *.vmg;do
fmsg=$(<"$f")
fbody=${fmsg#*BEGIN:VBODY[[:cntrl:]]}
fbody=${fbody%[[:cntrl:]]END:VBODY*}
fdate=${fbody#*Date:}
fdate=${fdate%%[[:cntrl:]]*}
space=${fdate//?/ }
ftext=${fbody#*TEXT*:}
ftext=${ftext//$'\n'/$'\n'$space $'\t'}
echo -e "$fdate \t$ftext" >> result.txt
done
Code: Select all
$ cat dt3.awk
# This version handles multi-line TEXT sections
BEGIN { RS="\r\n" }
/^Date:/ { printf( "%s\t%s", substr($1,6), $2) }
/^TEXT:/ { text = sprintf( "\t%s\n", substr($0,6))
getline y
while (y != "END:VBODY")
{
text = text y "\n"
getline y
}
printf( "%s", text )
}
This works well, but does not include the .2015 portion of the date (which is in line with my original request but I now feel I should include the whole date for clarity)some1 wrote: tb=$'\t'
cr=$'\r'
FILES=/path2smsdir/*.vmg
for f in $FILES; do
[ ! -f "$f" ] && continue #optionally
while IFS=":" read -r tkn rest || [ "$tkn" ]; do
case $tkn in
Date) rest="${rest/$cr/}";first="${rest/.2015 /$tb}";;
TEXT) printf "%s\t\t%s\n" "$first" "${rest/$cr/}";;
esac
done < "$f" >>yourextracts
done
Based on what I have seen so far, yes that is a fair assumption - however I have seen a couple of weird texts which - although they did follow this pattern - were not quite the same as the others. Unfortunately I accidentally erased these texts and they have also been deleted from the senders outbox (quite annoying) so I cannot get them re-sent for further testing. I will post any future oddities that I find.some1 wrote:Hitherto - (all?) codepieces is based on the assumption that
the input-data are NOT jumbled i.e:
We ASSUME that
1) a Date: -line is immediately followed by
2) a TEXT:-line - which is NOT a multiline-field.
Can we/you rely on that assumption,greengeek?
This works perfectly. Thanks.MochiMoppel wrote:#!/bin/bash
for f in /root/Message/*.vmg;do
fmsg=$(<"$f")
fbody=${fmsg#*BEGIN:VBODY[[:cntrl:]]}
fbody=${fbody%[[:cntrl:]]END:VBODY*}
fdate=${fbody#*Date:}
fdate=${fdate%%[[:cntrl:]]*}
space=${fdate//?/ }
ftext=${fbody#*TEXT*:}
ftext=${ftext//$'\n'/$'\n'$space $'\t'}
echo -e "$fdate \t$ftext" >> result.txt
done
This works perfectly - although I have no multiline texts to run against it at the moment. I'm not sure how to generate some for testing - I will have to give this a go in my next lot of tests.6502coder wrote:$ cat dt3.awk
# This version handles multi-line TEXT sections
BEGIN { RS="\r\n" }
/^Date:/ { printf( "%s\t%s", substr($1,6), $2) }
/^TEXT:/ { text = sprintf( "\t%s\n", substr($0,6))
getline y
while (y != "END:VBODY")
{
text = text y "\n"
getline y
}
printf( "%s", text )
}
I ran the script against my Messages directory and came up with 3 identical files as follows (a cluster of 40 messages):some1 wrote: # if contents of SEEN = POSSIBLES = EXPECTED
# - we have no jumbling,no multilines-TEXT.
Code: Select all
/root/Message/20150702190004_SendersPhNumber.vmg
/root/Message/20150703142734_SendersPhNumber.vmg
/root/Message/20150703143608_SendersPhNumber.vmg
/root/Message/20150703145102_SendersPhNumber.vmg
/root/Message/20150703150341_SendersPhNumber.vmg
/root/Message/20150703151149_SendersPhNumber.vmg
/root/Message/20150703151302_SendersPhNumber.vmg
/root/Message/20150704033934_SendersPhNumber.vmg
/root/Message/20150704034204_SendersPhNumber.vmg
/root/Message/20150704034738_SendersPhNumber.vmg
/root/Message/20150704101034_SendersPhNumber.vmg
/root/Message/20150704102452_SendersPhNumber.vmg
/root/Message/20150704104226_SendersPhNumber.vmg
/root/Message/20150704105157_SendersPhNumber.vmg
/root/Message/20150704110739_SendersPhNumber.vmg
/root/Message/20150704111553_SendersPhNumber.vmg
/root/Message/20150704173318_SendersPhNumber.vmg
/root/Message/20150704173519_SendersPhNumber.vmg
/root/Message/20150704173618_SendersPhNumber.vmg
/root/Message/20150704173903_SendersPhNumber.vmg
/root/Message/20150704174047_SendersPhNumber.vmg
/root/Message/20150704174235_SendersPhNumber.vmg
/root/Message/20150704174715_SendersPhNumber.vmg
/root/Message/20150705113110_SendersPhNumber.vmg
/root/Message/20150705113658_SendersPhNumber.vmg
/root/Message/20150705191705_SendersPhNumber.vmg
/root/Message/20150705192812_SendersPhNumber.vmg
/root/Message/20150705193532_SendersPhNumber.vmg
/root/Message/20150705194401_SendersPhNumber.vmg
/root/Message/20150705195247_SendersPhNumber.vmg
/root/Message/20150705195554_SendersPhNumber.vmg
/root/Message/20150705195911_SendersPhNumber.vmg
/root/Message/20150705200221_SendersPhNumber.vmg
/root/Message/20150705200538_SendersPhNumber.vmg
/root/Message/20150706170304_SendersPhNumber.vmg
/root/Message/20150706170656_SendersPhNumber.vmg
/root/Message/20150706173741_SendersPhNumber.vmg
/root/Message/20150706175602_SendersPhNumber.vmg
/root/Message/20150706184930_SendersPhNumber.vmg
/root/Message/20150706222731_SendersPhNumber.vmg
Yes - if you dont see any ERRTYPE-files - and the 3 files mentioned aboveI ran the script against my Messages directory and came up with 3 identical files as follows (a cluster of 40 messages):
(Therefore "no jumbling,no multilines-TEXT" I guess?)
Thanks, I have also added your comment to the previous thread here for reference. Sounds like a handy new function.technosaurus wrote:Just thought I would mention that as of 1.25 geany (released this month) has a checkbox to allow multiline regex or otherwise uses sed-style matching.
Code: Select all
BEGIN:VMSG
VERSION:1.1
X-IRMC-STATUS:READ
X-IRMC-BOX:INBOX
BEGIN:VCARD
VERSION:2.1
N;CHARSET=UTF-8:倀愀甀氀 挀攀氀氀;;;;
FN;CHARSET=UTF-8:
TEL:+64212xxxxxx
END:VCARD
BEGIN:VENV
BEGIN:VBODY
Date:23.07.2015 19:32:59
TEXT;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:not sure on =E2=80=9Clexmark=E2=80=9D but the red one is with him and the ye=
llow one with me (for later sharing). Paul
END:VBODY
END:VENV
END:VMSG