Chatterbox - STT / TTS / TTA project. Part 2
Yes, I think the next step involves extending the functionality. I do have some issues I want to look at more closely before I continue - for example I found that if I manually cleared the chatdump.txt file it stopped sphinx from doing any further updating to that file so I want to resolve why that is.
I also want to fine tune the code so that the script that reads the extracted command also clears the extracted_command.txt file (or "voice_prompt" or whatever we want to call it...) ready for the next extraction. (I think that step is pretty easy using sed)
One of the things on my list is to flesh out the first few posts in each of these chatterbox threads so that the important info is viewable without searching too far.
I also want to find ways to improve the integrity of word / phrase recognition so that it is possible to offer this as a pet which is reliable enough to make it pretty easy for new users to set up their own preferred command set/function at boot time, even if it is only for a single function.
In terms of what to do next I am keen to keep a similar informal format as this simple project but do two main things:
1) Produce a number of vocab files that are tailored for more reliable word recognition and with a word set that is appropriate to various specific functions (eg: a post-boot/main menu command set and/or vocab list, a browsing-specific set, and maybe a dictation set. Probably also a FileManager set.
2) I want to create scripts that allow a more interactive and multilevelled menu./protocol system eg: after boot I feel the computer should ask something like the following:
"Please choose between Music, Browsing, File Manager or Puppy menu" Once the user chooses their preference the main menu would hand control to the next menu and so on.
I have a few experiments in mind to test out what is possible so I hope to launch into those in the coming days. Feel free to suggest your own suggestions or preferences. The more thoughts the better...
EDIT: decided to start part 4 here:
http://murga-linux.com/puppy/viewtopic.php?t=89360
I also want to fine tune the code so that the script that reads the extracted command also clears the extracted_command.txt file (or "voice_prompt" or whatever we want to call it...) ready for the next extraction. (I think that step is pretty easy using sed)
One of the things on my list is to flesh out the first few posts in each of these chatterbox threads so that the important info is viewable without searching too far.
I also want to find ways to improve the integrity of word / phrase recognition so that it is possible to offer this as a pet which is reliable enough to make it pretty easy for new users to set up their own preferred command set/function at boot time, even if it is only for a single function.
In terms of what to do next I am keen to keep a similar informal format as this simple project but do two main things:
1) Produce a number of vocab files that are tailored for more reliable word recognition and with a word set that is appropriate to various specific functions (eg: a post-boot/main menu command set and/or vocab list, a browsing-specific set, and maybe a dictation set. Probably also a FileManager set.
2) I want to create scripts that allow a more interactive and multilevelled menu./protocol system eg: after boot I feel the computer should ask something like the following:
"Please choose between Music, Browsing, File Manager or Puppy menu" Once the user chooses their preference the main menu would hand control to the next menu and so on.
I have a few experiments in mind to test out what is possible so I hope to launch into those in the coming days. Feel free to suggest your own suggestions or preferences. The more thoughts the better...
EDIT: decided to start part 4 here:
http://murga-linux.com/puppy/viewtopic.php?t=89360
- technosaurus
- Posts: 4853
- Joined: Mon 19 May 2008, 01:24
- Location: Blue Springs, MO
- Contact:
My sound doesn't work in Linux on 1 computer without a lot of manual setup and the other has a really crappy mic, but I think I can program it blind.
To make it a bit trekky, I will call my generic command "computer" so that it only "does stuff" when you begin your sentence with "Computer ..."
for the text2speech try one of these:
http://www.murga-linux.com/puppy/viewto ... 601#573601
To make it a bit trekky, I will call my generic command "computer" so that it only "does stuff" when you begin your sentence with "Computer ..."
Code: Select all
computer(){
case $1 in
open)shift; which $1 && $@ || text2speech "I can't find that program.";;
disregard)exit;;
*)text2speech "I can't handle the $@ command yet.";;
esac
}
pocketsphinx_continuous $SOMERANDOMOPTIONS |while read ROW COMMAND ARGS; do
case "$ROW$COMMAND" in
[0-9]*:computer)$COMMAND $ARGS;;
[0-9]*:dictate)[ "$DICTATE" ] && DICTATE="" || DICTATE=true ;;
[0-9]*:*)[ "$DICTATE" ] && echo $COMMAND $ARGS >>$HOME/dictations
esac
done
http://www.murga-linux.com/puppy/viewto ... 601#573601
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].
"Puppy empty trash"...
...makes me think of this scene from Family Guy --> http://www.youtube.com/watch?v=17K6izfGMn0
LOL.
...makes me think of this scene from Family Guy --> http://www.youtube.com/watch?v=17K6izfGMn0
LOL.
the only problem i have with that is that "puppy" sounds like all sorts of other words ending with the long E sound, and that could lead to problems.what about puppy in place of computer...
the star trek computer is a good model to follow...and it has the advantage of being a four syllable word ending in a short R (uncommon) versus a two syllable word ending with the long E (very common). i see why you would suggest it though ted dog.
there are star trek computer sounds here...
http://www.starbase51.co.uk/starbase51/wav/wav.asp
andd technosaurus...we are using espeak, not text2speech. its already part of the package...
and a question....does the code you have there do this...
Code: Select all
#!/bin/bash
#This is just a "proof_of_concept" to show that the user can provide verbal feedback to control an action
# Establish loop
condition_to_check="False"
while [[ ${condition_to_check} == "False" ]]; do
#allow time after boot:
sleep 5
#Ask the question:
espeak -f /root/Qplay.txt &
#Allow time for user to reply
sleep 7
#play a noise to indicate the user is finifhed recording
/usr/share/chatterbox/sounds/c811.wav
#Use sed to extract last 3 lines of chatdump.txt, pipe the result to awk which extracts the single word command
#and writes it to sed2awk_extract_command.txt
sed -e :a -e '$q;N;4,$D;ba' /root/chatdump.txt | awk '/^0000/ { print $2 }' > /root/extracted_command.txt
#use sed to extract the command word from the sed2awk_extract_command.txt file
#and call it the "command" variable
command=$(sed '$!d' /root/extracted_command.txt)
#Test if the command word equals the word we want to hear
#if [ $command=yes ]
if test "$command" = "computer"
then
condition_to_check="True"
#If there is a match then make a noise to confirm:
mplayer /usr/share/chatterbox/sounds/c810.wav &
# espeak -f /root/Music.txt &
# delete the contents of the two text files
sed '/-Start/,/-End/d' /root/extracted_command.txt &
sed '/-Start/,/-End/d' /root/chatdump.txt &
# run the menu program
# ARGUEMENT to run menu program MISSING HERE!!
else
condition_to_check="False"
# echo "Failed to process chat_command."
espeak -f /root/CommandFail.txt &
fi
I also added the 'computer' sounds from the site above and put them in /usr/share/chatterbox/sounds
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson
- technosaurus
- Posts: 4853
- Joined: Mon 19 May 2008, 01:24
- Location: Blue Springs, MO
- Contact:
I meant for text2speech to be shell function wrapper like the ones in my link. The ultralight espeak version I built only uses standard puppy libs (no portaudio, ...), so the wav output option can be sent to stdout and piped through aplay (I like the unix philosophy)H4LF82 wrote:andd technosaurus...we are using espeak, not text2speech. its already part of the package...
and a question....does the code you have there do this...
....
..i think we were doing the same thin at the same time and came up with 2 different ways to do it i was going to add a second script for the menu of options beyond the word computer...
I also added the 'computer' sounds from the site above and put them in /usr/share/chatterbox/sounds
There are quite a few other examples in that post, reading html docs by stripping the tag, getting text from the clipboard (it gets filled every time you highlight something, so can be annoying unless you _need_ it) and a few more.
btw, I wonder if espeak's -f option would work like echo "my text" |espeak -f /dev/stdin
I'm sure my code is duplicated effort, but all the code I was seeing was becoming overly complex.
I bet it wouldn't be too difficult to use my .desktop file parsing code from jwm_tools (its in jwm_menu_create) to create a voice menu... and probably parse the PuppyPin or combine with wget to google stuff
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].
...ooooh. i see now!...I meant for text2speech to be shell function wrapper like the ones in my link. The ultralight espeak version I built only uses standard puppy libs (no portaudio, ...), so the wav output option can be sent to stdout and piped through aplay (I like the unix philosophy)
...i dont know, but it says...I wonder if espeak's -f option would work like echo "my text" |espeak -f /dev/stdin
in the helpfile, so id think it would.If neither -f nor --stdin, then <words> are spoken, or if none then text
is spoken from stdin, each line separately.
im still having trouble following along. were all going in the same direction tho i think...I'm sure my code is duplicated effort, but all the code I was seeing was becoming overly complex.
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson
Hi technosaurus, I have just tried "speak" and wondered if the problem I experienced is normal - I have a text file called /root/Qplay.txt and it contains the following sentence:technosaurus wrote:for the text2speech try one of these:
http://www.murga-linux.com/puppy/viewto ... 601#573601
"Welcome to puppy. Please say the word Music if you want me to play music"
If I use the syntax:
Code: Select all
speak_files /root/Qplay.txt
Code: Select all
speak /root/Qplay.txt
If I then use the following syntax:
Code: Select all
speak -w /root/Qplay.txt
Is that what you would expect?
- technosaurus
- Posts: 4853
- Joined: Mon 19 May 2008, 01:24
- Location: Blue Springs, MO
- Contact:
IIRC the -w flag indicates the name for the output wav file.greengeek wrote:However, if I use the following syntax:I get an error message telling me to use the -w option "because the program was built without a sound interface"Code: Select all
speak /root/Qplay.txt
If I then use the following syntax:the txt file gets emptied and has no contents.Code: Select all
speak -w /root/Qplay.txt
Is that what you would expect?
The reason speak_* work differently is that I wrote my own puppy helper scripts to use stdout as the output file and piped them through aplay. You can still use speak -w /root/Qplay.wav -f /root/Qplay.txt && aplay /root/Qplay.wav, but it will take unecessary disk space and have additional delay compared to using stdout.
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].
Hi H4LF82, does this addition mean that the "TEST FOR KEYWORD" is now going on continuously, or have I misunderstood.? (The one most important step I want to achieve at the moment is to get the keyword testing running continuously rather than just a single test/single action event)H4LF82 wrote:Code: Select all
#!/bin/bash #This is just a "proof_of_concept" to show that the user can provide verbal feedback to control an action # Establish loop condition_to_check="False" while [[ ${condition_to_check} == "False" ]]; do
"mplayer" has been inadvertently left off here right? Or are you using a different function somehow?sleep 7
#play a noise to indicate the user is finifhed recording
/usr/share/chatterbox/sounds/c811.wav
Nice touch. Have you been testing this script live or are you still in process of writing? I'm keen to know if the programmatic clearing of the file works correctly. When I manually clear the chatdump.txt I usually seem to get the outcome that sphinx stops writing to the file...# delete the contents of the two text files
sed '/-Start/,/-End/d' /root/extracted_command.txt &
sed '/-Start/,/-End/d' /root/chatdump.txt &
We have a chatterbox directory in /usr/share? Oooooh, that sounds great! Almost like a REAL program now...I also added the 'computer' sounds from the site above and put them in /usr/share/chatterbox/sounds
Just a heads-up for anyone else using "speak" then - don't do what I did and launch into reading your text file with the 'speak -w' syntax - in my case this WROTE to the textfile and I lost the contents. A minor problem in this case, but a different matter if it was an eBook...technosaurus wrote:IIRC the -w flag indicates the name for the output wav file.
Use 'speak_files" to do the reading.
Hi technosaurus - are you able to explain to my untrained brain a bit about what this is doing please? Is this MONITORING for the output from sphinx, or is this about PROCESSING the previously detected output? or maybe both?technosaurus wrote:To make it a bit trekky, I will call my generic command "computer" so that it only "does stuff" when you begin your sentence with "Computer ..."
Code: Select all
computer(){ case $1 in open)shift; which $1 && $@ || text2speech "I can't find that program.";; disregard)exit;; *)text2speech "I can't handle the $@ command yet.";; esac } pocketsphinx_continuous $SOMERANDOMOPTIONS |while read ROW COMMAND ARGS; do case "$ROW$COMMAND" in [0-9]*:computer)$COMMAND $ARGS;; [0-9]*:dictate)[ "$DICTATE" ] && DICTATE="" || DICTATE=true ;; [0-9]*:*)[ "$DICTATE" ] && echo $COMMAND $ARGS >>$HOME/dictations esac done
(I'm still struggling with trying to get continuous sampling of the sphinx output...)
yes. its wrapped up in a loop checking for condition-to-check to equal true; provided i dont include any syntax errors ... :/H4LF82 wrote:
Code:
#!/bin/bash
#This is just a "proof_of_concept" to show that the user can provide verbal feedback to control an action
# Establish loop
condition_to_check="False"
while [[ ${condition_to_check} == "False" ]]; do
Hi H4LF82, does this addition mean that the "TEST FOR KEYWORD" is now going on continuously, or have I misunderstood.? (The one most important step I want to achieve at the moment is to get the keyword testing running continuously rather than just a single test/single action event)
not a different function. THIS IS MY PROBLEM. THIS is why I cannot program....its not that i cant program. i cannot see...so my code is chock full of errors and hangs up in the stupidest mistakes. im glad i gave you this code example now--it illustrates my point perfectly.Quote:
sleep 7
#play a noise to indicate the user is finifhed recording
/usr/share/chatterbox/sounds/c811.wav
"mplayer" has been inadvertently left off here right? Or are you using a different function somehow?
you SHOULD not have to manually clear it now....but agaain i add the caviat that i cannot see, and i can guarantee there are errors in my code. triple checkmy code....Quote:
# delete the contents of the two text files
sed '/-Start/,/-End/d' /root/extracted_command.txt &
sed '/-Start/,/-End/d' /root/chatdump.txt &
Nice touch. Have you been testing this script live or are you still in process of writing? I'm keen to know if the programmatic clearing of the file works correctly. When I manually clear the chatdump.txt I usually seem to get the outcome that sphinx stops writing to the file...
im glaad you like it tho
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson
...yeah, that was getting necessary. i have a sscript running somewhere in all of this tat is filling my root folder with blank directories every hour...ive rebooted from the live cd and started a new savefile just for this project, and since it now has its own sfs, it might as well be structured correctly too.Quote:
I also added the 'computer' sounds from the site above and put them in /usr/share/chatterbox/sounds
We have a chatterbox directory in /usr/share? Oooooh, that sounds great! Almost like a REAL program now... Smile
/ussr/share/chatterbox/ is now the directory for it, if there are no objections?
cheers!
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson
- technosaurus
- Posts: 4853
- Joined: Mon 19 May 2008, 01:24
- Location: Blue Springs, MO
- Contact:
If you take a look at my little example, it uses no disk space unless "dictate" is toggled and then it only writes to a single file in the user's $HOME directory. *nix OSs (including puppy linux) can operate on streams, so unless you are planning to use the output data from pocketsphinx_continuous for analysis to maybe patch the source there is really no need to use a temporary file(s).H4LF82 wrote:...yeah, that was getting necessary. i have a sscript running somewhere in all of this tat is filling my root folder with blank directories every hour...ive rebooted from the live cd and started a new savefile just for this project, and since it now has its own sfs, it might as well be structured correctly too.
With that being said, I realize pocketsphinx_continuous has a lot of superfluous options, but without a decent sound system it is difficult for me to separate the wheat from the chaff. If anyone cares to take note of what command line args and output strings are of limited value, I'd be willing to thresh them out of the source code. If we are always needing to set an arg to a certain value, I can hard code it, if an arg is never used I can remove it and if the output would be better in a different format, that can be done (for example using a time-since-epoch style integer time stamp instead of 0000000001: ....)
in shell that would be date +%s
or in C
struct timeval tp;
gettimeofday(&tp);
int seconds = tp.tv_sec
to convert them to a date string in shell
date -d @1382162295 <options_here>
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].
I certainly have no objections. I am a little concious that chatterbox may end up being a messy collection of poorly coded (yet hopefully functional) scripts that represent our attempts to achieve our various goals...H4LF82 wrote:/ussr/share/chatterbox/ is now the directory for it, if there are no objections?
cheers!
But then, if that happens, there is nothing to prevent a better coder improving things and maybe in the end chatterbox just becomes a testing ground that makes way for a more professional effort which could have a better name (VoiceBox maybe...). What do you think?
I'm kind of enjoying being able to throw my 'chatterbox' ideas into the ring and learning some basics of scripting but I don't want to be blamed for filling the puppy coffers with bad code
That's excellent - I felt bad about using the temp file. Seemed a bit clunky. At least it helped me get to first base though...technosaurus wrote:so unless you are planning to use the output data from pocketsphinx_continuous for analysis to maybe patch the source there is really no need to use a temporary file(s).
...oh, please dont misunderstand. I get it! im not complaining. i expect that this will end up as a Frankenstein of code and be as cringe-worthy as it gets to the trained eye...and i dont care. im as happy as a pig in filth if the code is sloppy and im prepared to create new sfs files a thousand times over if thats what it takes.If you take a look at my little example, it uses no disk space unless "dictate" is toggled and then it only writes to a single file in the user's $HOME directory.
and im happy for the testing folder to contain a million empty directories. just not my root folder. that folder is cluttered enough and i have a terrible time navigating folders now as it is. buried in /usr/share/chatterbox is a good place for testing files IMHO...thats all i was saying.
forgive me if it sounded like i was wingeing!
and i agree that txt files are clunky. it was my suggestion, and i suggested it because it gives me a physical place to put the stdout without having to use a console where i can physically SEE it the moment it gets created. by all means, remove the text file and use the stdout ...someone with a console who trusts their eyes, please!
Cheers!
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson