PuppyOCR now with improved GUI interface
PuppyOCR now with improved GUI interface
Another improved version of PuppyOCR, bumped to V1.21
Size: 2.2MB
Get the English only version here:
http://www.datafilehost.com/download-1d4a03af.html
edit: Get the German/English version here:
Size:2.6MB:
http://www.datafilehost.com/download-4f3b0de4.html
This is slightly larger than the English-only version, but the trade-off is OK since this version should cover both languages.
Improvements:
Load document into scanner, press the red SCAN button. The document is then scanned OCR'd and then displayed in the Geany editor - all with one click.
Progress of job requests is properly displayed in the message window.
Scan and OCR files can be renamed using the user input boxes.
A file chooser dialogue can be accessed via the new FILE button.
Button tool tips are provided.
edit:
If you have installed PuppyOCR in a full boot from USB stick installation you have no /root folder. Therefore you should load and save your input/output files to /mnt/home/a.* instead.
Size: 2.2MB
Get the English only version here:
http://www.datafilehost.com/download-1d4a03af.html
edit: Get the German/English version here:
Size:2.6MB:
http://www.datafilehost.com/download-4f3b0de4.html
This is slightly larger than the English-only version, but the trade-off is OK since this version should cover both languages.
Improvements:
Load document into scanner, press the red SCAN button. The document is then scanned OCR'd and then displayed in the Geany editor - all with one click.
Progress of job requests is properly displayed in the message window.
Scan and OCR files can be renamed using the user input boxes.
A file chooser dialogue can be accessed via the new FILE button.
Button tool tips are provided.
edit:
If you have installed PuppyOCR in a full boot from USB stick installation you have no /root folder. Therefore you should load and save your input/output files to /mnt/home/a.* instead.
Last edited by tronkel on Tue 11 Oct 2011, 12:22, edited 1 time in total.
Life is too short to spend it in front of a computer
I downloaded version 1.21 and tried to scan in a page from some crochet instructions.
I then cropped the tif file to just the text.
What I got a a result was far from being OCR.
So the question is it limited to fonts it recognizes as well as the type size of the fonts displayed in the tif file?
I was expecting at least some of the text in the tif file to be recognized.
Also, the English only version had all the languages listed in the desktop file.
Or am I missing a setup option to choose the language to look for for the conversion to text.
I then cropped the tif file to just the text.
What I got a a result was far from being OCR.
So the question is it limited to fonts it recognizes as well as the type size of the fonts displayed in the tif file?
I was expecting at least some of the text in the tif file to be recognized.
Also, the English only version had all the languages listed in the desktop file.
Or am I missing a setup option to choose the language to look for for the conversion to text.
Hi 8-bit
Thanks for your feedback.
The a.tif that you attached did not scan for me as it stood either - but the reason for that was quite simple.
This tif contains text character images that are too small for the Tesseract engine to recognise and compare.
To solve this, open your tif in say, MTPAINT, go to menu item Image->scale and increase the image width to something over say, 1000p. Let it scale the height automatically to preserve the aspect ratio. It won't look very good on-screen at this size, but that's OK. Save the new image under a different name and then OCR it using PuppyOCR. The result for me had a few artifacts present and was therefore not 100% correct - probably because of the rather poorly rendered fonts in this specific example, but it nevertheless did OCR. The txt file is attached with a pet extension so that the forum will allow it to be uploaded - it is actually a txt file though.
Ideally also, an image that is to be OCR'd for its text content should consist of an unbroken block of text with no randomly included graphics in multiple columns. The use of graphics software such as MTPAINT or the GIMP etc. is often required to crop/format/resize the image prior to being OCR'd.
Languages?
PuppyOCR at the moment is available for 2 languages, English and German - see higher up in this thread for that.
The Tesseract OCR engine (V2.04 in this case) can handle 5 main European languages provided that the relevant language databases are included. It would be possible to package these 5 as individual dotpets if required. I didn't think to include them all in the PuppyOCR pet itself because of size constraints for Puppy.
Tesseract can be trained to recognise any language in theory, over and above these main 5. This I have yet to play with.
The languages mentioned in the *.desktop file are completely irrelevant to your question, so you can forget about that.
Thanks again for the question.
Thanks for your feedback.
The a.tif that you attached did not scan for me as it stood either - but the reason for that was quite simple.
This tif contains text character images that are too small for the Tesseract engine to recognise and compare.
To solve this, open your tif in say, MTPAINT, go to menu item Image->scale and increase the image width to something over say, 1000p. Let it scale the height automatically to preserve the aspect ratio. It won't look very good on-screen at this size, but that's OK. Save the new image under a different name and then OCR it using PuppyOCR. The result for me had a few artifacts present and was therefore not 100% correct - probably because of the rather poorly rendered fonts in this specific example, but it nevertheless did OCR. The txt file is attached with a pet extension so that the forum will allow it to be uploaded - it is actually a txt file though.
Ideally also, an image that is to be OCR'd for its text content should consist of an unbroken block of text with no randomly included graphics in multiple columns. The use of graphics software such as MTPAINT or the GIMP etc. is often required to crop/format/resize the image prior to being OCR'd.
Languages?
PuppyOCR at the moment is available for 2 languages, English and German - see higher up in this thread for that.
The Tesseract OCR engine (V2.04 in this case) can handle 5 main European languages provided that the relevant language databases are included. It would be possible to package these 5 as individual dotpets if required. I didn't think to include them all in the PuppyOCR pet itself because of size constraints for Puppy.
Tesseract can be trained to recognise any language in theory, over and above these main 5. This I have yet to play with.
The languages mentioned in the *.desktop file are completely irrelevant to your question, so you can forget about that.
Thanks again for the question.
- Attachments
-
- a1txt.pet
- (545 Bytes) Downloaded 521 times
Life is too short to spend it in front of a computer
I will have to wait to get back into Lucid 520 as I am currently in 01micko's Slacko and he did not compile MPaint with tif support.
In Lucid 520, MPaint has tif support.
But anyway, thank you for the suggestion of blowing up the text size for recognition.
Although, that also means that a direct OCR of a scanned document will most likely not work as most scanned documents do not have large text. That is unless the scan is done at a high DPI setting.
Does the scan by your GUI set DPI for the scan?
In Lucid 520, MPaint has tif support.
But anyway, thank you for the suggestion of blowing up the text size for recognition.
Although, that also means that a direct OCR of a scanned document will most likely not work as most scanned documents do not have large text. That is unless the scan is done at a high DPI setting.
Does the scan by your GUI set DPI for the scan?
Because puppyOCR has been primarily concieved as an OCRing program, scanner configuration as such has been omitted. It could be included however, but would involve something of a re-write of the code.
Scanners can normally be configured via their panel LCD display, assuming the driver software has this capabilty included - e.g. In scanners/multifunction devices manufactured by Brother there is an extra proprietary package that can be installed called "Scan-Key" that enables this feature.
The XSANE utility included with puppy has this feature included in the interface, but I have always simply used the default settings which for OCRing has always worked for me. Whether or not the scan resolution settings actually work for a particular piece of hardware is hard to say. Simply try it and see. I wonder if the code-base of XSANE is being currently maintained. Its version numbers hardly alter, if at all. When I get a minute I'll get hold of the XSANE source and try to recompile it with TIFF support enabled
Also on the subject of XSANE, I get an error message appearing if I try to save to TIFF format - so have to save in PNM format first and then and convert using a graphics interchange program.
The XSANE settings dialogue contain an option to configure the default OCRing software, so you could set it to point to Puppy OCR instead of the default GOCR which I believe is not enabled in the version of XSANE that gets included with modern Puppy versions. GOCR would probably require the inclusion of big dependencies for it to work - a no-no for vanilla Puppy versions.
Scanners can normally be configured via their panel LCD display, assuming the driver software has this capabilty included - e.g. In scanners/multifunction devices manufactured by Brother there is an extra proprietary package that can be installed called "Scan-Key" that enables this feature.
The XSANE utility included with puppy has this feature included in the interface, but I have always simply used the default settings which for OCRing has always worked for me. Whether or not the scan resolution settings actually work for a particular piece of hardware is hard to say. Simply try it and see. I wonder if the code-base of XSANE is being currently maintained. Its version numbers hardly alter, if at all. When I get a minute I'll get hold of the XSANE source and try to recompile it with TIFF support enabled
Also on the subject of XSANE, I get an error message appearing if I try to save to TIFF format - so have to save in PNM format first and then and convert using a graphics interchange program.
The XSANE settings dialogue contain an option to configure the default OCRing software, so you could set it to point to Puppy OCR instead of the default GOCR which I believe is not enabled in the version of XSANE that gets included with modern Puppy versions. GOCR would probably require the inclusion of big dependencies for it to work - a no-no for vanilla Puppy versions.
Life is too short to spend it in front of a computer
Puppyocr reaches the more advanced step,
Something to know : PuppyOcr does the job in Puppy...
At Pelo's Home... Others promised, but not get the result..
PuppyOcr in not perfect, sometimes your ask if you don't have better to type the text... But Puppyocr reaches the more advanced step, compared to others (in Puppy)
At Pelo's Home... Others promised, but not get the result..
PuppyOcr in not perfect, sometimes your ask if you don't have better to type the text... But Puppyocr reaches the more advanced step, compared to others (in Puppy)
My gramps in Vendée were terrorits !
Hum somebody who practice ! Greengeek, i share my real feedback.
Contrary to what i believed, larger size give results less good !
The document were typed around 1850. in french there are lot of éàè% , Puppy Ocr does not like it
But using a spell checker will find the words at 95%.
When you copy a book, edges re often garbled. It's worth to spend time getting a good picture.
At then you correct by hand wrong words. It's fun i mainly work on old judgments of royalists terrorists by the 'tribunal revolutionary' around 1794
Bla bla bla : (however important, but not technical) !
My Pelo's ancesters in Vendée were terrorists ! 300.000 were killed, burn, sunk,
Contrary to what i believed, larger size give results less good !
The document were typed around 1850. in french there are lot of éàè% , Puppy Ocr does not like it
But using a spell checker will find the words at 95%.
When you copy a book, edges re often garbled. It's worth to spend time getting a good picture.
At then you correct by hand wrong words. It's fun i mainly work on old judgments of royalists terrorists by the 'tribunal revolutionary' around 1794
Bla bla bla : (however important, but not technical) !
My Pelo's ancesters in Vendée were terrorists ! 300.000 were killed, burn, sunk,
DPI i don't remember. police size must be at least 14
DPI i don't remember. Background color should not be a reason for getting bad results. I had a topic about that. somewhere. these things are an old hobby. I come back.
2013 France forum
La taille de la police doit être de 14 mini , autrement Puppyocr est bigleux.
Votre texte est à l'écran
1 faites une copie décran, que vous enregistrez en tiff Printscreen to tiff format
2 ouvrez avec MTpaint cette image et zoomer là pour arriver à une taille de 14 Mtpaint to zoom if necessary
3 prenez une photo d'écran à nouveau (en .tiff bien sûr) que vous enregistrez en /root
4 lancez PuppyOCR en renseignant le nom du fichier, extension tif comprise.
5 donner un nom au fichier de sortie.
2013 France forum
La taille de la police doit être de 14 mini , autrement Puppyocr est bigleux.
Votre texte est à l'écran
1 faites une copie décran, que vous enregistrez en tiff Printscreen to tiff format
2 ouvrez avec MTpaint cette image et zoomer là pour arriver à une taille de 14 Mtpaint to zoom if necessary
3 prenez une photo d'écran à nouveau (en .tiff bien sûr) que vous enregistrez en /root
4 lancez PuppyOCR en renseignant le nom du fichier, extension tif comprise.
5 donner un nom au fichier de sortie.
- Attachments
-
- puppyocr.jpg
- take a glance at documents look like
- (103.91 KiB) Downloaded 413 times
Preprocessing scanned pages
A good program for preprocessing scanned pages - prior to feeding them to OCR, and/or making a PDF or DJVU out of them:
http://scantailor.org/
http://scantailor.org/
Scan Tailor
"Scan Tailor processed books can be found on Google Books and the Internet Archive. Provided here are some examples to show you what Scan Tailor is capable of."
The latest version is 0.9.11.1, and was released on February 27, 2012.
We provide Windows binaries and the source code to build both Windows and GNU/Linux versions.
It will be a long time to get a pet for it.
The latest version is 0.9.11.1, and was released on February 27, 2012.
We provide Windows binaries and the source code to build both Windows and GNU/Linux versions.
It will be a long time to get a pet for it.
Re: Scan Tailor
Hi Pelo, I have been using Scantailor on a Slacko 5.6 derivative for about 6 months. It is really quite good, although I have found that my Scantailor pet is incompatible with Librecad. There is some conflict between Qt config files if i have them both installed at the same time.Pelo wrote:"Scan Tailor ....
It will be a long time to get a pet for it.
I load the pet when I need it, then uninstall it afterwards, (I use Librecad much more than I use scantailor).
If you want to test it my scantailor pet is here:
http://www.mediafire.com/download/x0e7w ... .9.7.2.pet
I have not tested it on any other pups.
I recommend testing it first with your savefile hidden (ie: live boot from CD) if you use Librecad or have other QT apps installed. Take a backup of your savefile first (same as any other pet testing!)
.