From ddd3cad8c6802d016a48aa21e6303e8004d50955 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Zdenko=20Podobn=C3=BD?= Date: Mon, 14 Mar 2016 23:03:44 +0100 Subject: [PATCH] update ChangeLog; remove ReleaseNotes (a relevant information are in Changelog file and there is Release note wiki online) --- ChangeLog | 42 ++++++- Makefile.am | 2 +- ReleaseNotes | 323 --------------------------------------------------- 3 files changed, 41 insertions(+), 326 deletions(-) delete mode 100644 ReleaseNotes diff --git a/ChangeLog b/ChangeLog index 7a665dfc04..3fea2af8ae 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,4 +1,42 @@ -2014-02-04 v3.03 +2015-02-17 - V3.04.01 + * Added OSD renderer for psm 0. Works for single page and multi-page images. + * Improve tesstrain.sh script. + * Simplify build and run of ScrollView. + * Improved PDF output for OS X Preview utility. + * INCOMPATIBLE fix to hOCR line height information - commit 134ebc3. + * Added option to build Tesseract without Cube OCR engine (-DNO_CUBE_BUILD). + * Enable OpenMP support. + * Many bug fixes. + +2015-07-11 - V3.04.00 + * Tesseract development is now done with Git and hosted at github.com (Previously we used Subversion as a VCS and code.google.com for hosting). + * Tesseract now requires leptonica 1.71 or a higher version. + * Removed official support for VS 2008. + * Added support for 39 additional scripts/languages, including: amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat, iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya, nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd, uzb, uzb_cyrl, yid + * Major updates to training system as a result of extensive testing on 100 languages. + * New training data for over 100 languages + * Improved performance with PIC compilation option. + * Significant change to invisible font system in pdf output to improve correctness and compatibility with external programs, particularly ghostscript. + * Improved font identification. + * Major change to improve layout analysis for heavily diacritic languages: Thai, Vietnamese, Kannada, Telugu etc. + * Fixed problems with shifted baselines so recognition can recover from layout analysis errors. + * Major refactor to improve speed on difficult images, especially when running a heap checker. + * Moved params from global in page layout to tesseractclass. + * Improved single column layout analysis. + * Allow ocr output to multiple formats using tesseract command line executable. + * Fixed issues with mixed eng+ara scripts. + * Improved script consistency in numbers. + * Major refactor of control.cpp to enable line recognition. + * Added tesstrain.sh - a master training script. + * Added ability to text2image training tool to just list available fonts. + * Added ability to text2image to underline words. + * Improved efficiency of image processing for PDF output. + * Added parameter description for each parameter listed with 'print-parameters' command line option. + * Added font info to hOCR output. + * Enabled streaming input and output of multi-page documents. + * Many bug fixes. + +2014-02-04 - V3.03(rc1) * Added new training tool text2image to generate box/tif file pairs from text and truetype fonts. * Added support for PDF output with searchable text. @@ -16,7 +54,7 @@ * Many bug fixes. * More training source data included. -2012-02-01 - v3.02 +2012-02-01 - V3.02 * Moved ResultIterator/PageIterator to ccmain. * Added Right-to-left/Bidi capability in the output iterators for Hebrew/Arabic. * Added paragraph detection in layout analysis/post OCR. diff --git a/Makefile.am b/Makefile.am index 1a5f9d3df3..e328c58dcb 100644 --- a/Makefile.am +++ b/Makefile.am @@ -22,7 +22,7 @@ if !NO_CUBE_BUILD endif SUBDIRS += ccmain api . tessdata doc -EXTRA_DIST = ReleaseNotes README.md\ +EXTRA_DIST = README.md\ aclocal.m4 config configure.ac autogen.sh contrib \ tesseract.pc.in $(TRAINING_SUBDIR) java doc testing diff --git a/ReleaseNotes b/ReleaseNotes deleted file mode 100644 index b2400f6a55..0000000000 --- a/ReleaseNotes +++ /dev/null @@ -1,323 +0,0 @@ -= Tesseract release notes July 11 2015 - V3.04.01 = - * Added OSD renderer for psm 0. Works for single page and multi-page images. - * Improve tesstrain.sh script. - * Simplify build and run of ScrollView. - * Improved PDF output for OS X Preview utility. - * INCOMPATIBLE fix to hOCR line height information - commit 134ebc3. - * Added option to build Tesseract without Cube OCR engine (-DNO_CUBE_BUILD). - * Enable OpenMP support. - * Many bug fixes. - -= Tesseract release notes July 11 2015 - V3.04.00 = - * Tesseract development is now done with Git and hosted at github.com (Previously we used Subversion as a VCS and code.google.com for hosting). - * Tesseract now requires leptonica 1.71 or a higher version. - * Removed official support for VS 2008. - * Added support for 39 additional scripts/languages, including: amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat, iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya, nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd, uzb, uzb_cyrl, yid - * Major updates to training system as a result of extensive testing on 100 languages. - * New training data for over 100 languages - * Improved performance with PIC compilation option. - * Significant change to invisible font system in pdf output to improve correctness and compatibility with external programs, particularly ghostscript. - * Improved font identification. - * Major change to improve layout analysis for heavily diacritic languages: Thai, Vietnamese, Kannada, Telugu etc. - * Fixed problems with shifted baselines so recognition can recover from layout analysis errors. - * Major refactor to improve speed on difficult images, especially when running a heap checker. - * Moved params from global in page layout to tesseractclass. - * Improved single column layout analysis. - * Allow ocr output to multiple formats using tesseract command line executable. - * Fixed issues with mixed eng+ara scripts. - * Improved script consistency in numbers. - * Major refactor of control.cpp to enable line recognition. - * Added tesstrain.sh - a master training script. - * Added ability to text2image training tool to just list available fonts. - * Added ability to text2image to underline words. - * Improved efficiency of image processing for PDF output. - * Added parameter description for each parameter listed with 'print-parameters' command line option. - * Added font info to hOCR output. - * Enabled streaming input and output of multi-page documents. - * Many bug fixes. - -= Tesseract release notes Feb 4 2014 - V3.03(rc1) = - * Added OpenCL support (experimental). - * Added new training tool text2image to generate box/tif file pairs from text and truetype fonts. - * Added support for PDF output with searchable text. - * Removed entire IMAGE class and all code in image directory. - * Tesseract executable: support for output to stdout; limited support for one page images from stdin (especially on Windows) - * Added Renderer to API to allow document-level processing and output of document formats, like hOCR, PDF. - * Major refactor of word-level recognition, beam search, eliminating dead code. - * Refactored classifier to make it easier to add new ones. - * Generalized feature extractor to allow feature extraction from greyscale. - * Improved sub/superscript treatment. - * Improved baseline fit. - * Added set_unicharset_properties to training tools. - * Many bug fixes. - * More training source data included. - -= Tesseract release notes Feb 01 2012 - V3.02 = - * Added Right-to-left/Bidi capability in the output iterators for Hebrew/Arabic. - * Added paragraph detection in layout analysis/post OCR. - * Added simultaneous multi-language capability. - * Added experimental equation detector. - * Major improvements to layout analysis for better image detection, diacritic detection, better textline finding, better tabstop finding. - * Improved line detection and removal. - * Added fixed pitch chopper for CJK. - * Added word bigram correction. - * Added new uniform classifier API. - * Added new training error counter. - * More detailed changes recorded in ChangeLog. - - -= Tesseract release notes Oct 21 2011 - V3.01 = - * Thread-safety! Moved all critical globals and statics to members of the appropriate class. Tesseract is now thread-safe (multiple instances can be used in parallel in multiple threads.) with the minor exception that some control parameters are still global and affect all threads. - * Added Cube, a new recognizer for Arabic. Cube can also be used in combination with normal Tesseract for other languages with an improvement in accuracy at the cost of (much) lower speed. *There is no training module for Cube yet.* - * `OcrEngineMode` in `Init` replaces `AccuracyVSpeed` to control cube. - * Greatly improved segmentation search with consequent accuracy and speed improvements, especially for Chinese. - * Added `PageIterator` and `ResultIterator` as cleaner ways to get the full results out of Tesseract, that are not currently provided by any of the `TessBaseAPI::Get*` methods. All other methods, such as the `ETEXT_STRUCT` in particular are deprecated and will be deleted in the future. - * ApplyBoxes totally rewritten to make training easier. It can now cope with touching/overlapping training characters, and a new boxfile format allows word boxes instead of character boxes, BUT to use that you have to have already boostrapped the language with character boxes. "Cyclic dependency" on traineddata. - * Auto orientation and script detection added to page layout analysis. - * Deleted *lots* of dead code. - * Fixxht module replaced with scalable data-driven module. - * Output font characteristics accuracy improved. - * Removed the double conversion at each classification. - * Upgraded oldest structs to be classes and deprecated PBLOB. - * Removed non-deterministic baseline fit. - * Added fixed length dawgs for Chinese. - * Handling of vertical text improved. - * Handling of leader dots improved. - * Table detection greatly improved. - * Fixed a couple of memory leaks. - * Fixed font labels on output text. (Not perfect, but a lot better than before.) - * Cleanup and more bug fixes - * Special treatments for Hindi. - * Support for build in VS2010 with Microsoft Windows SDK for Windows 7 (thanks to Michael Lutz) - -Tesseract release notes Sep 30 2010 - V3.00 - * Preparations for thread safety: - * Changed TessBaseAPI methods to be non-static - * Created a class hierarchy for the directories to hold instance data, - and began moving code into the classes. - * Moved thresholding code to a separate class. - * Added major new page layout analysis module. - * Added HOCR output. - * Added Leptonica as main image I/O and handling. Currently optional, - but in future releases linking with Leptonica will be mandatory. - * Ambiguity table rewritten to allow definite replacements in place - of fix_quotes. - * Added TessdataManager to combine data files into a single file. - * Some dead code deleted. - * VC++6 no longer supported. It can't cope with the use of templates. - * Many more languages added. - * Doxygenation of most of the function header comments. - -Tesseract release notes June 30 2009 - V2.04 -Integrated patches for portability and to remove some of the -"access" macros. -Removed dependence on lua from the viewer making it a *lot* -faster. Also the viewer now compiles and works (on Linux.) -Fixed the following issues: -1, 63, 67, 71, 76, 79, 81, 82, 84, 106, 108, 111, 112, 128, 129, 130, 133, 135, -142, 143, 145, 146, 147, 153, 154, 160, 165, 169, 170, 175, 177, 187, 192, -195, 199, 201, 205, 209. -This is the last version to support VC++6! -This may also be the last version to compile without leptonica! -Windows version now outputs to stderr by default, fixing a lot of the problems with lack of visible meaningful error messages. - -Tesseract release notes April 22 2008 - V2.03 -2.02 was unrunnable, due to a last-minute "simple" change. -2.03 fixes the problem and also adds an include check for leptonica -to make it more usable. - -Tesseract release notes April 21 2008 - V2.02 -Improvements to clustering, training and classifier. -Major internationalization improvements for large-character-set -languages, eg Kannada. -Removed some compiler warnings. -Added multipage tiff support for training and running. -Updated graphics output to talk to new java-based viewer. -Added ability to save n-best lists. -Added leptonica support for more file types. -Improved Init/End to make them safe. -Reduced memory use of dictionaries. -Added some new APIs to TessBaseAPI. -Fixed namespace collisions with jpeg library (INT32). -Portability fixes for Windows for new code. -Updates to autoconf system for new code. - -Tesseract release notes August 27 2007 - V2.01 -Fixed UTF8 input problems with box file reader. -Fixed various infinite loops and crashes in dawg code. -Removed include of config_auto.h from host.h. -Added automatic wctype encoding to unicharset_extractor. -Fixed dawg table too full error. -Removed svn files from tarball. -Added new functions to tessdll. -Increased maximum utf8 string in a classification result to 8. - -Tesseract release notes July 17, 2007 - V2.00 - -First release of the International version. -This version recognizes the following languages: -English - eng -French - fra -Italian - ita -German - deu -Spanish - spa -Dutch - nld -The language codes follow ISO 639-2. The default language is English. -To recognize another language: -tesseract inputimage outputbase -l langcode - -To train on a new language, see separate documentation. -More languages will be appearing over time. - -List of changes in this release: - Converted internal character handling to UTF8. - Trained with 6 languages. - Added unicharset_extractor, wordlist2dawg. - Added boxfile creation mode. - Added UNLV regression test capability. - Fixed problems with copyright and registered symbols. - Fixed extern "C" declarations problem. - Made some improvements to consistency of accuracy across platforms. - Added vc++ express support. - -Instructions for downloading and building version 2.00. -Things have changed quite a bit since the previous versions so please read carefully. -*All users* -The tarballs are split into pieces. -tesseract-2.00.tar.gz contains all the source code. -tesseract-2.00..tar.gt contains the data files for . You need at least one of these or tesseract will not work. -tesseract-2.00.exe.tar.gz is not for the 'exe' language. It is windows executables. They are built with VC++ express and come with absolutely no warranty. If they work for you then great, otherwise get visual C++ express (and the platform sdk) and build from the source. - -*Non-windows users* -As with 1.04, this version works with make install. -*New* there is a tesseract.spec for making rpms. (Thanks to Andrew Ziem for the help.) -It might work with your OS if you know how to do that sort of thing. -If you are linking to the libraries, as with Ocropus, there is now a single master -library called libtesseract_full.a. - -*Windows users* -If you are building from the sources, there are still dsw and dsp files for vc++6 and also -sln and vcproj files for vc++ express. -The dll has been updated to allow input of non-binary images. (Thanks to Glen of Jetsoft.) - - -Tesseract release notes May 15, 2007 - V1.04. - -=== Windows users only === -Added a dll interface for windows. Thanks to Glen at Jetsoft for contributing -this. To use the dll, include tessdll.h, import tessdll.lib and put tessdll.dll -somewhere where the system can find it. There is also a small dlltest program -to test the dll. Run with: -dlltest phototest.tif phototest.txt -It will output the text from phototest.tif with bounding box information. -**New for Windows** the distribution now includes tesseract.exe and tessdll.dll -which *might* work out of the box! There are no guarantees as you need -VC++6 versions of mfc and crt (at least) for it to work. (Batteries not -included, and certainly no installshield.) - -== Important note for anyone building with make: i.e. anyone except devstudio -users == -This release includes new standardization for the data directory. To enable -Tesseract to find its data files, you must either: -./configure -make -make install -to move the data files to the standard place, or: -export TESSDATA_PREFIX="directory in which your tessdata resides/" -(or equivalent) in your .profile or whatever or setenv to set the environment -variable. Note that the directory must end in a / -HAVING tesseract and tessdata IN THE SAME DIRECTORY DOES NOT WORK ANY MORE. - -== All users == -Fixed a bunch of name collisions - mostly with stl. -Made some preliminary changes for unicode compatibility. Includes a new data -file (unicharset) and renaming of the other data files to eng.* to support -different languages. -There are also several other minor bug fixes and portability improvements -for 64 bit, the latest visual studio compiler etc. Thanks to all who have -contributed these fixes. - -NOTE: This is likely to be the last English-only release! -Apologies in advance to non-windows users for bloating the distribution with -windows executables. This will probably get fixed in the next release with -the multi-language capability, since that will also bloat the distribution. - - -Tesseract release notes Feb 2, 2007 - V1.03. -Added mftraining and cntraining. Using an image with a box file, tesseract -generates .tr output files. cntraining runs on the .tr files to make -normproto that lives in tessdata. mftraining runs on the .tr files to -make inttemp and pffmtable in tessdata. These are the main data files -that tesseract uses to recognize characters. At present, the code to make -dictionary files is not yet available, nor are any sample box files or -rebuilt inttemp or documentation to create any of these. Recognition is -still limited to the ASCII set, but when this problem is fixed, documentation -will follow. - -Added a new API with adaptive thresholding for grey and color images. -See ccmain/baseapi.h/cpp for details. The main program has been converted -to use the API as an example. See main() in ccmain/tesseractmain.cpp for -details. The API is designed to make it easy to add subclasses with ability -to output the bounding boxes etc from the internal structures. The adaptive -thresholding improves accuracy (most of the time) on non-binary images. - -Many memory leaks have been fixed. There are no known leaks left from using -the API correctly. - -The adaptive classifier was not operating correctly. This bug, and several -others have been fixed, including poor chopping, an indefinite (if not quite -infinite) loop in the number parser, and a couple of crash bugs. Thanks to -all that have contributed bugs and bug fixes. - -It is now possible to build without any of the graphics support to save code -size using #define GRAPHICS_DISABLED. There is also a new EMBEDDED define -for use on operating systems with limited library support. - -64-bit and Mac OSX buildability is now included in the mainline source tree. -Thanks to all that have contributed patches and comments to help with that. -1.03 is also endian-independent, apart from the tiff i/o, so if you use -libtiff, the code should run on all platforms, even if you get/create new -data files of a different endinanness. - -Some of the bug fixes improve accuracy, and so do some of the changes to -DangAmbigs and user-words. - -Tesseract release notes, Oct 4 2006 - V1.02. -Removed dependency on aspirin. *All* code is now licensed under Apache2.0. - -Tesseract release notes, Sep 7 2006 - V1.01. - -Fixes for this release: -Added mfcpch.cpp and getopt.cpp for VC++. -Fixed problem with greyscale images and no libtiff. -Stopped debug window from being used for the usage output. -Fixed load of inttemp for big-endian architectures. -Fixed some Mac compilation issues. - -This version should read uncompressed 8 bit grey and 24 bit color tiffs -without having to have libtiff. It does a dumb threshold though, so don't -expect good results from poor contrast or images of natural scenes etc. - -If you just run tesseract with no command line args you should now get a -sensible usage message on linux, with or without X-windows. - -If you can get it to compile on a PPC Mac, it may now run correctly, -although not all the build issues are fixed yet. - -Building Tesseract: -Windows: -Unpack the tar.gz archive -Open tesseract.dsw in DevStudio (preferably version 6, higher versions will be more difficult) -Set Win32 - Release as the active configuration. -Build. -Copy tesseract.exe from bin.rel up one directory level. -Run tesseract phototest.tif phototest -This will create phototest.txt. - -Linux: -Unpack the tar.gz archive -./configure -make -Copy tesseract from ccmain up one directory level (or create a symbolic link) -Run tesseract phototest.tif phototest -This will create phototest.txt.