Capture2Text enables users to quickly OCR a portion of the screen using a keyboard shortcut. The resulting text will be saved to the clipboard by default.
Conceptual illustration:
The latest version can be found on the Capture2Text download page hosted by SourceForge. Source code is included.
Capture2Text can OCR the following languages:
| Afrikaans (afr) | Greek (ell) | Odiya (ori) |
| Albanian (sqi) | Gujarati (guj) | Panjabi (pan) |
| Amharic (amh) | Haitian (hat) | Persian (fas) |
| Ancient Greek (grc) | Hebrew (heb) | Polish (pol) |
| Arabic (ara) | Hindi (hin) | Portuguese (por) |
| Assamese (asm) | Hungarian (hun) | Pushto (pus) |
| Azerbaijani (aze) | Icelandic (isl) | Romanian (ron) |
| Basque (eus) | Indic (inc) | Russian (rus) |
| Belarusian (bel) | Indonesian (ind) | Sanskrit (san) |
| Bengali (ben) | Inuktitut (iku) | Serbian (srp) |
| Bosnian (bos) | Irish (gle) | Sinhala (sin) |
| Bulgarian (bul) | Italian (ita) | Slovak (slk) |
| Burmese (mya) | Japanese (jpn) | Slovenian (slv) |
| Catalan (cat) | Javanese (jav) | Spanish (spa) |
| Cebuano (ceb) | Kannada (kan) | Swahili (swa) |
| Central Khmer (khm) | Kazakh (kaz) | Swedish (swe) |
| Cherokee (chr) | Kirghiz (kir) | Syriac (syr) |
| Chinese - Simplified (chi_sim) | Korean (kor) | Tagalog (tgl) |
| Chinese - Traditional (chi_tra) | Kurukh (kru) | Tajik (tgk) |
| Croatian (hrv) | Lao (lao) | Tamil (tam) |
| Czech (ces) | Latin (lat) | Telugu (tel) |
| Danish (dan) | Latvian (lav) | Thai (tha) |
| Dutch (nld) | Lithuanian (lit) | Tibetan (bod) |
| Dzongkha (dzo) | Macedonian (mkd) | Tigrinya (tir) |
| English (eng) | Malay (msa) | Turkish (tur) |
| Esperanto (epo) | Malayalam (mal) | Uighur (uig) |
| Estonian (est) | Maltese (mlt) | Ukrainian (ukr) |
| Finnish (fin) | Marathi (mar) | Urdu (urd) |
| Frankish (frk) | Math/Equations (equ) | Uzbek (uzb) |
| French (fra) | Middle English (1100-1500) (enm) | Vietnamese (vie) |
| Galician (glg) | Middle French (1400-1600) (frm) | Welsh (cym) |
| Georgian (kat) | Nepali (nep) | Yiddish (yid) |
| German (deu) | Norwegian (nor) |
By default only English, French, German, Italian, Japanese and Spanish are installed.
How to install additional languages:
How to OCR:
To cancel an OCR capture, press ESC.
To move the entire OCR capture box, hold down the right mouse button and drag.
To nudge the OCR capture box, use the arrow keys.
To toggle the active OCR capture box corner, press the space bar.
To change the OCR language, right-click the Capture2Text tray icon, select the OCR Language option and then select the desired language.
To quickly switch between 3 languages, use the OCR language quick access keys: Windows Key + 1, Windows Key + 2, and Windows Key + 3.
When Chinese or Japanese is selected, you should specify the text direction (vertical/horizontal/auto) using the text direction key: Windows Key + W. If auto is selected, horizontal will be used when the capture width is more than twice the height, otherwise vertical will be used. The text direction also affects how furigana is stripped from Japanese text.
(For Japanese) When OCR pre-processing is enabled, by default, Capture2Text will attempt to strip out furigana. You may disable this behavior in "Preferences... -> OCR -> Strip Furigana".
Using the Preferences dialog, you can change the following OCR settings:
By default, the OCR'd text will be placed on the clipboard.
You also have 3 more ways to output the text.
To send the text to a pop-up window you can right-click the Capture2Text tray icon and select Show Popup Window.
To send the text to whichever textbox currently contains the blinking cursor/I-beam, right-click the Capture2Text tray icon and select Send to Cursor.
(Advanced) To send the text directly to a window/control (for example, Notepad++), first fill in the Send to Control settings in the Preferences dialog.
Using the Preferences dialog, you can change the following output settings:
Right-click the Capture2Text tray icon in the bottom-right of your screen and then select the "Preferences..." option to bring up the Preferences dialog.
Sometimes Capture2Text consistently makes the same OCR mistakes such as recognizing an "M" as "I\/|".
By editing the substitutions.txt file in the Capture2Text directory, you may tell Capture2Text to substitute one text string for another text string.
Just find the appropriate language section and add one substitution
per line in this format:
from_text = to_text
Example (adding 3 substitutions to the English section):
To create a substitution regardless of language, add the substitution to the "All:" section.
Special tokens and escape characters:
| %space% | Space character |
| %tab% | Tab character |
| %eq% | Equals (=) |
| %perc% | Percent sign (%) |
| %lf% | Linefeed character (\n) |
| %cr% | Carriage return character (\r) |
You may disable a substitution by adding a "#" in front.
When done editing substitutions.txt, either restart Capture2Text or switch language for the substitutions to take effect.
You may OCR the screen via command line by calling Capture2Text in this format:
Capture2Text.exe x1 y1 x2 y2 [output_file]
Capture2Text will read settings.ini to determine settings such as OCR language and output options (clipboard, popup, etc.).
Note: Make sure that you wait for Capture2Text to finish processing before attempting to start a new instance of Capture2Text. Batch files automatically do this. If you are typing directly into the console window, be sure to use the "start /wait" command. For example:
Possible solutions:
Make sure that you have unzipped Capture2Text to a path that does not contain Asian or other non-English characters. Search Google if you do not know how to unzip a file.
Make sure that you did not unzip Capture2Text to the "Program Files" directory. This will avoid possible issues related to write privileges.
Try unzipping Capture2Text to a very short path such as C:\Capture2Text. Some computers supposedly have issues with longer paths. I have never actually seen this happen though.
Make sure that your Anti-virus software is not blocking Capture2Text. Refer to the documentation that was bundled with your Anti-virus software.
Make sure that you have downloaded the latest version from SourceForge: http://sourceforge.net/projects/capture2text/files/Capture2Text/
Restart your computer.
Right-click Capture2Text.exe and select "Run as administrator". Capture2Text shouldn't need administrator privileges, but if all else fails it won't hurt to try.
Ask one of your grandchildren to help you :)
If none of these things work for you, I suggest deleting Capture2Text and looking for some other OCR software (do not ask me for recommendations).
Just restart it. I have never actually seen this happen though.
Read the OCR section of this document to learn how to add new languages.
Click the "Show hidden icons" button (it looks like an triangle).
Right-click it instead.
Since I don't maintain the OCR training files, there is nothing that I can do about it. If you have a technical background and a lot of free time, feel free to create your own training files: https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
Capture2Text is a Windows-only software. If you have a technical background, feel free to port it (but don't ask me to help).
There isn't one. Capture2Text doesn't have an installer either. To remove Capture2Text from your computer, simply delete the Capture2Text directory.
| Tesseract | OCR engine |
| Leptonica | Image processing and analysis library |
| ScreenCapture | AutoHotKey screen capture script |
Automatically lookup Japanese words that you have OCR'd with Capture2Text. Supports de-inflected expressions, readings, audio pronunciation, example sentences, pitch accent, word frequency, kanji information, and grammar analysis. Supports both EDICT and EPWING dictionaries.
Free and open source Manga reader android app that allows you to quickly OCR and lookup Japanese words in real-time. There are no ads and no mysterious network permissions. Supports both EDICT and EPWING dictionaries.