Automated Document Conversion Solution | ABBYY Recognition Server

ABBYY Recognition Server provides powerful server-based OCR functionality for automated document capture and PDF conversion. Designed for mid- to high-volume batch processing, it enables organizations and scanning service providers to establish cost-efficient processes for converting paper, as well as TIFF, JPEG, and PDF image documents into electronic files suitable for full-text search and long-term digital archiving.

 Why ABBYY Recognition Server?

Automated Conversion to PDF and PDF/A formats

Enables digitization of large document collections and automated conversion of scans into PDF or PDF/A-formats that can be electronically stored and archived.

An Enterprise-level Document Conversion Service

Provides an on-demand OCR and document conversion service for employees or customers – available online at anytime and from anywhere.

Creation of Full-text Searchable SharePoint Libraries

Generates searchable text for scanned or faxed documents stored in Microsoft® SharePoint® libraries to enable indexing by the SharePoint search engine.

OCR that’s empowering.

 Product Highlights: Typical Usage

1. Create digital archive with easy access to information

Challenge: Convert to PDF huge amount of paper documents
Solution: Enable fast and seamless conversion of a vast range of non-text files into searchable format
Keep your archives lean and cut storage costs
Enable fast access to valuable information
Ensure Compliance

2. Make existing digital archive full-text searchable & compliant

Challenge: Convert scanned, faxed or MS Office documents directly within digital archive
Solution: Crawl documents libraries in the background by schedule and convert all images or MS Office documents to searchable PDF or PDF\A
Standardize corporate documents
Enable fast access to valuable information
Ensure Compliance

3. Provide employees with centralized documents conversion service

Challenge: Employees need to convert documents into files ready for updating, re-using, sharing, approving or archiving
Solution: Provide employees with centralized conversion service for everyone
No user training is needed
One service for multiple departments
Access via e-mail, shared folders, portals or MFP panels

4. Prepare corporate documents for eDiscovery, DLP, Fraud Detection

Challenge: Convert to PDF or plain text formats huge amount documents in limited timeframes. Minimize the risk of missing important evidence
Solution: Enable fast and seamless conversion of a vast range of non-text files into a searchable format.
Speed up the overall process
Eliminate the need for human intervention or manual conversion tools
Never losing a document

5. Create digital libraries

Challenge: Preserve unique and rare materials, make them available for a wide range of readers
Solution: Digitize all the materials, convert to PDF, Alto XML or Epub files and publish to online media storages
Enable fast access to valuable information
Recognize materials in various languages, including rare and ancient
Digital version is absolutely identical to the original document

Powerful server-based OCR software for automated document capture and PDF conversion.

Product Advantages

Automated Conversion in PDF & PDF/A Formats

In ABBYY Recognition Server, precision OCR and PDF conversion processes are server-based and fully automated. ABBYY Recognition Server crawls specified “hot folders”, file shares and SharePoint libraries, converts discovered image documents into searchable files and delivers the results back into the same SharePoint library or a custom-specified destination. Minimized user involvement in the high-volume document conversion tasks significantly reduces the cost of the business process.
ABBYY Recognition Servers comes with advanced PDF and PDF/A creation features that meet the standards of long-term digital document archiving. Enhanced MRC compression technology allows creating small-size PDFs with high visual quality, well-suited for publishing online. PDF encryption can be used to prevent unauthorized viewing, printing or modifying the created PDF files. ABBYY Recognition Server can detect PDFs that were generated by a scanner and add a text layer to those files making them full-text searchable. If the PDF file already contains a text layer, ABBYY Recognition Server will evaluate its quality and replace it with a higher quality text layer if needed. At the same time, all bookmarks, annotations, metadata and attachments of the original PDF file will be preserved intact.
Defining custom file names, destination folders or metadata fields is easy with the convenient indexing and metadata extraction tools offered by ABBYY Recognition Server. Barcodes on the cover sheets or data that is contained within the documents can be captured, used for document classification and routing, and stored along with the document in the digital archive.

Creation of Full-Text Searchable SharePoint Libraries

State-of-the-art ABBYY OCR technology delivers the best results even on low quality documents and ensures high recognition accuracy. All scanned or faxed documents can be converted in searchable PDF or PDF/A in order to be indexed by Microsoft SharePoint and become discoverable.
ABBYY Recognition Server provides integration with Microsoft SharePoint on multiple levels. It can be set up as a front-end to the SharePoint server to consistently convert all incoming image documents into searchable PDFs prior to their upload in Microsoft SharePoint. Image documents that are already stored in SharePoint libraries can be automatically converted within the libraries without any user interference. In addition to that, newly added documents that are uploaded by users as images will be found, processed by OCR engine and stored back in the library in a searchable format.
Document conversion is performed as a background process and is fully invisible for SharePoint end users. Their experience of working with SharePoint will not have to change, while ABBYYY Recognition Server will ensure that all incoming documents are consistently processed and made full-text searchable.

An Enterprise-Level Document Conversion Service

With ABBYY Recognition Server, OCR is not limited to a desktop PC or its operator’s work hours. The service that resides on a server is available to all users or select groups of users regardless of their location or access hours.
Centralized installation, configuration, and administration of ABBYY Recognition Server make it a cost-efficient enterprise-wide solution as opposed to desktop applications that need to be maintained by IT personnel on multiple workstations.
Users can start using the document conversion service right away, without having to learn what OCR is. They simply select a format they want to convert their document into (searchable PDF, PDF/A, Microsoft Word or Excel), and receive the requested file.
Because the document conversion process is completely automatic and hidden from the user, ABBYY Recognition Server is equally well suited for a single- or multi-tenant user environment. The installation can be easily scaled to process documents from added clients without a decline in productivity. A flexible system of priorities allows automatically moving important documents ahead of the queue.

Automatically digitize your documents.

How It Works?

1. Flexible Import Options

Import from Network/FTP Folders
ABBYY Recognition Server can automatically import images from the following network resources:

  • Network folder
  • FTP folder (e.g. if images are to be uploaded from remote locations)
  • E-mail folder (e.g. if users send images for conversion by e-mail)

File Input Formats

  • TIFF / Multipage TIFF
    Compression methods: Unpacked, CCITT Group 3, CCITT Group 3 FAX(2D), CCITT Group4, PackBits, JPEG, ZIP, LZW
  • JPEG, JPEG 2000
  • PDF
  • DjVu
  • BMP
  • PNG
  • PCX, DCX

2. Scanning Station

Scanning Station provides functionality for batch scanning and preparation of images for further processing:

  • Scanning via TWAIN, WIA and ISIS.
  • Quick image preview.
  • Image preprocessing (rotation, deskew, despeckle, etc.).
  • Document separation by barcodes / blank pages / fixed number of pages.

For images scanned in a batch, ABBYY Recognition Server offers several built-in document separation options: by blank sheets, barcode sheets, or barcodes stuck or printed on the first page of each document. Additional custom rules based on the recognized text can be created using scripting.

OCR is done on a Processing Station automatically. It is possible to connect several computers to the Server Manager as Processing Stations, and the Server Manager will balance the workload among these stations evenly. This will result in much faster processing of documents.

The OCR and barcode recognition technologies implemented in Recognition Server deliver unprecedented accuracy, support various types of text and support the most popular 1D and 2D barcodes. The OCR process has extensive language support. The supported languages include 198 languages including Latin, Cyrillic, Greek, Arabic, Chinese, Japanese, Korean, Vietnamese, Hebrew, Yiddish and Thai. European languages written in Gothic fonts are also supported.

To preserve the original document layout, ABBYY Recognition Server uses Adaptive Document Recognition Technology (ADRT). ADRT significantly improves document layout retention when saving documents to DOC and RTF formats. The logical structure of an entire document is reproduced, including headers, footers, footnotes, page numbers, table of contents linked to document sections and notes to pictures and diagrams.

Support Many Recognition Languages

  • 43 main languages with dictionary support: Arabic (Saudi Arabia), Armenian (Eastern), Armenian (Grabar), Armenian (Western), Azeri (Latin), Bashkir, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, Dutch (Belgian), English, Estonian, Finnish, French, German, German (new spelling), Greek, Hebrew, Hungarian, Indonesian, Italian, Latvian, Lithuanian, Norwegian, Norwegian (Bokmal), Norwegian (Nynorsk), Polish, Portuguese, Portuguese (Brazilian), Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Tatar, Thai, Turkish, Ukrainian, Vietnamese;
  • 133 additional languages without dictionary support: Abkhaz, Adyghe, Afrikaans, Agul, Albanian, Altai, Avar, Aymara, Azerbaijani (Cyrillic), Basque, Belarusian, Bemba, Blackfoot, Breton, Bugotu, Buryat, Cebuano, Chamorro, Chechen, Chukchee, Chuvash, Corsican, Crimean Tatar, Crow, Dargwa, Dungan, Eskimo (Cyrillic), Eskimo (Latin), Even, Evenki, Faroese, Fijian, Frisian, Friulian, Gagauz, Galician, Ganda, German (Luxembourg), Guarani, Hani, Hausa, Hawaiian, Icelandic, Indonesian, Ingush, Irish, Jingpo, Kabardian, Kalmyk, Karachay-balkar, Karakalpak, Kasub, Kawa, Kazakh, Khakass, Khanty, Kikuyu, Kirghiz, Kongo, Koryak, Kpelle, Kumyk, Kurdish, Lak, Latin, Lezgi, Luba, Macedonian, Malagasy, Malay (Malaysian), Malinke, Maltese, Mansi, Maori, Mari, Maya, Miao, Minangkabau, Mohawk, Moldavian, Mongol, Mordvin, Nahuatl, Nenets, Nivkh, Nogay, Nyanja, Ojibway, Ossetian, Papiamento, Provencal, Quechua, Rhaeto-Romanic, Romany, Rundi, Russian (Old Spelling), Rwanda, Sami (Lappish) , Samoan, Scottish Gaelic, Selkup, Serbian (Cyrillic), Serbian (Latin), Shona, Sioux (Dakota), Somali, Sorbian, Sotho, Sunda, Swahili, Swazi, Tabasaran, Tagalog, Tahitian, Tajik, Tok Pisin, Tongan, Tswana, Tun, Turkmen, Tuvinian, Udmurt, Uigur (Cyrillic), Uigur (Latin), Uzbek (Cyrillic), Uzbek (Latin), Welsh, Wolof, Xhosa, Yakut, Yiddish, Zapotec, and Zulu;
  • 5 East Asian languages: Chinese (Traditional, Simplified), Japanese, Korean and Hangul (Korean);
  • 6 languages for recognition of old European documents and Gothic fonts in books printed in 18-20th centuries
    • English,
    • French,
    • German,
    • Italian,
    • Spanish,
    • Latvian;
  • 4 artificial languages: Esperanto, Ido, Interlingua, and Occidental;
  • 6 programming languages: Basic, C/C++, COBOL, Fortran, Java, and Pascal;
  • Simple chemical formulas
  • Digits
  • 1D Barcodes
    • Check Code 39, Check Interleaved 25, Code 128, Code 39, EAN 13, EAN 8, Interleaved 25, CODABAR (without checksum), UCC Code 128, Code 2 of 5 (Industrial, IATA, Matrix), Code 93, UPC-A, UPC-E, Patch Code and Postnet;
  • 2D Barcodes
    • PDF 417, Aztec, Data Matrix, QR Code
  • Multiple Text Types
    • Normal, Fax (mode for low-resolution texts), Typewriter, Dot Matrix Printer, OCR-A, OCR-B, MICR (E13B), Gothic

Sometimes there is a need to process important documents which have to be recognized with exceptional accuracy. At the same time, the quality of the scans may not be perfect, suffering from low resolution and unwanted noise. In this case it is very important to have a reliable quality assurance mechanism.

Automatic quality control allows the administrator to set a threshold for recognition accuracy: documents with poor-quality text will not be converted, but rather stored in a separate folder for special treatment.

Verification Station

A client station for proofreading recognition results. Verification can be enabled for all pages or it can be based on the accuracy threshold. Verification permissions management is supported.

Indexing Station

A client station for document indexing and classification.

1. Multi-Export Destinations

ABBYY Recognition Server enables multiple destinations for data and images as well as generation of searchable PDFs.

2. Flexible File Output Formats

  • PDF, PDF/A-1a, PDF/A-1b, PDF/-2a, PDF/A-2b, PDF/A-2u
  • RTF
  • DOC, DOCX
  • XLS, XLSX
  • TXT, CSV
  • HTML
  • TIFF
  • JPEG, JPEG 2000
  • JBIG2
  • PNG
  • EPUB
  • XML, Alto XML
  • FineReader internal format (FineReader Engine-compatible)

3. Available Connectors to Enterprise Systems

  • Export to Microsoft SharePoint
  • IFilter for TIFF files
  • Connector to Google Search Appliance

4. Available Customization and Integration Options

  • Custom processing parameters defined via XML files (XML Tickets)
  • WEB API
  • COM API
  • Scripting in VBScript and JScript

ABBYY Recognition Server is a server-based software for automating document processing, OCR and PDF conversion in enterprise and service-based environments. Its architecture makes it easy to deploy document processing solutions that scale to any size, with significant time and cost savings.

ABBYY Recognition Server automatically converts large volumes of paper documents or document images into fully searchable electronic text suitable for business processes including archiving, e-discovery, and enterprise search. It enables automated, unattended document processing that can be managed and accessed from within an organization or remotely. Recognition Server can also connect with a variety of back-end systems and third-party applications, integrating via Scripts, XML tickets, a Web-service API or a COM-based API. ABBYY intelligent OCR and PDF conversion technology delivers highly accurate document conversion with recognition of up to 190 languages.

Architecture

ABBYY Recognition Server consists of several components, which can be installed on one or many computers in a LAN. The main components are:

  • Server Manager — a central service component, which controls the document processing queue and distributes the tasks among the stations.
  • Processing Station — a service that performs recognition and document conversion.
  • Scanning Station — a client station for batch scanning and image pre-processing.
  • Indexing Station — a client station for document indexing and classification.
  • Connector to Google Search Appliance™ (GSA) — a component that allows Google Search Appliance to use ABBYY Recognition Server for extracting content from document images .
  • Connector to Microsoft® Search Systems (IFilter) — a component that allows Microsoft Office SharePoint Server and Windows Search to use ABBYY Recognition Server for extracting content from document images.
  • Remote Administration Console — a client console used for configuring and monitoring Recognition Server.

abbyy-recognition-server-overview4

Document Processing

6470e_rs_doc_processing

ABBYY Recognition Server processes each image file according to a workflow — a set of processing parameters predefined by the administrator. ABBYY Recognition Server can run several workflows with different parameters simultaneously. Each workflow corresponds to a unique input source (a folder, a SharePoint library or a mailbox).

Processing Steps

A workflow in ABBYY Recognition Server typically includes up to six configurable stages. Each workflow runs independently of others according to its own schedule and priority.

6470e_rs_processingstep

Six Stages of Document Processing

1. Scanning/Import of images. Images can be either scanned by an operator on the Scanning Station and then sent to ABBYY Recognition Server, or automatically imported by ABBYY Recognition Server from an input folder (network folder, FTP folder, SharePoint® library, or mailbox). ABBYY Recognition Server arranges image files in a queue to process them automatically according to priorities.

2. Recognition. The OCR process runs automatically on the Processing Station. If several Processing Stations are installed in the system, the files will be distributed among these Processing Stations evenly for optimal performance. Deploying additional Processing Stations brings a linear increase in OCR speed.

3. Verification (optional). In some cases, for example when digitizing books, verification of the recognition results might be necessary. Verification Stations allow operators to check all documents or only documents below a certain accuracy threshold.

4. Document separation (optional). When the batch scanning or import is performed, a document separation may be required. The documents can be separated using blank separator sheets, barcodes or by fixed number of pages per document. Separation can also be done according to a scripted rule.

5. Classification and indexing (optional). Indexing of documents can be done either automatically by a script, or by an operator on the Indexing Station, which allows the operator to manually select the document type and assign document attributes. The operator can also verify the data that has been populated by the script.

6. Export. In the final stage, ABBYY Recognition Server delivers the output documents to their destination (which can be a network folder, a SharePoint document library, or an e-mail address). Additionally, scripts can be applied for intelligent routing and delivery of documents to ECM systems based on document types and attributes.

Recognition Server is administered via a convenient interface based on the Microsoft Management Console. It allows the administrator to configure the system and monitor its activity: to set processing parameters, to manage licenses, stations, user permissions, processing queues and to view logs.

With the priority management and scheduling features, the administrator can control the order in which the documents are processed and use the stations’ hardware resources efficiently by scheduling OCR for night hours or weekends.

Benefits

Increase your business competitiveness
This high-performance & highly scalable technology helps you to fasten decision making processes, provide instant and efficient services to your clients and attract new customers and businesses.
Enjoy the new level of Arabic recognition
ABBYY Recognition Server provides fast document capture with 99%* accuracy for 190 languages including Arabic, which is an unprecedented success in OCR technologies.
Reduce cost of your business processes
Digitization of your workflow and archive allows you to reduce costs on paper, hard copy storage, manual entry and processing, consequently you save money and man-hours.
Easily set and forget
ABBYY Recognition server has intuitive User Interface, providing quick simple setup and implementation, including ready-to-use Demo-projects, no training needed, and fast technical support.
Get fast ROI
Flexible system of licensing of ABBYY technologies, its high level of scalability and 24/7 automated performance allows you to get fast ROI.
ABBYY technology seamlessly integrates with your existing environment
ABBYY Recognition Server has an inherent mechanism to integrate with Microsoft SharePoint and thanks to open API you can also smoothly integrate it into other existing workflow systems with no additional expenses or efforts.