Introduction
My team and I in Mikrocop d.o.o. have been testing a variety of products, for use in our enterprise content management system, that would allow us to extract document contents for purpose of building a search index and convert those documents to PDF or thumbnails, in order to provide a unified interface, through which our users could view documents on various devices.
Content extraction
A
cornerstone of each content management system is search engine that will be
used to locate documents, once they are stored. In order to provide users with
greatest range of search queries, it is necessary to build a search index,
containing not only metadata of each file, but also its contents.
Since we
were required to support a wide variety of file types, including Microsoft
office, Open office, TIF, PDF, MSG and RTF files, we were looking for products,
that would support as many of those files as possible, without dependencies
that would prevent them from being used on servers. We have tested a variety of
products, of which Aspose proved to be both easiest to use and in most cases
the fastest way to extract text. Aspose proved capable of extracting content
from all of those file types, including any attachments embedded within them.
Support for older Microsoft Office formats will also allow us to provide same
functionalities for older files, archived within our system.
Thumbnail generation
To make it
easier for our customers, to find the documents, they are looking for and to
reduce the number of documents downloaded, we have decided to convert the first
page of each document to a thumbnail. This allows users to preview a document,
before they initialize a download, reducing the time it takes them to locate a
document, they were looking for.
Similar to
our requirements for text extraction, we were looking for a product that would
support a wide variety of document types, without any external dependencies.
Aspose, once again proved to be the best choice. We haven't found any other product
that would match Aspose in terms of speed or ease of use.
Conversion to PDF
Due to wide
variety of devices our system is required to support, from desktop computers to
tablets, we were faced with a challenge of providing unified interface for
reading documents. We have decided to convert documents to PDF and display them
using PDF.js platform.
Excellent
support for document conversions in Aspose framework proved more than capable
of this task and just as in thumbnail generation, proved faster than
competition.
Conclusion
Aspose Total
supports all our needs, making it easy to handle various types of
documents. It offers more than decent
performance, which is critical in big data cloud systems. When reviewing the
components we also ran into problems, which were rapidly handled by Aspose
technical support.