SOURCE: Connotate


October 03, 2012 08:00 ET

Connotate Expands Automation Solution to Include Precise PDF Extraction

Transformation of Content Offers Significant Productivity Gains for Clients in Publishing, Finance, Legal Services, Science, Technology and Government

NEW BRUNSWICK, NJ--(Marketwire - Oct 3, 2012) - Connotate, the leading provider of solutions that help organizations automate data collection from the Web, today announced added capabilities for extracting data from unstructured flat-file documents by transforming them into higher-value structured content with style information and metadata.

Firms in many industries have been searching for a way to automate data extraction and do it with precision. For example, PDF files are widely used in financial reporting and countless hours are spent manually extracting tables and commentary to repurpose documents for new uses.

Connotate's unique solution precisely analyzes input from files in PDF, HTML, RTF, OCR and other formats, transforming it into an easy to manage output with style information and metadata. This enables organizations to process and analyze higher volumes of data, thereby reducing costs and increasing productivity. Use cases include:

  • Finance - convert PDF to output that can be validated against schema such as International Financial Reporting Standards (IFSR) or Generally Accepted Accounting Principles (GAAP)

  • Magazines, newspapers and scientific journals - Automatically create a searchable directory of articles by identifying and extracting critical data for the abstract (author, publisher, etc.)

  • Legal - for PDFs containing multiple case documents, segment data case-by-case, extracting court name, case number, jurisdiction, presiding judge, date, judgment, etc.

  • Government, aerospace and defense technical documentation -- convert legacy content (PDF) into standard military or government data modules

"In listening to our customer base, we heard very clearly the need for automated collection and transformation of more types of data -- and PDF extraction was at the top of the list," said Isai Shenker, vice president of product management for Connotate. "We are delighted to offer a proven solution to meet this need by working with our partner, Khemeia Technologies."

Today, customers in publishing, financial services, life sciences, automotive, retail, government, manufacturing and consumer packaged goods sectors are using Connotate to reduce costs and generate new revenue streams. This fall, Connotate is conducting a series of webinars focusing on best practices in Web data extraction. The next webinar in this series will be on October 3, 2012 titled "Stop Searching and Start Selling: How Automation Enhances B2B Sales Intelligence." Learn more about this free, informative webinar.

Tweet this News: @Connotate expands automation solution to include precise PDF extraction:

About Connotate
Connotate puts the power of Web data monitoring and collection into the hands of the business user. Connotate delivers the scalability, reliability and resiliency necessary to drive strategic value from dynamic, Web sources. With benefits ranging from increased productivity, competitive advantages and dramatic operational cost savings, Connotate's growing customer list includes global businesses such as McGraw-Hill, Associated Press and Thomson Reuters. Connotate has been named a KMWorld "Trend-Setting Product" for the past seven years. For more information, please visit or

Contact Information

  • Media Contacts:
    Gina Cerami
    Vice President of Marketing, Connotate
    Email Contact

    Heather Fitzsimmons
    Mindshare PR
    Email Contact