SOURCE: Connotate, Inc.

Connotate, Inc.

April 09, 2014 09:46 ET

Patent for Automated Quality Control of Webdata Awarded to Connotate, Inc.

Innovations in Machine Learning and Advanced Pattern Recognition to Ensure Consistency of Large-Scale Webdata Extractions

NEW BRUNSWICK, NJ--(Marketwired - Apr 9, 2014) -  Connotate, the enterprise-grade datapipe for Web-sourced information, or Webdata, today announced the award of its sixth patent, U.S. Patent No. 8,666,913. Connotate received this patent for its innovative use of advanced pattern recognition techniques that automatically identify inconsistencies in data formats during large-scale Web data extractions.

Connotate's newest patent focuses on a critical aspect of high-scale Web data delivery: automating the quality control process ensuring that extracted data is properly structured and consistently formatted. The approach uses advanced machine learning algorithms and pattern detection to do so. Its algorithms first perform background monitoring of the flow of extracted data to "learn" the appropriate formats for data -- for instance, if a date consistently appears as "mm/dd/yyyy," the system notes this formatting. The algorithms then search for exceptions in subsequent data flows, either automatically correcting the issue or alerting human operators to more extensive complications

The validity check can be applied to a broad number of data formats including dates, names, addresses, phone numbers, and part numbers -- as the platform learns new formats each time it is exposed to something inherently new to it. The patented technique extends to anomalies in any large-scale data flow beyond Webdata, so it has broad potential application to help enterprises tame and structure their Big Data flows from any source.

"This patent is a critical advancement that enables Connotate's Web scraping platform to automatically identify patterns and regularities in data formats -- like a date -- and immediately repair inconsistencies or alert responsible parties while training or in production," said Vince Sgro, the company's Co-founder and chief technology officer. "The ability to run a validity check at the time of training or execution leads to a persistent, scalable extraction. It's one of the things that truly sets us apart," continues Mr. Sgro.

This is the latest addition to Connotate's Web data extraction technology patent portfolio. Its core technology is based on visual abstraction techniques that enable users to quickly identify and automate the extraction of data from Web pages through a point-and-click interface. The platform handles millions of extractions daily, delivering terabytes of clean Webdata to fuel analytics applications and large-scale information aggregation for enterprise and government.

About Connotate

Connotate puts the power of Web data monitoring and collection into the hands of the business user. Connotate is a Web data scraping technology that delivers the scalability, reliability and resiliency necessary to derive strategic value from dynamic, Web sources. Connotate's growing customer list includes global businesses such as McGraw-Hill, Associated Press and Thomson Reuters. The Connotate solution has been named a KMWorld "Trend-Setting Product" for the past eight years.

For more information, please visit www.connotate.com or www.connotate.co.uk and follow us on Twitter at @Connotate.

Contact Information