SOURCE: Basis Technology

Basis Technology

February 22, 2011 09:00 ET

Basis Technology Releases Rosette Linguistics Platform 7.3 Expanding Language Coverage and Search Engine Connectivity

Advanced Text Analytics for Search-Based Applications Powered by Apache Lucene, Apache Solr, Dtsearch, and Lucidworks Enterprise

CAMBRIDGE, MA--(Marketwire - February 22, 2011) - Basis Technology Corporation (, the leading provider of multilingual text analytics for search-based applications, today released the Rosette® Linguistics Platform version 7.3 with significant improvements in performance, language coverage, and search engine connectivity. Rosette now supports 28 languages with the addition of Finnish, Hebrew, Thai, and Turkish. Bundled connectors enable applications built with Apache Lucene, Apache Solr, dtSearch Text Retrieval Engine, and LucidWorks Enterprise to incorporate advanced linguistic capabilities, including document language identification, multilingual search, entity extraction, and entity resolution.

Integration with Search Engine Frameworks
Rosette connectors enable a variety of linguistic capabilities to be added to search-based applications. The Rosette Language Identifier determines the written language and character encoding of each indexed document, and is capable of recognizing 55 languages and 40 encodings. Rosette Base Linguistics tokenizes and lemmatizes text in 28 languages at index or query time. Determining the "lemma," or "dictionary form," of each indexed word is essential for increasing search engine relevancy. This technique enables queries containing, for example, the word "speak" to match documents containing the word "spoke" (when used as a past tense verb instead of a noun). The Rosette Entity Extractor automatically extracts "entities" -- i.e., names of people, places, organizations -- to enable document clustering and faceted search.

Rosette 7.3 includes connectors for Apache Lucene versions 2.3 to 3.0; Apache Solr versions 1.3 and 1.4; dtSearch Text Retrieval Engine version 7.66 and earlier; and LucidWorks Enterprise version 1.5.

Expanded Language Coverage
With version 7.3, Rosette Base Linguistics adds tokenization support for Finnish, Turkish, and Thai; and adds lemmatization support for Hebrew. The Rosette Entity Extractor for Urdu has been upgraded to take advantage of improved statistical modeling techniques; and the accuracy of existing languages have been improved with the addition of new data to several languages.

"The world's most successful global companies recognize that high quality text search is key to using any application. That's why Basis Technology is committed to expanding and improving our linguistics and search engine support," said Steve Kearns, Product Manager at Basis Technology. "The search market as a whole is becoming more sophisticated and search savvy developers demand the linguistic analysis Rosette provides."

Rosette 7.3 is available now for evaluation. Contact Basis Technology for license and pricing information at +1-617-386-2090 or

About Basis Technology
Basis Technology develops innovative products and solutions incorporating multilingual text analytics and digital forensics. Our Rosette® Linguistics Platform provides morphological analysis, entity extraction, name matching, and name translation, yielding useful information from unstructured data in such fields as information retrieval, government intelligence, e-discovery, and financial compliance. Our digital forensics team pioneers better, faster, and cheaper techniques to extract forensic evidence, keeping government and law enforcement ahead of exponential growth of data storage volumes.

Our products and services are used by over 250 major organizations, including, EMC, Endeca, Exalead/Dassault, Fujitsu, Google, Hewlett-Packard, Microsoft, Oracle, and governments around the world. Learn more at or call 800-697-2062.

Contact Information