MONTREAL, QC--(Marketwired - Dec 21, 2016) - MALUUBA, a Canadian deep-learning company helping machines think, reason and communicate with human-like intelligence, today announced the public release of two sophisticated natural language understanding datasets. In making these resources available, the company seeks to further advance and facilitate breakthrough innovation in artificial intelligence research.
Created by a team of humans, rather than synthetically, Maluuba's new datasets explore fundamental aspects of human capabilities in literacy and conversation. These datasets exhibit complexity and have been developed for machine reading comprehension, goal-oriented dialogue systems and conversational interface research.
"We believe that language understanding is fundamental to solving artificial intelligence," said Kaheer Suleman, cofounder and CTO of Maluuba. "Our hope is that the Maluuba datasets will push forward the field of AI and natural language, so that collectively we can reach our goal of a world where machines communicate intuitively with humans."
Maluuba's first dataset, NewsQA, was developed to train algorithms capable of answering complex questions that require human-level comprehension and reasoning skills. Leveraging CNN articles from the DeepMind Q&A Dataset, Maluuba prepared a crowd-sourced machine reading corpus of 120,000 question-answer pairs. The collection methodology was based on incomplete information and fostered curiosity. The questions require reasoning to answer, such as synthesis, inference and handling ambiguity, unlike other datasets that have focused on larger volumes yet simpler questions. The result is a robust dataset that will further drive natural language research.
"The efforts put into developing this dataset will help drive progress in machine reading comprehension," said Dr. Aaron Courville, Assistant Professor in the Department of Computer Science and Operations Research (DIRO) at the Université de Montréal.
Maluuba's second dataset, Frames, consists of 19,986 turns that can be used to help train deep-learning algorithms on natural conversations. These text-based conversations were recorded between two humans, simulating the conversation between a vacation seeker and a travel agent. The conversations are free-flowing, moving from topic-to-topic such as flights, dates, accommodation and other questions that naturally occur in such a conversation. Other dialogue datasets make an assumption that the dialogue is memory-less; only one set of user constraints is considered and remembered at each step of the dialogue. In contrast, the Frames dataset will require development of completely new state tracking models.
"This is an important new dataset that extends standard dialogue tasks into areas such as comparison and exploration of different customer options," said Dr. Oliver Lemon, Professor, School of Mathematical and Computer Sciences (MACS), Heriot-Watt University. "Building conversational systems which can support such tasks is a fascinating challenge, and this dataset will help us to do that."
"Having access to datasets such as Maluuba's Frames is invaluable in helping AI researchers drive breakthroughs in goal-oriented dialogue," said Dr. Verena Rieser, Associate Professor, School of Mathematical and Computer Sciences (MACS), Heriot-Watt University. "At the MACS Interaction Lab, this dataset will greatly benefit the academic research we are conducting in spoken dialogue systems and response generation."
Maluuba's datasets are available for the research community at https://datasets.maluuba.com.
Maluuba Inc. is a global, natural language understanding company founded in 2011. The company's goal is to create a world where intelligent machines work hand-in-hand with humans to advance the collective intelligence of the human species. In 2016, Maluuba opened a research lab in Montreal dedicated to solving fundamental problems in language understanding for innovative products that will further advance AI systems. For more information, visit: www.maluuba.com.