Joinutility seperatorLogin utility separator Infobright.com

Infobright Blog

30
Jun

The Conference in Moscow - Part 2

Dominik Slezak's photo
by Dominik Slezak     Thu, Jun 30, 2011

Hello,

Although I had to leave Moscow on two days ago, the events over there have last until today. The rough set conference has smoothly transitioned into the pattern recognition conference, opened by the keynote by Rakesh Agrawal, widely known as one of the inventors of association rules. Interestingly, Rakesh did not talk about association rules at all. Rather than on any specific data mining algorithms, he focused on their potential applications. With this respect, he cited Crossing the Chasm - the famous book about the positioning problems of start-up companies, which, according to Rakesh, may be relevant also to the domains of computer science, such as data mining.

Clearly, the algorithmic foundations are important but the only chance for data mining to remain a domain of active research is to find its way towards modern applications. As an example, Rakesh described his project on enriching the textbook contents by the most relevant links acquired from the Internet. More precisely, the idea is to use data mining techniques to find, for each section of an input textbook, the most relevant versions of the most relevant articles in Wikipedia. The electronically-available textbook is then made available with those additional links, which is important especially for the sections with incomplete coverage of the presented topics, and images. Actually, the choice of appropriate images opened a very interesting discussion during the talk. - How many images should be displayed? How to diversify images between the textbook sections? - These are only several technical problems to be solved. More generally, the search for the most relevant materials can be indeed stated as a data mining problem (first you need to define what the relevance means; then you need optimize the search process; and then you may expect verification and a need to re-tune algorithms), which is just one of components serving the ultimate purpose of providing the end-users with good results.

Before leaving, I had also occasion to attend some industry talks delivered by the main conference sponsors. Talking about commercial products in front of the academic audience is certainly not an easy task, as you should speak at relatively high, intuitive level, but in the same time remain ready to answer deeply technical questions. From this perspective, I especially enjoyed the lecture by Aram Pakhchanian from ABBYY - one of the leading Russian IT companies specialized in OCR, capturing data from documents, and automatic translators, among other software tools. Aram explained how so called IPA (Integrity, Purposefulness, Adaptivity) principle helped ABBYY to develop their products. Let me actually cite a fragment of Aram's abstract published in the online expert session materials (together with abstracts of talks contributed by such companies as Yandex, Avicomp, Witology, Forecsys, and Infobright, as well as a number of speeches by top-level academic experts attending both conferences):

This core idea of IPA was borrowed from the observation of how live species detect and identify objects around them. Object identification process goes through the following steps: the attempt identification the object in general, as an integral set of its features. At that step we use only features that apply to the object as a whole. The result is not yet a final decision but a set of hypotheses. As each of them is being addressed, the system purposefully uses only detectors that measure features that could help to support or decline the specific hypothesis. This process leads to each of detectors assigning an integral measure of probability. The final decision is made by comparing weighted average probabilities coming from each of detectors. As the system goes through this process, it learns how to better fit its recognition process into specific overall conditions and that allows to adapt the system to improve its behaviour in these conditions.

In my opinion, the two above-described talks - by Rakesh and Aram - are perfect examples of how to embed various types of intelligent algorithms into the internals of technologies that seem to provide quite standard (or if not fully standard, then remaining very simple and intuitive) functionality to the end-users. This is actually the key objective also for us at Infobright, when developing, for instance, all those intelligent techniques of interaction between the exact and rough data layers. You may be sure that I will write more about it in the next couple of days - the rough set and pattern recognition events are now over but, in the meantime, my favorite conference on intelligent systems has begun in Warsaw...

Best greetings,

Dominik


29
Jun

Infobright Community Edition 4.0 Released

Jeff Kibler's photo
by Jeff Kibler     Wed, Jun 29, 2011


It's hot out of the oven, and best of all, it's early! Infobright Community Edition 4.0 is now generally available on our download page. For your convenience, the links below will guide you on the incredible enhancements provided in 4.0 as well as give you a download location. Our virtual machines will be updated within the next two weeks to include not only Infobright CE 4.0 but also the upgrades to Jaspersoft, Talend, Pentaho, and Actuate/Birt.

Release Announcement: Blog Posting which Details Features
Release Notes: ICE 4.0.3 Release Notes (PDF)
Release Video: YouTube Video


Registration Page: Register for User Guide, Bloor Report, and potentially win an Enterprise Edition Dev License
Download Page: Download Binaries


Next Page