Joinutility seperatorLogin utility separator Infobright.com

Infobright Blog

31
Oct

CIKM 2010 - Our Paper / Keynote 3

Dominik Slezak's photo
by Dominik Slezak     Sun, Oct 31, 2010

Hello!

Finally, let me write about our paper on Injecting Domain Knowledge into a Granular Database Engine. CIKM 2010 is over. (CIKM 2011 will be in Glasgow.) I’m flying back to Poland and I have lots of time to gather thoughts.

A Granular Database Engine stands for our ICE / IEE. In recent blogs, on forums and in publications we started to refer to our knowledge grid based operations as an example of the granular computing methodology. See slides 8 and 9 in the attachment below. (This is our CIKM poster presentation… Yes, I know it’s not a perfect layout for a poster.J) You can see that we made an effort to standardize rough (the level of data packs or, in other words, blocks or granules) and exact (single values and rows) operations applied during query execution. Rough functions work with rough values (aka knowledge nodes) and assist the query engine to minimize and optimize data access while filtering, joining, aggregating et cetera.

So what about Injecting Domain Knowledge? It can mean many things. In my previous post, I referred to examples of expressing domain knowledge by means of data schema. However, I also mentioned that users often want to build their applications over schemas that are as simple as possible, where domain knowledge (for example knowledge about different sub-types of documents) is not exposed. We'd like to go even further. In order to maintain simplicity and scalability of our architecture, we assume that our query engine modules should not be aware of any domain knowledge either. In other words, even if we know that values of a column declared as VARCHAR(700) may be actually URLs, the query engine should treat it as any other VARCHAR. On the other hand, both data storage and rough values’ creation may use this knowledge quite efficiently. (See slides 10-13).

Our paper includes also other aspects of motivation to consider such understood domain knowledge injections. Generally, we want to encourage domain experts and application developers to specify semantic rules that would guide us towards better data storage and representation. This way we want to bridge so called semantic gap, which in our case means taking advantage of actual meaning of values stored in long VARCHAR columns, even if users and software applications build on top of RDBMS are not prepared to deal with this type of knowledge and, more importantly, to deal with changes in available knowledge.

Speaking of semantics, it is the highest time to write about keynote speech by Gregory Grefenstette on Use of Semantics in Real Life Applications. Basing on many years of industry experience, including Exalead (a provider of enterprise search solutions) most recently, Gregory explained what semantics may mean to ordinary people and presented several examples of semantics-based applications. He also proposed his own definition of (the usage of) semantics in text search:

  1. Anything added that was not in original texts
  2. Anything fancier than indexing original texts
  3. Anything that has some form of a “language”

Let me emphasize that this definition is formulated for a specific area of research and applications. For example, indexing refers to text - not databases. On the other hand, it’s tempting to reinterpret the above criteria in database terminology, particularly for the case of ICE / IEE and our CIKM paper. So, let’s try:

  1. Blocks of values of VARCHAR columns are better stored using additional knowledge
  2. Rough values do not refer to original values but take an advantage of their structure
  3. Experts use a “language” of semantic rules (so far we considered simplified regular expressions) to specify knowledge about data content and internal modules use those rules to decompose and recompose original values, as well as to create and correctly interpret rough values

Surely, our experience with understanding and using semantics is far shorter than in case of Gregory and many other attendees of CIKM 2010. We are just at the beginning of our “semantic adventure”. But this conference convinced me that we are on the right path. Would you have other thoughts or suggestions? I’m opening a new forum thread on using data content semantics in databases.

Best greetings,

Dominik

Attachment: Our CIKM 2010 paper poster/presentation

Infobright     Tags:
Please login or register to post a comment.