Can somebody explain in one or two sentences?
One aspect is extension of set theory, where an element (1) can belong to a set, (2) can belong to the complement of a set, (3) can belong to the boundary of set, so we can not precisely say whether it belongs or not to the set.
It is interesting to note that rough sets is an approach introduced more than 25 years ago, used so far mainly in data mining and knowledge discovery. As an example, consider automatic detection of some important events in data. (Like fraud detection or credit risk assessment.) In such an application, the rough set approach will attempt to create a decision model that automatically identifies: (1) the cases that support the given event; (2) the cases that do not support the given event; (3) the cases that remain undecided.
Methodologies based on rough sets can be, however, applicable to many other domains too.
At Infobright, for example, we follow the rough set approach to identify: (1) the data portions that are fully relevant to the given query execution; (2) the data portions that are fully irrelevant to the given query execution; (3) the data portions that remain undecided. The corresponding data identification mechanism is based on the Infobright’s knowledge grid, where compact information about the particular data portions is gathered. The data portions are highly compressed and - for analytical queries - only the undecided data portions are required to be accessed.
It is an online database gathering the rough set-related academic papers.
Actually, I remember a funny story about Infobright and that database:
Long time ago, just after Infobright had been founded, I was asked whether we were really so unique using rough sets in the database systems. When I answered: “yes, we are fully unique!”, the response was: “but there is that Rough Set Database System available online at http://rsds.univ.rzeszow.pl/ !!!”
Actually, it would be nice to see whether the database system with rough set documents can be run using the database engine based on rough set principles. I know the creators of that online system pretty well, so maybe some day we will try? I guess we’d need to better support the document search queries, especially at the level of Knowledge Grid. However, with ICE open, perhaps we may count on some help from the Community with this respect? I would be really, really, really great!!
Rough sets is an extension of traditional set theory.
In the traditional set theory, it is clear whether an element x is in a given set A. While rough sets looks at the relationship between the set x belongs to according to the indiscernibility, and a given set A. The essence is that a higher degree, i.e., the power set, is studied. Therefore, it might be more accurate to say “an element’s indiscernible set (1) is belong to a set ...” rather than “an element (1) can belong to a set”.
You are right that the theory of rough sets is more about working with sets than with elements. This is why, at the level of foundations, rough sets are sometimes discussed together with mereology (http://en.wikipedia.org/wiki/Mereology) or, as another example, with granular computing (http://en.wikipedia.org/wiki/Granular_computing). What we do at Infobright can be actually interpreted as a kind granular computing but, if you don’t mind, I’ll keep this topic for another post…
One of the points now is how those sets are defined in particular approaches and applications. In original rough set applications, the sets of interest are the classes of rows with the same values on some subsets of available columns. In rough set extensions, they may gather the rows with similar but not necessarily the same values, et cetera. At Infobright, both in the case of ICE and IEE editions, we also work with the sets. The sets can be defined either as the blocks of rows gathered and compressed together or the sets of rows satisfying some conditions defined by SQL statements arriving to the system.
I attach the presentation we did at the 2007 rough set conference in Toronto.
It was more than a year before going ICE. Our product was called Brighthouse then…
Please have a look at the last two slides. They illustrate the foundations of rough sets in two scenarios described previously in this thread. By the way, please let me know in case you’re interested in any of the Toronto 2007 papers.
When you visit the homepage of IRSS (International Rough Set Society), please have a look at IRSS Resources. In Guides, tutorials etc, there are two presentations. You may say that there’s “too much math” there. However, the whole idea is really straightforward, easy to implement and extend.
The first presentation introduces basic notions. (I suggest reading slides 1-33 and going back to the rest a bit later.) In particular, there’s the framework for removing superfluous attributes (columns) and shortening IF-THEN rules while learning decision (classification, prediction, etc.) models from data. I’ve already mentioned this way of optimization (simplification, clarification) of data-based models in the data mining thread (#3). Such a tendency is visible not only among rough setters but also in many areas of machine learning.
For those of you who may be interested, there are several pages of THE BOOK, i.e. the original book on rough sets by Zdzislaw Pawlak, avaliable at Google Books: http://books.google.com/books?id=MJPLCqIniGsC&hl=en
There are other RS books there as well, unfortunately only a few of them with full text.
Thanks a lot! This is indeed a very good reference. There were some rough set publications before but in this book Professor Pawlak was able to put everything together very clearly. It’s a pity that there are only a few first pages available in the free preview. Still, it’s worth reading them. On the one hand, it’s visible that original way of constructing knowledge granulations basing on indiscernibility relations is different from what we do in Infobright. On the other hand, we do rely on the notions of rough approximations in our technology and we’ll surely continue to refer even more to the rough set foundations in the future.
It reminds me about another publication. Sorry if I mentioned about it before. I hope I’m not getting boring. When I attended SIGMOD 2008, I noticed some familiar printouts at the conference registration desk. I came closer and… those were copies of even earlier paper (a decade before the book about rough sets):
Zdzislaw Pawlak: Information systems theoretical foundations. Inf. Syst. 6(3): 205-218 (1981)
It was about interpretation of data, information derivable from data, and the querying methodologies related to data and data information. Pawlak’s approach with this respect can be regarded as an alternative to Codd’s framework for relational data models. There are some interesting similarities and dissimilarities.
I’ll try to get a reasonable quality electronic version of this paper.