Hello,
Some of my academic colleagues still remember the first Joint Rough Set (JRS) Conference held in 2007 in Toronto, opened by speeches of presidents/CEOs of York U, MaRS and Infobright. Obviously, international rough set conferences have far longer tradition but it was the first time in 2007 when originally separate rough set events were organized in the same time and place. After five years, the rough set community goes back to this idea - JRS 2012 will be held in Chengdu, China. It will be the only major international rough set event in 2012. Given popularity of rough sets in China and neighboring countries, it has a chance to become the biggest rough set meeting, counting from the first international rough set workshop in 1992.
August is a very good time to visit Chengdu - a historical city surrounded by natural wonders hidden high in the mountains. I still remember Jiuzhaigou located near Chengdu in 2006. (I attach a picture of my wife and mine taken near the valley.) On the other hand, in 2008, Chengdu suffered from a huge earthquake that happened just before another international rough set conference (RSKT 2008). Those of participants who then managed to reach the city could see that the nature can be both beautiful and brutal. Therefore, for many of my friends and colleagues, attending JRS 2012 will mean something more than just yet another academic meeting.
JRS 2012 may turn out very interesting also for researchers and practitioners who have never thought about visiting a rough set event before. As an example, let me briefly describe the Data Mining Competition organized in conjunction with the main conference. The classification problem formulated as the goal of the competition has its origin in very recent experiences of my U of Warsaw colleagues with the analysis of biomedical research papers gathered within the PubMed Central repository. (Here is a link to one of our papers about it.) The data set provided for the competition purposes contains information about 20,000 journal articles (split onto the train and test samples) labelled by 25,640 attributes - numeric columns expressing association strengths of articles to the MeSH ontology terms. The task is to train a classifier that enables to automatically tag research papers with the most appropriate MeSH subheadings - in other words, to automatically classify papers into some predefined topic categories. The classifier with the highest accuracy (measured based on comparison with tags assigned manually by medical experts) wins the competition.
One might say that such data mining competitions are not so interesting for the database people. However, there are at least two important database-related aspects here. First of all, the above-described data set was created using database tools. As you may remember (I used to blog about it a few times), my U of Warsaw colleagues use Infobright, MongoDB and several other technologies to store very detailed information about documents downloaded from various sources. In particular, it enabled the competition organizers to compute more thoroughly the weights of associations between articles and ontology terms. Secondly, it is interesting to discuss whether standard algorithms for classifier learning (for example algorithms aimed at extracting optimal decision trees or SVMs from data) can be still efficient for truly large data volumes and whether SQL-based analytic scripts might be used to speed them up. (I used to blog about it as well.) In case of this particular data set and the corresponding classification problem, it may be actually not a bad idea to use the provided train sample to construct a random forest of trees based on different subsets of attributes. How to heuristically search through the space of all forests using SQL in order to find the most promising classifier ensemble? – A good question for potential competition participants.
Merry Christmas!
Dominik
Comments (0)
Hello,
Last week I attended the FGIT 2011 multi-conference organized by the SERSC society. (FGIT and SERSC stand for Future Generation Information Technology and Science & Engineering Research Support soCiety, respectively.) As usual, it was a good opportunity to interact with experts in different areas of foundations and applications of computer science. I enjoyed a couple of interesting database presentations. (See for example the paper on recursive extensions of SQL-based analytics.) This time I contributed with an article on a new type of rough-set-based framework for classifier ensembles, which are now quite trendy in machine learning. I could also see several inspiring papers that were not directly related to my research interests. (See for instance the paper on blended nurture.) I do like such a variety of topics!
Of course everyone paid special attention to the keynote and plenary talks. There were nine of them. See the conference homepage for their brief descriptions. Let me start with Dr. Hamid Arabnia who perfectly adjusted his lecture on parallel multiprocessor systems to the varied background of the audience including myself. Honestly speaking, I did not realize that it is possible to think about adaptively reconfigurable (during runtime!!!) topologies of processor connections. I found the analogy between human neurons and processing units truly inspiring – indeed, it's not only about their number but also the ability to learn how to communicate optimally. I also liked the way of using the resulting multiprocessor system model to image analysis, specifically to breast cancer detection. Medical applications also dominated a few other invited talks including the one on capturing the behavior of medical staff by Dr. Shusaku Tsumoto (very impressive usage of temporal data mining methods within hospital information systems and warehouses) and the one on privacy issues in public health informatics by Dr. Sabah Mohammed (I can see here some interesting opportunities for analytic databases, at the edge of bio-surveillance and knowledge discovery).
Besides the interesting presentations there were of course also discussions about the future of the FGIT conference series. Let me recall that so far SERSC applied the following strategy: one big annual multi-conference and a few smaller events scattered in time and space, all published in the form of proceedings in a way that is standard for the majority of international academic events. However, for 2012 they have planned something new and pretty ambitious. They intend to organize events of roughly equal size (almost) every month, in different parts of the world. They also made an agreement with an American journal publisher and they will do their best to put accepted conference articles directly into scientific journal issues, without publishing regular proceedings any longer. Take a look from time to time at the SERSC conference calendar. I hope it will get updated soon. I will try to attend at least one of events (especially those exposing science-industry connections). Let's hope that the above-described fast journal publication strategy turns out to be successful. It may encourage other scientific conference organizers to follow the same path in the near future.
Dominik