One of our new technologies, DomainExpert, has begun to really catch on. Besides the enhanced compression, DomainExpert improves overall query speed due to the reduction in decompression time. Plus, it doesn't require any DDL changes. This continually-improving technology will get further enhanced over time.
To help show the simplicity of DomainExpert, take a look at this short video. If you have any questions regarding DomainExpert, reach out on our forums.
Read Comments (0)
Over the past 15 months, I proudly served as Infobright's open-source community manager. In recent months, I began looking at other options to expand my experience within the organization; given a recent move, we now are looking for our next community manager.
While I will continue to blog, vlog, attend conferences, and speak, the position requires someone to ensure that our community members have a strong voice. To that end, we are actively looking for an individual with strong database experience and business acumen. For more specifics on the role (and to apply), check out our careers website: http://www.infobright.com/Careers/community_manager/.
Many have asked, "What does a community manager do?" (I tend to get that question the most from my wife.) Long story short, the community manager educates and assists its users with technical challenges and celebrates their successes. As an example, the community manager will ensure all forum posts have proper responses. Five minutes later, they will prepare their talk for the upcoming OSCON show. Plus, the community manager heavily influences the Infobright Intern Program at the University of Illinois; the position serves as the program manager for the team.
You, as our community manager, own our open-source analytics database. From the website to the install procedure, you heavily influence how our users interact with Infobright. Honestly, it's been the most fulfilling role I've ever had with any company. The position hones your engineering, business, and leadership skills. Plus, you work with an amazing group of people.
Since you will help lead the intern program, the position requires you to be in Champaign, IL. Our office is located within the University of Illinois' Research Park incubator facility EnterpriseWorks, recently awarded the Research Park of the Year Award as well as rated one of the top ten incubators.
If you have any questions about the position, please feel free to e-mail me. You can reach me at (JavaScript must be enabled to view this email address).
Hello,
Some of my academic colleagues still remember the first Joint Rough Set (JRS) Conference held in 2007 in Toronto, opened by speeches of presidents/CEOs of York U, MaRS and Infobright. Obviously, international rough set conferences have far longer tradition but it was the first time in 2007 when originally separate rough set events were organized in the same time and place. After five years, the rough set community goes back to this idea - JRS 2012 will be held in Chengdu, China. It will be the only major international rough set event in 2012. Given popularity of rough sets in China and neighboring countries, it has a chance to become the biggest rough set meeting, counting from the first international rough set workshop in 1992.
August is a very good time to visit Chengdu - a historical city surrounded by natural wonders hidden high in the mountains. I still remember Jiuzhaigou located near Chengdu in 2006. (I attach a picture of my wife and mine taken near the valley.) On the other hand, in 2008, Chengdu suffered from a huge earthquake that happened just before another international rough set conference (RSKT 2008). Those of participants who then managed to reach the city could see that the nature can be both beautiful and brutal. Therefore, for many of my friends and colleagues, attending JRS 2012 will mean something more than just yet another academic meeting.
JRS 2012 may turn out very interesting also for researchers and practitioners who have never thought about visiting a rough set event before. As an example, let me briefly describe the Data Mining Competition organized in conjunction with the main conference. The classification problem formulated as the goal of the competition has its origin in very recent experiences of my U of Warsaw colleagues with the analysis of biomedical research papers gathered within the PubMed Central repository. (Here is a link to one of our papers about it.) The data set provided for the competition purposes contains information about 20,000 journal articles (split onto the train and test samples) labelled by 25,640 attributes - numeric columns expressing association strengths of articles to the MeSH ontology terms. The task is to train a classifier that enables to automatically tag research papers with the most appropriate MeSH subheadings - in other words, to automatically classify papers into some predefined topic categories. The classifier with the highest accuracy (measured based on comparison with tags assigned manually by medical experts) wins the competition.
One might say that such data mining competitions are not so interesting for the database people. However, there are at least two important database-related aspects here. First of all, the above-described data set was created using database tools. As you may remember (I used to blog about it a few times), my U of Warsaw colleagues use Infobright, MongoDB and several other technologies to store very detailed information about documents downloaded from various sources. In particular, it enabled the competition organizers to compute more thoroughly the weights of associations between articles and ontology terms. Secondly, it is interesting to discuss whether standard algorithms for classifier learning (for example algorithms aimed at extracting optimal decision trees or SVMs from data) can be still efficient for truly large data volumes and whether SQL-based analytic scripts might be used to speed them up. (I used to blog about it as well.) In case of this particular data set and the corresponding classification problem, it may be actually not a bad idea to use the provided train sample to construct a random forest of trees based on different subsets of attributes. How to heuristically search through the space of all forests using SQL in order to find the most promising classifier ensemble? – A good question for potential competition participants.
Merry Christmas!
Dominik