Hello rainbow2009,
Thanks!
Actually, there will be a short summarizing paper about Infobright at SIGMOD 2009 (Industry Track).
The paper’s objective is to discuss Infobright technology with respect to its main features and architectural differentiators, as well as to introduce the upcoming research and development projects that may be of special interest to the academic and industry communities.
One of the sections is about approximate querying. Let me paste it below, with the cited papers at the very bottom and with Figure 4 attacched. Let me also thank all the community members participating in this thread for inspiration!
Best greetings,
Dominik
===================
7. APPROXIMATE QUERIES
In [26], we introduced an abstract notion of rough query. We did it to illustrate internal processing of Rough Rows (treated as higher-level objects) by means of their Knowledge Grid entries (treated as those higher-level objects’ attributes). Surely, one may think about it also in the context of extending SQL. It is related to a popular topic of approximate querying [6,12], which receives some attention also on Infobright forums. (Here I put reference to this thread.) Principles of rough sets can be useful in this area too. Actually, some rough set-based approximate querying extensions have been already proposed [20].
Although approximate querying is not currently on our product roadmap, we investigated several possibilities in this area. For example, as illustrated by Figure 4, we tested inexact Data Pack Nodes that describe Data Packs almost correctly, neglecting certain percentage of local outliers in order to provide crisper min-max intervals. The results obtained in terms of speed versus precision of analytic query answers are quite promising. It is worth noting that in some applications query results are actually more reliable when outliers are removed (cf. [7]). Outlier handling can be useful also in, e.g., data compression (cf. [28,30]) or organization (Section 8). (I’ll write about it more pretty soon.) One just needs to remember that Figure 4 refers to outliers as some odd values while the mechanism introduced in Section 8 refers to outliers as odd data rows.
(...)
10. REFERENCES (only those used in Section 7)
[6] S. Chaudhuri, V. R. Narasayya, and R. Ramamurthy. Estimating progress of long running sql queries. In SIGMOD Conference, pages 803-814, 2004.
[7] A. Deligiannakis, Y. Kotidis, V. Vassalos, V. Stoumpos, and A. Delis. Another outlier bites the dust: Computing meaningful aggregates in sensor networks. In ICDE, pages 988-999, 2009.
[12] Y. Hu, S. Sundara, and J. Srinivasan. Supporting time-constrained sql queries in oracle. In VLDB, pages 1207-1218, 2007. (Krzysztof, thanks again for this reference!)
[20] S. Naouali and M. Quafafou. Concepts approximation for information retrieval from databases and pragmatic olap. In ICTTA, pages 401-402, 2004.
[26] D. Slezak, J. Wroblewski, V. Eastwood, and P. Synak. Brighthouse: an analytic data warehouse for ad-hoc queries. PVLDB, 1(2):1337-1345, 2008. (Brighthouse was the original name of our product.)
[28] M. Wojnarski, C. Apanowicz, V. Eastwood, D. Slezak, P. Synak, A. Wojna, and J. Wroblewski. Method and system for data compression in a relational database, 2008. US Patent Application, 2008/0071818 A1.
[30] M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar ram-cpu cache compression. In ICDE, page 59, 2006.
===================