Joinutility seperatorLogin utility separator Infobright.com
   
 
tutorial
Posted: 13 February 2009 10:15 AM   Ignore ]  
Newbie
Rank
Total Posts:  1
Joined  2009-02-13

Is there some kind of tutorial I could follow to develop my test warehouse?

I have such scenario:
- I have a large amount of logs; I need to grep these logs in order to retrieve a piece of knowledge on a regular basis.
- the idea is to load all the data into IB database and build a simple web interface to query the database; this way I could take the burden of knowledge retrieval off me (other people could use the interface after a short tutorial)
- an additional surplus would be the size; IB can compress better than gzip.

The questions:
- how should I organize my data? Every single record contains a timestamp, 3 strings (always present), and a line of comment of variable length; I need to store 400 million records. A record is approx. 240B wide
- should I use a single table to store all my rows or should I split the data across several tables?
- how does IB behave when there are several (5-10) queries at once to the same database/table?
- what will be the response time of a query of an idle DB, or a busy DB (running 5 other queries)
- what hardware do I need to run my system smoothly?
- how do I import the data?
- how do I delete the oldest data?

To say the truth I’ve been experimenting with IB for some time, but the results are not satisfactory.

Profile
 
Posted: 13 February 2009 10:28 AM   Ignore ]   [ # 1 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  736
Joined  2008-08-18

Hi Hannibal,

To quick answer to at least part of your questions: you should rather store all data in one table - it is faster than joining many tables by UNION. On the other hand, if you really need to delete old data, and dyou don’t switch to IEE, partition into several tables will be the best solution. You can drop table in ICE, but not delete rows. A multiuser performance depends on hardware (memory, cores) and queries; for 10 users probably something like 16+ GB, 8 cores would be OK. With a bit of luck, 5 queries on such machine will not be much slower than 1.

Regards,

Signature 
Profile
 
Posted: 15 February 2009 12:11 PM   Ignore ]   [ # 2 ]  
Newbie
Rank
Total Posts:  15
Joined  2009-01-29
hannibal - 13 February 2009 10:15 AM

Is there some kind of tutorial I could follow to develop my test warehouse?

I have such scenario:
- I have a large amount of logs; I need to grep these logs in order to retrieve a piece of knowledge on a regular basis.
- the idea is to load all the data into IB database and build a simple web interface to query the database; this way I could take the burden of knowledge retrieval off me (other people could use the interface after a short tutorial)
- an additional surplus would be the size; IB can compress better than gzip.

The questions:
- how should I organize my data? Every single record contains a timestamp, 3 strings (always present), and a line of comment of variable length; I need to store 400 million records. A record is approx. 240B wide
- should I use a single table to store all my rows or should I split the data across several tables?
- how does IB behave when there are several (5-10) queries at once to the same database/table?
- what will be the response time of a query of an idle DB, or a busy DB (running 5 other queries)
- what hardware do I need to run my system smoothly?
- how do I import the data?
- how do I delete the oldest data?

To say the truth I’ve been experimenting with IB for some time, but the results are not satisfactory.

400 million rows seems to be quite within the range of Infobright.  We are doing something similiar, but primarily doing some analtyic calculations on logs and we found IB to be quite fast. 

You need to figure out how much information do you need.  Do you need the timestamp by time of day or is by date sufficient?  Are your strings repetetive?  Are they unique across the few million or are there less than 10k different values for the strings per column? 
As posted, you can store it all in 1 table. 
For importing it needs to be in CSV format and you can use standard mysql load function. 
For delete you will need the enterprise version.  Or you could get around it by doing queries and reloading your data into new tables. 
As for performance you will have to load the data and try for yourself, keeping in mind some optimizations like possibly using lookup tables. 

There is another important optimization.  You should have the data loaded in the order you will query by.  If you will do queries on small date ranges, then load the data by date.  If you will query on particular values in certain columns, load the data sorted by that column.

Ankur

Profile
 
Posted: 19 August 2009 01:37 PM   Ignore ]   [ # 3 ]  
Newbie
Rank
Total Posts:  1
Joined  2009-08-19

Thanks for the information data warehousing,If u have any projects need to be done thru data warehousing feel free to consult us,b coz we are into it for more than 3 years with great technical experience.
Thanks and regards
datawarehousing

Profile