Joinutility seperatorLogin utility separator Infobright.com
   
 
dump from ICE and reload (in order to add columns)
Posted: 28 May 2010 01:11 AM   Ignore ]  
Newbie
Rank
Total Posts:  5
Joined  2010-05-20

I’m trying to add one or several columns ( with known data from some file) to existing table.

Since i know ICE does not support ANY alter statements, i decide to dump all the data from the table into a text file, merge this file with ‘the other columns’ data’,  then drop the table, recreate the table, and load data from the merged file.

This should take a long time, can any one tell me if there is any better ways to do this?(for example, using binary data, using pipes, or any possible way that may increase the speed)

Thanks!

Profile
 
Posted: 29 July 2010 10:38 AM   Ignore ]   [ # 1 ]  
Member
RankRankRank
Total Posts:  175
Joined  2008-08-18

ICE only supports text based LOAD files. However, the Infobright Loader does load very fast, so hopefully the effort won’t take too long.

You could also try the Eval of IEE, which supports both ALTER TABLE and binary export/load.

Cheers,
Andrew

Signature 
Profile
 
Posted: 10 August 2010 06:32 PM   Ignore ]   [ # 2 ]  
Jr. Member
RankRank
Total Posts:  78
Joined  2009-04-28

We seem to do this fairly regularly (ok ~every 3 months). We keep the original load files. Then using PERL we add columns, subtract columns, fix bad data & then load into a new table (well actually we use the passive server and do a drop table/create new table & load the data). Our data is approximately 3.5 billion rows. We keep our load files fairly small (under 200MB) so that ICE can load them quickly. We’ll do this on a passive server that we are not using. (it’s kept for backup & for these sorts of changes). Between the perl & load it takes approx 3-4 days. Listening to talks about alters on a craigslist table (over a week!) I think 3-4 days is acceptable.

I’ve found doing a mysql-ib -e “select *” >data-dump.sql takes too long. keeping the original load files means we can easily setup another server for reads or backup w/o stress to the production servers.

Profile
 
   
 
 
‹‹ top-n queries?      distinct and group by ››