Joinutility seperatorLogin utility separator Infobright.com
   
1 of 2
1
Physical data space is not released
Posted: 24 February 2011 12:52 PM   Ignore ]  
Newbie
Rank
Total Posts:  10
Joined  2011-02-24

Hi all,

I’m running IEE 3.4 with the following setting in brighthouse.ini:
ClusterSize = 2

(this value was chosen for testing purposes)

I’ve created the table with 27M rows, and table’s folder size was 192Mb
After this I removed all records using DELETE statement.
Physical size of the folder has not changed.
I understand that server marks rows as “deleted”, however it is mentioned in “How To Use DML Effectively in Infobright”:

When is the data physically removed?
The data packs in the file system are organized
in separate files, one per column, up to the maximum size specified in the
brighthouse.ini config file parameter “ClusterSize” (defaults to 2GB). If all of the data
rows in all packs comprising one of these clusters is marked as deleted, then the
files are deleted and the storage is reclaimed
.

Doesn’t this mean that in my case (ClusterSize is 2Mb) it should have deleted the files physically?

Profile
 
Posted: 28 February 2011 09:05 AM   Ignore ]   [ # 1 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  964
Joined  2008-08-18

Hi !

Can you do ls -l on your table directory ( <datadir>/<database-name>/<table-name>.bht ) before and after the delete ?

Profile
 
Posted: 28 February 2011 09:43 AM   Ignore ]   [ # 2 ]  
Newbie
Rank
Total Posts:  10
Joined  2011-02-24

Hi Janusz!

Sure, I tried this, but the only file gets changed is del_mask.flt

Profile
 
Posted: 28 February 2011 11:57 AM   Ignore ]   [ # 3 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  964
Joined  2008-08-18

I meant can you post the results of ls -l? I am curious what are file sizes….
J.

Profile
 
Posted: 01 March 2011 01:33 PM   Ignore ]   [ # 4 ]  
Newbie
Rank
Total Posts:  10
Joined  2011-02-24

Hi. Today I performed additional test to get the file list you asked for.
ClusterSize was set to 200.
After this I restarted IB instance.

I created the table with about 442,385,008 records. Table folder size was 3,237,026,970.
Full list of files is below:

total 3164768
-rw-rw---- 1 mysql mysql         0 Feb 28 15:40 ab_switch
-rw-rw---- 1 mysql mysql      6760 Feb 28 15:40 del_mask_alt.flt
-rw-rw---- 1 mysql mysql      5072 Feb 28 15:38 del_mask.flt
-rw-rw---- 1 mysql mysql 161225428 Feb 28 15:40 TA00000000000000.ctb
-rw-rw---- 1 mysql mysql 160862982 Feb 28 15:38 TA00000000000001.ctb
-rw-rw---- 1 mysql mysql       131 Feb 28 15:38 TA00000.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00000DPN.ctb
-rw-rw---- 1 mysql mysql       133 Feb 28 15:38 TA00001.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00001DPN.ctb
-rw-rw---- 1 mysql mysql 542699587 Feb 28 15:40 TA00002000000000.ctb
-rw-rw---- 1 mysql mysql 542374739 Feb 28 15:38 TA00002000000001.ctb
-rw-rw---- 1 mysql mysql       128 Feb 28 15:38 TA00002.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00002DPN.ctb
-rw-rw---- 1 mysql mysql       371 Feb 28 15:39 TA00003000000000.ctb
-rw-rw---- 1 mysql mysql       326 Feb 28 15:37 TA00003000000001.ctb
-rw-rw---- 1 mysql mysql       128 Feb 28 15:38 TA00003.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00003DPN.ctb
-rw-rw---- 1 mysql mysql 284627938 Feb 28 15:40 TA00004000000000.ctb
-rw-rw---- 1 mysql mysql 284467894 Feb 28 15:38 TA00004000000001.ctb
-rw-rw---- 1 mysql mysql       123 Feb 28 15:38 TA00004.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00004DPN.ctb
-rw-rw---- 1 mysql mysql 287195569 Feb 28 15:40 TA00005000000000.ctb
-rw-rw---- 1 mysql mysql 287025335 Feb 28 15:38 TA00005000000001.ctb
-rw-rw---- 1 mysql mysql       128 Feb 28 15:38 TA00005.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00005DPN.ctb
-rw-rw---- 1 mysql mysql       128 Feb 28 15:38 TA00006.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00006DPN.ctb
-rw-rw---- 1 mysql mysql 177671279 Feb 28 15:40 TA00007000000000.ctb
-rw-rw---- 1 mysql mysql 177552551 Feb 28 15:38 TA00007000000001.ctb
-rw-rw---- 1 mysql mysql       123 Feb 28 15:38 TA00007.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00007DPN.ctb
-rw-rw---- 1 mysql mysql       123 Feb 28 15:38 TA00008.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00008DPN.ctb
-rw-rw---- 1 mysql mysql  98190040 Feb 28 15:40 TA00009000000000.ctb
-rw-rw---- 1 mysql mysql  98133319 Feb 28 15:38 TA00009000000001.ctb
-rw-rw---- 1 mysql mysql       119 Feb 28 15:38 TA00009.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00009DPN.ctb
-rw-rw---- 1 mysql mysql   6643198 Feb 28 15:40 TA00010000000000.ctb
-rw-rw---- 1 mysql mysql   6636201 Feb 28 15:38 TA00010000000001.ctb
-rw-rw---- 1 mysql mysql       119 Feb 28 15:38 TA00010.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00010DPN.ctb
-rw-rw---- 1 mysql mysql  59381351 Feb 28 15:40 TA00011000000000.ctb
-rw-rw---- 1 mysql mysql  59326059 Feb 28 15:38 TA00011000000001.ctb
-rw-rw---- 1 mysql mysql       124 Feb 28 15:38 TA00011.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00011DPN.ctb
-rw-rw---- 1 mysql mysql        69 Feb 28 15:25 Table.ctb
-rw-rw---- 1 mysql mysql       131 Feb 28 15:40 TB00000.ctb
-rw-rw---- 1 mysql mysql       133 Feb 28 15:40 TB00001.ctb
-rw-rw---- 1 mysql mysql       128 Feb 28 15:40 TB00002.ctb
-rw-rw---- 1 mysql mysql       128 Feb 28 15:40 TB00003.ctb
-rw-rw---- 1 mysql mysql       123 Feb 28 15:40 TB00004.ctb
-rw-rw---- 1 mysql mysql       128 Feb 28 15:40 TB00005.ctb
-rw-rw---- 1 mysql mysql       128 Feb 28 15:40 TB00006.ctb
-rw-rw---- 1 mysql mysql       123 Feb 28 15:40 TB00007.ctb
-rw-rw---- 1 mysql mysql       123 Feb 28 15:40 TB00008.ctb
-rw-rw---- 1 mysql mysql       119 Feb 28 15:40 TB00009.ctb
-rw-rw---- 1 mysql mysql       119 Feb 28 15:40 TB00010.ctb
-rw-rw---- 1 mysql mysql       124 Feb 28 15:40 TB00011.ctb 
Profile
 
Posted: 01 March 2011 01:34 PM   Ignore ]   [ # 5 ]  
Newbie
Rank
Total Posts:  10
Joined  2011-02-24

After this I deleted ALL records.

Folder size after delete: 3,237,021,898

List of the files after delete:

total 3164756
-rw-rw---- 1 mysql mysql         0 Feb 28 15:40 ab_switch
-rw-rw---- 1 mysql mysql      6760 Mar  1 10:57 del_mask_alt.flt
-rw-rw---- 1 mysql mysql 161225428 Feb 28 15:40 TA00000000000000.ctb
-rw-rw---- 1 mysql mysql 160862982 Feb 28 15:38 TA00000000000001.ctb
-rw-rw---- 1 mysql mysql       131 Feb 28 15:38 TA00000.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00000DPN.ctb
-rw-rw---- 1 mysql mysql       133 Feb 28 15:38 TA00001.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00001DPN.ctb
-rw-rw---- 1 mysql mysql 542699587 Feb 28 15:40 TA00002000000000.ctb
-rw-rw---- 1 mysql mysql 542374739 Feb 28 15:38 TA00002000000001.ctb
-rw-rw---- 1 mysql mysql       128 Feb 28 15:38 TA00002.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00002DPN.ctb
-rw-rw---- 1 mysql mysql       371 Feb 28 15:39 TA00003000000000.ctb
-rw-rw---- 1 mysql mysql       326 Feb 28 15:37 TA00003000000001.ctb
-rw-rw---- 1 mysql mysql       128 Feb 28 15:38 TA00003.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00003DPN.ctb
-rw-rw---- 1 mysql mysql 284627938 Feb 28 15:40 TA00004000000000.ctb
-rw-rw---- 1 mysql mysql 284467894 Feb 28 15:38 TA00004000000001.ctb
-rw-rw---- 1 mysql mysql       123 Feb 28 15:38 TA00004.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00004DPN.ctb
-rw-rw---- 1 mysql mysql 287195569 Feb 28 15:40 TA00005000000000.ctb
-rw-rw---- 1 mysql mysql 287025335 Feb 28 15:38 TA00005000000001.ctb
-rw-rw---- 1 mysql mysql       128 Feb 28 15:38 TA00005.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00005DPN.ctb
-rw-rw---- 1 mysql mysql       128 Feb 28 15:38 TA00006.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00006DPN.ctb
-rw-rw---- 1 mysql mysql 177671279 Feb 28 15:40 TA00007000000000.ctb
-rw-rw---- 1 mysql mysql 177552551 Feb 28 15:38 TA00007000000001.ctb
-rw-rw---- 1 mysql mysql       123 Feb 28 15:38 TA00007.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00007DPN.ctb
-rw-rw---- 1 mysql mysql       123 Feb 28 15:38 TA00008.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00008DPN.ctb
-rw-rw---- 1 mysql mysql  98190040 Feb 28 15:40 TA00009000000000.ctb
-rw-rw---- 1 mysql mysql  98133319 Feb 28 15:38 TA00009000000001.ctb
-rw-rw---- 1 mysql mysql       119 Feb 28 15:38 TA00009.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00009DPN.ctb
-rw-rw---- 1 mysql mysql   6643198 Feb 28 15:40 TA00010000000000.ctb
-rw-rw---- 1 mysql mysql   6636201 Feb 28 15:38 TA00010000000001.ctb
-rw-rw---- 1 mysql mysql       119 Feb 28 15:38 TA00010.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00010DPN.ctb
-rw-rw---- 1 mysql mysql  59381351 Feb 28 15:40 TA00011000000000.ctb
-rw-rw---- 1 mysql mysql  59326059 Feb 28 15:38 TA00011000000001.ctb
-rw-rw---- 1 mysql mysql       124 Feb 28 15:38 TA00011.ctb
-rw-rw---- 1 mysql mysql    249824 Feb 28 15:40 TA00011DPN.ctb
-rw-rw---- 1 mysql mysql        69 Feb 28 15:25 Table.ctb
-rw-rw---- 1 mysql mysql       131 Feb 28 15:40 TB00000.ctb
-rw-rw---- 1 mysql mysql       133 Feb 28 15:40 TB00001.ctb
-rw-rw---- 1 mysql mysql       128 Feb 28 15:40 TB00002.ctb
-rw-rw---- 1 mysql mysql       128 Feb 28 15:40 TB00003.ctb
-rw-rw---- 1 mysql mysql       123 Feb 28 15:40 TB00004.ctb
-rw-rw---- 1 mysql mysql       128 Feb 28 15:40 TB00005.ctb
-rw-rw---- 1 mysql mysql       128 Feb 28 15:40 TB00006.ctb
-rw-rw---- 1 mysql mysql       123 Feb 28 15:40 TB00007.ctb
-rw-rw---- 1 mysql mysql       123 Feb 28 15:40 TB00008.ctb
-rw-rw---- 1 mysql mysql       119 Feb 28 15:40 TB00009.ctb
-rw-rw---- 1 mysql mysql       119 Feb 28 15:40 TB00010.ctb
-rw-rw---- 1 mysql mysql       124 Feb 28 15:40 TB00011.ctb 
Profile
 
Posted: 01 March 2011 01:37 PM   Ignore ]   [ # 6 ]  
Newbie
Rank
Total Posts:  10
Joined  2011-02-24

It’s curious that you can see the files having the size about 500MB despite ClusterSize option being set to 200MB.

Profile
 
Posted: 01 March 2011 03:49 PM   Ignore ]   [ # 7 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  964
Joined  2008-08-18

Hi !

The space is not released because for each column up to 2 recently modified (by load or INSERT) files must be kept. Only older files can be removed. Unfortunately there are just 2 files per column created and they must stay (though I must recall why both files stay, not just the freshest one).

The problem is - as you have noticed - that ClusterSize setting is not effective. It looks like a bug.
BTW - Have you tried any newer release? I will check ClusterSize in 3.5.2

Profile
 
Posted: 02 March 2011 03:04 AM   Ignore ]   [ # 8 ]  
Newbie
Rank
Total Posts:  10
Joined  2011-02-24

Thanks for keeping the interest to this topic.
Answering to your last question: I forgot to mention that last test has been performed on IEE 3.5.2

Profile
 
Posted: 02 March 2011 05:41 AM   Ignore ]   [ # 9 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  964
Joined  2008-08-18

Hi !

Oh, it is so simple. But not obvious. ClusterSize parameter in brighthouse.ini is depricated. It is still accepted to let the server start, but ignored.

The real value can be set in .infobright (<ClusterSize> 2000 </ClusterSize> ), but this is not intended to be changed by users. So you may to play with it, but keep in mind that editing the contents of .infobright is not supported.

Profile
 
Posted: 02 March 2011 08:39 AM   Ignore ]   [ # 10 ]  
Newbie
Rank
Total Posts:  10
Joined  2011-02-24

Thank you Janusz for quick and helpful answer.
Actually I see I need to expand the question now, because the real task is more complex smile
The real problem that I’m working now on is to find the reliable approach of managing the space consumed by the database.

This means that when database size is close to critical value (say 500GB which is constraint of physical storage) we would like to have a job that purges old data from large tables. Obviously we expect to release the storage space this way. Or at least to able to reuse the space.

What I see now is the following:
- I can manage the space consumed by the table files (via ClusterSize)
- when I delete the records from the table they are just marked as deleted, and space is not reused by further inserts
- even having the data distributed through the files I can’t predict when these files will me removed because they contain data from one column

In other words I can remove the records from the table, but their count doesn’t correlate with space marked for delete in physical files.
Currently I’m looking for the answer to such question:
- which records and how many of them I need to delete to release X GB of disk space?

Here I come up with two questions to you:
Q1. Do you know the way/approach of predictive releasing of the table space?
Q2. Is there any kind of maintenance utilities for BH engine to shrink the unused space?

Profile
 
Posted: 02 March 2011 09:00 AM   Ignore ]   [ # 11 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  964
Joined  2008-08-18

Hi !

The most reliable space management is user-level performed with explicit data partitioning. Eg. a table holds data from one day, another table from the next day etc. Then, the queries take a form of multiple queries combined with UNION ALL. When old data are not necessary any more the tables containing them are dropped. This approach requires some design and implementation effort of course.

Otherwise ,the physical space is recovered when all data in a file have been deleted. If you load fresh data and you delete old data (so called rolling delete) then naturally - as data are loaded and stored in consecutive files - some files (oldest ones) will get deleted. The problem is that if you have ClusterSize 2GB and e.g. 5 tables with 10 columns each, you can assume that 2 files per column will persists = 5x10x2x2GB = 200GB will be always occupied. Fortunately usually there are only 1-2 large tables and many small ones, so in practice less space will be occupied.

Profile
 
Posted: 02 March 2011 10:16 AM   Ignore ]   [ # 12 ]  
Newbie
Rank
Total Posts:  10
Joined  2011-02-24

Thanks for sharing these thoughts.
Actually we already decided to split our large tables into partitions - so I agree with the approach you suggested.
And the first driver to this was have the way to release the space predictively.
However I don’t agree that traditional partitioning techniques can be applied to BH tables.

For example, I created fact table with 5M records and named it F_5M.
The same records were also loaded into 5 partitions F_1M_0, F_1M_1, F_1M_2, F_1M_3, F_1M_4.
Each partition contains 1M of records.

Now compare the performance results of these queries.

This one works 0.000 sec:

select count(*) from dw.F_5M

And this gives the same result in 20 sec:

select count(*) from (
    
select from dw.F_1M_0
    union all
    select 
from dw.F_1M_1
    union all
    select 
from dw.F_1M_2
    union all
    select 
from dw.F_1M_3
    union all
    select 
from dw.F_1M_4
parts 

Sure, I can optimize this query:

select sum(cntfrom (
    
select count(*) as cnt from dw.F_1M_0
    union all
    select count
(*) from dw.F_1M_1
    union all
    select count
(*) from dw.F_1M_2
    union all
    select count
(*) from dw.F_1M_3
    union all
    select count
(*) from dw.F_1M_4
parts 

But you must admit it is not true partitioning when you place query logic into each partition.

Profile
 
Posted: 02 March 2011 11:03 AM   Ignore ]   [ # 13 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  964
Joined  2008-08-18

Yuriy,
As I wrote this type of explicit partitioning requires design and implementation effort. Indeed it is not true partitioning. While you can view it as a workaround, this approach has been successfully used in a couple of cases already.

We are aware of these kind of problems you are experiencing and partitioning capabilities are on our roadmap. The target is to be able to define a partitioning scheme transparent for user level queries, but supporting dropping chosen partitions. Unfortunately I cannot tell when it will get released…

Profile
 
Posted: 02 March 2011 12:01 PM   Ignore ]   [ # 14 ]  
Newbie
Rank
Total Posts:  10
Joined  2011-02-24

Janusz, thank you for your answers to this thread. They were very sincere and informative.
I’ll create separate thread for my further questions about optimizing the queries.

Profile
 
Posted: 02 March 2011 06:59 PM   Ignore ]   [ # 15 ]  
Newbie
Rank
Total Posts:  1
Joined  2011-03-02
Janusz Borkowski - 02 March 2011 09:00 AM

The problem is that if you have ClusterSize 2GB and e.g. 5 tables with 10 columns each, you can assume that 2 files per column will persists = 5x10x2x2GB = 200GB will be always occupied.

If I have a CDR table with 42 columns in it, and I manually partition it into 12 monthly tables, load the tables, and later delete all the records, I will still have a disk footprint of 12 tables X 42 columns/table X 2 files/column X 2 GB/file = 2016 GB, or more than 2 TB, for my empty table?

That is definitely unique, but not clever.  In fact, our sysadmin says it can’t be right.  Is it?

Profile
 
   
1 of 2
1