Hi again Jakub, thanks for the replies.
(I was writting down this post by the time of your last answer, and yes, I agree there is some mistery still to find out)
I’ve been researching a bit, and I cannot say I’m convinced of the order of the rows explaining the x5 compression.
I need some help trying to figure out what happened here. I will try to give all the info I’m collecting.
This is the table that I’m testing (I have to mention, that for the other 25 tables in the database, the same behaviour occurs…)
CREATE TABLE `passiveHsdpaRadioParameters_201005` (
`idPdpConnection` bigint(20) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT '1985-01-19 00:00:00',
`idUsedLocation` int(11) NOT NULL,
`cqi` tinyint(4) DEFAULT NULL,
`hsDschQam16Packets` smallint(6) DEFAULT NULL,
`hsDschQam64Packets` smallint(6) DEFAULT NULL,
`hsDschQpskPackets` smallint(6) DEFAULT NULL,
`hsDschSchedPackets` smallint(6) DEFAULT NULL,
`hsDschNackPackets` smallint(6) DEFAULT NULL,
`hsDschTraffic` int(11) DEFAULT NULL,
`hsDschUplinkTotal` mediumint(9) DEFAULT NULL,
`numberOfCodes` tinyint(4) DEFAULT NULL
) ENGINE=BRIGHTHOUSE DEFAULT CHARSET=utf8;
There is no relation between columns, I mean, if by any reason rows would be sorted by any column, that could improve compression by that column specifically, but not for the other colums… So I’m not sure column order could explain a x5 compression.
shell> du -h passiveHsdpaRadioParameters_201005.bht/
288M passiveHsdpaRadioParameters_201005.bht/
shell> zip -r passiveHsdpaOriginalTable.zip passiveHsdpaRadioParameters_201005.bht/
...
shell> ls -lh passiveHsdpaOriginalTable.zip
-rw-r--r-- 1 root root 257M 2010-08-04 12:09 passiveHsdpaOriginalTable.zip
I’ve zipped folder passiveHsdpaRadioParameters_201005.bht and a little more compression archieved, but nothing compared with export/import.
But if I do what I posted in fist place:
mysql> SET @bh_dataformat = 'txt_variable';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT count(1) FROM passiveHsdpaRadioParameters_201005;
+----------+
| count(1) |
+----------+
| 13798497 |
+----------+
1 row in set (0.00 sec)
mysql> CREATE TABLE passiveHsdpaRadioParameters_201005_reimported LIKE passiveHsdpaRadioParameters_201005;
Query OK, 0 rows affected (0.01 sec)
mysql> SELECT * FROM passiveHsdpaRadioParameters_201005 INTO OUTFILE '/tmp/passiveHsdpaRadioParameters_201005.csv';
Query OK, 13798497 rows affected (45.69 sec)
LOAD DATA INFILE '/tmp/passiveHsdpaRadioParameters_201005.csv' INTO TABLE passiveHsdpaRadioParameters_201005_reimported;
Query OK, 13798497 rows affected (49.00 sec)
Records: 13798497 Deleted: 0 Skipped: 0 Warnings: 0
And now let’s look at .bht folder disk size.
shell> du -h passiveHsdpaRadioParameters_201005_reimported.bht/
57M passiveHsdpaRadioParameters_201005_reimported.bht/
From 288M to 57M…
I’m going to test loading table with single LOAD DATA INFILE for each row vs LOAD DATA INFILE for all rows at a time, and see if compression is affected.
I will post the results shortly.
By the way, I could upload the zipped folders for both tables (original and reimported) if it could be helpfull to see what this is happening.