Hi !
I just installed ICE and am doing some testing. My testing data is two sets of 25 files. Each file has +/- 1 million line. Each 25-file set in loaded in one table. So in the end, I have two tables, each with +/- 25 million rows. I almost have a 1-1 relationship between both tables but not quite. Otherwise, I would load all data in a single table.
I can load the data in two different ways. 1) I call the bulk loader 25 times for each table or 2) I combine all 25 files into 1 big file and call the bulk loader once for each table.
Here is the problem: query performance is dramatically worse in scenario 1 than in scenario 2. Why ?
Possible solutions would be : a) always “pool” data together before loading. However, this is not always possible. b) load data in chunks and then “optimize” the table. However, I could not find any optimizing tools for engine=brighthouse tables. c) another solution unknown to me as of yet.
I searched the different ICE forums, but could not find a definitive answer to this matter.
Any and all help highly appreciated !
Martin.
PS If relevant: testing data set is 25 million rows. Production data sets will be in billions of rows (notice “data sets” is plural).

