Hi,
There are several tricks to take a look inside KNs. A roughstats command may be used as an alias of the first column:
SELECT COUNT(*) AS roughstats FROM t1;
The query itself works as usual, but on the server console (or *.err log file) the statistics of the first used table will be displayed in the following form:
-------- RSIndices for trans (tab.no. 371, 17 packs) ----------
Name Triv.packs Span Hist.dens.
----------------------------------------------------------------
trans_id 0 99.99% 73.31%
account_id 0 99.94% 69.78%
tdate 0 5.88% 14.19%
(...)
bank 0 94.11% 100%
account 0
-------------------------------------------------------------------
RSI_PP: col. account_id joined with tab. 365, col. 0 dens. 100%
RSI_PP: col. account_id joined with tab. 369, col. 2 dens. 100%
RSI_PP: col. tdate self-joined with col. tdate dens. 100%
RSI_PP: col. amount joined with tab. 373, col. 2 dens. 100%
-------------------------------------------------------------------
Note that to see these statistics, “ControlMessages = 2” must be set in brighthouse.ini. The following information is displayed:
* the name of the table and its number (used for identifying KN files),
* the number of packs,
* each line describes one column:
—its name
—number of trivial packs, i.e. containing nulls only, or the same numerical value for each row,
—average span, i.e. “width” of the numerical pack (max. value minus min. value), comparing to the overall span of values in this column,
—histogram/CMAP density (if one exists), i.e. percentage of 1’s in histograms.
* The list of all Pack2Pack nodes, together with their density (% of ones), for any column from the table, joining with any other table/column. Note that there is a column name displayed only for self-joins (the table to itself).
Statistics of histograms and span are calculated for nontrivial packs only.
In general, small % values are better. It may be considered as a probability that a random value will not be excluded by min/max or other KN (histogram/CMAP).
The example above means that all 17 packs are nontrivial and each pack is included in an interval covering (in average) 5.8% of the whole domain of tdate and each such interval is additionally scattered such that only 14% of its values are actually present. Thus, if we have random condition “tdate = x”, where x is somewhere between minimal and maximal value of tdate, then we have 100% - (5.88% * 14.19%) = 99.1% chance that this condition will be resolved by either DPN or KN, without touching the data (for a particular pack). I.e. probably just one pack will be opened by such query.
Another method to display statistics is roughattrstats alias:
SELECT col AS roughattrstats FROM tab LIMIT 1;
It displays information about the selected column, separately for each data pack:
-- examples from TPC-H 1 GB database
select l_orderkey as roughattrstats from lineitem limit 1;
Column 0, table 71
Pack Rows Nulls Min Max Hist.
-------------------------------------------------------------------------------
0 65536 0 1 65380 1024/1024
1 65536 0 65381 130530 1024/1024
...
90 65536 0 5896963 5962531 1024/1024
91 37439 0 5962531 6000000 1024/1024
-------------------------------------------------------------------------------
select p_container as roughattrstats from part limit 1;
Column 6, table 66
Pack Rows Nulls Min Max Hist.
-------------------------------------------------------------------------------
0 65536 0 0 39 n/a
1 65536 0 0 39 n/a
...
3 3392 0 0 39 n/a
-------------------------------------------------------------------------------
Displayed information (numerical column) involve column and table internal number, pack number, number of rows in the pack, number of null values in the pack, minimal and maximal value (internally encoded), and histogram density. The density is displayed as ‘x/y’, where x is a number of ones in histogram (the value may occur in data pack), and y is a total number of histogram positions. Default value 1024 means that the interval for the data pack is wider than 1024, so histogram is approximate. Value ‘n/a’ means that the histogram is not available for some reason.
Another example:
select vin as roughattrstats from fact_sales limit 1;
Column 0, table 140
Pack Rows Nulls Min Max CMap
-------------------------------------------------------------------------------
0 65536 0 0AAHL653 8ZZYU286 9-26-26-26-26-10-10...
1 65536 0 0AAFF860 8ZZUM680 9-26-26-26-26-10-10...
...
15258 51712 0 0AABB747 8ZZHU603 9-26-26-26-26-10-10...
-------------------------------------------------------------------------------
String columns are described by minimal and maximal value (8 first characters, which may define a common prefix), and CMap statistics. The latter is displayed as a series of numbers indicating a number of different characters on consecutive string positions (example above: only 9 character on the first position, probably digits ‘0’-‘8’). Up to 7 positions of CMap is displayed. If a common prefix is found, the CMap starts on the first character after prefix.
Regards,