I have tried ICE 3.1 for Win32, Chinese text encoded with Windows(China locale) default code page can be loaded and exported correctlly, but the mysql client tool and Toad for MySQL freeware can’t display the loaded Chinese words.
Currently you can load any character set to VAR/CHAR columns, but ICE does not recognize character sets and treats them as sequence of 1-byte values. Therefore
- sorting/comparisons do not not handle collation,
- a 3-byte character is treated as 3 1-byte characters
- character set information is not passed consistently from engine to clients, so necessary character conversions may be omitted. Example is given in http://www.infobright.org/index.php/Forums/viewthread/551/ - results of MIN() aggregation
However, if you setup the environment correctly, selecting a UTF-8 column from a table should succeed - bytes sequences will be retrieved, they will be passed to mysql, which should take care for converting them into clients local character set. Example in http://www.infobright.org/index.php/Forums/viewthread/551/ is select distint [...]
One important clarification on UTF-8 in the current version: to store it properly, columns must be declared with an appropriate width (in bytes). I.e. for safe storing of 20-characters UTF-8 values, one should use VARCHAR(80), as every character may be 4 bytes wide.
Hi, it’s June now, can UTF-8 support come out at the end of this month as planed?
Hi Amber,
Regrettably, UTF-8 has been moved into Q3 in favour of enhanced SQL function support for complex expressions in the upcoming release in July. As soon as I have a firm new date for UTF-8, I will let you know.
By any chance, is this enhanced support going to include optimized grouping by the available aggregates? I really hope so, that would be -excellent-. That has really been my biggest stumbling block so far, because I pretty much refuse to enable the MySQL fallback, and putting all of this stuff into subqueries takes a lot of extra work and often results in rather poor performance.
Supported Character Sets
Infobright storage supports all ANSI character sets. This means that Infobright can store and
retrieve data encoded in any 8-bit character set. Infobright does not support any other
character sets, such as the Unicode, UTF-8 or Latin1 (cp1252) character sets.