informix-community

Open full view…

Character length semantics in Informix IDS

sebflaesch
Mon, 11 Mar 2019 15:41:49 GMT

Hi all, Informix IDS supports the UTF-8 encoding. However, the database uses byte length semantics. For example, the length() function and column subscript operator col[s,e] works in byte unit: SELECT length("forêt") in UTF-8 returns 6 (bytes), if col contains "forêt", a SELECT col[4,4] returns a whitespace; you have to specify SELECT col[4,5] to get the "ê" ... etc. We know there is a server configuration parameter (SQL_LOGICAL_CHAR) that applies a ratio to CHAR/VARCHAR sizes when defining table columns. For example, when you create a VARCHAR(10) it becomes a VARCHAR(30) when the ratio is 3. This is far from what other database brands are able to do regarding char length semantics and Unicode support... Is there a plan to have a better (complete) support for char length semantics in a future version? Best regards, Seb

I am Pradeep
Mon, 11 Mar 2019 18:01:28 GMT

The short answer is, "yes". We do have it in the plan to introduce character based length rather than byte lengths. However, this does require a lot of redesign and change in internal architecture. Moreover, we need to keep in mind how to handle backward compatibility. Stay tuned for future announcements and our roadmap discussions at various worldwide events.

sebflaesch
Tue, 12 Mar 2019 08:31:51 GMT

That's good news! And I can imagine char length semantics is not easy to implement. Seb