Redrock Postgres Documentation
Home Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Collation

The collation feature allows specifying the sort order and character classification behavior of data per-column, or even per-operation. This alleviates the restriction that the LC_COLLATE and LC_CTYPE settings of a database cannot be changed after its creation.

Concepts

Conceptually, every expression of a collatable data type has a collation. (The built-in collatable data types are text, varchar, and char. User-defined base types can also be marked collatable, and of course a domain over a collatable data type is collatable.) If the expression is a column reference, the collation of the expression is the defined collation of the column. If the expression is a constant, the collation is the default collation of the data type of the constant. The collation of a more complex expression is derived from the collations of its inputs, as described below.

The collation of an expression can be the “default” collation, which means the locale settings defined for the database. It is also possible for an expression’s collation to be indeterminate. In such cases, ordering operations and other operations that need to know the collation will fail.

When the database system has to perform an ordering or a character classification, it uses the collation of the input expression. This happens, for example, with ORDER BY clauses and function or operator calls such as <. The collation to apply for an ORDER BY clause is simply the collation of the sort key. The collation to apply for a function or operator call is derived from the arguments, as described below. In addition to comparison operators, collations are taken into account by functions that convert between lower and upper case letters, such as lower, upper, and initcap; by pattern matching operators; and by to_char and related functions.

For a function or operator call, the collation that is derived by examining the argument collations is used at run time for performing the specified operation. If the result of the function or operator call is of a collatable data type, the collation is also used at parse time as the defined collation of the function or operator expression, in case there is a surrounding expression that requires knowledge of its collation.

The collation derivation of an expression can be implicit or explicit. This distinction affects how collations are combined when multiple different collations appear in an expression. An explicit collation derivation occurs when a COLLATE clause is used; all other collation derivations are implicit. When multiple collations need to be combined, for example in a function call, the following rules are used:

  1. If any input expression has an explicit collation derivation, then all explicitly derived collations among the input expressions must be the same, otherwise an error is raised. If any explicitly derived collation is present, that is the result of the collation combination.
  2. Otherwise, all input expressions must have the same implicit collation derivation or the default collation. If any non-default collation is present, that is the result of the collation combination. Otherwise, the result is the default collation.
  3. If there are conflicting non-default implicit collations among the input expressions, then the combination is deemed to have indeterminate collation. This is not an error condition unless the particular function being invoked requires knowledge of the collation it should apply. If it does, an error will be raised at run-time.

Supported Collations

The following table shows the collations available for use in PostgreSQL.

Collation Provider Description
default libc The default collation selects the LC_COLLATE and LC_CTYPE values specified at database creation time.
C libc A collations specify “traditional C” behavior, in which only the ASCII letters “A” through “Z” are treated as letters, and sorting is done strictly by character code byte values.
POSIX libc A collations specify “traditional C” behavior.
ucs_basic libc SQL standard collation, available for encoding UTF8. It is equivalent to C and sorts by Unicode code point.
zh_CN libc Chinese (Simplified, China)
zh_CN.utf8 libc Chinese (Simplified, China), available for encoding UTF8.
zh_CN.gb2312 libc Chinese (Simplified, China), available for encoding GB2312.
zh_HK libc Chinese (Traditional, Hong Kong SAR, China)
zh_HK.utf8 libc Chinese (Traditional, Hong Kong SAR, China), available for encoding UTF8.
zh_TW libc Chinese (Traditional, Taiwan)
zh_TW.utf8 libc Chinese (Traditional, Taiwan), available for encoding UTF8.
zh_TW.euctw libc Chinese (Traditional, Taiwan), available for encoding EUCTW.
zh_SG libc Chinese (Simplified, Singapore)
zh_SG.utf8 libc Chinese (Simplified, Singapore), available for encoding UTF8.
zh_SG.gb2312 libc Chinese (Simplified, Singapore), available for encoding GB2312.
zh-x-icu ICU Chinese. Sort by Chinese phonetic order
zh-Hans-x-icu ICU Chinese (Simplified). Sort by Chinese phonetic order
zh-Hans-CN-x-icu ICU Chinese (Simplified, China). Sort by Chinese phonetic order
zh-Hans-HK-x-icu ICU Chinese (Simplified, Hong Kong SAR, China). Sort by Chinese phonetic order
zh-Hans-MO-x-icu ICU Chinese (Simplified, Macau SAR, China). Sort by Chinese phonetic order
zh-Hans-SG-x-icu ICU Chinese (Simplified, Singapore). Sort by Chinese phonetic order
zh-Hant-x-icu ICU Chinese (Traditional). Sort by Chinese radical/stroke order
zh-Hant-HK-x-icu ICU Chinese (Traditional, Hong Kong SAR, China). Sort by Chinese radical/stroke order
zh-Hant-MO-x-icu ICU Chinese (Traditional, Macau SAR, China). Sort by Chinese radical/stroke order
zh-Hant-TW-x-icu ICU Chinese (Traditional, Taiwan). Sort by Chinese radical/stroke order