If you use psql’s \l command to list all the databases in your PostgreSQL cluster, you’ll notice a column in the output labelled “Collate”; on my system, that column has the value “en_US.UTF-8″. This means that when, for example, you sort strings, you’ll use the “English” sort order rather than the traditional “C” sort order. You might ask: what’s the difference?
In the “C” sort order, all capital letters come before all lower-case letters, whereas in en_US.UTF8, a comes before A which comes before b which comes before B, and so on. In other words, every collation can have its own rules for sorting strings, consistent with the way that the people who speak that language like to alphabetize things; or at least with the way that the people who wrote the locale definitions for that language think that they like to alphabetize things. Collations are OS-dependent: some operating systems don’t support them at all, while Windows has a completely different naming convention from every other operating system, and probably different behaviors as well.
Read more »
If you're new here, you may want to subscribe to our RSS feed or follow us on Twitter for product announcements, event information, and industry news.
The Perils of Collation-Aware Comparisons






