A union of curiosity and data science

Knowledgebase and brain dump of a database engineer

Collation Basics 浚


Collation is used by SQL Server to dictate how language is interpreted.


Collation settings control the following:
Case sensitivity ('A'='a'? or 'A' = 'A'?)
Accent sensitivity
('AIMÉE' = 'AIMEE'?)
Kana sensitivity
Width sensitivity


Example :


Latin1_General_CI_AI = Latin code page, Case Insensitive, Accent Insensitive


Collation can be set at the server, database, and column level.

Collation can also be explicitly forced in joins and sorting.



      select first_name from [users]

      order by first_name COLLATE SQL_Latin1_General_CP1_CS_AS

 Let's go through some collation examples below.
No collation, No Unicode Datatype

--No Collation or Unicode Datatype.

create table #no_collo (value varchar(64))

insert #no_collo (value) select N''

select * from #no_collo

 SQL has no idea what code page to associate your inserted data to and questions you.
No Collation, Includes Unicode Datatype

--No collation specified with unicode datatype
table #unicode (value Nvarchar(64))
      insert #unicode (value) select N'
* from #unicode

 The unicode data type is a double byte standard which SQL knows how to display.
No Unicode Datatype, Includes Column Level Collation

-- No unicode datatype with collation used
-- add the collation to the column definition

table #collation (value varchar(64) COLLATE Chinese_Taiwan_Stroke_BIN)
      insert #collation (value)
      select N'
* from #collation

  SQL has an idea of what code page to associate your inserted data to and no longer questions you.
Joining Unicode and Collocated Column 

---Selecting data from two tables created earlier...
* from #unicode u
 #collation c on c.value = u.value

   Msg 468, Level 16, State 9, Line 2

      Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS"
      and "Chinese_Taiwan_Stroke_BIN" in the equal to operation.

Select with Collation in statement fixes the above error 

---Selecting data based on collation.
* from #unicode u
 #collation c on c.value = u.value COLLATE Chinese_Taiwan_Stroke_BIN

 You would assume joining a unicode datatype against a specific collation would work,but it doesn't.
You'll need to specify the collation used in the join because the collation for the two sets of data are different.

Add comment