A union of curiosity and data science

Knowledgebase and brain dump of a database engineer


Collation Basics 浚

 
 
 

Collation is used by SQL Server to dictate how language is interpreted.

 
 

Collation settings control the following:
Case sensitivity ('A'='a'? or 'A' = 'A'?)
Accent sensitivity
('AIMÉE' = 'AIMEE'?)
Kana sensitivity
Width sensitivity

 

Example :

 

Latin1_General_CI_AI = Latin code page, Case Insensitive, Accent Insensitive

 

Collation can be set at the server, database, and column level.

Collation can also be explicitly forced in joins and sorting.

 
 
 

example:

      select first_name from [users]

      order by first_name COLLATE SQL_Latin1_General_CP1_CS_AS

 
 
 
 
 Let's go through some collation examples below.
 
 
No collation, No Unicode Datatype

--No Collation or Unicode Datatype.

create table #no_collo (value varchar(64))

insert #no_collo (value) select N''

select * from #no_collo

 Result:
 
 SQL has no idea what code page to associate your inserted data to and questions you.
 
 
 
 
No Collation, Includes Unicode Datatype

--No collation specified with unicode datatype
create
table #unicode (value Nvarchar(64))
      insert #unicode (value) select N'
'
select
* from #unicode

 
 Result:
 
 The unicode data type is a double byte standard which SQL knows how to display.
 
 
 
 
 
No Unicode Datatype, Includes Column Level Collation

-- No unicode datatype with collation used
-- add the collation to the column definition

create
table #collation (value varchar(64) COLLATE Chinese_Taiwan_Stroke_BIN)
      insert #collation (value)
      select N'
'
select
* from #collation

  Result:
 
  SQL has an idea of what code page to associate your inserted data to and no longer questions you.
 
 
 
 
 
Joining Unicode and Collocated Column 

---Selecting data from two tables created earlier...
select
* from #unicode u
join
 #collation c on c.value = u.value

 
  Result:    
   Msg 468, Level 16, State 9, Line 2

      Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS"
      and "Chinese_Taiwan_Stroke_BIN" in the equal to operation.

 
 
 
 
 
Select with Collation in statement fixes the above error 

---Selecting data based on collation.
select
* from #unicode u
join
 #collation c on c.value = u.value COLLATE Chinese_Taiwan_Stroke_BIN

 
 Result:
 
 You would assume joining a unicode datatype against a specific collation would work,but it doesn't.
You'll need to specify the collation used in the join because the collation for the two sets of data are different.