A union of curiosity and data science

Collation is used by SQL Server to dictate how language is interpreted.

Collation settings control the following:
Case sensitivity ('A'='a'? or 'A' = 'A'?)
Accent sensitivity ('AIMÉE' = 'AIMEE'?)
Kana sensitivity
Width sensitivity

Example :

Latin1_General_CI_AI = Latin code page, Case Insensitive, Accent Insensitive

Collation can be set at the server, database, and column level.

Collation can also be explicitly forced in joins and sorting.

example:

select first_name from [users]

order by first_name COLLATE SQL_Latin1_General_CP1_CS_AS

Let's go through some collation examples below.

No collation, No Unicode Datatype

--No Collation or Unicode Datatype.

create table #no_collo (value varchar(64))

insert #no_collo (value) select N'浚'

select * from #no_collo

Result:

SQL has no idea what code page to associate your inserted data to and questions you.

No Collation, Includes Unicode Datatype

--No collation specified with unicode datatype
create table #unicode (value Nvarchar(64))
insert #unicode (value) select N'浚'
select * from #unicode

Result:

The unicode data type is a double byte standard which SQL knows how to display.

No Unicode Datatype, Includes Column Level Collation

-- No unicode datatype with collation used
-- add the collation to the column definition
create table #collation (value varchar(64) COLLATE Chinese_Taiwan_Stroke_BIN)
insert #collation (value)
select N'浚'
select * from #collation

Result:

SQL has an idea of what code page to associate your inserted data to and no longer questions you.

Joining Unicode and Collocated Column

---Selecting data from two tables created earlier...
select * from #unicode u
join #collation c on c.value = u.value

Result:

Msg 468, Level 16, State 9, Line 2

Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS"
and "Chinese_Taiwan_Stroke_BIN" in the equal to operation.

Select with Collation in statement fixes the above error

---Selecting data based on collation.
select * from #unicode u
join #collation c on c.value = u.value COLLATE Chinese_Taiwan_Stroke_BIN

Result:

You would assume joining a unicode datatype against a specific collation would work,but it doesn't.

You'll need to specify the collation used in the join because the collation for the two sets of data are different.

Collation Basics 浚

Prometheus Kubernetes Setup - Error : lens metrics are not available due to missing or invalid prometheus configuration.

Ubuntu Shell - get your IP Address , Subnet Mask and Gateway

Raspberry Pi - Pyspark Hadoop Cluster - 6 Nodes

Presto Cheat Sheet

Network Settings in Ubuntu 19.10

Python Environments Setup.

Setup and Install Apache Airflow on a Ubuntu 18 GCP (Google Cloud) VM

Magento 2 API Product Get

Google Translate API - Wav file speech recognition with python3

Connect with SSH to MySQL server with Python