A union of curiosity and data science

Knowledgebase and brain dump of a database engineer


Clickhouse - Notes

ClickHouse Administration Guide: Essential Commands and Best Practices

ClickHouse is a powerful open-source columnar database management system designed for real-time analytics. This guide covers essential administrative tasks, best practices, and useful resources for managing ClickHouse deployments.

Table Engine Selection

Table engines are a fundamental concept in ClickHouse that determine how your data is stored and processed. Each engine type offers different capabilities and trade-offs. For a comprehensive guide on selecting the appropriate table engine for your use case, refer to the ClickHouse Table Engine Selection Guide.

Backup and Restore

When it comes to backing up ClickHouse data, S3 is the recommended approach for reliability and ease of use. Other methods can be challenging and less reliable.

Backup Command

BACKUP DATABASE `database_name` TO S3(
    'https://s3.<aws region>.amazonaws.com/<bucket location>/filename.bak',
    '<access key id>',
    '<secret access key>'
)

Restore Command

RESTORE DATABASE `database_name` AS `database_name` 
ON CLUSTER '<cluster name>'
FROM S3(
    'https://s3.<aws region>.amazonaws.com/<bucket location>/filename.bak',
    '<access key id>',
    '<secret access key>'
)

Essential System Queries

Cluster Information

Query cluster configuration and status:

SELECT * FROM system.clusters;

ZooKeeper/ClickHouse Keeper Status

Monitor distributed coordination service:

SELECT * FROM system.zookeeper 
WHERE path IN ('/', '/clickhouse');

Query Logging

Review query performance and debugging:

SELECT * FROM system.query_log LIMIT 10;

Cluster-Wide Queries

Execute queries across all nodes:

SELECT * FROM cluster('default', default.properties) 
WHERE address_full LIKE '%3303%';

Educational Resources

For deeper understanding of ClickHouse capabilities and best practices, the following video resources are recommended:

  • Tips and Tricks for ClickHouse
    Watch Video
    • Optimization techniques
    • Performance tuning
    • Best practices
  • Sharding and Replication
    Watch Video
    • Distributed architecture
    • Data redundancy
    • High availability setup
  • ClickHouse at Scale
    Watch Video
    • Large-scale deployments
    • Performance considerations
    • Architecture patterns
  • ClickHouse on Kubernetes
    Watch Video
    • Container orchestration
    • Deployment strategies
    • Resource management
  • ClickHouse Kubernetes Operator
    Watch Video
    • Automated operations
    • Configuration management
    • Lifecycle management

 

Table Engine Types: 

https://www.alibabacloud.com/blog/selecting-a-clickhouse-table-engine_597726

 

Backups and Restore- use S3, everything else is a nightmare. 

Backups BACKUP DATABASE `turfdox` TO S3('https://s3.<aws region>.amazonaws.com/<bucket location>/filename.bak', '<access key id>', '<secret access key'>)

Restore:  RESTORE DATABASE `turfdox` as `turfdox` on cluster '<cluster name>' FROM S3('https://s3.<aws region>.amazonaws.com/<bucket location>/filename.bak', '<access key id>', '<secret access key'>)

 

-- query the cluster info :
select * from `system`.clusters


-- query the zookeeper / clickhouse keeper info
select * from `system`.zookeeper
where path in ('/', '/clickhouse')


-- query log
select * From system.query_log limit 10

-- select from entire cluster
select * from cluster('default', default.properties)
where address_full like '%3303%'

Great Videos:  

Tips and Tricks: https://www.youtube.com/watch?v=FsVrFbcyb84&t=607s

Sharding and Replication: https://www.youtube.com/watch?v=Vuh6NOluIxo

Clickhouse at Scale: https://www.youtube.com/watch?v=vBjCJtw_Ei0

Clickhouse on Kubernetes: https://www.youtube.com/watch?v=H2iGgO9nG6I

Clickhouse Kubernets Operator: https://www.youtube.com/watch?v=Y6GEiDzjHE4