Rewriting PostgreSQL queries in ClickHouse
This is Part 4 of a guide on migrating from PostgreSQL to ClickHouse. This content can be considered introductory, with the aim of helping users deploy an initial functional system that adheres to ClickHouse best practices. It avoids complex topics and will not result in a fully optimized schema; rather, it provides a solid foundation for users to build a production system and base their learning.
The following provides example queries comparing PostgreSQL to ClickHouse. This list aims to demonstrate how to exploit ClickHouse features to significantly simplify queries. These queries will, in most cases, also execute faster in ClickHouse. The examples here use the full Stack Overflow dataset (up to April 2024) on equivalent resources in PostgreSQL and ClickHouse (8 cores, 32GiB RAM).
Counts here will slightly differ as the Postgres data only contains rows which satisfy the referential integrity of the foreign keys. ClickHouse imposes no such constraints and thus has the full dataset e.g. inc. anon users.
Users (with more than 10 questions) which receive the most views:
Which tags
receive the most views
:
Aggregate functions
Where possible, users should exploit ClickHouse aggregate functions. Below we show the use of the argMax function to compute the most viewed question of each year.
This is significantly simpler (and faster) than the equivalent Postgres query:
Conditionals and Arrays
Conditional and array functions make queries significantly simpler. The following query computes the tags (with more than 10000 occurrences) with the largest percentage increase from 2022 to 2023. Note how the following ClickHouse query is succinct thanks to the conditionals, array functions, and ability to reuse aliases in the HAVING and SELECT clauses.
This concludes our basic guide for users migrating from Postgres to ClickHouse. We recommend users migrating from Postgres read the guide for modeling data in ClickHouse to learn more about advanced ClickHouse features.