sqlcheck
automatically detects common SQL anti-patterns. Such anti-patterns often slow down queries. Addressing them will, therefore, help accelerate queries.
sqlcheck
targets all major SQL dialects.
For development updates on sqlcheck
and general news on next generation database systems, follow me at @joy_arulraj.
Here's a short demonstration of how to use sqlcheck
on a SQL file named top_mutexes.sql
. It highlights the verbose (-v
) and color (-c
) modes of the tool, which are disabled by default.
Right now SQLCheck can detect the following anti-patterns:
- Multi-Valued Attribute
- Recursive Dependency
- Primary Key Does Not Exist
- Generic Primary Key
- Foreign Key Does Not Exist
- Entity-Attribute-Value Pattern
- Metadata Tribbles
- Imprecise Data Type
- Values In Definition
- Files Are Not SQL Data Types
- Too Many Indexes
- Index Attribute Order
- SELECT *
- NULL Usage
- NOT NULL Usage
- String Concatenation
- GROUP BY Usage
- ORDER BY RAND Usage
- Pattern Matching Usage
- Spaghetti Query Alert
- Reduce Number of JOINs
- Eliminate Unnecessary DISTINCT Conditions
- Implicit Column Usage
- HAVING Clause Usage
- Nested sub queries
- OR Usage
- UNION Usage
- DISTINCT & JOIN Usage
- Download the debian package from the latest release.
wget https://github.com/jarulraj/sqlcheck/releases/download/v1.2/sqlcheck-x86_64.deb
- Install it by running the following command.
dpkg -i sqlcheck-x86_64.deb
- Download the rpm package from the latest release.
wget https://github.com/jarulraj/sqlcheck/releases/download/v1.2/sqlcheck-x86_64.rpm
- Install it by running the following command.
yum --nogpgcheck localinstall sqlcheck-x86_64.rpm
- Download the dmg package from the latest release.
wget https://github.com/jarulraj/sqlcheck/releases/download/v1.2/sqlcheck-x86_64.dmg
-
Click the dmg to mount the package. This will mount it in the
Volumes
directory. -
Open the
Terminal
app. This page contains more details on finding the app. -
Copy over the SQLCheck binary file to desired local directory.
cp /Volumes/sqlcheck-x86_64/bin/sqlcheck /usr/local/bin/
Another alternative is to use Homebrew. Here's a tap written by @gwerbin :
brew tap gwerbin/tap
brew cask install gwerbin/tap/sqlcheck
- Download the exe file from the latest release.
wget https://github.com/jarulraj/sqlcheck/releases/download/v1.2/sqlcheck.exe
- Open a terminal (like
cmd.exe
) and run the executable.
- Download the zip package from the latest release.
wget https://github.com/jarulraj/sqlcheck/releases/download/v1.2/sqlcheck-x86_64.zip
- Unzip it and find the SQLCheck binary (
bin/sqlcheck
).
unzip sqlcheck-x86_64.zip
SQLCheck has the following software dependencies:
- g++ 4.9+
- cmake (Cmake installation guide)
First, clone the repository (with --recursive option).
git clone --recursive https://github.com/jarulraj/sqlcheck.git
Next, run the following commands to build and install SQLCheck:
./bootstrap
cd build
cmake -DCMAKE_BUILD_TYPE=RELEASE ..
make
make install
$ sqlcheck -h
Command line options : sqlcheck <options>
-f --file_name : file name
-r --risk_level : set of anti-patterns to check
: 1 (all anti-patterns, default)
: 2 (only medium and high risk anti-patterns)
: 3 (only high risk anti-patterns)
-c --color_mode : color mode
-v --verbose_mode : verbose mode
$ sqlcheck -f examples/top_mutexes.sql -v
-------------------------------------------------
> RISK LEVEL :: ALL ANTI-PATTERNS
> SQL FILE NAME :: examples/top_mutexes.sql
-------------------------------------------------
==================== Results ===================
-------------------------------------------------
SQL Statement: with top_mutexes as ( select--+ leading(t1 s1 v1 v2 t2 s2) use_hash(s1)
use_nl(v1) use_hash(s2) materialize t1.hsecs ,s1.* ,s2.sleeps as end_sleeps
,s2.wait_time as end_wait_time ,s2.sleeps-s1.sleeps as delta_sleeps ,t2.hsecs -
t1.hsecs as delta_hsecs --,s2.* from v$timer t1 ,v$mutex_sleep s1 ,(select/*+
no_merge */ sum(level) a from dual connect by level<=1e6) v1 ,v$timer t2
,v$mutex_sleep s2 where s1.mutex_type=s2.mutex_type and s1.location=s2.location
) select * from top_mutexes order by delta_sleeps desc;
[examples/top_mutexes.sql]: (HIGH RISK) (QUERY ANTI-PATTERN) SELECT *
● Inefficiency in moving data to the consumer:
When you SELECT *, you're often retrieving more columns from the database than
your application really needs to function. This causes more data to move from
the database server to the client, slowing access and increasing load on your
machines, as well as taking more time to travel across the network. This is
especially true when someone adds new columns to underlying tables that didn't
exist and weren't needed when the original consumers coded their data access.
● Indexing issues:
Consider a scenario where you want to tune a query to a high level of
performance. If you were to use *, and it returned more columns than you
actually needed, the server would often have to perform more expensive methods
to retrieve your data than it otherwise might. For example, you wouldn't be able
to create an index which simply covered the columns in your SELECT list, and
even if you did (including all columns [shudder]), the next developer who came around
and added a column to the underlying table would cause the optimizer to ignore
your optimized covering index, and you'd likely find that the performance of
your query would drop substantially for no readily apparent reason.
[Matching Expression: select *]
[examples/top_mutexes.sql]: (LOW RISK) (QUERY ANTI-PATTERN) Spaghetti Query Alert
● Split up a complex spaghetti query into several simpler queries:
SQL is a very expressive language—you can accomplish a lot in a single query
or statement. But that doesn't mean it's mandatory or even a good idea to
approach every task with the assumption it has to be done in one line of code.
One common unintended consequence of producing all your results in one query is
a Cartesian product. This happens when two of the tables in the query have no
condition restricting their relationship. Without such a restriction, the join
of two tables pairs each row in the first table to every row in the other table.
Each such pairing becomes a row of the result set, and you end up with many more
rows than you expect.
It's important to consider that these queries are simply
hard to write, hard to modify, and hard to debug. You should expect to get
regular requests for incremental enhancements to your database applications.
Managers want more complex reports and more fields in a user interface. If you
design intricate, monolithic SQL queries, it's more costly and time-consuming to
make enhancements to them. Your time is worth something, both to you and to your
project. Split up a complex spaghetti query into several simpler queries. When
you split up a complex SQL query, the result may be many similar queries,
perhaps varying slightly depending on data values. Writing these queries is a
chore, so it's a good application of SQL code generation. Although SQL makes it
seem possible to solve a complex problem in a single line of code, don't be
tempted to build a house of cards.
==================== Summary ===================
All Anti-Patterns :: 2
> High Risk :: 1
> Medium Risk :: 0
> Low Risk :: 1
(1) SQL Anti-patterns: Avoiding the Pitfalls of Database Programming, Bill Karwin
(2) Common SQL Anti-patterns, StackOverflow
Contributions to SQLCheck are always welcome. You can contribute in different ways:
- Open an issue with suggestions for improvements and errors you're facing;
- Fork this repository and submit a pull request;
- Improve the documentation.
Licensed under the Apache License.