July 22, 2022: Help us out by filling out a short user survey!
Blog

SQLite has pretty limited builtin functions

Published on by

SQLite is the core of DataStation and dsq.

So a while ago I tried to use DataStation/dsq to analyze some benchmarks I had done that generated CSVs. But I quickly realized how limited SQLite's standard library is.

SQLite only comes with these few core functions. A few JSON functions have recently become builtin too.

Aggregate functions

I wanted to get average and standard deviation values from my benchmark result CSVs.

But there are only 7 builtin aggregate functions: avg, count, group_concat, min, total, max, and sum.

There's no builtin standard deviation function. I'd need to write an extension.

In contrast, DuckDB (an analytics database otherwise very similar to SQLite) comes with a suite of statistics aggregation functions. This isn't just an "OLAP" thing. PostgreSQL has a wealth of statistical aggregation functions. MySQL, Oracle, and SQL Server too.

If I had to choose between SQLite and DuckDB for ad-hoc analysis of datasets on the command line I'd probably pick DuckDB for its more convenient standard library.

However, SQLite is still the most convenient database to embed and for that reason DataStation/dsq use SQLite. What can be done?

A new standard library

So to bridge the gap I published a standard library for SQLite, called go-sqlite3-stdlib. The standard library is written in Go and mostly wraps existing statistical (and other) libraries, providing a useful set of functions for Go users embedding SQLite.

Why not sqlean you may ask? It's a great project. But setting up extensions like this in Go bindings to C projects is inconvenient. And sqlean is written in C. So if it's ever missing a function I'd have to write it in C. I don't really want to write anything in C if I can avoid it. So go-sqlite3-stdlib is written in Go and has 95% test coverage.

It adds statistical aggregation functions like discrete and continuous percentiles, standard deviation, and so on.

Aggregation functions

Since DataStation/dsq now registers go-sqlite3-stdlib, you can do much more statistical analysis in DataStation/dsq.

Furthermore, not just DataStation/dsq but anyone who uses SQLite in Go can register this standard library. The registration functions are specific to mattn/go-sqlite3 today but could adapters could be made for other Go SQLite bindings in the future.

Beyond aggregation

This standard library I published doesn't just include additional aggregation functions.

Date parsing

The library includes best-effort date parsing/retrieval functions.

Date parsing functions

URL parsing

It includes URL parsing/retrieval functions.

URL parsing functions

String functions

It includes string functions.

String parsing functions

Hashing/encoding functions

It includes hashing and encoding/decoding functions.

String parsing functions

Math functions

And it includes math functions that are largely ports of SQLite's "builtin" math functions. The thing is that these functions are disabled by default. Easier to just provide them in Go in this library.

Math functions

Conclusion

With this standard library, SQL exploration of ad-hoc datasets in DataStation and dsq are significantly more convenient.

Check out the library README for full docs.

SQLite is amazing software. But it's probably worth adding this limited list of builtin functions to your "things to consider" when deciding whether SQLite is right for you. Its much smaller than I expected, personally. So you may need to do some work to fill in the gaps (if you aren't embedding SQLite in Go).

Share

With questions, criticism or ideas, email or Tweet me.

Also, check out DataStation and dsq.