Blog

SQLite in Go, with and without cgo

Published on by

Most people use the mattn/go-sqlite3 package to interact with SQLite in Go. This package uses cgo and bundles the SQLite C amalgamation files with the Go source.

But Go developers often prefer not to use cgo (see for example cgo is not go). I mention this because there happens to be an underrated translation of SQLite's C source code to Go. Since it is a translation of C to Go, you don't need to use cgo to call into it. Some developers find this idea compelling. And this Go translation is an impressive work in and of itself.

But for real-world usage there are at least two major concerns I had: compatibility and performance. According to their documentation they pass most or all SQLite3 tests, so it seems pretty compatible. But I didn't see anything on their site or documentation (which, to their detriment, is pretty scant) that talked about performance. So I took a look.

This post summarizes some basic ingestion and query benchmarks using mattn/go-sqlite3 and modernc.org/sqlite. tldr; the Go translation is between twice as slow and 10% slower for both small datasets and large, for INSERTs and SELECTs.

Benchmarks

There are two main benchmarks, one ingest benchmark and one query benchmark. They are both run 10 times and both run against a growing number of rows.

Ingest

This benchmark inserts 10_000, 479_827, and 4_798_270 rows 10 times each. There are ten columns and the contents are a mix of randomly generated strings and integers.

Query

This benchmark runs a single GROUP BY query: SELECT COUNT(1), age FROM people GROUP BY age ORDER BY COUNT(1) DESC. It runs 10 times against each of the sizes of rows ingested.

Code

Here is the mattn/go-sqlite3 version. And here is the SQLite Go translated version.

Machine Specs

I am running these benchmarks on a dedicated bare metal instance, OVH Rise-1.

Results

Below are the averages across all 10 runs for each query category, number of rows acting on, and the library (cgo or no cgo) used.

Category Average Time (Seconds) Standard Deviation # Rows Library
insert 0.0446901 0.000986 10000 mattn
insert 0.0954987 0.0018 10000 modernc
group_by 0.0028469 0.000132 10000 mattn
group_by 0.0045523 .000085 10000 modernc
insert 2.317502 0.017499 479827 mattn
insert 4.6066586 0.027994 479827 modernc
group_by 0.227473 0.005689 479827 mattn
group_by 0.2786216 0.00188 479827 modernc
insert 23.0529852 0.077669 4798270 mattn
insert 46.2240191 0.370401 4798270 modernc
group_by 3.5935438 0.052308 4798270 mattn
group_by 4.0170164 0.066636 4798270 modernc

Summary

modernc.org/sqlite is a very impressive project. But the Go translation is twice as slow in INSERTs. It does quite well with SELECTs for a translation; being between 10% slower and at worst twice as slow.

Based on these results, if your workload has solely small datasets (i.e. small business apps) the tradeoff allowing you to avoid cgo could be worth it. Otherwise if you care strongly about performance you'll be better off with the real SQLite and mattn/go-sqlite3.

Errata: bug in SELECT

Thanks benhoyt and oefrha for finding a bug in the SELECT query that was causing the mattn version not to finish executing the SELECT query.

Summary before errata correction

modernc.org/sqlite is a very impressive project. But it is at least twice as slow in every variation even with smaller datasets. And SELECT performance gets way worse as the dataset size increases. The Go translation also has a greater standard deviation which might be attributable to the garbage collector. My guess is that mattn/go-sqlite3 is so constant in SELECT performance even as the dataset increases because even 4 million rows is peanuts for SQLite. Or maybe my benchmark code is wrong!

Based on these results, if your workload has solely small datasets (i.e. small business apps) the tradeoff allowing you to avoid cgo could be worth it. Otherwise if you care strongly about performance you'll be better off with the real SQLite and mattn/go-sqlite3.

Share

With questions, criticism or ideas, email or Tweet me.

Also, check out DataStation and dsq.