dsq: Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.
dsq is a commandline tool for running SQL queries against all the data file types that DataStation supports (JSON, CSV, Parquet, Excel, etc.). Here is a simple example:
$ cat users.csv
name,age
Jorge,30
Nina,32
Cal,28
Mona,27
$ dsq users.csv "SELECT * FROM {} ORDER BY CAST(age AS INT) DESC" | jq
[
{
"age": "32",
"name": "Nina"
},
{
"age": "30",
"name": "Jorge"
},
{
"age": "28",
"name": "Cal"
},
{
"age": "27",
"name": "Mona"
}
]
dsq uses DataStation libraries under-the-hood and DataStation in turn uses SQLite for these kinds of queries.
The dsq source code is available on Github.
Install
dsq originated as a subproject in the DataStation repo but now exists on its own. You can install it with Go 1.17+:
$ go install github.com/multiprocessio/dsq@latest
Multiple files and joins
If you pass multiple files to dsq, they become available under the
alias {N}
where N
is the 0-based index of
the list of files passed on the commandline. {}
is an
alias for {0}
. For example:
$ cat ids.json
[
{"id": 1, "name": "Jorge"},
{"id": 2, "name": "Mona"},
{"id": 3, "name": "Cal"},
{"id": 4, "name": "Nina"}
]
$ dsq users.csv ids.json "SELECT id, {0}.name FROM {0} JOIN {1} ON {0}.name = {1}.name" | jq
[
{
"id": 1,
"name": "Jorge"
},
{
"id": 4,
"name": "Nina"
},
{
"id": 3,
"name": "Cal"
},
{
"id": 2,
"name": "Mona"
}
]
File type support
As of today dsq supports: JSON, CSV, TSV, Excel, Parquet, OpenOffice Sheets, and Apache2 and nginx logs. This list will continue to grow over time.
The complete list of file extensions and mimetypes is in the README here.
Comparison to other tools
There are a number of existing tools like q, textql and octosql. These are mature tools and there wouldn't be a need for dsq. But DataStation already supports the core logic so exposing this CLI is very simple. Moreover DataStation supports more file types than these existing tools.
A detailed comparison table is available in the repo README here.
Try it out!
Give it a go, share any bugs or feedback! The code is available on Github.
Share
dsq now supports joining against multiple files (of any supported type!). It also now supports .tsv and .ods (OpenOffice Sheets) files.
— Multiprocess Labs (@multiprocessio) January 11, 2022
And it has a brand new repo so that it's easier to find. Read more in the blog post here:https://t.co/MVjuWlAMwy pic.twitter.com/FOXTx5BfrL
With questions, criticism or ideas, email or Tweet me.
Also, check out DataStation and dsq.