Blog

dsq: Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.

Published on by

dsq is a commandline tool for running SQL queries against all the data file types that DataStation supports (JSON, CSV, Parquet, Excel, etc.). Here is a simple example:

$ cat users.csv
name,age
Jorge,30
Nina,32
Cal,28
Mona,27
$ dsq users.csv "SELECT * FROM {} ORDER BY CAST(age AS INT) DESC" | jq
[
  {
    "age": "32",
    "name": "Nina"
  },
  {
    "age": "30",
    "name": "Jorge"
  },
  {
    "age": "28",
    "name": "Cal"
  },
  {
    "age": "27",
    "name": "Mona"
  }
]

dsq uses DataStation libraries under-the-hood and DataStation in turn uses SQLite for these kinds of queries.

The dsq source code is available on Github.

Install

dsq originated as a subproject in the DataStation repo but now exists on its own. You can install it with Go 1.17+:

$ go install github.com/multiprocessio/dsq@latest

Multiple files and joins

If you pass multiple files to dsq, they become available under the alias {N} where N is the 0-based index of the list of files passed on the commandline. {} is an alias for {0}. For example:

$ cat ids.json
[
  {"id": 1, "name": "Jorge"},
  {"id": 2, "name": "Mona"},
  {"id": 3, "name": "Cal"},
  {"id": 4, "name": "Nina"}
]
$ dsq users.csv ids.json "SELECT id, {0}.name FROM {0} JOIN {1} ON {0}.name = {1}.name" | jq
[
  {
    "id": 1,
    "name": "Jorge"
  },
  {
    "id": 4,
    "name": "Nina"
  },
  {
    "id": 3,
    "name": "Cal"
  },
  {
    "id": 2,
    "name": "Mona"
  }
]

File type support

As of today dsq supports: JSON, CSV, TSV, Excel, Parquet, OpenOffice Sheets, and Apache2 and nginx logs. This list will continue to grow over time.

The complete list of file extensions and mimetypes is in the README here.

Comparison to other tools

There are a number of existing tools like q, textql and octosql. These are mature tools and there wouldn't be a need for dsq. But DataStation already supports the core logic so exposing this CLI is very simple. Moreover DataStation supports more file types than these existing tools.

A detailed comparison table is available in the repo README here.

Try it out!

Give it a go, share any bugs or feedback! The code is available on Github.

Share

With questions, criticism or ideas, email or Tweet me.

Also, check out DataStation and dsq.