DataStation is an open-source data IDE for developers. It allows you to easily build graphs and tables with data pulled from SQL databases, logging databases, metrics databases, HTTP servers, and all kinds of text and binary files. Need to join or munge data? Write embedded scripts as needed in Python, JavaScript, Ruby, R, or Julia. All in one application.

DataStation 0.5.0 Release Notes

Published on

This release is very exciting for the massive performance improvements and improvements for handling large amounts of data; primarily through a port of key code from Node.js to Go. If you have tried out DataStation before and were frustrated at slow run times, consider trying it again with this release.

DataStation

Fast panel evaluation via Go port

The primary change in this release is that the panel evaluation code has been mostly ported to Go. This reduces the eliminates the 8-12 second overhead (especially on Windows) when running even simple program and literal panels. The panel evaluation overhead (initialization of libraries, etc.) is less than 1 second now.

Since existing integration tests applied to the Go code covers many cases, there is reason to believe that even this drastic and fast of a rewrite is generally ok. However, it is a rewrite so there's always a greater than usual risk of bugs. If you prefer stability, stick with 0.4.0 for now. If you want performance, try this release out.

Up to 15x faster remote file reads

Remote files are now compressed with gzip before copying/reading (if gzip is available on the remote machine). For a 100mb file this brought down total ingest time from 60 seconds to 4 seconds.

SQLite for SQL program panel engine

Before this release, DataStation used AlaSQL as the SQL engine for the SQL program panel. This implementation (and/or the DataStation wrapper code around it) could not handle ingesting more than 4MB of data. This case arises when you have a large file and you want to run SQL over it: SELECT COUNT(1) FROM DM_getPanel('some large panel').

But in this release the engine has been swiched to SQLite which has been tested to be able to handle ingesting at least 500MB of data. This is a breaking change since AlaSQL has some PostgreSQL features like the col::TEXT cast syntax that is not ANSI SQL.

Large file shape analysis

Many panels would crash after ingesting more than 4MB of data just while trying to get shape and preview information. This release introduces a partial JSON parser (dedicated blog post on this to come) that reads only a small amount of initial data (for example at most 100KB of data) and does shape and preview analysis based on that partial JSON data.

Standalone CLI for SQL queries on JSON, CSV, Parquet, etc.

The port to Go also made it easier to build a small CLI on top of existing DataStation behavior. You can install dsq on your laptop or server and use it to run SQL queries on every kind of file that DataStation supports. It does not require the full DataStation app to run.

For example if you have a CSV file with user information, you can run dsq users.csv "SELECT COUNT(DISTINCT name) FROM {}" to count all the distinct names in your users.csv file. Under the hood dsq uses SQLite for queries just like the rest of DataStation.

You can read more about dsq in its README.

Improvements to automated testing

This release adds additional automated integration testing for SQLite, SQL Server, and Oracle database panels; file panels over SSH; HTTP panels over SSH; and file panels reading log data (like Apache access logs). The integration tests for databases works by running entire databases (including Oracle and SQL Server) in containers in Github Actions so that real queries can be run against the real databases.

Additionally, this release improves on JavaScript (frontend and backend) line coverage at around 68% and Go line coverage at around 72%.

Install or upgrade

Get the 0.5.0 release now!

How you can help

If you are a developer or engineering manager, install DataStation and start using it at work! Report bugs and usability issues (there are surely many). Join the Discord and subscribe to updates.

If you are an investor, get in touch and subscribe to updates.

Share