0.0.4-alpha Release Notes

Features and improvements

There is so much in this release it is difficult to summarize! Here are a few cool parts: you can now use DataStation like a rudimentary ELK or Splunk instance with its new log parsers. You can load larger data into DataStation due to a number of efficiency improvements. You can run SQL over much larger datasets due to a more efficient SQL implementation. And much more!

Read on for details and screenshots.

File, HTTP and Literal panels

New log parsers

You can now override the content type of content panels and select from a number of useful builtin parsers (in addition to Excel, CSV, JSON): JSON newline, Apache2 error and access logs, Nginx access logs, Syslog logs, or specify a custom regex to apply to each line.

Upshot? In addition to just being able to import logs for joining with other datasets, you can now use DataStation as a primitive alternative to a Splunk or ELK setup for working with your logs. New log parsers

Importing Parquet content

DataStation can now read Parquet files from File and HTTP panels. (Since it's a binary format it won't work in the Literal panel which is plain-text.)

Upshot? By importing a Parquet file or hitting an HTTP endpoint returning Parquet, you can use DataStation to run SQL queries against your Parquet data by adding a SQL Code panel reading from the File or HTTP panel.

Improvements to Excel importing

Excel files now only import the actual sheets rather than all metadata about the Excel file. Additionally, when there is only one sheet in the Excel file, the result of the panel is flattened down as if you imported a CSV of that single sheet (rather than the result being a dictionary mapping sheet name to CSV-like result). Finally, DataStation will automatically trim whitespace in Excel column headers.

Program and SQL panels

Cancel running processes

Before this release you could not cancel a running process. This meant it was impossible to debug an infinite loop a Program panel. In this release you can now hit Pause after running a program to kill the running process. Cancel a running Python process

SQLite

You can now query SQLite databases, locally or remote.

In-memory SQL

By switching from sql.js (SQLite compiled to WebAssembly) to alasql DataStation can now handle running over much larger results. WebAssembly in contrast seems to have very tight restrictions on how much memory it is able to use.

Additionally, the in-memory SQL panel has moved from the SQL panel into the Program panel since it's more like a script than a connection to a real database.

Upshot? You can now run SQL queries against at least 10s of megabytes of data.

CockroachDB

This release confirms support for querying CockroachDB databases by setting up a PostgreSQL connection pointed at your CockroachDB database.

More useful in-memory Python for online environment

Brython was the original in-memory Python implementation used in the online environment. It is small and efficient but wraps all native Python objects in a way that makes it very inconvenient to use. This release swaps out Brython for Pyodide to make the in-memory Python experience as easy as you'd expect.

General panel improvements

Panel success/failure

When a panel is running, its background will oscillate between a purple tinge and white. Additionally, "Success" or "Failure" is shown at the top of the panel near the play button in green or red.

Inferred shape of results

DataStation will now show you an inferred schema for the results after you run a panel. As it happens, the library powering this is also available on MultiProcess Github for standalone use in JavaScript/Node.js programs.

Upshot? In addition to making it easier to explore your own data, DataStation uses the shape of panels to make it easier to fill out other panels. For example, the Graph panel can now default to loading a Y-axis field that is a number and an X-axis field that is a string. Inferred shape of ngixn logs

Additional results metadata

After a panel is run, DataStation will show you the assumed content-type used while parsing the results and will also show you the size in bytes, kilobytes, or megabytes of the result stored on disk. Size and assumed content-type of results

Misc improvements

Efficient results storage

Prior to this release, panel results were stored both on disk and in browser memory. In this release, results are no longer stored in browser memory. This keeps the UI fast as you load larger data into DataStation.

Run all panels on page

Each page now has a global run button that will trigger all panels to run in sequence top to bottom. This will only work if panels depend on panels above them.

Global undo

You can now use Ctrl- or Cmd- z to undo changes like deleting a page or panel.

Project improvements

You can now open multiple projects at the same time (in different windows). You can also create new projects if one already exists (a silly limitation before).

Test and release improvements

End-to-end tests

End-to-end tests are now run automatically in Github Actions for Windows, Linux and macOS on amd64/x86_64.

Linux binaries

Pre-built Linux binaries are now provided in the release artifacts page on Github!

Automated release builds

Before this release, release artifacts were hand-built on different laptops. Now they are built and uploaded by Github Actions for Linux, Windows, and macOS.

Get the 0.0.4-alpha now!

This new release of DataStation is just jam-picked with great stuff: inferred shape of results, massive performance improvements for large datasets, support for @CockroachDB and #SQLite, support for parsing many different log formats, and more!https://t.co/1cTjmbkwIT pic.twitter.com/JLJky39JWy
— DataStation (@multiprocessio) July 29, 2021