Features and improvements
There is so much in this release it is difficult to summarize! Here are a few cool parts: you can now use DataStation like a rudimentary ELK or Splunk instance with its new log parsers. You can load larger data into DataStation due to a number of efficiency improvements. You can run SQL over much larger datasets due to a more efficient SQL implementation. And much more!
Read on for details and screenshots.
File, HTTP and Literal panels
New log parsers
You can now override the content type of content panels and select from a number of useful builtin parsers (in addition to Excel, CSV, JSON): JSON newline, Apache2 error and access logs, Nginx access logs, Syslog logs, or specify a custom regex to apply to each line.
Upshot? In addition to just being able to
import logs for joining with other datasets, you can now use
DataStation as a primitive alternative to a Splunk or ELK setup
for working with your logs.
Importing Parquet content
DataStation can now read Parquet files from File and HTTP panels. (Since it's a binary format it won't work in the Literal panel which is plain-text.)
Upshot? By importing a Parquet file or hitting an HTTP endpoint returning Parquet, you can use DataStation to run SQL queries against your Parquet data by adding a SQL Code panel reading from the File or HTTP panel.
Improvements to Excel importing
Excel files now only import the actual sheets rather than all metadata about the Excel file. Additionally, when there is only one sheet in the Excel file, the result of the panel is flattened down as if you imported a CSV of that single sheet (rather than the result being a dictionary mapping sheet name to CSV-like result). Finally, DataStation will automatically trim whitespace in Excel column headers.
Program and SQL panels
Cancel running processes
Before this release you could not cancel a running process. This
meant it was impossible to debug an infinite loop a Program
panel. In this release you can now hit Pause after running a
program to kill the running process.
SQLite
You can now query SQLite databases, locally or remote.
In-memory SQL
By switching from sql.js (SQLite compiled to WebAssembly) to alasql DataStation can now handle running over much larger results. WebAssembly in contrast seems to have very tight restrictions on how much memory it is able to use.
Additionally, the in-memory SQL panel has moved from the SQL panel into the Program panel since it's more like a script than a connection to a real database.
Upshot? You can now run SQL queries against at least 10s of megabytes of data.
CockroachDB
This release confirms support for querying CockroachDB databases by setting up a PostgreSQL connection pointed at your CockroachDB database.
More useful in-memory Python for online environment
Brython was the original in-memory Python implementation used in the online environment. It is small and efficient but wraps all native Python objects in a way that makes it very inconvenient to use. This release swaps out Brython for Pyodide to make the in-memory Python experience as easy as you'd expect.
General panel improvements
Panel success/failure
When a panel is running, its background will oscillate between a purple tinge and white. Additionally, "Success" or "Failure" is shown at the top of the panel near the play button in green or red.
Inferred shape of results
DataStation will now show you an inferred schema for the results after you run a panel. As it happens, the library powering this is also available on MultiProcess Github for standalone use in JavaScript/Node.js programs.
Upshot? In addition to making it easier to
explore your own data, DataStation uses the shape of panels to
make it easier to fill out other panels. For example, the Graph
panel can now default to loading a Y-axis field that is a number
and an X-axis field that is a string.
Additional results metadata
After a panel is run, DataStation will show you the assumed
content-type used while parsing the results and will also show
you the size in bytes, kilobytes, or megabytes of the result
stored on disk.
Misc improvements
Efficient results storage
Prior to this release, panel results were stored both on disk and in browser memory. In this release, results are no longer stored in browser memory. This keeps the UI fast as you load larger data into DataStation.
Run all panels on page
Each page now has a global run button that will trigger all panels to run in sequence top to bottom. This will only work if panels depend on panels above them.
Global undo
You can now use Ctrl- or Cmd- z to undo changes like deleting a page or panel.
Project improvements
You can now open multiple projects at the same time (in different windows). You can also create new projects if one already exists (a silly limitation before).
Test and release improvements
End-to-end tests
End-to-end tests are now run automatically in Github Actions for Windows, Linux and macOS on amd64/x86_64.
Linux binaries
Pre-built Linux binaries are now provided in the release artifacts page on Github!
Automated release builds
Before this release, release artifacts were hand-built on different laptops. Now they are built and uploaded by Github Actions for Linux, Windows, and macOS.
Share
This new release of DataStation is just jam-picked with great stuff: inferred shape of results, massive performance improvements for large datasets, support for @CockroachDB and #SQLite, support for parsing many different log formats, and more!https://t.co/1cTjmbkwIT pic.twitter.com/JLJky39JWy
— DataStation (@multiprocessio) July 29, 2021