Querying AWS Athena with DataStation
Requires DataStation 0.8.0+.
Database initialization [Optional]
If you want to follow along with this tutorial verbatim, create two S3 buckets. One is for storing data we will query. The other is for storing the query result. Athena requires you to give it a bucket for storing results.
My two buckets are called s3://datastation-tests
and
s3://datastation-test-results
respectively. You'll have to pick your
own names.
Data
Create a csv and upload it to the bucket for storing data we will query:
$ cat users.csv
name,age
Garry,43
Mina,39
Karl,50
Nile,41
Emma,52
$ aws s3 cp users.csv s3://datastation-tests
Swap out datastation-tests
with the name of your bucket.
Athena definition
Now go to Athena and create a new table and database.
The create table statement for this file is:
CREATE EXTERNAL TABLE `basic_users`(
`name` string,
`age` int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://datastation-tests'
TBLPROPERTIES (
'has_encrypted_data'='false',
'transient_lastDdlTime'='1646067804')
API access
Finally, make sure you have an AWS_ACCESS_KEY_ID
and
AWS_SECRET_ACCESS_KEY
with all relevant permissions.
If you are creating an account solely for read-only access to Athena, you can copy these permissions and modify them for own buckets:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"s3:GetObject",
"athena:GetWorkGroup",
"athena:StartQueryExecution",
"glue:GetTables",
"athena:StopQueryExecution",
"athena:GetQueryExecution",
"athena:GetQueryResults",
"s3:ListBucket",
"glue:GetTable"
],
"Resource": [
"arn:aws:glue:us-east-1:693079658475:table/testdata/*",
"arn:aws:glue:us-east-1:693079658475:database/testdata",
"arn:aws:glue:us-east-1:693079658475:catalog",
"arn:aws:s3:::datastation-tests",
"arn:aws:s3:::datastation-test-results",
"arn:aws:s3:::datastation-test-results/*",
"arn:aws:s3:::datastation-tests/*",
"arn:aws:athena:*:693079658475:workgroup/primary"
]
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucketMultipartUploads",
"s3:AbortMultipartUpload",
"s3:ListBucketVersions",
"s3:CreateBucket",
"s3:ListBucket",
"s3:DeleteObject",
"s3:GetBucketLocation",
"s3:ListMultipartUploadParts"
],
"Resource": [
"arn:aws:s3:::datastation-test-results",
"arn:aws:s3:::datastation-test-results/*"
]
}
]
}
Data source setup
Now in DataStation, create a new data source in the left sidebar.
Give it a nice name so you easily can find it later. And select Athena in the Vendor dropdown.
Fill in database as testdata
, output bucket as the bucket you
created for query results, and region, access key id and secret access
key according to your region and user.
Panel setup
Now create a new panel and select the Database type.
Run a query
Enter the following query and hit play!
SELECT * FROM basic_users;
Graph the results
There is a graph button below the query panel, beside the New Panel button. Click it to generate a graph panel pointed at the query panel.
About this page
See an error or want to add a clarification? This page is
generated from this file on Github.
Last edited Mar 21, 2022.