import altair as alt
import pandas_gbqAccessing Ethereum token transaction data with pandas-gbq
Accessing Ethereum token transaction data with pandas-gbq
One way to access blockchain bigquery data with Google BigQuery is to use the pandas-gbq library.
It makes it really easy to take a BigQuery SQL query and download its results as a pandas DataFrame.
Big Query authentication
In order to use this library, you need to authenticate a credential with BigQuery. I found creating a ‘Service account’ to be the most meaningful. You can follow the steps here.
Installation
Installation with conda is straightforward.
$ conda install pandas-gbq --channel conda-forge
Basic usage
We can download a table by defining an SQL query and passing it to the read_gbq method.
As Ethereum data is big, we constrain our query to a single hour.
sql = """
SELECT block_timestamp, value
FROM `bigquery-public-data.ethereum_blockchain.token_transfers`
WHERE
EXTRACT(YEAR FROM block_timestamp) = 2019 AND
EXTRACT(MONTH FROM block_timestamp) = 09 AND
EXTRACT(DAY FROM block_timestamp) = 24 AND
EXTRACT(HOUR FROM block_timestamp) = 12
"""token = pandas_gbq.read_gbq(sql, project_id="pandas-gbq-test-290508")Downloading: 100%|██████████| 15392/15392 [00:03<00:00, 4846.48rows/s]
token| block_timestamp | value | |
|---|---|---|
| 0 | 2019-09-24 12:47:43+00:00 | 50058584428 |
| 1 | 2019-09-24 12:47:43+00:00 | 30000000 |
| 2 | 2019-09-24 12:47:43+00:00 | 14069000000 |
| 3 | 2019-09-24 12:47:43+00:00 | 170000000 |
| 4 | 2019-09-24 12:47:43+00:00 | 59478199 |
| ... | ... | ... |
| 15387 | 2019-09-24 12:40:54+00:00 | 3577720000000000000000 |
| 15388 | 2019-09-24 12:40:54+00:00 | 2314760000000000000000 |
| 15389 | 2019-09-24 12:40:54+00:00 | 2399000000000000000000 |
| 15390 | 2019-09-24 12:40:54+00:00 | 5701000000000000000000 |
| 15391 | 2019-09-24 12:40:54+00:00 | 3608000000000000000000 |
15392 rows × 2 columns
Data transformation
After loading the dataset, it requires some transformations.
First, we want to sort it by timestamps.
token.dtypesblock_timestamp datetime64[ns, UTC]
value object
dtype: object
token = token.sort_values('block_timestamp')Then we want to convert the values column to float.
token['value'] = token['value'].astype(float)Visualization
As altair does not allow visualization with more than 5000 rows, we need to manually set it possible.
alt.data_transformers.disable_max_rows()DataTransformerRegistry.enable('default')
We plot the average transaction values by minute. As we there are a few number of very high value transactions, we use a log scale.
alt.Chart(token).mark_line().encode(
alt.X('utchoursminutes(block_timestamp):T'),
alt.Y('average(value):Q', scale=alt.Scale(type='log'))
).properties(width=800, title='Average token transaction values (log scale)')