Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] Build Enriched Traces & Transactions + Aggregation Tables #1161

Draft
wants to merge 47 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
c4353f0
draft
MSilb7 Dec 12, 2024
5ff10a9
adds
MSilb7 Dec 12, 2024
a2eade6
fix func names
MSilb7 Dec 12, 2024
ee2d62d
init push
MSilb7 Dec 12, 2024
ebeb470
push example aggregation
MSilb7 Dec 12, 2024
cc12c72
mod
MSilb7 Dec 13, 2024
443e6c4
mods
MSilb7 Dec 13, 2024
b33ee03
juice it up
MSilb7 Dec 13, 2024
dd03a2a
resolve sources
MSilb7 Dec 13, 2024
ccc8ed9
migrate to existing transaction_fees and rename
MSilb7 Dec 13, 2024
563ad68
update names
MSilb7 Dec 13, 2024
365ee57
checkpoint
MSilb7 Dec 13, 2024
ae2f95c
Merge branch 'main' into traces-txs-models-v0
MSilb7 Dec 13, 2024
6503ab7
check
MSilb7 Dec 13, 2024
e4a2c77
check
MSilb7 Dec 13, 2024
a785060
undo
MSilb7 Dec 13, 2024
7905ab0
fix eet
MSilb7 Dec 13, 2024
6592682
simplify views
MSilb7 Dec 13, 2024
1617cba
mod nb
MSilb7 Dec 13, 2024
09cfe79
push latest - through refined transactions
MSilb7 Dec 13, 2024
012bed3
Merge branch 'main' into traces-txs-models-v0
MSilb7 Dec 13, 2024
379100c
clean
MSilb7 Dec 13, 2024
9d09853
Merge branch 'main' into traces-txs-models-v0
MSilb7 Dec 13, 2024
a185a85
push updates to models
MSilb7 Dec 13, 2024
0fc755f
Merge branch 'main' into traces-txs-models-v0
MSilb7 Dec 13, 2024
e705cac
checkpoint, base models DONE
MSilb7 Dec 13, 2024
28d7b5f
Merge branch 'main' into traces-txs-models-v0
MSilb7 Dec 14, 2024
61cab69
separate l1 fee scalar versions
MSilb7 Dec 14, 2024
ae7b7db
fixes and create aggregate trace models
MSilb7 Dec 14, 2024
dd00b26
Merge branch 'main' into traces-txs-models-v0
MSilb7 Dec 14, 2024
b11df41
Merge branch 'main' into traces-txs-models-v0
MSilb7 Dec 15, 2024
7112e82
split txs and traces - push tx aggregates
MSilb7 Dec 15, 2024
8164f9a
Merge branch 'main' into traces-txs-models-v0
MSilb7 Dec 15, 2024
6e03596
Merge branch 'main' into traces-txs-models-v0
MSilb7 Dec 16, 2024
45ed343
rename traces & include all types
MSilb7 Dec 16, 2024
9f147e2
push dev notebook
MSilb7 Dec 16, 2024
afeb39d
Merge branch 'main' into traces-txs-models-v0
lithium323 Dec 16, 2024
f6da4d4
Merge branch 'main' into traces-txs-models-v0
lithium323 Dec 16, 2024
0633750
Undo changes to example_daily_data.ipynb
lithium323 Dec 16, 2024
e2e57ae
Apply sqlfluff to refined_transactions_fees.sql.j2
lithium323 Dec 16, 2024
d634da1
Merge branch 'main' into traces-txs-models-v0
lithium323 Dec 16, 2024
70b34f5
Make diff wih main smaller
lithium323 Dec 16, 2024
2190036
Merge branch 'main' into traces-txs-models-v0
lithium323 Dec 17, 2024
a49845e
Make diff wih main smaller
lithium323 Dec 17, 2024
0489465
Merge branch 'main' into traces-txs-models-v0
lithium323 Dec 17, 2024
9226dde
Format sql
lithium323 Dec 18, 2024
cbea965
Merge branch 'main' into traces-txs-models-v0
lithium323 Dec 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,8 @@
"sqlfluff.executablePath": ".venv/bin/sqlfluff",
"files.associations": {
"*.sql.j2": "sql"
}
},
"sqlfluff.excludeRules": [
"LT04"
]
}
148 changes: 26 additions & 122 deletions notebooks/adhoc/example_daily_data.ipynb

Large diffs are not rendered by default.

502 changes: 502 additions & 0 deletions notebooks/adhoc/refined_transactions_traces_address_models_dev.ipynb

Large diffs are not rendered by default.

25 changes: 0 additions & 25 deletions src/op_analytics/datapipeline/models/code/daily_address_summary.py

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# import duckdb

# from op_analytics.datapipeline.models.compute.querybuilder import TemplatedSQLQuery
# from op_analytics.datapipeline.models.compute.registry import register_model
# from op_analytics.datapipeline.models.compute.types import NamedRelations


# @register_model(
# input_datasets=["intermediate/refined_transactions_fees_v1"],
# expected_outputs=["summary_v1"],
# auxiliary_views=[
# TemplatedSQLQuery(
# template_name="daily_address_summary",
# context={},
# ),
# ],
# )
# def daily_address_summary_old(duckdb_client: duckdb.DuckDBPyConnection) -> NamedRelations:
# return {
# "summary_v1": duckdb_client.view("daily_address_summary"),
# }
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
import duckdb

from op_analytics.datapipeline.models.compute.querybuilder import TemplatedSQLQuery
from op_analytics.datapipeline.models.compute.registry import register_model
from op_analytics.datapipeline.models.compute.types import NamedRelations


@register_model(
input_datasets=["intermediate/enriched_transactions_v1"],
expected_outputs=["daily_transactions_fees_by_to_v1"],
# TODO: Uncomment if we do this as a view (or some element as a view)
# auxiliary_views=[
# TemplatedSQLQuery(
# template_name="daily_transactions_fees_by_to",
# context={},
# ),
# ],
)
def daily_transactions_fees_by_to(duckdb_client: duckdb.DuckDBPyConnection) -> NamedRelations:
return {
"daily_transactions_fees_by_to_v1": duckdb_client.view(
"""
TODO: AGGREGATION CODE
"""
),
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# # TO DEPRECATE?

# import duckdb

# from op_analytics.datapipeline.models.compute.querybuilder import TemplatedSQLQuery
# from op_analytics.datapipeline.models.compute.registry import register_model
# from op_analytics.datapipeline.models.compute.types import NamedRelations


# @register_model(
# input_datasets=["ingestion/logs_v1", "ingestion/transactions_v1", "ingestion/blocks_v1"],
# expected_outputs=["event_emitting_transactions_v1"],
# auxiliary_views=[
# TemplatedSQLQuery(
# template_name="refined_transactions_fees",
# context={},
# ),
# TemplatedSQLQuery(
# template_name="logs_topic0_filters",
# context={},
# ),
# TemplatedSQLQuery(
# template_name="event_emitting_transactions",
# context={},
# ),
# ],
# )
# def event_emitting_transactions(duckdb_client: duckdb.DuckDBPyConnection) -> NamedRelations:
# return {
# "event_emitting_transactions_v1": duckdb_client.view("event_emitting_transactions"),
# }
21 changes: 21 additions & 0 deletions src/op_analytics/datapipeline/models/code/refined_trace_calls.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# import duckdb

# from op_analytics.datapipeline.models.compute.querybuilder import TemplatedSQLQuery
# from op_analytics.datapipeline.models.compute.registry import register_model
# from op_analytics.datapipeline.models.compute.types import NamedRelations


# @register_model(
# input_datasets=["ingestion/traces_v1", "refined_transactions_fees_v1"],
# expected_outputs=["refined_trace_calls_v1"],
# auxiliary_views=[
# TemplatedSQLQuery(
# template_name="refined_trace_calls",
# context={},
# ),
# ],
# )
# def refined_trace_calls(duckdb_client: duckdb.DuckDBPyConnection) -> NamedRelations:
# return {
# "refined_trace_calls_v1": duckdb_client.view("refined_trace_calls"),
# }
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
import duckdb

from op_analytics.datapipeline.models.compute.querybuilder import TemplatedSQLQuery
from op_analytics.datapipeline.models.compute.registry import register_model
from op_analytics.datapipeline.models.compute.types import NamedRelations


@register_model(
input_datasets=[
"ingestion/transactions_v1",
"ingestion/blocks_v1",
"ingestion/logs_v1",
"ingestion/traces_v1",
],
expected_outputs=[
"refined_transactions_fees_v1",
"refined_trace_calls_v1",
"event_emitting_transactions_v1",
"summary_v1",
],
auxiliary_views=[
TemplatedSQLQuery(
template_name="refined_transactions_fees",
context={},
),
TemplatedSQLQuery(
template_name="refined_trace_calls",
context={},
),
TemplatedSQLQuery(
template_name="logs_topic0_filters",
context={},
),
TemplatedSQLQuery(
template_name="event_emitting_transactions",
context={},
),
TemplatedSQLQuery(
template_name="daily_address_summary",
context={},
),
TemplatedSQLQuery(
template_name="refined_trace_calls_agg_from_to_hash",
context={},
),
TemplatedSQLQuery(
template_name="refined_trace_calls_agg_to_hash",
context={},
),
TemplatedSQLQuery(
template_name="daily_trace_calls_agg_to",
context={},
),
],
)
def refined_transactions_traces_address_models(
duckdb_client: duckdb.DuckDBPyConnection,
) -> NamedRelations:
return {
"refined_transactions_fees_v1": duckdb_client.view("refined_transactions_fees"),
"refined_trace_calls_v1": duckdb_client.view("refined_trace_calls"),
"event_emitting_transactions_v1": duckdb_client.view("event_emitting_transactions"),
"summary_v1": duckdb_client.view("daily_address_summary"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we rename this to daily_address_summary_v1? Maybe in a future PR to keep this clean-ish

"refined_trace_calls_agg_from_to_hash_v1": duckdb_client.view(
"refined_trace_calls_agg_from_to_hash"
),
"refined_trace_calls_agg_to_hash_v1": duckdb_client.view("refined_trace_calls_agg_to_hash"),
"daily_trace_calls_agg_to_v1": duckdb_client.view("daily_trace_calls_agg_to"),
}
25 changes: 24 additions & 1 deletion src/op_analytics/datapipeline/models/compute/udfs.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ def create_duckdb_macros(duckdb_client: duckdb.DuckDBPyConnection):

CREATE OR REPLACE MACRO wei_to_gwei(a)
AS a::DECIMAL(28, 0) * 0.000000001::DECIMAL(10, 10);

CREATE OR REPLACE MACRO gwei_to_eth(a)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified decimal precision here, it wasn't allowing for a to be a decimal before

AS a::DECIMAL(28, 10) * 0.000000001::DECIMAL(10, 10);

CREATE OR REPLACE MACRO safe_div(a, b) AS
IF(b = 0, NULL, a / b);
Expand All @@ -27,14 +30,34 @@ def create_duckdb_macros(duckdb_client: duckdb.DuckDBPyConnection):
-- Truncate a timestamp to hour.
CREATE OR REPLACE MACRO epoch_to_hour(a) AS
date_trunc('hour', make_timestamp(a * 1000000::BIGINT));

-- Truncate a timestamp to day.
CREATE OR REPLACE MACRO epoch_to_day(a) AS
date_trunc('day', make_timestamp(a * 1000000::BIGINT));

-- Division by 16 for DECIMAL types.
CREATE OR REPLACE MACRO div16(a)
AS a * 0.0625::DECIMAL(5, 5);

--Get the length in bytes for binary data that is encoded as a hex string
CREATE OR REPLACE MACRO hexstr_bytelen(x)
AS (length(x) - 2) / 2
AS (length(x) - 2) / 2;

--Count non-zero bytes for binary data that is encoded as a hex string. We don't use hexstr_bytelen because we need to substring the input data.
CREATE OR REPLACE MACRO hexstr_nonzero_bytes(x)
AS length(replace(hex(unhex(substr(x, 3))), '00', '')) / 2;

--Count non-zero bytes for binary data that is encoded as a hex string
CREATE OR REPLACE MACRO hexstr_zero_bytes(x)
AS hexstr_bytelen(x) - hexstr_nonzero_bytes(x);

--Calculate calldata gas used for binary data that is encoded as a hex string (can be updated by an EIP)
CREATE OR REPLACE MACRO hexstr_calldata_gas(x)
AS 16*hexstr_nonzero_bytes(x) + 4*hexstr_zero_bytes(x);

--Get the method id for input data. This is the first 4 bytes, or first 10 string characters for binary data that is encoded as a hex string.
CREATE OR REPLACE MACRO hexstr_method_id(x)
AS substring(x,1,10)
""")


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ SELECT
dt,
chain,
chain_id,
network,
from_address AS address,
-- Aggregates

Expand Down Expand Up @@ -43,9 +44,9 @@ SELECT

sum(if(success, receipt_gas_used, 0)) AS success_l2_gas_used_sum,

sum(l1_gas_used) AS l1_gas_used_sum,
sum(l1_gas_used_unified) AS l1_gas_used_unified_sum,

sum(if(success, l1_gas_used, 0)) AS success_l1_gas_used_sum,
sum(if(success, l1_gas_used_unified, 0)) AS success_l1_gas_used_unified_sum,

wei_to_eth(sum(tx_fee)) AS tx_fee_sum_eth,

Expand All @@ -58,7 +59,7 @@ SELECT

wei_to_eth(sum(l2_priority_fee)) AS l2_priority_fee_sum_eth,

wei_to_eth(sum(l2_base_legacy)) AS l2_base_legacy_fee_sum_eth,
wei_to_eth(sum(l2_legacy_extra_fee)) AS l2_base_legacy_fee_sum_eth,

-- L1 Fee and breakdown into BASE + BLOB
wei_to_eth(sum(l1_fee)) AS l1_fee_sum_eth,
Expand All @@ -82,11 +83,29 @@ SELECT
) AS l1_base_price_avg_gwei,

wei_to_gwei(safe_div(sum(l1_blob_fee), sum(l1_blob_scaled_size)))
AS l1_blob_fee_avg_gwei
AS l1_blob_fee_avg_gwei,

-- Data Processed
sum(input_zero_bytes) AS input_zero_bytes_sum,
sum(if(success, input_zero_bytes, 0)) AS success_input_zero_bytes_sum,

sum(input_nonzero_bytes) AS input_nonzero_bytes_sum,
sum(if(success, input_nonzero_bytes, 0)) AS success_input_nonzero_bytes_sum,

sum(input_byte_length) AS input_byte_length_sum,
sum(if(success, input_byte_length, 0)) AS success_input_byte_length_sum,

sum(estimated_size) AS estimated_size_sum,
sum(if(success, estimated_size, 0)) AS success_estimated_size_sum

FROM
transaction_fees
refined_transactions_fees
WHERE
NOT is_system_transaction
AND gas_price > 0
Copy link
Contributor Author

@MSilb7 MSilb7 Dec 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this used to be at the transactions_fees level, but I unfiltered there (so we preserve those transactions) and brought it here.

GROUP BY
1,
2,
3,
4
4,
5
Loading
Loading