-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docpoc - Add backfill docs for Hubble #1362
base: main
Are you sure you want to change the base?
Conversation
Something went wrong with PR preview build please check |
ee2a717
to
34f6caa
Compare
Something went wrong with PR preview build please check |
6 similar comments
Something went wrong with PR preview build please check |
Something went wrong with PR preview build please check |
Something went wrong with PR preview build please check |
Something went wrong with PR preview build please check |
Something went wrong with PR preview build please check |
Something went wrong with PR preview build please check |
Something went wrong with PR preview build please check |
9e58e0b
to
ad159e4
Compare
Preview is available here: |
Note that I think we might have to reorganize/rename |
| [Data catalog](/docs/data/hubble/data-catalog) | View all Hubble data catalog information. | Learn | | ||
| [Admin guide](/docs/data/hubble/admin-guide) | A comprehensive guide that will teach you how to run your own Hubble analytics platform | Tutorial | | ||
| [Developer guide](/docs/data/hubble/developer-guide) | A comprehensive guide that will teach you how to run your own Hubble analytics platform | Tutorial | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The developer guide encompasses more than just running stellar-etl. It may be useful to add something about building custom data pipelines using Hubble as a source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This describes the scenario in which someone already has a data warehouse set up with full history loaded, but it does not cover cases where a developer wants to perform an initial backfill. I don't think our recommendation would be to use the js UDF. It would be more efficient for someone to either export the bigquery table into a format they could ingest, or connect via SDK and pull the data they needed in with a query.
Can you include this option on the page as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm do you think this needs to go in this PR?
I was thinking the larger backfill would be a different body of work. Like in this comment backfill
is gonna be super overloaded. Like the initial backfill should be its own section with options of using galexie, rpc, hubble, and 3rd party hosted data lake
Rephrasing: I think we should save the initial backfill
doc for a separate doc and rename this to using UDF to <blank>
|
||
- **Bug Fix:** You resolve a bug and need to re-ingest a specific data column. | ||
- **New Feature:** You add a new data column as part of a feature request and need to backfill data for the newly added column. | ||
- **Raw Data Extraction:** You want to use Hubble as a source for raw data (XDR columns) and extract only the required data columns. For scenarios 1 and 2, you can perform a backfill using Airflow and trigger a Directed Acyclic Graph (DAG) for past dates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how you've outlined the different scenarios for when you need to perform a backfill. I think that's very clear. I think you should have a similar section that outlines the different options for how to backfill ie.
Options:
- Run stellar-etl and trigger a DAG for past dates
- Export data from Bigquery (via SDK connection + SQL or exporting the data into files)
- JS UDF
This would allow you to link out to subpages that give more detail as necessary on these options
sidebar_position: 0 | ||
--- | ||
|
||
This document outlines methods to extract required fields from the XDR of raw data. We'll take the example of extracting the `fee_account_muxed` field from a transaction envelope (`tx_meta` XDR). However, this method can be adapted to other fields as well. It is worth noting that most users will not need to standup and run their own Hubble. The Stellar Development Foundation provides public access to the data through the public datasets and tables in GCP BigQuery. Instructions on how to access this data can be found in the [Connecting](../../developer-guide/connecting-to-bigquery/README.mdx) section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is missing a line that gives context as to why you would need to do this. Something simple like Hubble does not parse every single field available in raw XDR, but it does save the raw transaction meta in case you need to extract a field directly from the XDR.
Closes #1339
This PR: