Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docpoc - Add backfill docs for Hubble #1362

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

amishas157
Copy link
Contributor

@amishas157 amishas157 commented Mar 6, 2025

Closes #1339

This PR:

  • Adds documentation for backfill using hubble
  • Pull out connecting to bigquery section out of analyst guide
  • Rename analyst guide to developer guide (and fix associated urls)

@stellar-jenkins
Copy link

Something went wrong with PR preview build please check

@amishas157 amishas157 force-pushed the patch/improve-hubble-docs branch 2 times, most recently from ee2a717 to 34f6caa Compare March 6, 2025 21:29
@stellar-jenkins
Copy link

Something went wrong with PR preview build please check

6 similar comments
@stellar-jenkins
Copy link

Something went wrong with PR preview build please check

@stellar-jenkins
Copy link

Something went wrong with PR preview build please check

@stellar-jenkins
Copy link

Something went wrong with PR preview build please check

@stellar-jenkins
Copy link

Something went wrong with PR preview build please check

@stellar-jenkins
Copy link

Something went wrong with PR preview build please check

@stellar-jenkins
Copy link

Something went wrong with PR preview build please check

@amishas157 amishas157 changed the title Add docs for backfilling Docpoc - Add backfill docs for Hubble Mar 6, 2025
@amishas157 amishas157 marked this pull request as ready for review March 6, 2025 22:19
@amishas157 amishas157 requested a review from a team March 6, 2025 22:20
@stellar-jenkins
Copy link

Something went wrong with PR preview build please check

@amishas157 amishas157 force-pushed the patch/improve-hubble-docs branch from 9e58e0b to ad159e4 Compare March 6, 2025 22:25
@stellar-jenkins
Copy link

@chowbao
Copy link
Contributor

chowbao commented Mar 7, 2025

Note that I think we might have to reorganize/rename backfill. I think it's gonna become an overloaded term in the near future with galexie backfilling, stellar-etl backfilling, rpc backfilling, etc...

| [Data catalog](/docs/data/hubble/data-catalog) | View all Hubble data catalog information. | Learn |
| [Admin guide](/docs/data/hubble/admin-guide) | A comprehensive guide that will teach you how to run your own Hubble analytics platform | Tutorial |
| [Developer guide](/docs/data/hubble/developer-guide) | A comprehensive guide that will teach you how to run your own Hubble analytics platform | Tutorial |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The developer guide encompasses more than just running stellar-etl. It may be useful to add something about building custom data pipelines using Hubble as a source

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This describes the scenario in which someone already has a data warehouse set up with full history loaded, but it does not cover cases where a developer wants to perform an initial backfill. I don't think our recommendation would be to use the js UDF. It would be more efficient for someone to either export the bigquery table into a format they could ingest, or connect via SDK and pull the data they needed in with a query.

Can you include this option on the page as well?

Copy link
Contributor

@chowbao chowbao Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm do you think this needs to go in this PR?

I was thinking the larger backfill would be a different body of work. Like in this comment backfill is gonna be super overloaded. Like the initial backfill should be its own section with options of using galexie, rpc, hubble, and 3rd party hosted data lake

Rephrasing: I think we should save the initial backfill doc for a separate doc and rename this to using UDF to <blank>


- **Bug Fix:** You resolve a bug and need to re-ingest a specific data column.
- **New Feature:** You add a new data column as part of a feature request and need to backfill data for the newly added column.
- **Raw Data Extraction:** You want to use Hubble as a source for raw data (XDR columns) and extract only the required data columns. For scenarios 1 and 2, you can perform a backfill using Airflow and trigger a Directed Acyclic Graph (DAG) for past dates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you've outlined the different scenarios for when you need to perform a backfill. I think that's very clear. I think you should have a similar section that outlines the different options for how to backfill ie.
Options:

  1. Run stellar-etl and trigger a DAG for past dates
  2. Export data from Bigquery (via SDK connection + SQL or exporting the data into files)
  3. JS UDF

This would allow you to link out to subpages that give more detail as necessary on these options

sidebar_position: 0
---

This document outlines methods to extract required fields from the XDR of raw data. We'll take the example of extracting the `fee_account_muxed` field from a transaction envelope (`tx_meta` XDR). However, this method can be adapted to other fields as well. It is worth noting that most users will not need to standup and run their own Hubble. The Stellar Development Foundation provides public access to the data through the public datasets and tables in GCP BigQuery. Instructions on how to access this data can be found in the [Connecting](../../developer-guide/connecting-to-bigquery/README.mdx) section.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing a line that gives context as to why you would need to do this. Something simple like Hubble does not parse every single field available in raw XDR, but it does save the raw transaction meta in case you need to extract a field directly from the XDR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add User Journey for Engineer <> Hubble
4 participants