Skip to content

Commit

Permalink
grammar n stuff in Ingestion
Browse files Browse the repository at this point in the history
  • Loading branch information
briwylde08 committed Jan 17, 2024
1 parent 9883e9e commit 5cf066f
Showing 1 changed file with 18 additions and 18 deletions.
36 changes: 18 additions & 18 deletions docs/run-platform-server/ingestion.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,31 +9,31 @@ Horizon API provides most of its utility through ingested data, and your Horizon

## Ingestion Types

There are two primary ingestion use-cases for Horizon operations:
There are two primary ingestion use cases for Horizon operations:

- ingesting **live** data to stay up to date with the latest ledgers from the network, accumulating a sliding window of aged ledgers
- ingesting **historical** data to retroactively add network data from a time range in the past to the database
- Ingesting **live** data to stay up to date with the latest ledgers from the network, accumulating a sliding window of aged ledgers;
- Ingesting **historical** data to retroactively add network data from a time range in the past to the database.

## Determine storage space
## Determine Storage Space

You should think carefully about the historical timeframe of ingested data you'd like to retain in Horizon's database. The storage requirements for transactions on the Stellar network are substantial and are growing unbounded over time. This is something that you may need to continually monitor and reevaluate as the network continues to grow. We have found that most organizations need only a small fraction of recent historical data to satisfy their use cases. Through analyzing traffic patterns on SDF's Horizon instance, we see that most requests are for very recent data.

To keep your storage footprint small, we recommend the following:

- use **live** ingestion, use **historical** ingestion only in limited exceptional cases
- if your application requires access to all network data, no filtering can be done, we recommend limiting historical retention of ingested data to a sliding window of 1 month (HISTORY_RETENTION_COUNT=518400) which is default set by Horizon
- if your application can work on a [filtered network dataset](./ingestion-filtering.mdx) based on specific accounts and assets, then we recommend applying ingestion filter rules. When using filter rules, it provides benefit of choice in longer historical retention timeframe since the filtering is reducing the overall database size to such a degree, historical retention(`HISTORY_RETENTION_COUNT`) can be set in terms of years rather than months or even disabled(`HISTORY_RETENTION_COUNT=0`)
- if you cannot limit your history retention window to 30 days and cannot use filter rules, we recommend considering [Stellar Hubble Data Warehouse](https://developers.stellar.org/docs/accessing-data/overview) for any historical data
- Use **live** ingestion, use **historical** ingestion only in limited exceptional cases.
- If your application requires access to all network data, no filtering can be done, we recommend limiting historical retention of ingested data to a sliding window of 1 month (HISTORY_RETENTION_COUNT=518400) which is default set by Horizon.
- If your application can work on a [filtered network dataset](./ingestion-filtering.mdx) based on specific accounts and assets, then we recommend applying ingestion filter rules. When using filter rules, it provides benefit of choice in longer historical retention timeframe since the filtering is reducing the overall database size to such a degree, historical retention(`HISTORY_RETENTION_COUNT`) can be set in terms of years rather than months or even disabled(`HISTORY_RETENTION_COUNT=0`).
- If you cannot limit your history retention window to 30 days and cannot use filter rules, we recommend considering [Stellar Hubble Data Warehouse](https://developers.stellar.org/docs/accessing-data/overview) for any historical data.

### Ingesting Live Data

This option is enabled by default and is the recommended mode of ingestion to run. It is controlled with environment configuration flag `INGEST`. Refer to [Configuration](./configuring.mdx) for how an instance of Horizon performs the ingestion role.

For a high availability requirements, **we recommend deploying more than one live ingesting instance**, as this makes it easier to avoid downtime during upgrades and adds resilience, ensuring you always have the latest network data, refer to [Ingestion Role Instance](./configuring.mdx#multiple-instance-deployment)
For high availability requirements, **we recommend deploying more than one live ingesting instance**, as this makes it easier to avoid downtime during upgrades and adds resilience, ensuring you always have the latest network data (refer to [Ingestion Role Instance](./configuring.mdx#multiple-instance-deployment)).

### Ingesting Historical Data

Import network data from a past date range in to the database:
Import network data from a past date range into the database:

<CodeExample>

Expand All @@ -49,9 +49,9 @@ Typically the only time you need to run historical ingestion is once when boot-s

You can run historical ingestion in parallel in background while your main Horizon server separately performs **live** ingestion. If the range specified overlaps with data already in the database, it is ok and will simply be overwritten, effectively idempotent.

#### Parallel ingestion workers
#### Parallel Ingestion Workers

You can parallelize the ingestion of target historical ledger range by dividing it into sequential slices of smaller ranges and run the db reingest range command for each sub range in parallel as a separate process on the same or a different machine. The shorthand rule for best performance is to identify the number of CPU cores available per target machine, if multi-core, then add `--parallel-workers <num of cores>` to the command, this will enable the command to further parallelize internally within a single process using multiple threads and sub-divided smaller ranges.
You can parallelize the ingestion of target historical ledger range by dividing it into sequential slices of smaller ranges and run the db reingest range command for each sub-range in parallel as a separate process on the same or a different machine. The shorthand rule for best performance is to identify the number of CPU cores available per target machine, if multi-core, then add `--parallel-workers <num of cores>` to the command, this will enable the command to further parallelize internally within a single process using multiple threads and sub-divided smaller ranges.

<CodeExample>

Expand All @@ -73,15 +73,15 @@ horizon2> stellar-horizon db reingest range 15001 30000 --parallel-workers 2

#### Some endpoints may report not available during **live** ingestion

Endpoints that display current state information from **live** ingestion may return `503 Service Unavailable`/`Still Ingesting` error. An example is the `/paths` endpoint (built using offers). Such endpoints will become available after **live** ingestion has finished network synchronization and catch up(usually within a couple of minutes).
- Endpoints that display current state information from **live** ingestion may return `503 Service Unavailable`/`Still Ingesting` error. An example is the `/paths` endpoint (built using offers). Such endpoints will become available after **live** ingestion has finished network synchronization and catch up (usually within a couple of minutes).

#### If more than five minutes has elapsed with no new ingested data:

- Verify host machine meets recommended [Prerequisites](./prerequisites.mdx).
- Verify the host machine meets recommended [Prerequisites](./prerequisites.mdx).

- Check horizon log output.
- Check Horizon log output.
- If there are many `level=error` messages, it may point to an environmental issue, inability to access the database.
- **live** ingestion will emit two key log lines about once every 5 seconds based on latest ledger emitted from network. Tail the horizon log output and grep for presence of these lines with a filter:
- **Live** ingestion will emit two key log lines about once every 5 seconds based on latest ledger emitted from network. Tail the Horizon log output and grep for presence of these lines with a filter:
```
tail -f horizon.log | | grep -E 'Processed ledger|Closed ledger'
```
Expand All @@ -92,6 +92,6 @@ Endpoints that display current state information from **live** ingestion may ret
sudo dd if=/dev/zero of=/tmp/test_speed.img bs=1G count=1
```

#### Monitoring ingestion process
#### Monitoring Ingestion Process

For high availability deployments, it is recommended to implement monitoring of ingestion process for visibility on performance/health. Refer to [Monitoring](./monitoring.mdx) for accessing logs and metrics from horizon. Stellar publishes the example [Horizon Grafana Dashboard](https://grafana.com/grafana/dashboards/13793-stellar-horizon/) which demonstrates queries against key horizon ingestion metrics, specifically look at the `Local Ingestion Delay [Ledgers]` and `Last ledger age` in the `Health Summary` panel.
For high-availability deployments, it is recommended to implement monitoring of ingestion process for visibility on performance/health. Refer to [Monitoring](./monitoring.mdx) for accessing logs and metrics from Horizon. Stellar publishes the example [Horizon Grafana Dashboard](https://grafana.com/grafana/dashboards/13793-stellar-horizon/), which demonstrates queries against key horizon ingestion metrics, specifically look at the `Local Ingestion Delay [Ledgers]` and `Last ledger age` in the `Health Summary` panel.

0 comments on commit 5cf066f

Please sign in to comment.