Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update url for the new opensearch otel demo repo #9

Merged
merged 7 commits into from
May 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .env
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ ENV_PLATFORM=local
# OpenTelemetry Collector
OTEL_COLLECTOR_HOST=otelcol
OTEL_COLLECTOR_PORT=4317
OTEL_COLLECTOR_HEALTH_CHECK_PORT=1313
OTEL_EXPORTER_OTLP_ENDPOINT=http://${OTEL_COLLECTOR_HOST}:${OTEL_COLLECTOR_PORT}
PUBLIC_OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4318/v1/traces

Expand Down
15 changes: 13 additions & 2 deletions .github/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,14 @@ git clone https://github.com/opensearch/opentelemetry-demo.git
cd opentelemetry-demo
docker compose up -d
```
#### Known Issues
In some cases few services might not load correctly - in such cases you may have to re-run the docker-compose command:
```
docker compose up -d
```
**_This is a known issue we are currently working on to resolve_**

---

### Services

Expand All @@ -35,9 +43,12 @@ The next instructions are similar and use the same docker compose file.
```
docker compose up -d
```
**Note:** The docker compose `--no-build` flag is used to fetch released docker images from [ghcr](http://ghcr.io/open-telemetry/demo) instead of building from source.
Removing the `--no-build` command line option will rebuild all images from source. It may take more than 20 minutes to build if the flag is omitted.
**Note:**

_The docker compose `--no-build` flag is used to fetch released docker images from [ghcr](http://ghcr.io/open-telemetry/demo) instead of building from source.
Removing the `--no-build` command line option will rebuild all images from source. It may take more than 20 minutes to build if the flag is omitted._

---
### Explore and analyze the data With OpenSearch Observability
Review revised OpenSearch [Observability Architecture](architecture.md)

Expand Down
8 changes: 4 additions & 4 deletions .github/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,15 @@ Backend supportive services
- [Frontend Nginx Proxy](http://nginx:90) *(replacement for _Frontend-Proxy_)*
- See [description](../src/nginx-otel/README.md)
- [OpenSearch](https://opensearch-node1:9200)
- See [description](https://github.com/YANG-DB/opentelemetry-demo/blob/12d52cbb23bbf4226f6de2dfec840482a0a7d054/docker-compose.yml#L697)
- See [description](https://github.com/opensearch-project/opentelemetry-demo/blob/12d52cbb23bbf4226f6de2dfec840482a0a7d054/docker-compose.yml#L697)
- [Dashboards](http://opensearch-dashboards:5601)
- See [description](https://github.com/YANG-DB/opentelemetry-demo/blob/12d52cbb23bbf4226f6de2dfec840482a0a7d054/docker-compose.yml#L747)
- See [description](https://github.com/opensearch-project/opentelemetry-demo/blob/12d52cbb23bbf4226f6de2dfec840482a0a7d054/docker-compose.yml#L747)
- [Prometheus](http://prometheus:9090)
- See [description](https://github.com/YANG-DB/opentelemetry-demo/blob/12d52cbb23bbf4226f6de2dfec840482a0a7d054/docker-compose.yml#L674)
- See [description](https://github.com/opensearch-project/opentelemetry-demo/blob/12d52cbb23bbf4226f6de2dfec840482a0a7d054/docker-compose.yml#L674)
- [Feature-Flag](http://feature-flag-service:8881)
- See [description](../src/featureflagservice/README.md)
- [Grafana](http://grafana:3000)
- See [description](https://github.com/YANG-DB/opentelemetry-demo/blob/12d52cbb23bbf4226f6de2dfec840482a0a7d054/docker-compose.yml#L637)
- See [description](https://github.com/opensearch-project/opentelemetry-demo/blob/12d52cbb23bbf4226f6de2dfec840482a0a7d054/docker-compose.yml#L637)

### Services Topology
The next diagram shows the docker compose services dependencies
Expand Down
9 changes: 5 additions & 4 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -678,10 +678,11 @@ services:
- ./src/otelcollector/otelcol-config.yml:/etc/otelcol-config.yml
- ./src/otelcollector/otelcol-config-extras.yml:/etc/otelcol-config-extras.yml
ports:
- "4317" # OTLP over gRPC receiver
- "4318:4318" # OTLP over HTTP receiver
- "9464" # Prometheus exporter
- "8888" # metrics endpoint
- "4317" # OTLP over gRPC receiver
- "4318:4318" # OTLP over HTTP receiver
- "13133:13133" # health check port
- "9464" # Prometheus exporter
- "8888" # metrics endpoint
depends_on:
- jaeger-agent
- data-prepper
Expand Down
3 changes: 3 additions & 0 deletions src/dataprepper/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,6 @@ There are some additional `trace.group` related fields which are not part of the
- traceGroupFields.durationInNanos - A derived field, the durationInNanos of the trace's root span.

```
### Metrics from Traces Processors

Adding new processors for creating metrics for logs and traces that pass through [Data Prepper](https://opensearch.org/blog/Announcing-Data-Prepper-2.1.0/)
53 changes: 38 additions & 15 deletions src/integrations/src/install.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,13 @@
# For testing only. Don't store credentials in code.
auth = ('admin', 'admin')

# Configure logging
logging.basicConfig(format='%(asctime)s [%(levelname)s] %(message)s', level=logging.INFO)
# Configure logging to file
logging.basicConfig(
filename='application.log', # The file where the logs should be written to
filemode='a', # Append mode, use 'w' for overwriting each time the script is run
format='%(asctime)s [%(levelname)s] %(message)s',
level=logging.INFO
)
logger = logging.getLogger(__name__)

# Create the client with SSL/TLS enabled, but hostname verification disabled.
Expand All @@ -25,13 +30,15 @@
use_ssl = True,
verify_certs = False,
ssl_assert_hostname = False,
ssl_show_warn = False
ssl_show_warn = False,
timeout=20,
max_retries=50, # Increasing max_retries from 3 (default) to 5
retry_on_timeout=True
)
# verify connection to opensearch
# verify connection to opensearch
def test_connection():
max_retries = 30 # Maximum number of retries
retry_interval = 10 # Wait for 10 seconds between retries
max_retries = 100 # Maximum number of retries
retry_interval = 20 # Wait for 20 seconds between retries

for i in range(max_retries):
try:
Expand All @@ -42,15 +49,15 @@ def test_connection():
verify=False # Disable SSL verification
)
response.raise_for_status() # Raise an exception if the request failed
print('Successfully connected to OpenSearch')
logger.info('Successfully connected to OpenSearch')
return # Exit the function if connection is successful
except requests.HTTPError as e:
logging.error(f'Failed to connect to OpenSearch, error: {str(e)}')

print(f'Attempt {i + 1} failed, waiting for {retry_interval} seconds before retrying...')
logger.info(f'Attempt {i + 1} failed, waiting for {retry_interval} seconds before retrying...')
time.sleep(retry_interval)

print(f'Failed to connect to OpenSearch after {max_retries} attempts')
logger.error(f'Failed to connect to OpenSearch after {max_retries} attempts')
exit(1) # Exit the program with an error code

# create mapping components to compose the different observability categories
Expand All @@ -62,7 +69,7 @@ def create_mapping_components(client):
mapping = json.load(f)

template_name = os.path.splitext(filename)[0] # Remove the .mapping extension
print(f'About to load template: {template_name}')
logger.info(f'About to load template: {template_name}')
# Create the component template
try:
response = requests.put(
Expand All @@ -74,7 +81,6 @@ def create_mapping_components(client):
)
response.raise_for_status() # Raise an exception if the request failed
logger.info(f'Successfully created component template: {template_name}')
print(f'Successfully created component template: {template_name}')
except requests.HTTPError as e:
logger.error(f'Failed to create component template: {template_name}, error: {str(e)}')

Expand All @@ -87,12 +93,12 @@ def create_mapping_templates(client):
mapping = json.load(f)

template_name = os.path.splitext(filename)[0] # Remove the .mapping extension
print(f'About to created index template: {template_name}')
logger.info(f'About to created index template: {template_name}')

# Create the template
try:
client.indices.put_index_template(name=template_name, body=mapping)
print(f'Successfully created index template: {template_name}')
logger.info(f'Successfully created index template: {template_name}')
except OpenSearchException as e:
logger.error(f'Failed to create index template: {template_name}, error: {str(e)}')

Expand Down Expand Up @@ -143,10 +149,28 @@ def load_dashboards():
# create the data_streams based on the list given in the data-stream.json file
def create_datasources():
datasource_dir = '../datasource/'
# get current list of data sources
try:
response = requests.get(
url=f'https://{opensearch_host}:9200/_plugins/_query/_datasources',
auth=HTTPBasicAuth('admin', 'admin'),
verify=False, # Disable SSL verification
headers={'Content-Type': 'application/json'}
)
response.raise_for_status() # Raise an exception if the request failed
current_datasources = response.json() # convert response to json
except requests.HTTPError as e:
logger.error(f'Failed to fetch datasources, error: {str(e)}')

for filename in os.listdir(datasource_dir):
if filename.endswith('.json'):
with open(os.path.join(datasource_dir, filename), 'r') as f:
datasource = json.load(f)
# check if datasource already exists
if any(d['name'] == datasource['name'] for d in current_datasources):
logger.info(f'Datasource already exists: {filename}')
continue # Skip to the next datasource if this one already exists

try:
response = requests.post(
url=f'https://{opensearch_host}:9200/_plugins/_query/_datasources',
Expand All @@ -157,12 +181,11 @@ def create_datasources():
)
response.raise_for_status() # Raise an exception if the request failed
logger.info(f'Successfully created datasource: {filename}')
print(f'Successfully created datasource: {filename}')
except requests.HTTPError as e:
print(f'Failed to create datasource: {filename}, error: {str(e)}')
logger.error(f'Failed to create datasource: {filename}, error: {str(e)}')



if __name__ == '__main__':
# import all assets
test_connection()
Expand Down
129 changes: 129 additions & 0 deletions src/otelcollector/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# OTEL Collector Pipeline

The next diagram describes the Observability signals pipeline defined inside the OTEL collector

```mermaid
%%{
init: {
'theme': 'base',
'themeVariables': {
'primaryColor': '#BB2528',
'primaryTextColor': '#fff',
'primaryBorderColor': '#7C0000',
'lineColor': '#F8B229',
'secondaryColor': '#006100',
'tertiaryColor': '#fff'
}
}
}%%

graph LR
otlp-->otlpReceiverTraces["otlp Receiver"]
style otlpReceiverTraces fill:#f9d,stroke:#333,stroke-width:4px
otlp-->otlpReceiverMetrics["otlp Receiver"]
style otlpReceiverMetrics fill:#f9d,stroke:#333,stroke-width:4px
otlp-->otlpReceiverLogs["otlp Receiver"]
style otlpReceiverLogs fill:#f9d,stroke:#333,stroke-width:4px
otlpServiceGraph-->otlpServiceGraphReceiver["otlp/servicegraph Receiver"]
spanmetrics-->spanmetricsReceiver["spanmetrics Receiver"]

subgraph Traces Pipeline
otlpReceiverTraces-->tracesProcessor["Traces Processor"]
style tracesProcessor fill:#9cf,stroke:#333,stroke-width:4px
tracesProcessor-->otlpExporter["otlp Exporter"]
tracesProcessor-->loggingExporterTraces["Logging Exporter"]

tracesProcessor-->spanmetricsExporter["Spanmetrics Exporter"]
tracesProcessor-->otlp2Exporter["otlp/2 Exporter"]
end

subgraph Metrics/Servicegraph Pipeline
otlpServiceGraphReceiver-->metricsServiceGraphProcessor["Metrics/ServiceGraph Processor"]
style otlpServiceGraphReceiver fill:#f9d,stroke:#333,stroke-width:4px
metricsServiceGraphProcessor-->prometheusServiceGraphExporter["Prometheus/ServiceGraph Exporter"]
style metricsServiceGraphProcessor fill:#9cf,stroke:#333,stroke-width:4px
end

subgraph Metrics Pipeline
otlpReceiverMetrics-->metricsProcessor["Metrics Processor"]
style metricsProcessor fill:#9cf,stroke:#333,stroke-width:4px

spanmetricsReceiver-->metricsProcessor
style spanmetricsReceiver fill:#f9d,stroke:#333,stroke-width:4px
metricsProcessor-->prometheusExporter["Prometheus Exporter"]
metricsProcessor-->loggingExporterMetrics["Logging Exporter"]
end

subgraph Logs Pipeline
otlpReceiverLogs-->logsProcessor["Logs Processor"]
style logsProcessor fill:#9cf,stroke:#333,stroke-width:4px
logsProcessor-->loggingExporterLogs["Logging Exporter"]
end


```

### Traces
The traces pipeline consists of a receiver, multiple processors, and multiple exporters.

**Receiver (otlp):**
This is where the data comes in from. In your configuration, the traces pipeline is using the otlp receiver. OTLP stands for OpenTelemetry Protocol. This receiver is configured to accept data over both gRPC and HTTP protocols. The HTTP protocol is also configured to allow CORS from any origin.

**Processors (memory_limiter, batch, servicegraph):**
Once the data is received, it is processed before being exported. The processors in the traces pipeline are:

1. **memory_limiter:** This processor checks memory usage every second (check_interval: 1s) and ensures it does not exceed 4000 MiB (limit_mib: 4000). It also allows for a spike limit of 800 MiB (spike_limit_mib: 800).

2. **batch:** This processor batches together traces before sending them on to the exporters, improving efficiency.

3. **servicegraph:** This processor is specifically designed for creating a service graph from the traces. It is configured with certain parameters for handling latency histogram buckets, dimensions, store configurations, and so on.

**Exporters (otlp, logging, spanmetrics, otlp/2):**
After processing, the data is sent to the configured exporters:

1. **otlp:** This exporter sends data to an endpoint configured as jaeger:4317 over OTLP with TLS encryption in insecure mode.

2. **logging:** This exporter logs the traces.

3. **spanmetrics:** This is likely a custom exporter defined as a connector in your configuration. It seems to be designed to create metrics from spans, but this is mostly conjecture based on the name.

4. **otlp/2:** This exporter sends data to an endpoint configured as data-prepper:21890 over OTLP with TLS encryption in insecure mode.

### Metrics
**Metrics Pipeline**

This pipeline handles metric data.

- **Receivers (otlp, spanmetrics):**

Metric data comes in from the `otlp` receiver and the `spanmetrics` receiver.
- **Processors (filter, memory_limiter, batch):**
The data is then processed:
1. **filter:** This processor excludes specific metrics. In this configuration, it is set to strictly exclude the queueSize metric.
2. **memory_limiter:** Similar to the traces pipeline, this processor ensures memory usage doesn't exceed a certain limit.
3. **batch:** This processor batches together metrics before sending them to the exporters, enhancing efficiency.

- **Exporters (prometheus, logging):**
The processed data is then exported:
1. **prometheus:** This exporter sends metrics to an endpoint configured as
2. **otelcol:9464**. It also converts resource information to Prometheus labels and enables OpenMetrics.
3. **logging:** This exporter logs the metrics.

### Logs

**Logs Pipeline**

This pipeline handles log data.

- **Receiver (otlp):**

Log data comes in from the otlp receiver.
- **Processors (memory_limiter, batch):**
The data is then processed:
1. **memory_limiter:** Similar to the traces and metrics pipelines, this processor ensures memory usage doesn't exceed a certain limit.
2. **batch:** This processor batches together logs before sending them to the exporter, enhancing efficiency.

- **Exporter (logging):**

The processed data is then exported:
- Logs Pipeline
Loading