Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: spans are unconditionally set to a status of OK even after setting to ERROR after invoking with trace_call and that fails due to unconditional overriding because OpenTelemetry-Python can change the status after ERROR to OK #1246

Closed
odeke-em opened this issue Dec 3, 2024 · 1 comment · Fixed by #1248 · May be fixed by #1258
Assignees
Labels
api: spanner Issues related to the googleapis/python-spanner API.

Comments

@odeke-em
Copy link
Contributor

odeke-em commented Dec 3, 2024

Noticed while writing extensive tests for modernized tracing functionality that we've got this code

with tracer.start_as_current_span(
name, kind=trace.SpanKind.CLIENT, attributes=attributes
) as span:
try:
yield span
except Exception as error:
span.set_status(Status(StatusCode.ERROR, str(error)))
span.record_exception(error)
raise
else:
span.set_status(Status(StatusCode.OK))

which doesn't do exactly what you expect with OpenTelemetry

Once yield span is invoked, if no exception was encountered, we unconditionally set the span's value to OK even if it previously were set to ERROR and here is a test

Reproduction

from tests import _helpers as ot_helpers
        
@pytest.mark.skipif(
    not ot_helpers.HAS_OPENTELEMETRY_INSTALLED,
    reason="Tracing requires OpenTelemetry",
)               
def test_trace_call_status():
    from opentelemetry.sdk.trace.export import SimpleSpanProcessor
    from opentelemetry.sdk.trace.export.in_memory_span_exporter import (
        InMemorySpanExporter,
    )
    from google.cloud.spanner_v1._opentelemetry_tracing import trace_call
    from opentelemetry.trace.status import Status, StatusCode
    from opentelemetry.sdk.trace import TracerProvider
    from opentelemetry.sdk.trace.sampling import ALWAYS_ON
    from opentelemetry import trace
        
    tracer_provider = TracerProvider(sampler=ALWAYS_ON)
    trace_exporter = InMemorySpanExporter()
    tracer_provider.add_span_processor(SimpleSpanProcessor(trace_exporter))
    observability_options = dict(tracer_provider=tracer_provider)
        
    with trace_call(
        "VerifyBehavior", observability_options=observability_options
    ) as span:
        span.set_status(Status(StatusCode.ERROR, "Our error exhibit"))
        
    span_list = trace_exporter.get_finished_spans()
    got_statuses = []
        
    for span in span_list:
        got_statuses.append(
            (span.name, span.status.status_code, span.status.description)
        )
        
    want_statuses = [
        ("VerifyBehavior", StatusCode.ERROR, "Our error exhibit"),
    ]
    assert got_statuses == want_statuses

which would fail with

$ nox -s system-3.8 -- -k test_trace_call_status
>       assert got_statuses == want_statuses
E       AssertionError: assert [('VerifyBeha...OK: 1>, None)] == [('VerifyBeha...ror exhibit')]
E         
E         At index 0 diff: ('VerifyBehavior', <StatusCode.OK: 1>, None) != ('VerifyBehavior', <StatusCode.ERROR: 2>, 'Our error exhibit')
E         Use -v to get more diff

tests/system/test_observability_options.py:180: AssertionError

Fix

diff --git a/google/cloud/spanner_v1/_opentelemetry_tracing.py b/google/cloud/spanner_v1/_opentelemetry_tracing.py
index 9472251..65b3441 100644
--- a/google/cloud/spanner_v1/_opentelemetry_tracing.py
+++ b/google/cloud/spanner_v1/_opentelemetry_tracing.py
@@ -142,6 +142,9 @@ def trace_call_end_lazily(
     ctx_manager.__enter__()
 
     def discard(exc_type=None, exc_value=None, exc_traceback=None):
+        if not exc_type:
+            span.set_status(Status(StatusCode.OK))
+
         ctx_manager.__exit__(exc_type, exc_value, exc_traceback)
 
     return discard
@@ -175,8 +178,12 @@ def trace_call(name, session=None, extra_attributes=None, observability_options=
             span.record_exception(error)
             raise
         else:
-            span.set_status(Status(StatusCode.OK))
-
+            if span._status.status_code == StatusCode.UNSET:
+                # OpenTelemetry-Python only allows a status change
+                # if the current code is UNSET or ERROR. At the end
+                # of the generator's consumption, only set it to OK
+                # it wasn't previously set otherwise
+                span.set_status(Status(StatusCode.OK))

/cc @harshachinta

@product-auto-label product-auto-label bot added the api: spanner Issues related to the googleapis/python-spanner API. label Dec 3, 2024
@odeke-em
Copy link
Contributor Author

odeke-em commented Dec 3, 2024

I have a fix for it.

odeke-em added a commit to odeke-em/python-spanner that referenced this issue Dec 3, 2024
odeke-em added a commit to odeke-em/python-spanner that referenced this issue Dec 3, 2024
In modernized OpenTelemetry-Python, if the SpanStatus
was not already set to OK, it can be changed and
the code for trace_call was accidentally unconditionally
setting the status to OK if there was no exception.
This change fixes that and adds tests to lock this behavior in.

Fixes googleapis#1246
odeke-em added a commit to odeke-em/python-spanner that referenced this issue Dec 3, 2024
odeke-em added a commit to odeke-em/python-spanner that referenced this issue Dec 4, 2024
This change adds more descriptive and value adding spans to
replace the generic CloudSpanner.ReadWriteTransaction.
With this change, we add new spans:
* CloudSpanner.Database.run_in_transaction
* CloudSpanner.execute_pdml
* CloudSpanner.execute_sql
* CloudSpanner.execute_update

Reduce edit changes

Propagate db_name when session is None

Update system tests + test_session

Database.execute_pdml: add events for before and after BeginTransaction

Database.run_in_transaction: add span_events for using Transaction and aborted

fix tests

Trace (FixedSizePool, PingingPool).BatchCreateSessions in bind

Add another form of tracing with explicit .close() invocation

General touch-ups for every better fine-grained spans and events

Use add_event_on_current_span helper more

Make updates and fix up tests

Wring up passthrough context manager

Address updates from running transaction aborting

All add much deeper tests to check for span statuses plus retries+abort

Also while here, fixed googleapis#1246

More upates for test span results

Updates from self review
odeke-em added a commit to odeke-em/python-spanner that referenced this issue Dec 4, 2024
In modernized OpenTelemetry-Python, if the SpanStatus
was not already set to OK, it can be changed and
the code for trace_call was accidentally unconditionally
setting the status to OK if there was no exception.
This change fixes that and adds tests to lock this behavior in.

Fixes googleapis#1246
odeke-em added a commit to odeke-em/python-spanner that referenced this issue Dec 5, 2024
In modernized OpenTelemetry-Python, if the SpanStatus
was not already set to OK, it can be changed and
the code for trace_call was accidentally unconditionally
setting the status to OK if there was no exception.
This change fixes that and adds tests to lock this behavior in.

Fixes googleapis#1246
odeke-em added a commit to odeke-em/python-spanner that referenced this issue Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/python-spanner API.
Projects
None yet
2 participants