Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured logging for DSR requests #4594

Merged
merged 12 commits into from
Feb 14, 2024

Conversation

galvana
Copy link
Contributor

@galvana galvana commented Feb 6, 2024

Closes PROD-1602

Description Of Changes

Uses a combination of a Contextualizable mixin and a log_context decorator. The decorator iterates through the arguments of the decorated function and calls get_log_context on every argument that implements Contextualizable. The log_context decorator accepts optional keyword arguments to add additional context outside of the contextualizable arguments.

Adding four user-friendly error groups for SaaS connector logs.

  • Authentication error
  • Client-side error
  • Server-side error
  • Network/protocol

All HTTP request logs for SaaS connectors will also include the following metadata

  • system_key
  • connection_key
  • action_type
  • collection_name
  • privacy_request_id
  • method
  • url
  • body (if dev mode is enabled)
  • response (if dev mode is enabled)

This logging will apply to access, erasure, and consent requests. We can expand this contextualized logging to database connectors or other parts of the app after we receive initial feedback.

Code Changes

  • A new logger_context_utils.py with a Contextualizable base class and log_context decorator
  • Addition of logger.contextualize in key parts of the access, erasure, and consent request functions

Steps to Confirm

  • Run the test_privacy_request_logging.py test or any of the SaaS connector tests

Pre-Merge Checklist

  • All CI Pipelines Succeeded
  • Issue Requirements are Met
  • Update CHANGELOG.md

Copy link

vercel bot commented Feb 6, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
fides-plus-nightly ⬜️ Ignored (Inspect) Visit Preview Feb 14, 2024 7:33pm

Copy link

cypress bot commented Feb 6, 2024

Passing run #6270 ↗︎

0 4 0 0 Flakiness 0
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.

Details:

Merge 4a67a4f into cae578a...
Project: fides Commit: 6ed97e5b79 ℹ️
Status: Passed Duration: 00:35 💡
Started: Feb 13, 2024 9:17 PM Ended: Feb 13, 2024 9:18 PM

Review all test suite changes for PR #4594 ↗︎

Copy link

codecov bot commented Feb 6, 2024

Codecov Report

Attention: 13 lines in your changes are missing coverage. Please review.

Comparison is base (cae578a) 86.82% compared to head (4a67a4f) 86.83%.

Files Patch % Lines
src/fides/api/util/logger_context_utils.py 86.48% 5 Missing and 5 partials ⚠️
...pi/service/connectors/saas/authenticated_client.py 76.92% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4594   +/-   ##
=======================================
  Coverage   86.82%   86.83%           
=======================================
  Files         331      332    +1     
  Lines       19885    19975   +90     
  Branches     2553     2569   +16     
=======================================
+ Hits        17265    17345   +80     
- Misses       2152     2157    +5     
- Partials      468      473    +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@galvana galvana requested a review from adamsachs February 6, 2024 19:36
Copy link
Contributor

@adamsachs adamsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galvana i haven't done UAT testing yet since this is still in draft, but i think the approach here looks great! a couple minor comments for now but, assuming you've confirmed this is giving us good output that's in line with the expectations in the ticket, i think this is a good path to continue down and i'd be happy to do some proper UAT testing if/when you feel that's ready.

have we confirmed SaaS request execution logging as a sufficient scope for this increment? if so, then this looks pretty much feature-complete, i think. that being said, my comments are mainly around future-proofing and how to ensure consistency as we'll look to build on this. it may not be time to act on those concerns quite yet, but at least wanted to share some of my thoughts as i was going through!

"""Log SaaS request and response in dev mode only"""
if CONFIG.dev_mode:
logger.info(
"\n\n-----------SAAS REQUEST-----------"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, this is nice to get rid of :)

def exception_details(exception: Exception, url: str) -> Dict[str, Any]:
"""Maps select connection exceptions to user-friendly error details."""

details = {"error_group": ErrorGroup.network_error.value}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this confuses me a bit, when considered alongside the function signature. the function seems to apply to any Exception generally, but here we effectively assume that it's a network_error. that may be the only context in which this helper function is called currently, but seems like something that one could easily be tripped up on when trying to extend these utilities to other use cases.

so, either i'm misunderstanding something, or i'd recommend adjusting the function name or logic (or at least docstring)

if connector.current_collection_name:
details["collection"] = connector.current_collection_name
if connector.current_privacy_request:
details["privacy_request_id"] = connector.current_privacy_request.id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looking ahead to a broader logging capability, can you think of any (simple) ways to make more generic/extensible the pieces that add the generic privacy request details to the log context/dictionary?

i.e., we'll probably want the privacy_request_id put on the logger context even when we don't have a SaaSconnector available to us (on non-SaaS DSRs). it'd be nice to ensure consistency/DRY code for adding that piece of context in all use cases.

all that being said, that may be an over-abstraction for now, if we're just looking to get the SaaS connector support, so totally fine deferring on this till later. just wanted to mention and keep eyes on what's ahead!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one very rudimentary thing i can think of, at least for now, is just defining the privacy_request_id key as a module-level constant rather than a string defined inline. that's at least one measure we can take to help ensure consistency across contexts. we also could do the same for other keys that we anticipate will be reused across different context. 👍

the reason i key (pun intended) in on this is because consistency across logs is super important for being able to sift through the noise! so the closer we can get to a formal "schema" for the context we're adding (across the entire app), the better!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice to pull these utils out into their own module 👍

could we go a step further and abstract out the (small amount of) boiler plate that's still required to get the dict that's output by these functions into the logger context? maybe not, but something to consider. basically - how can we make it as simple as possible for someone who wants to come in and add a similar type of logging context from a totally different point in the app, with as consistent a schema as possible to what we've started here. having the functions defined in a single place here helps with that goal, for sure - but i wonder if there's anything else we can do to help guide consistency.

take that with a grain of salt though - i'm aware that we can't over-optimize, so it may be nothing to act on for now!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an alternate approach #4609, let me know your thoughts!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooooh, this looks really nice!

@galvana galvana marked this pull request as ready for review February 9, 2024 23:54
@galvana galvana requested a review from adamsachs February 10, 2024 00:01
Copy link
Contributor

@adamsachs adamsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest changes look good to me, and i really like the way the changes in #4609 are looking - that approach feels very extensible while remaining relatively simple!

i think you'll wanna merge that into this PR and finalize things, right? so i'll hold off on a final approval until then, just to see how things shape up here. but everything else looks good! nice work on all of this 👍

Comment on lines 20 to 24
class LoggerContextKeys(Enum):
privacy_request_id = "privacy_request_id"
error_group = "error_group"
error_details = "error_details"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice to declare this as an enum to ensure consistency! 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooooh, this looks really nice!

@galvana galvana requested a review from adamsachs February 13, 2024 21:06
Copy link
Contributor

@adamsachs adamsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galvana awesome work here, i really like how this has turned out! a really elegant paradigm you've set up.

i've got one nit that you could look address, but probably not necessary. i haven't done any testing yet, but i'll approve preemptively anyway, and i'll see if i can follow up with some manual testing.

@wraps(func)
def wrapper(*args: Any, **kwargs: Any) -> Any:
context = dict(additional_context)
for arg in args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: may we also want to look at the kwargs here for Contextualizable instances? could just make the decorator a bit more robust i think? admittedly i'm a bit fuzzy on how args/kwargs come through to the decorator wrapper here so i could be wrong about this! in any case, not a blocker, but may be something quick to make this a bit more defensive, if my understanding is correct!

@@ -161,6 +162,9 @@ def debug(self) -> Dict[str, Any]:
"to": {k: set(v) for k, v in to.items()},
}

def get_log_context(self) -> Dict[LoggerContextKeys, Any]:
return {LoggerContextKeys.collection: self.node.collection.name}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, i think this ends up being quite elegant since these additional pieces of context are composed when passed to a function that's decorated. i really like it! 👏

status_code=response.status_code, errors_to_ignore=ignore_errors
)
context_logger = logger.bind(
**request_details(prepared_request, response, ignore_error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at some point, it'd be nice if we could figure out a not-too-crazy way to add/override third party classes (e.g. PreparedRequest and Response) and "annotate" them with get_log_context methods, i.e. make them extend Contextualizable... and then some of this could be further abstracted out into those methods, rather than the standalone utility functions, which stick out a bit now.

anyway, not something to worry about in this initial iteration, just thought i'd note it!

@galvana galvana merged commit 1f807d5 into main Feb 14, 2024
14 checks passed
@galvana galvana deleted the PROD-1602-server-logging-searchability branch February 14, 2024 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants