Refactor audit trail code #516

kelvin-chappell · 2025-02-26T08:45:39Z

This change optimises and simplifies some of the code that manages interactions with the audit trail DB table.

Changes

Removal of redundant calls to describe table.
The name of the table and its key schema are already hardcoded throughout the code so it seems redundant to occasionally look these details up as well.
Removal of redundant table name parameter from various methods.
As mentioned above, the table name is already hardcoded so need to pass it around as a parameter.
Replaced an awkward tuple with a case class, as suggested in this comment.
Replaced repeated use of string literal table attribute names with shared constants.
To make it clear where the same field is being used, what it is and avoid typos etc.
Some breakdown of functions to make them more readable.
Separation of concerns between AuditTrail and AuditTrailDB now clearer as the former does all marshalling of data between database and internal formats while AuditTableDB makes the requests.

To test

These tests have been done locally and need to be repeated in production:

Go to audit table and find a known access record => should appear correctly and correspond with what's in Dynamo DB table
Sign into AWS console from Janus, go back to Janus and look at audit table => should see a new access record with fields as expected
Request credentials from Janus and then look at audit table => should see a new access record with fields as expected

adamnfish

Great stuff, thank you for doing the thinking to simplify this part of the codebase 👍

I've added a few comments but nothing blocking, up to you if you're interested to tweak things a little more.

adamnfish · 2025-02-26T16:09:36Z

app/logic/AuditTrail.scala

+  val accessTypeAttrName = "j_accessType"
+  val isExternalAttrName = "j_external"
+
+  private type DbAttr = (String, AttributeValue)


Would it simplify / clarify things to make this a case class instead of a type alias? Alternatively, is this bringing much value?

In the latter case, this type alias isn't used anywhere except right here (the next few lines), so maybe inling the real type would simplify the AuditLogDbEntryAttrs definition.

Considering the approach to instead use a case class, I notice that the tests for this code all look something like this: attrs.partitionKey._2.s(). Making this a case class with named fields would let us do something more descriptive than _2, something which I had to dig into while reviewing that part of this change.

I think either approahc to changing this would be fine. If you want to keep it as-is that's ok too, this isn't a big deal. It jumped out to me because I generally try to keep quite a high bar for things to qualify for type aliases. Type aliases bring the abstraction cost of having a new concept for a thing without the extra safety and ergonomics of a real typed wrapper with named fields.

adamnfish · 2025-02-26T16:13:04Z

app/aws/AuditTrailDB.scala

@@ -108,11 +59,21 @@ object AuditTrailDB {
    queryResult(dynamoDB, request)
  }

+  private def attrCondition(


I'm not 100% convinced this helper is worthwhile but if it is perhaps we could call it something like attrEqualCondition, to better convey that this is hard-coded to do an EQ check?

There are only two uses of this helper, and its definition isn't loads shorter than its invocation. Would it be simpler to write the builder out each time, especially since this would be consistent with the "BETWEEN" condition used just below for date range?

If IntelliJ did that underlined "Duplicate code fragment" thing then no worries, that's a clear sign wiser minds than I say this is definitely worthwhile 😁

adamnfish · 2025-02-26T16:23:54Z

app/controllers/Janus.scala

@@ -239,7 +238,7 @@ class Janus(
        janusData.access
      )
    } yield {
-      AuditTrailDB.insert(table, auditLog)
+      AuditTrailDB.insert(auditLog)


I realise this isn't part of your change, but while we're here anyway and tidying up the DB usage can we pull this up into the for comp steps? I don't like that we have an important side-effecting step in the yield block here (my apologies for the current state of things).

Something like _ = AuditTrailDB.insert(auditLog) as the last step would highlight that this is an important operation, separate from building our return value.

I'd do the same for the log line, but I don't feel as strongly we should do that now.

Equally, I'm very happy to tackle this along with a wider review of the codebase in a separate PR - I only mention it here because our diff has hit it.

adamnfish · 2025-02-26T16:29:56Z

app/aws/AuditTrailDB.scala

    dynamoDB.putItem(request)
  }

-  private def toAttribValue(value: Any): AttributeValue = {


Delighted to see this go away, great change 👍

adamnfish · 2025-02-26T16:39:46Z

app/logic/AuditTrail.scala

+  val secondaryIndexName = "AuditTrailByUser"
+
+  val partitionKeyName = "j_account"
+  val sortKeyName = "j_timestamp"


There are a few places in the codebase after this change where we lose a bit of the connection to what's really going on, when we use this name.

From a DynamoDB implementation point of view, the DB's sort key is what we're using to implement date sorting/filtering. It makes sense to call this sortKeyName to highlight this important connection to the implementation detail. An example of this is https://github.com/guardian/janus-app/pull/516/files#diff-3fcccabe5ec8afa651c9696d905657fabb5b0ba56779b459c6150def64b600fbR76, makes great sense as-is.

However, it's also the audit log timestamp, and there are other use cases where it would sense for this name to be connected to our business domain. An example is the date range condition's implementation above. In this case it's probably more helpful to know that we're using the timestamp here.

I don't have an answer for this, I think we're being pulled in both directions! One observation is that we refer to both domains in this block, which I think helps with comprehension. The code right here tells us that this is the DB sort field and that it relates to the access time.

Maybe there's a name for this member that could convey that this is the timestamp and that it is the timestamp?

adamnfish · 2025-02-26T16:42:21Z

app/logic/AuditTrail.scala

+  val tableName = "AuditTrail"
+  val secondaryIndexName = "AuditTrailByUser"
+
+  val partitionKeyName = "j_account"


This consolidation might also give us the opportunity to add a comment explaining the j_ prefix, which is to avoid clashing with DynamoDB reserved keywords. Up to you if you think this is relevant / useful?

Refactor audit trail code

0f6883d

kelvin-chappell marked this pull request as ready for review February 26, 2025 11:37

kelvin-chappell requested review from a team February 26, 2025 11:52

adamnfish approved these changes Feb 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor audit trail code #516

Refactor audit trail code #516

kelvin-chappell commented Feb 26, 2025 •

edited

Loading

adamnfish left a comment

adamnfish Feb 26, 2025

adamnfish Feb 26, 2025

adamnfish Feb 26, 2025

adamnfish Feb 26, 2025

adamnfish Feb 26, 2025

adamnfish Feb 26, 2025

Refactor audit trail code #516

Are you sure you want to change the base?

Refactor audit trail code #516

Conversation

kelvin-chappell commented Feb 26, 2025 • edited Loading

Changes

To test

adamnfish left a comment

Choose a reason for hiding this comment

adamnfish Feb 26, 2025

Choose a reason for hiding this comment

adamnfish Feb 26, 2025

Choose a reason for hiding this comment

adamnfish Feb 26, 2025

Choose a reason for hiding this comment

adamnfish Feb 26, 2025

Choose a reason for hiding this comment

adamnfish Feb 26, 2025

Choose a reason for hiding this comment

adamnfish Feb 26, 2025

Choose a reason for hiding this comment

kelvin-chappell commented Feb 26, 2025 •

edited

Loading