Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor audit trail code #516

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

kelvin-chappell
Copy link
Member

@kelvin-chappell kelvin-chappell commented Feb 26, 2025

This change optimises and simplifies some of the code that manages interactions with the audit trail DB table.

Changes

  1. Removal of redundant calls to describe table.
    The name of the table and its key schema are already hardcoded throughout the code so it seems redundant to occasionally look these details up as well.
  2. Removal of redundant table name parameter from various methods.
    As mentioned above, the table name is already hardcoded so need to pass it around as a parameter.
  3. Replaced an awkward tuple with a case class, as suggested in this comment.
  4. Replaced repeated use of string literal table attribute names with shared constants.
    To make it clear where the same field is being used, what it is and avoid typos etc.
  5. Some breakdown of functions to make them more readable.
  6. Separation of concerns between AuditTrail and AuditTrailDB now clearer as the former does all marshalling of data between database and internal formats while AuditTableDB makes the requests.

To test

These tests have been done locally and need to be repeated in production:

  • Go to audit table and find a known access record => should appear correctly and correspond with what's in Dynamo DB table
  • Sign into AWS console from Janus, go back to Janus and look at audit table => should see a new access record with fields as expected
  • Request credentials from Janus and then look at audit table => should see a new access record with fields as expected

@kelvin-chappell kelvin-chappell marked this pull request as ready for review February 26, 2025 11:37
@kelvin-chappell kelvin-chappell requested review from a team February 26, 2025 11:52
Copy link
Contributor

@adamnfish adamnfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff, thank you for doing the thinking to simplify this part of the codebase 👍

I've added a few comments but nothing blocking, up to you if you're interested to tweak things a little more.

val accessTypeAttrName = "j_accessType"
val isExternalAttrName = "j_external"

private type DbAttr = (String, AttributeValue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it simplify / clarify things to make this a case class instead of a type alias? Alternatively, is this bringing much value?

In the latter case, this type alias isn't used anywhere except right here (the next few lines), so maybe inling the real type would simplify the AuditLogDbEntryAttrs definition.

Considering the approach to instead use a case class, I notice that the tests for this code all look something like this: attrs.partitionKey._2.s(). Making this a case class with named fields would let us do something more descriptive than _2, something which I had to dig into while reviewing that part of this change.

I think either approahc to changing this would be fine. If you want to keep it as-is that's ok too, this isn't a big deal. It jumped out to me because I generally try to keep quite a high bar for things to qualify for type aliases. Type aliases bring the abstraction cost of having a new concept for a thing without the extra safety and ergonomics of a real typed wrapper with named fields.

@@ -108,11 +59,21 @@ object AuditTrailDB {
queryResult(dynamoDB, request)
}

private def attrCondition(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% convinced this helper is worthwhile but if it is perhaps we could call it something like attrEqualCondition, to better convey that this is hard-coded to do an EQ check?

There are only two uses of this helper, and its definition isn't loads shorter than its invocation. Would it be simpler to write the builder out each time, especially since this would be consistent with the "BETWEEN" condition used just below for date range?

If IntelliJ did that underlined "Duplicate code fragment" thing then no worries, that's a clear sign wiser minds than I say this is definitely worthwhile 😁

@@ -239,7 +238,7 @@ class Janus(
janusData.access
)
} yield {
AuditTrailDB.insert(table, auditLog)
AuditTrailDB.insert(auditLog)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise this isn't part of your change, but while we're here anyway and tidying up the DB usage can we pull this up into the for comp steps? I don't like that we have an important side-effecting step in the yield block here (my apologies for the current state of things).

Something like _ = AuditTrailDB.insert(auditLog) as the last step would highlight that this is an important operation, separate from building our return value.

I'd do the same for the log line, but I don't feel as strongly we should do that now.

Equally, I'm very happy to tackle this along with a wider review of the codebase in a separate PR - I only mention it here because our diff has hit it.

dynamoDB.putItem(request)
}

private def toAttribValue(value: Any): AttributeValue = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delighted to see this go away, great change 👍

val secondaryIndexName = "AuditTrailByUser"

val partitionKeyName = "j_account"
val sortKeyName = "j_timestamp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few places in the codebase after this change where we lose a bit of the connection to what's really going on, when we use this name.

From a DynamoDB implementation point of view, the DB's sort key is what we're using to implement date sorting/filtering. It makes sense to call this sortKeyName to highlight this important connection to the implementation detail. An example of this is https://github.com/guardian/janus-app/pull/516/files#diff-3fcccabe5ec8afa651c9696d905657fabb5b0ba56779b459c6150def64b600fbR76, makes great sense as-is.

However, it's also the audit log timestamp, and there are other use cases where it would sense for this name to be connected to our business domain. An example is the date range condition's implementation above. In this case it's probably more helpful to know that we're using the timestamp here.

I don't have an answer for this, I think we're being pulled in both directions! One observation is that we refer to both domains in this block, which I think helps with comprehension. The code right here tells us that this is the DB sort field and that it relates to the access time.

Maybe there's a name for this member that could convey that this is the timestamp and that it is the timestamp?

val tableName = "AuditTrail"
val secondaryIndexName = "AuditTrailByUser"

val partitionKeyName = "j_account"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This consolidation might also give us the opportunity to add a comment explaining the j_ prefix, which is to avoid clashing with DynamoDB reserved keywords. Up to you if you think this is relevant / useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants