Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PXP-7805 Push audit logs to an AWS SQS #923

Merged
merged 26 commits into from
Jun 11, 2021
Merged

Conversation

paulineribeyre
Copy link
Contributor

@paulineribeyre paulineribeyre commented May 26, 2021

Jira Ticket: PXP-7805

goes with uc-cdis/audit-service#2 and uc-cdis/cloud-automation#1603

New Features

  • Fence can be configured to push audit logs to an AWS SQS
  • Record all requests instead of only successful requests by creating audit logs right before returning the response

Improvements

  • Audit logs for login events now have the "code" and "state" query parameters redacted
  • Fence unit tests for login events now test all available IDPs
  • Change deprecated "logger.warn" to new "logger.warning"

Deployment changes

  • For existing deployments with "ENABLE_AUDIT_LOGS" turned ON, to DISABLE pushing audit logs to an AWS SQS, the configuration "PUSH_AUDIT_LOGS_CONFIG.type" must be changed to "api" (default: "aws_sqs"). If audit-service has never been deployed, no action required.
  • To ENABLE pushing audit logs to an AWS SQS (REQUIRES audit-service 1.0.0 or more recent), create/update the configuration file's "PUSH_AUDIT_LOGS_CONFIG" field ("type: aws_sqs" (default), and the SQS URL and region can be copied from the audit-service configuration - or run gen3 sqs info $(gen3 api safe-name audit-sqs)), see
    PUSH_AUDIT_LOGS_CONFIG:
    type: aws_sqs
    aws_sqs_config:
    sqs_url:
    region:
    . Then run kubectl delete secret fence-config and gen3 kube-setup-fence

@github-actions
Copy link

The style in this PR agrees with black. ✔️

This formatting comment was generated automatically by a script in uc-cdis/wool.

@coveralls
Copy link

coveralls commented May 26, 2021

Pull Request Test Coverage Report for Build 11210

  • 126 of 158 (79.75%) changed or added relevant lines in 12 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.4%) to 71.124%

Changes Missing Coverage Covered Lines Changed/Added Lines %
fence/blueprints/login/base.py 3 4 75.0%
fence/models.py 0 1 0.0%
fence/resources/admin/admin_users.py 0 1 0.0%
fence/resources/openid/ras_oauth2.py 0 1 0.0%
fence/sync/sync_users.py 0 1 0.0%
fence/resources/audit/utils.py 45 48 93.75%
fence/blueprints/login/init.py 2 6 33.33%
fence/resources/audit/client.py 56 76 73.68%
Files with Coverage Reduction New Missed Lines %
fence/blueprints/data/indexd.py 1 74.25%
Totals Coverage Status
Change from base Build 11200: 0.4%
Covered Lines: 6133
Relevant Lines: 8623

💛 - Coveralls

# `region` are required. Fields `aws_access_key_id` and
# `aws_secret_access_key` are optional.
PUSH_AUDIT_LOGS_CONFIG:
type: aws_sqs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaulting to this means that existing deployments with ENABLE_AUDIT_LOGS on will now error until they flip this to api, right? We may want to note that in DEPLOYMENT CHANGES. I know you note something about enabling them, but may want to explicitly mentioned that even for deployments that DON'T want this, they have to update the cfg (unless you change this default)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

resource_paths = indexed_file.index_document.get("authz", [])
if not resource_paths:
# fall back on ACL
resource_paths = indexed_file.index_document.get("acl", [])
if not protocol and indexed_file.indexed_file_locations:
if not protocol and len(indexed_file.indexed_file_locations) > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the truthy check above already handle this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which check? if the indexd record urls field is empty, we need to check this here - we record no protocol, and then the presigned URL generation function errors 'there are no locations'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I meant the previous code. like why are we switching from indexed_file.indexed_file_locations to len(indexed_file.indexed_file_locations) > 0. For situations where urls are empty the previous code should have been False.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops, yeah i'm not sure what happened there 😆

return request_url


@app.after_request
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this called after every request? 😬 I'm a little nervous of the overhead of this. I like the idea of consolidating and making it cleaner, but could we isolate this cleaning logic to only functions where we know we're capturing audit logs?

Maybe make a custom decorator and only decorate functions we know we want to audit log? Also only do the decorator logic if audit logs are enabled.

I also want to make sure that if someone doesn't want ANY logging, it shouldn't impact performance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, i'll look into that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like this conditional decorator based on if audit logs are enabled. And then this function basically becomes a decorator only used when it's enabled and only on functions where we know we want to audit log.

"""
Attempt to parse the request for token to authenticate the user. fallback to
populated information about an anonymous user.
By default, cast `sub` to str. Use `sub_type` to override this behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just always cast to string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look too much into it, but authutils.current_token["sub"] is a string and flask.g.user.id is an int 🤔
I'm casting authutils.current_token["sub"] to int here because audit-service expects an int, to match the type of sub in the fence DB.

raise InternalError("Unable to create audit log")

def create_audit_log(self, category, data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have another function called create_audit_log in a different module. Probably fine but might be confusing. Pending if/how we do the decorator/performance improvements

@@ -479,6 +479,8 @@ AUDIT_SERVICE: 'http://audit-service'
ENABLE_AUDIT_LOGS:
presigned_url: false
login: false
PUSH_AUDIT_LOGS_CONFIG:
type: api
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also write some tests that monkey patch this and add others for aws_sqs and mock out the AWS call for better coverage of the new feature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the current tests are testing "type: api", so the audit_service_client module without the SQS part. Tbh I wasn't sure how to test it since everything would be mocked. But i'll spend some more time on that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well just the boto call out to sqs would need to be mocked, you could check that it actually gets called with expected data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see you added some 🎊

Copy link
Contributor

@Avantol13 Avantol13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments. tl;dr I think we should isolate this more to reduce potential impact to performance

@paulineribeyre paulineribeyre requested a review from Avantol13 May 27, 2021 20:26
)

def ping(self):
max_tries = 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use backoff lib for this to remain consistent? we're using it elsewhere

@backoff.on_exception(backoff.expo, Exception, **DEFAULT_BACKOFF_SETTINGS)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can override the exception/handling of default if you need to

self.ping()
self.validate_config()
else:
logger.warn("NOT enabling audit logs")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.warn is deprecated I believe. you should use .warning

f"Audit logs are enabled but audit-service is unreachable at {status_url}: {r.text}"
)

def validate_config(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't hate me. I'm tempted to ask if we get some docstrings for all these public functions. Granted I think most are self explanatory. But for consistency, eh 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants