Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better logging #1922

Closed
10 tasks done
jemrobinson opened this issue May 30, 2024 · 15 comments
Closed
10 tasks done

Better logging #1922

jemrobinson opened this issue May 30, 2024 · 15 comments
Assignees
Labels
enhancement New functionality that should be added to the Safe Haven
Milestone

Comments

@jemrobinson
Copy link
Member

jemrobinson commented May 30, 2024

✅ Checklist

  • I have searched open and closed issues for duplicates.
  • This is a request for a new feature in the Data Safe Haven or an upgrade to an existing feature.
  • The feature is still missing in the latest version.
  • I have read through the documentation.
  • This isn't an open-ended question (open a discussion if it is).

🍓 Suggested change

We should ensure that our logging is improved by:

  • logging more things
  • alerting when things are not as expected

🚂 How could this be done?

Could consolidate in LogAnalyticsWorkspace and send to e.g. GreyLog? Not sure on the best way to alert

Task list

@jemrobinson jemrobinson added the enhancement New functionality that should be added to the Safe Haven label May 30, 2024
@jemrobinson jemrobinson added this to the Release 5.1.0 milestone May 30, 2024
@JimMadge
Copy link
Member

JimMadge commented Aug 6, 2024

Related to #1931

@JimMadge
Copy link
Member

JimMadge commented Nov 15, 2024

Azure monitor data source

Log sources are:

  • Azure resources (most azure resources, I think this is basic data like SKU, state, ...)
  • Entra ID (audit data, authentication events etc.)
  • App insights (data from app services like webapps)
  • Virtual machines (e.g. syslog)
  • Kubernetes Clusters
  • Custom sources, using the REST API

I think our working assumption was that using the Azure logging resources would mean we are easily able to ingest logs from Azure resources, whereas using an alternative log manager would involve lots of work to ingest logs.

The number of things which Azure monitor can ingest natively is fairly limited though so there might not be such a big advantage to using it.

@craddm
Copy link
Contributor

craddm commented Nov 15, 2024

Additional logs to investigate:

  • Network events
    • Firewall logs
    • NSG logs
  • Service logs
    • Containers
      • apricot
      • guac user sync
      • Nexus allowlist
      • Gitea
      • Hedgedoc
  • Event logs
    • Data ingress
    • Data egress
  • Authentication logs
    • From VM (PAM, ldap)
    • Entra (would need to work cross tenant)

@craddm
Copy link
Contributor

craddm commented Nov 15, 2024

Logging from container instances https://learn.microsoft.com/en-us/azure/container-instances/monitor-azure-container-instances

Log Analytics workspaces provide a centralized location for storing and querying log data not only from Azure resources, but also on-premises resources and resources in other clouds. Azure Container Instances includes built-in support for sending logs and event data to Azure Monitor logs.

To send container group log and event data to Azure Monitor logs, specify an existing Log Analytics workspace ID and workspace key when configuring a container group.

Seems like this must be configured when deploying a container group.

In pulumi-azure-native https://www.pulumi.com/registry/packages/azure-native/api-docs/containerinstance/containergroup/#loganalytics

@craddm
Copy link
Contributor

craddm commented Nov 15, 2024

  • Can't enable LogAnalytics/Diagnostic settings directly on the Entra tenants we use, since these features require a subscription on the tenant
  • Simplest solution probably just to direct people to the right place, rather than try to find a way to ingest these logs into a central Log Analytics Workspace
  • Logs available on Entra
    • User sign-in
    • Audit
    • Provisioning

@JimMadge
Copy link
Member

firewall monitoring

@craddm
Copy link
Contributor

craddm commented Nov 18, 2024

One complexity is that network flow logs require a Network Watcher.
There is only one network watcher per region per subscription, which by default gets automatically created when any virtual network gets created within the subscription.

https://learn.microsoft.com/en-us/azure/network-watcher/network-watcher-overview

This means we need to import the existing network watcher resource rather than create it and manage it ourselves (and assume, if there is not one already, that it'll be created during our deployment). May be an issue if users have already created their own network watcher with a non-default name etc

@jemrobinson
Copy link
Member Author

It's fine to postpone network flow logs for a future release if it's not a quick fix. Better to have some things working (and know which ones need to be added later) than have nothing ready until it all is.

@JimMadge JimMadge removed the status in Data Safe Haven Nov 21, 2024
@JimMadge JimMadge removed the status in Data Safe Haven Nov 25, 2024
@JimMadge JimMadge mentioned this issue Nov 25, 2024
3 tasks
@JimMadge
Copy link
Member

Looks like there isn't anything useful we can get out of the virtual networks/NSGs.

Metrics are not supported by log analytics, and @craddm noted an issue with the flow logs (not actually sure what those are).

@jemrobinson
Copy link
Member Author

Looks like virtual network flow logs are the new thing (NSG flow logs deprecated from 2027). Not sure if they are any easier to use though.

@JimMadge
Copy link
Member

It seems like most 'Azure native' resources collect logs and metrics, which you can collect in a log analytics workspace using a Diagnostic setting (in pulumi).

This was referenced Nov 26, 2024
@craddm
Copy link
Contributor

craddm commented Nov 26, 2024

Looks like virtual network flow logs are the new thing (NSG flow logs deprecated from 2027). Not sure if they are any easier to use though.

Also won't be able to create new NSG flow logs from June 30 2025

Assume the pulumi flow logs are virtual network flow logs rather than NSG flow logs, but it's not explicitly stated https://www.pulumi.com/registry/packages/azure-native/api-docs/network/flowlog/

@craddm
Copy link
Contributor

craddm commented Nov 26, 2024

Looks like there isn't anything useful we can get out of the virtual networks/NSGs.

Metrics are not supported by log analytics, and @craddm noted an issue with the flow logs (not actually sure what those are).

flow logs should keep track of things like network traffic (source and destinations) and how the NSG rules are being applied

@JimMadge
Copy link
Member

JimMadge commented Nov 26, 2024

flow logs should keep track of things like network traffic (source and destinations) and how the NSG rules are being applied

I think that is what we would want. But it sounds difficult because it has to use the subscription wide network watcher?

If so, I think it would be OK to leave it for now.

@craddm
Copy link
Contributor

craddm commented Nov 26, 2024

flow logs should keep track of things like network traffic (source and destinations) and how the NSG rules are being applied

I think that is what we would want. But it sounds difficult because it has to use the subscription wide network watcher?

If so, I think it would be OK to leave it for now.

Yes, I'm not sure if there's a way to get Pulumi to import it if it exists or create it if it doesn't - I think it's a problem like this Pulumi issue

@craddm craddm mentioned this issue Nov 26, 2024
3 tasks
This was referenced Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New functionality that should be added to the Safe Haven
Projects
Status: Done
Development

No branches or pull requests

3 participants