Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Microsoft.Azure.EventHubs.Processor with Azure.Messaging.EventHubs.Processor #310

Closed
cgillum opened this issue Sep 28, 2023 · 5 comments · Fixed by #385
Closed
Labels
P1 Priority 1

Comments

@cgillum
Copy link
Member

cgillum commented Sep 28, 2023

This is required for internal Microsoft component governance (internal Microsoft link).

The problem is that Microsoft.Azure.EventHubs.Processor (which has been deprecated) has a dependency on old ADAL libraries, and there is a high-level mandate to get rid of all ADAL usage across the company for security and compliance reasons.

The compliant version of the event hub processor can be found here: https://www.nuget.org/packages/Azure.Messaging.EventHubs.Processor/

Importantly, the scale controller will also need to be updated with a new build that uses the updated event hubs processor. Alternatively, if the scale controller doesn't require the Event Hubs processor (I assume it doesn't), we can create a version of the Netherite client library that doesn't include this dependency.

@sebastianburckhardt
Copy link
Member

FYI, some Netherite users have also asked about this since this deprecation is causing them similar issues.

We will need to provide some guidance/timeline as to when we expect to be able to remove this dependency.

@NickCraver
Copy link
Member

NickCraver commented Sep 28, 2023

Adding info here: I tried to do a spike migration to see how much effort would be involved here. Here's a quick link to the guide as a ref as well: https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/eventhub/Azure.Messaging.EventHubs/MigrationGuide.md

Overall:

  • The way clients work has fundamentally changed - they aren't per-partition anymore but are send/receive with control of partition within as well as some for processing and such (our main case, I think). This affects everything from connection strings down.
    • Subset: Partition shutdowns are different so we'd have to handle shutdown differently too - this may get into lifetime mechanics and become very fun
  • Event processors are gone in Azure.Messaging.EventHubs (things are not event based as hookups instead of implementing IEventProcessor
  • Offsets/SequenceNumber/etc. have moved but are still there. In general though I think the iteration mechanics are meant to be IEnumerable/handled for you in most cases where a lot of it's manual in code today. This should be net cleaner, but isn't a quick port
  • PartitionEvents are not structs, not subclassed
  • General close mechanics differ overall (unsure of the extent, but reasons/handling differ for sure)
  • Besides those, lots of misc metadata changes.

My read is a split of functionality between 2 libs is likely the most practical option here, e.g. DurableTask.Netherite.EventHubs or some such given needs and timelines.

I'm not Event Hubs expert by any means, but happy to help on package/dependencies and such if able, please ping away. This will help a lot of other teams get off those older dependencies getting flagged.

@NickCraver
Copy link
Member

I figured the easiest way to figure what blockers if any we have would be to migrate this real quick into a split project to see what that looks like. I did that this morning here: #311.

There are 2 reference points we could do a few different things to avoid, but I'm not sure how they're used - need some team expertise when y'all have time for eyes and thoughts please.

@bachuv bachuv added P1 Priority 1 and removed Needs: Triage 🔍 labels Oct 3, 2023
@davidmrdavid
Copy link
Member

davidmrdavid commented Oct 10, 2023

This will probably be useful material for the migration: https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/eventhub/Azure.Messaging.EventHubs/MigrationGuide.md

Update - (nvm, I see nick had already posted this :-) )

@sebastianburckhardt
Copy link
Member

I took a closer look at what it takes to migrate. From what I can see, there are two kinds of issues:

  1. significant superficial changes. This includes renaming of many classes, and differences in how events are delivered.
  2. differences in supported functionality. Some previously supported features of the event processors were entirely removed, such as the availability to customize the checkpointing process, or the availability to receive batches of events.

The superficial changes 1) mostly just require diligent reading of the docs and figuring out how to adapt things. It is more complicated than simple renaming, but not terrible. With the removed features (2) it is more difficult to figure out the impact and exactly how much work is required.

Currently I think processing events without batching is not ideal but acceptable because I recently added a blob-batching optimization that should balance this out somewhat.

Not sure yet about the consequences of the lack of checkpoint customization. Since we can no longer control where those checkpoints are stored, we can no longer have the clean storage organization we had before, which creates issues when users create and delete task hubs (previously we had all persistent data for a task hub stored in one place so it was easy to delete). It also creates a problem of migrating checkpoints, which EH does not do (it just gives us sample code on how to do it which is of course quite a bit of work for since that migration would have to happen automatically for our customers).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 Priority 1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants