SQL Server implementation #5

RobThree · 2024-04-22T13:25:44Z

This PR adds a Microsoft SQL Server implementation.

I have 'ported' the Postgres SQL statements to T-SQL as closely as I could. One thing I ran into is that the rowversion mechanism differs quite much between the two. As far as I could see the Outbox doesn't seem to use the rowversion anywhere(?), and I tried to keep the ProcessManager as close to the Postgres version as possible. One of the issues I had was that a rowversion (or timestamp) in Sql Server is an 8 byte (64 bit) value, however, the VersionedState requires an int for the version and since there's no (safe) way to convert a 64 bit value to a 32 bit value I opted to 'DIY' the rowversion; the MERGE statement should be atomic.

It's also not very clear to me what use the date_added_utc and date_updated_utc fields have, but maybe I'm missing some implementation details that Postgres has that I'm not aware of. I did keep (and populate) them in the Sql Server variant where applicable.

Also, there weren't (and aren't) any integrationtests as far as I can see for the Outbox. Again, I tried to keep this as close as possible to the Postgres variant but I haven't tested it (yet).

In the integrationtests project I renamed the KafkaFlowFixture to a more specific PostgreKafkaFlowFixture and then copied it to a SqlServerKafkaFlowFixture and changed it to work with SqlServer. Same goes for the LifecycleProcessManagerTests which I renamed to PostgresLifecycleProcessManagerTests and copied to SqlServerLifecycleProcessManagerTests and changed to work with SqlServer. This may need some work to de-duplicate (DRY) some of the integrationtests.

So, things that need attention:

Rowversion
Outbox isn't tested
Integrationtests code duplication

Since Sql Server (or rather: the SqlClient) handles connection pooling internally we don't need to set up a pool like in the Postgres variant. I did, however, have to create a small shared, common library called Contrib.KafkaFlow.SqlServer which is used by both KafkaFlow.Outbox.SqlServer and KafkaFlow.ProcessManagers.SqlServer. This library only contains a few extension methods to configure Sql Server usage and a SqlServerOptions record so we don't have to pass a "bare" connectionstring to the SqlServerOutboxBackend and SqlServerProcessManagersStore classes.

Other than that, I think this should at least be a good start for a SqlServer implementation.

⚠️ I couldn't get the initdb.sh script to trigger under the mssql container. I have a few solutions/workarounds but I am still deciding which one to use. Your opinion / help would be very much appreciated. For now you'll need to run the /docker-entrypoint-initdb.d/initdb.sh script manually before running the integrationtests. Fixed in 7385102

Oh, one more thing: It isn't very clear to me how you'd use this library in an existing project. I assume the developer is expected to create the outbox and processes tables "manually"? I would expect the table(s) creation to be handled either on demand by invoking some method or automatically the first time the database is approached or something?

Finally: I have some other ideas / plans:

I'd like to get rid of a lot of compiler warnings (4 currently) and messages (27 currently) by 'modernizing' the code a little (like using primary constructors, collection initializers etc)
EDIT: Most of this is done (see this branch). I'll create a new PR for this once this one has been merged.
I'd like to try lowering the requirement for .Net to, say .Net 7, 6, or even better: .Net Standard 2.0/2.1 or so, without too much effort or breaking stuff for broader platform support.
EDIT: Done; I managed to lower the requirement to .Net Standard 2.0 (see this branch or this commit). This means that even .Net Framework developers can use this library. I'll create a new PR for this once this one and the 'modernize' one (above) has been merged.
I referred to code duplication earlier in the integrationtests, but the actual implementations of the Postgres and SQL Server variants are also very close implementation-wise; I'd like to see if that could be relieved or solved by introducing some interfaces and making the code a little more generic.
EDIT: Done. See this branch
Use a schema as per @kchenery remarks: Done, see this branch
This project should have a LICENSE. Not sure what you prefer, but KafkaFlow is MIT and I usually also use MIT; but that's ofcourse 100% up to you.

RobThree · 2024-04-23T20:35:13Z

Sorry, just a tiny little nudge to let you know I have two more PR's ready (see the checkboxes above) and have updated the PR description above. Take your time. Thanks!

AlexeyRaga · 2024-04-24T05:33:02Z

Thanks @RobThree, awesome work! Really appreciate!
I'll see if I can get someone with much better SQL Server knowledge than my own to look at the PR and maybe to make some suggestions around your questions regarding VersionedState.

I know that in PG xmin is 32 bits and I see the complication here...
One thing that we could do is to make VersionedState to be 64bit, and unsafe-cast it to 32 bit for PG implementation... Under the premise that in practice PG will never exceed 32 bits...

But perhaps your "custom" solution would be sufficient enough, too! :)

Thanks again, and I tope to be back to you soon!

AlexeyRaga · 2024-04-24T05:37:24Z

As for "how to provision schemas", we typically do it from CI/CD pipelines. dbup is one way of doing it, but, really, depends.
What I did not want to do is to auto-provision these schemas from code.

Because giving DDL permissions to the service feels just wrong, and the migration story gets too complicated too quickly...

Of course, any suggestion to how to make it better is appreciated!

RobThree · 2024-04-24T08:04:58Z

What I did not want to do is to auto-provision these schemas from code.

Understand completely.

Of course, any suggestion to how to make it better is appreciated!

I'm fine with leaving it as-is, no problem. However, a compromise could be to have developers opt-in and call some InitDb() method for example which would create (or migrate) the required tables. But either way, I'm fine with whatever you prefer. Another suggestion would be to at least document the required tables and provide (links to) scripts in the documentation.

One thing that we could do is to make VersionedState to be 64bit, and unsafe-cast it to 32 bit for PG implementation... Under the premise that in practice PG will never exceed 32 bits...

~~I haven't thought this through (yet), nor looked into the feasibility, but maybe we can make it an abstract base class and derive a "PgVersionedState" and "SqlServerVersionedState" from them...~~
Nope, that won't work. And I'm like you not very comfortable unsafe casting the version. I'll think about it some more.

kchenery

@RobThree I think it would be good to tidy up the schema files. We shouldnt create the DB here - that should be up to the test fixture (I recommend Test Containers to simplify it too).

@AlexeyRaga I have a question about ordering for the outbox which might require a bit of a think/change for the rowversion column.

kchenery · 2024-04-24T10:26:58Z

src/Contrib.KafkaFlow.Outbox.SqlServer/schema/0001.Shema.sql

@@ -0,0 +1,5 @@
+IF NOT EXISTS(SELECT * FROM sys.databases WHERE name = 'kafkaflowtest')
+BEGIN
+	CREATE DATABASE [kafkaflowtest]


I think you should be creating the schema here, not the database. This might work for tests... but wont work for production environments. We cant assume the name of the DB there.

For tests - I'd be looking at using MSSQL Test Containers. @AlexeyRaga would you consider that + DbUp possibly embedded in the code startup (or similar)?

Back to here - I think this should be:

IF NOT EXISTS(SELECT * FROM sys.schemas WHERE name = 'outbox') BEGIN CREATE SCHEMA [outbox] END

There'll be similar changes below too.

I am happy to take the test containers route.
I have never done it myself though, so can't speak from experience :)

kchenery · 2024-04-24T10:28:49Z

src/Contrib.KafkaFlow.Outbox.SqlServer/schema/0002.Table.sql

+/* Table */
+IF OBJECT_ID(N'[kafkaflowtest].[dbo].[outbox]', N'U') IS NULL
+BEGIN
+    CREATE TABLE [kafkaflowtest].[dbo].[outbox](


We shouldn't specify the DB here. And if we change to outbox as the schema (to keep in line with the PG implementation) that would change to [outbox].[outbox]

e.g:

IF OBJECT_ID(N'[outbox].[outbox]', N'U') IS NULL BEGIN CREATE TABLE [outbox].[outbox](

kchenery · 2024-04-24T10:30:28Z

src/Contrib.KafkaFlow.ProcessManagers.SqlServer/schema/0002.Table.sql

+/* Table */
+IF OBJECT_ID(N'[kafkaflowtest].[dbo].[processes]', N'U') IS NULL
+BEGIN
+    CREATE TABLE [kafkaflowtest].[dbo].[processes](


Same comment as above with the outbox table regarding the database name and schema name.

kchenery · 2024-04-24T10:30:54Z

src/Contrib.KafkaFlow.ProcessManagers.SqlServer/schema/0001.Schema.sql

@@ -0,0 +1,5 @@
+IF NOT EXISTS(SELECT * FROM sys.databases WHERE name = 'kafkaflowtest')
+BEGIN
+	CREATE DATABASE [kafkaflowtest]


I think this should be setting the schema and not the DB name. As per my comment for the outbox table

src/Contrib.KafkaFlow.Outbox.SqlServer/SqlServerOutboxBackend.cs

RobThree · 2024-04-24T11:02:28Z

@kchenery As per your database/schema remarks: I agree, will fix that. I am currently working on separating the actual database implementation from the rest of the logic and I'm close to done with that.

~~As for the rowversion: in refactoring I found another issue with the rowversion: It is returned by nPgSql as uint but can't be set as uint? I may be mistaken though, looking into it as we speak.~~

I'll update here when I'm done with the code-deduplication / separating RDBMS specific stuff.

RobThree · 2024-04-24T12:18:06Z

Would anyone have an objection to changing the sequence_id from SERIAL to BIGSERIAL? As an alternative I can use int for SQL Server instead of bigint. But I think using a bigint / BIGSERIAL is better?

RobThree · 2024-04-24T12:48:04Z

I am currently working on separating the actual database implementation from the rest of the logic and I'm close to done with that.

Done.

@kchenery As per your database/schema remarks: I agree, will fix that.

Schema is fixed here, see 45abefa

So, summarizing: I updated the PR request and have 4 branches ready to PR after this one (I could make it a huge PR, but I don't think anyone would enjoy that 😜 ). In order, they are: modernize (no functional changes), lower-framework-support changes required .Net version to .Net Standard 2.0, code-deduplication refactors RDBMS specific stuff to "repositories" (better name suggestions welcome) and mssql-use-schema to use schema's in the integration tests for MSSQL.

What's left is:

Can we switch to BIGSERIAL (which will make things easier and level the playingfield). Alternatively I can switch to using integers for the Sql Server implementation
@kchenery's question about rowversion. I'm happy to work on that once we make a decision
See PR description: look into adding a LICENSE?

AlexeyRaga · 2024-04-25T01:25:56Z

But I think using a bigint / BIGSERIAL is better?

I think it'd be a good change. AFAIK SERIAL is not reset when the data is deleted from the table, so reaching 2000K and then getting an overflow error seems possible with time.

I think we should do it (and write a migration script).

What do you think @kchenery ?

kchenery · 2024-04-25T01:43:47Z

DB change is trivial... its basically just this migration script:

ALTER TABLE outbox.outbox ALTER COLUMN sequence_id TYPE BIGINT

I'm unsure how long that'll take on a large table... but peoples outboxes should be mostly empty.

I'll submit a PR for you for that. I haven't looked what code you'd have to change and you'll want to do that first.
Update: See #7

RobThree · 2024-04-25T07:25:30Z

I have already changed the existing scripts and code. I propose we (work on) merge the branches up to now and then see where we stand? (Because I'm kind of losing oversight otherwise 😓 ). It should be alright - as long as you agree with the other changes I made - and then we can add a migration and make final changes (if any) before a new release? Or am I getting ahead of myself?

AlexeyRaga · 2024-04-25T08:32:24Z

I'm unsure how long that'll take on a large table... but peoples outboxes should be mostly empty.

It should mostly be empty... But to my knowledge sequences don't get reset even if the rows are deleted from the database... Am I wrong?

Also, as @RobThree mentioned, shouldn't it be BIGSERIAL and not BIGINT?

kchenery · 2024-04-25T09:41:35Z

Deletes do not reset serial values. Truncates do not either (unlike SQL Server which does reset the identity values).

And no, it should be BIGINT. I know cause I had to test it 😄 When you do the change, GUIs will still show it as BIGSERIAL though. Guessing a bit - you're changing the underlying data type and not so much its behaviour.

AlexeyRaga · 2024-04-25T10:42:36Z

OK, we've merged the BIGINT/BIGSERIAL thing in.

@RobThree I guess that we can merge this one too once Kent's suggestions about database/schema are addressed.

RobThree · 2024-04-25T11:04:46Z

Should be good. Though this still doesn't use a schema (which is in another branch later); have you had a chance to look at the other changes I made?

AlexeyRaga · 2024-04-25T14:31:54Z

Not really, I only at what is in this PR...

Can we have the database/schema changes as a part of the PR? Looks like you already have it here so shouldn't be a great trouble?

RobThree · 2024-04-25T14:34:34Z

No, shouldn't be a problem, I just need to rebase some stuff. Let me see what I can do.

RobThree · 2024-04-25T14:45:36Z

Done. Should be good to go.

src/Contrib.KafkaFlow.ProcessManagers.SqlServer/SqlServerProcessManagersStore.cs

AlexeyRaga · 2024-04-26T00:26:41Z

Thanks @RobThree, AWESOME contribution! Really appreciated!

AlexeyRaga · 2024-04-26T00:58:57Z

Unfortunately, integration tests are failing...

AlexeyRaga · 2024-04-26T01:46:52Z

Fixed! The new set of packages is released!

RobThree added 7 commits April 22, 2024 09:54

Update packages

26a7680

Rename "db" container to "postgres"

be6f66f

Add Sql Server (MSSQL) and init scripts.

5e0e998

Initial SQL Server implementation

15e47a9

Update README

febba9e

Refactor and more/better useage of options.

0b225b2

Run init.db on startup. This ain't pretty but it works.

7385102

kchenery suggested changes Apr 24, 2024

View reviewed changes

Fix concurrency bug

fd57a9b

Merge branch 'AlexeyRaga:main' into mssql

4a78a21

Use schema instead of database

1d23eff

kchenery suggested changes Apr 25, 2024

View reviewed changes

src/Contrib.KafkaFlow.ProcessManagers.SqlServer/SqlServerProcessManagersStore.cs Outdated Show resolved Hide resolved

Use explicit schema

822d348

AlexeyRaga approved these changes Apr 26, 2024

View reviewed changes

AlexeyRaga merged commit f87a685 into AlexeyRaga:main Apr 26, 2024

RobThree deleted the mssql branch April 26, 2024 08:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL Server implementation #5

SQL Server implementation #5

RobThree commented Apr 22, 2024 •

edited

Loading

RobThree commented Apr 23, 2024

AlexeyRaga commented Apr 24, 2024

AlexeyRaga commented Apr 24, 2024

RobThree commented Apr 24, 2024 •

edited

Loading

kchenery left a comment

kchenery Apr 24, 2024 •

edited

Loading

AlexeyRaga Apr 24, 2024

kchenery Apr 24, 2024

kchenery Apr 24, 2024

kchenery Apr 24, 2024

RobThree commented Apr 24, 2024 •

edited

Loading

RobThree commented Apr 24, 2024 •

edited

Loading

RobThree commented Apr 24, 2024 •

edited

Loading

AlexeyRaga commented Apr 25, 2024

kchenery commented Apr 25, 2024 •

edited

Loading

RobThree commented Apr 25, 2024 •

edited

Loading

AlexeyRaga commented Apr 25, 2024

kchenery commented Apr 25, 2024 •

edited

Loading

AlexeyRaga commented Apr 25, 2024

RobThree commented Apr 25, 2024

AlexeyRaga commented Apr 25, 2024

RobThree commented Apr 25, 2024

RobThree commented Apr 25, 2024

AlexeyRaga commented Apr 26, 2024

AlexeyRaga commented Apr 26, 2024

AlexeyRaga commented Apr 26, 2024

SQL Server implementation #5

SQL Server implementation #5

Conversation

RobThree commented Apr 22, 2024 • edited Loading

RobThree commented Apr 23, 2024

AlexeyRaga commented Apr 24, 2024

AlexeyRaga commented Apr 24, 2024

RobThree commented Apr 24, 2024 • edited Loading

kchenery left a comment

Choose a reason for hiding this comment

kchenery Apr 24, 2024 • edited Loading

Choose a reason for hiding this comment

AlexeyRaga Apr 24, 2024

Choose a reason for hiding this comment

kchenery Apr 24, 2024

Choose a reason for hiding this comment

kchenery Apr 24, 2024

Choose a reason for hiding this comment

kchenery Apr 24, 2024

Choose a reason for hiding this comment

RobThree commented Apr 24, 2024 • edited Loading

RobThree commented Apr 24, 2024 • edited Loading

RobThree commented Apr 24, 2024 • edited Loading

AlexeyRaga commented Apr 25, 2024

kchenery commented Apr 25, 2024 • edited Loading

RobThree commented Apr 25, 2024 • edited Loading

AlexeyRaga commented Apr 25, 2024

kchenery commented Apr 25, 2024 • edited Loading

AlexeyRaga commented Apr 25, 2024

RobThree commented Apr 25, 2024

AlexeyRaga commented Apr 25, 2024

RobThree commented Apr 25, 2024

RobThree commented Apr 25, 2024

AlexeyRaga commented Apr 26, 2024

AlexeyRaga commented Apr 26, 2024

AlexeyRaga commented Apr 26, 2024

RobThree commented Apr 22, 2024 •

edited

Loading

RobThree commented Apr 24, 2024 •

edited

Loading

kchenery Apr 24, 2024 •

edited

Loading

RobThree commented Apr 24, 2024 •

edited

Loading

RobThree commented Apr 24, 2024 •

edited

Loading

RobThree commented Apr 24, 2024 •

edited

Loading

kchenery commented Apr 25, 2024 •

edited

Loading

RobThree commented Apr 25, 2024 •

edited

Loading

kchenery commented Apr 25, 2024 •

edited

Loading