Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MoveTables: adjust datetimes when importing from non-UTC sources into UTC targets #10102

Merged
merged 15 commits into from
Apr 27, 2022

Conversation

rohit-nayak-ps
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps commented Apr 15, 2022

Description

MoveTables is a common technique used to import data into Vitess. Many users run their MySQL servers using the SYSTEM timezone and/or storing datetime fields in local time. Vitess recommends running clusters in UTC, expecting apps to do the conversion to local time using, for example, convert_tz() in queries, :application_timezone in the mysql2 driver, or setting @@session.time_zone if using only timestamp data types.

Datetime values are usually stored using datetime or timestamp columns. MySQL stores timestamp s internally using UTC, but datetime is effectively a string value which is timezone-unaware.

If you import data from a server running in a different time zone into a server using UTC. timestamps will be correctly reproduced but datetimes can result in bugs which are difficult to track and fix. One such scenario is:

  • User is running a cluster in Berlin, using SYSTEM time zone with the server time zone being local.
  • The table contains a datetime column with current_timestamp as the default.
  • On the source any inserts into the table will contain the current Berlin local time.
  • The table is moved into a Vitess cluster, also using SYSTEM but with the UTC time zone.
  • Today the datetimes will be copied as-is.
  • The next insert will store the current time in UTC.
  • So some values will be in the Berlin time zone and others in UTC. Since the datetime column has no time zone information the app has no way to differentiate between these.

This PR adds an option to MoveTables to specify a -source_time_zone. If specified, VReplication will convert datetime values into UTC before storing it.

Changes Made

User Facing

  • a new option -source_time_zone added to MoveTables.
  • Workflow Show now displays SourceTimeZone

Internal

  • a new attribute SourceTimeZone has been added to BinlogSource (whose proto is stored in the source column of _vt.vreplication).
  • VDiff has been modified to use BinlogSource.SourceTimeZone to correctly compare datetimes.
  • an e2e test TestMoveTablesTZ reflects the supported use case

Not supported

  • non-datetime fields, such as textual fields which store datetimes: conversion is only for datetime data types
  • tables where datetime columns are stored in varying time zones: you can either convert all datetimes or none
  • automatically doing the conversion to local time in Vitess queries: the application has to be locale-aware and has to itself handle the conversion from UTC timestamps
  • automatically detecting the source timezone: while this is possible we don't want to always convert datetime fields because we don't know what time zone (local or utc) the application has stored the values in.

Todos

  • Do reverse adjustments in the reverse vreplication streams

Related Issue(s)

Checklist

  • "Backport me!" label has been added if this change should be backported
  • Tests were added or are not required
  • Documentation was added or is not required - TBD

@rohit-nayak-ps rohit-nayak-ps added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VReplication release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) Skip Upgrade Downgrade labels Apr 15, 2022
Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The actual code change is some 10 lines, with a gazillion changes on top just to support the new information and to validate vdiff.

@rohit-nayak-ps
Copy link
Contributor Author

@shlomi-noach, I just realized that we are only supporting the forward migration of data at the moment: reverse replication streams are not setup correctly. Many users do like to run reverse streams into existing setups for a while before switching over entirely to Vitess. So we will need to add a TargetTimeZone in addition to the current SourceTimeZone in the BinlogSource proto for the reverse vreplication streams to work correctly.

It is a simple change and I had instinctively started with that design even for forward workflows, but thought I was over-designing since we wanted to fix the target to UTC.

Once this is implemented it means we can intrinsically support any time zone on source and target sides, but for now MoveTables will only allow the source time zone to be specified and default target time zone to UTC.

@shlomi-noach
Copy link
Contributor

I just realized that we are only supporting the forward migration of data at the moment: reverse replication streams are not setup correctly.

Oh that's right! Sounds good.

@rohit-nayak-ps rohit-nayak-ps marked this pull request as ready for review April 21, 2022 14:01
@rohit-nayak-ps
Copy link
Contributor Author

@shlomi-noach, I just realized that we are only supporting the forward migration of data at the moment: reverse replication streams are not setup correctly. Many users do like to run reverse streams into existing setups for a while before switching over

Added support for reverse workflows.

Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it LGTM. Nice work on this!

I do think that we should add a test for when an invalid named time zone is specified for the flag, and I think that we should check for that and handle it in the command too.

Let's chat about it on Slack if I missed or misunderstood anything.

Thanks!

go/test/endtoend/vreplication/tz.sql Show resolved Hide resolved
go/test/endtoend/vreplication/time_zone_test.go Outdated Show resolved Hide resolved
go/vt/vtctl/vtctl.go Show resolved Hide resolved
go/test/endtoend/vreplication/time_zone_test.go Outdated Show resolved Hide resolved
go/vt/wrangler/vdiff.go Outdated Show resolved Hide resolved
go/vt/wrangler/vdiff.go Outdated Show resolved Hide resolved
go/vt/wrangler/vdiff.go Show resolved Hide resolved
go/vt/wrangler/vdiff.go Outdated Show resolved Hide resolved
Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @rohit-nayak-ps! I had one specific request, we can clear that one up quickly and then it's good!

go/vt/wrangler/materializer.go Outdated Show resolved Hide resolved
go/vt/wrangler/materializer.go Outdated Show resolved Hide resolved
@mattlord mattlord self-requested a review April 26, 2022 19:06
Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for hanging in there with me. ❤️

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
…ble plan builder and convert datetime columns if option is provided. e2e test demonstrates this works. VDiff needs to be modified

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: VReplication release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants