Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input crashes on TZInfo::AmbiguousTime Error when setting jdbc_default_timezone #121

Closed
proligde opened this issue Mar 11, 2016 · 26 comments

Comments

@proligde
Copy link

Hi there! First of all - thanks a lot for this plugin - it's great!

However, when using my MySQL DB's timestamp in elasticsearch/kibana I noticed it was off by 1h since the MySQL timestamp is interpreted as UTC but in fact is CET.

So i added:
jdbc_default_timezone => "Europe/Berlin"

But now my logstash-pipeline crashes fatal resulting in an endless loop of trying to read data from MySQL with this error:

A plugin had an unrecoverable error. Will restart this plugin.
  Plugin: <LogStash::Inputs::Jdbc jdbc_connection_string=>"jdbc:mysql://localhost:3306/MYDBNAME?zeroDateTimeBehavior=convertToNull", jdbc_user=>"MYDBUSER", jdbc_password=><password>, jdbc_driver_library=>"/driver/mysql-connector-java-5.1.35-bin.jar", jdbc_driver_class=>"com.mysql.jdbc.Driver", jdbc_default_timezone=>"Europe/Berlin", statement_filepath=>"/config/queries/shop_order_item.sql", type=>"shop_order_item", codec=><LogStash::Codecs::Plain charset=>"UTF-8">, jdbc_paging_enabled=>false, jdbc_page_size=>100000, jdbc_validate_connection=>false, jdbc_validation_timeout=>3600, jdbc_pool_timeout=>5, sql_log_level=>"info", parameters=>{"sql_last_value"=>1970-01-01 00:00:00 UTC}, last_run_metadata_path=>"/var/lib/logstash/.logstash_jdbc_last_run", use_column_value=>false, clean_run=>false, record_last_run=>true, lowercase_column_names=>true>
  Error: TZInfo::AmbiguousTime: 2015-10-25T02:49:45+00:00 is an ambiguous local time. {:level=>:error}

I'm aware that this might probably more like an tzinfo related problem (tzinfo/tzinfo#32) but I don't see any other possibility to make it work correctly.

Exporting a UNIX-timestamp in MySQL and using a date input filter combined with the "unix" timestamp parser is (for whatever reason) painfully slow resulting in jammed elasticsearch input queues.

Best - Max

@proligde proligde changed the title Input crashes on TZIngo::AmbiguousTime Error when setting jdbc_default_timezone Input crashes on TZInfo::AmbiguousTime Error when setting jdbc_default_timezonf Mar 11, 2016
@proligde proligde changed the title Input crashes on TZInfo::AmbiguousTime Error when setting jdbc_default_timezonf Input crashes on TZInfo::AmbiguousTime Error when setting jdbc_default_timezone Mar 11, 2016
@razafinr
Copy link

Hi!

I am having the same error:

Error: TZInfo::AmbiguousTime: 2015-10-25T02:10:03+00:00 is an ambiguous local time

It is quite a problem that this is causing logstash to restart endlessly.

@proligde
Copy link
Author

As far as I understand the issue logstash-input-jdbc would have to let us configure (like tzinfo does it) what fallback (pick first or second point in time) to perform in case of an ambigous time stamp. As long as it doesn't to this it's somewhat "correct" to fail fatally.

Here is my understanding of the whole thing - please correct me if I'm wrong:

  • We have source timestamps in let's say CE(S)T timezone supporting daylight saving time.
  • Events with timestamps during summer/winter-time shifting (twice a year) can't be mapped precisely because at 3am the clocks are set back to 2am, so every timestamp between 2am and 3am exists twice.
  • Thus events within this time range can't be mapped correctly and throw an fatal error.
  • As far as I understand tzinfo has an option to force such timestamps to either the earlier or the later timestamp equivalent in UTC.
  • When we could configure that in logstash-input-jdbc's yml as well to pass it to tzinfo we could get rid of the problem for the cost of losing the exakt time during summer/winter-time shifting which can't be avoided due to logical reason pointed out before.

I know - that's all evil stuff and everyone should only write UTC timestamps to avoid such awkwardness but in reality there are a lot of databases out there logging stuff in local timezones that support daylight saving time shifts - so support for that would be great :-)

@razafinr
Copy link

I totally agree with you (timezones are just so annoying to handle).

Probably the local_to_utc conversion is global to all logstash plugins and not for the jdbc input only ? I don't know if it is a specific fix for this plugin or for logstash itslef.

@proligde
Copy link
Author

Good point - I don't know either. If one of the contributors of this plugin could confirm or negate that we could keep this ticket here or move it to the proper elastic project instead.

I guess in the end there should be a global or plugin specific setting, leading to these calls:

tz.local_to_utc(d) { |periods| periods.first }
or
tz.local_to_utc(d) { |periods| periods.last }

as mentioned in the linked tzinfo issue.

@mbertani
Copy link

mbertani commented Jun 3, 2016

I have the same issue here. I am importing data from a database with local timestamps. Even handling the exception and returning a _dateparsefailure in logstash would help. Then we could redirect the output to a file an parse accordingly.

@razafinr
Copy link

Coming back to this issue, I think that the problem comes from the named_timezoneextension used by Sequel.

As you can see here:

https://github.com/jeremyevans/sequel/blob/ac925ce9556f33d56f49b84d905d307c6a621716/lib/sequel/extensions/named_timezones.rb

There is an attribute tzinfo_disambiguatorwhich could be used to remove the ambiguity by choosing wich period to use periods.first or periods.last in the application.

If I could send a PR I would, but I am really unfamiliar with ruby language so I hope this will help to fix this issue.

@hameno
Copy link

hameno commented May 25, 2017

Any update or workaround for this? My import job is now stuck with this error...

@passing
Copy link

passing commented Aug 3, 2017

we are using the following workaround to convert CET dates without timezone to UTC in the SQL statement (running against oracle database):

select from_tz(cast(<column> as timestamp), 'CET') at time zone ('UTC') "@timestamp"
from <table>
where ...

then ensure that timezone setting of the local machine or "jdbc_default_timezone" is UTC

maybe something similar is possible with MySQL?

@yaauie
Copy link
Contributor

yaauie commented May 23, 2018

Unfortunately, the setting linked for supplying a tzinfo_disambiguator is a global setting for Sequel's named timezone extension, which means we don't have a way of configuring a particular input's handling.

In a brief spike, I have created a timezone proxy object using SimpleDelegator, whose period_for_local has an implicit preference for DST being either enabled or disabled, which seems to work:

# Quacks like a Timezone
class TimezoneWrapper < SimpleDelegator
  ##
  # Wraps a ruby timezone object in an object that has an explicit preference in time conversions
  # either for or against having DST enabled.
  #
  # @param timezone [String,TZInfo::Timezone]
  # @param dst_enabled_on_overlap [Boolean] (default: nil) when encountering an ambiguous time,
  # declare a preference for selecting the option with DST either enabled or disabled.
  def self.wrap(timezone, dst_enabled_on_overlap)
    timezone = TZInfo::Timezone.get(timezone) if timezone.kind_of?(String)
    new(timezone, dst_enabled_on_overlap)
  end

  ##
  # @api private
  def initialize(timezone, dst_enabled_on_overlap)
    super(timezone) # SimpleDelegator
    @dst_enabled_on_overlap = dst_enabled_on_overlap
  end

  ##
  # @override `Timezone#period_for_local`
  # inject an implicit preference for DST being either enabled or disabled if called
  # without an explicit preference
  def period_for_local(value, dst_enabled_on_overlap=nil, &global_disambiguator)
    dst_enabled_on_overlap = @dst_enabled_on_overlap if dst_enabled_on_overlap.nil?
    __getobj__.period_for_local(value, dst_enabled_on_overlap, &global_disambiguator)
  end
end

In this way, we could set the Database#timezone=(tz) with our proxy object, which would resolve ambiguous timestamps as configured:

if @jdbc_default_timezone
  @database.timezone = TimezoneWrapper.wrap(@jdbc_default_timezone, @jdbc_default_timezone_dst_enabled_on_overlap)
end

A couple open questions:

  • what should this has-opinions-on-DST timezone proxy be called?
  • how should we expose the option? jdbc_default_timezone_dst_enabled_on_overlap is a bit cumbersome.

(cc: @guyboertje @jakelandis)

@marbin
Copy link

marbin commented Oct 30, 2018

I agree with @mbertani: Would help a lot just to get a proper exception and a _dateparsefailure or something similar. Now, logstash silently crashes and drops events. Took me a long time to figure out what was going on.

@marbin
Copy link

marbin commented Oct 30, 2018

I don't really understand why this is not labeled as a bug. If I understand this correctly, the jdbc_default_timezone cannot be used in countries using daylight saving time.

@guyboertje
Copy link
Contributor

@marbin
The bug only surfaces once a year, when setting the jdbc_default_timezone and the input is processing database records that have timestamps recorded during the change from daylight savings and the timezone in use has ambiguous results, i.e. ones that observe daylight savings.

As @yaauie points out, the setting that is made available for us to change in Sequel (the 3rd party ORM) that we use, is a global setting meaning that all jdbc inputs, jdbc_streaming and jdbc_static filters in a LS instance (across all pipelines) will have the same disambiguation set. Due to this global effect, setting a timezone in one plugin affects all plugins.

Unfortunately, either way the disambiguation proc chooses (first or last) will create some timestamps that overlap - there is no way to avoid this.

@marbin
Copy link

marbin commented Nov 14, 2018

@guyboertje You mean that just because the bug only occurs in timezones with daylight savings, i.e. all of Europe, US, Australia etc and just once a year, it is not a bug ;-)
The millennium bugs were no bugs because they only occured once in a millennium? ;-)

Joking aside, In my case, we had several pipelines that just crashed and it was pretty tricky to figure out what was going on, since there were no error messages in the logs. Had to turn on debugging to find it.
Then I had to hack the last_run_metadata to skip the events that occured during the hour of the daylight saving time change, losing all these events. And all these pipelines were down until we had found and fixed the issue.

I understand this is a hard one to fix, but I think it would make sense to track it as a bug and maybe a warning in the docs for the jdbc_default_timezone feature, since using it with a timezone with daylight saving WILL crash your pipeline.

@guyboertje
Copy link
Contributor

@marbin
There is a bug or three flying about somewhere. The 'enhancement' here plus the global nature of the sequel library plus the scheduler also has problems with TZ bugs (some other plugins have pinned rufus/scheduler at an old old version and we need a big test to see what will fail if we remove the pinning). I'm not saying its not a bug.

We do have plans to switch all our jdbc plugins to a common reusable Java framework based around HikariCP, Vanilla JDBC and Quartz but this is a lot of work but we feel there are many niggling faults with the current code across all jdbc plugins that warrant the effort.

Therefore, fixing these issues and having the current set of tests pass is the goal of the move to Java project.

@science695
Copy link

This "bug" doesn't only happen if you have data with a date in this 1 hour interval, it also does it you have a job that runs in that hour using a datefield as the index column.

If this happens, then an ambiguous date is stored in the meta file and that job will fail until you manually change / clear that meta file.

@marcoacm
Copy link

marcoacm commented Apr 9, 2020

I am having this issue now. Any changes? Any other workarounds?

@spawnrider
Copy link

Same for me. Any update on this ?

@bjolivot
Copy link

bjolivot commented Dec 8, 2021

hello, any news ?

@jehutywong
Copy link

Same here. any workaround example for mysql 5.5?

@bjolivot
Copy link

Hello new hit on that bug tonight, no progress on the AmbiguousTime problem ?

@Rick25-dev
Copy link

Problem occurs here as well, no logs are loaded since hour change, whats the progress on this issue?

@s137
Copy link

s137 commented Jan 11, 2023

I have encountered the same issue, any news on when this is going to be resolved?
There is already a PR waiting for merge in the repo of the integration plugin: logstash-plugins/logstash-integration-jdbc#70

@flyingman15
Copy link

As long as jdbc crashes on this, for MySQL, you can try to avoid the issue like this :
statement => "SELECT CONVERT_TZ(date_created, 'Europe/Paris', 'UTC') AS date_created FROM your_table
jdbc_default_timezone => "GMT"

(You'll need to populate tz infos of your mysql db first if it's not already the case with :
mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql)

CONVERT_TZ should handle properly this annoying AmbiguousTime Error

@s137
Copy link

s137 commented Nov 7, 2023

Any updates on this? Unfortunately it's currently not really feasible for us to convert everything to UTC.

@yaauie
Copy link
Contributor

yaauie commented Nov 7, 2023

Since v5.4.0 of this plugin (which is now distributed as a part of the JDBC Integration plugin for Logstash), you can specify the behaviour when encountering an ambiguous timestamp:

While it is common to store local times in SQL’s timestamp column type, many timezones change their offset during the course of a calendar year and therefore cannot be used with SQL’s timestamp type to represent an ordered, continuous timeline. For example in the America/Chicago zone when daylight saving time (DST) ends in the autumn, the clock rolls from 01:59:59 back to 01:00:00, making any timestamp in the 2-hour period between 01:00:00CDT and 02:00:00CST on that day ambiguous.

When encountering an ambiguous timestamp caused by a DST transition, the query will fail unless the timezone specified here includes a square-bracketed instruction for how to handle overlapping periods (such as: America/Chicago[dst_enabled_on_overlap:true] or Australia/Melbourne[dst_enabled_on_overlap:false]).

-- Logstash JDBC Input Plugin -- jdbc_default_timezone -- Ambiguous Timestamps


Versions of this plugin with this feature were first shipped with Logstash v8.6.0.

Logstash 7.6+ and 8.0+ shipped with the JDBC Integration plugin, so it can be updated with the normal method:

bin/logstash-plugin update logstash-integration-jdbc

Logstash < 7.5 will need to remove the stand-alone JDBC plugins prior to installing the integration plugin, which will look something like:

bin/logstash-plugin remove logstash-input-jdbc
bin/logstash-plugin remove logstash-filter-jdbc_static
bin/logstash-plugin remove logstash-filter-jdbc_streaming

bin/logstash-plugin install logstash-integration-jdbc

@s137
Copy link

s137 commented Nov 9, 2023

@yaauie Thank you so much. That helps a lot, unfortunately because of another issue in Elasticsearch itself, we cannot upgrade to Logstash 8.6.0 just now, but as you pointed out, I can just upgrade the integration-plugin in Logstash 8.3.3

@yaauie yaauie closed this as completed Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests