Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Performance Issues reading LocalDate,LocalDateTime, and LocalTime #1070

Closed
magicfoodhand opened this issue May 30, 2019 · 31 comments
Closed
Assignees
Labels
Enhancement An enhancement to the driver. Lower priority than bugs.

Comments

@magicfoodhand
Copy link

Driver version

7.2.1.jre8

SQL Server version

Microsoft SQL Server 2014 (SP3) (KB4022619) - 12.0.6024.0 (X64)
Sep 7 2018 01:37:51
Copyright (c) Microsoft Corporation
Developer Edition (64-bit) on Windows NT 6.3 (Build 9600: )

Client Operating System

Ubuntu 16.04

JAVA/JVM version

openjdk version "12" 2019-03-19
OpenJDK Runtime Environment (build 12+33)
OpenJDK 64-Bit Server VM (build 12+33, mixed mode, sharing)

Table schema

CREATE TABLE test.DatePerformance (
	someDate DATE
)

-- Populate Table with Random Dates
INSERT INTO test.DatePerformance
SELECT DATEADD(DAY, -RAND(12313456) * 1000, GETDATE())

WHILE((SELECT COUNT(*) FROM test.DatePerformance) < 2500)
INSERT INTO test.DatePerformance
SELECT DATEADD(DAY, -RAND() * 1000, GETDATE())

Problem description

  1. Expected behaviour:

    • Reading Date objects from the database should have low overhead.
  2. Actual behaviour:

    • There is substantial overhead creating GregorianCalendars and converting them to the LocalDate, LocalDateTime, LocalTime classes
    • Calling Calendar.getInstance(TimeZone.getTimeZone("UTC")) is 30% of the getObject call for LocalDate, LocalDateTime, LocalTime classes
  3. Error message/stack trace:

  4. Any other details that can be helpful:

Sample Profiling Snapshots

	com.microsoft.sqlserver.jdbc.SQLServerResultSet.getObject () 15.094s
>	com.microsoft.sqlserver.jdbc.DDC.convertTemporalToObject () 9.206s
>>	java.util.GregorianCalendar.<init> () 6.01s
>>	java.util.Calendar.getTimeInMillis () 3.005s
>	java.util.Calendar.getInstance () 5.088s

	com.microsoft.sqlserver.jdbc.SQLServerResultSet.getObject () 7.601s
>	com.microsoft.sqlserver.jdbc.DDC.convertTemporalToObject () 3.906s
>>	java.util.GregorianCalendar.<init> () 2.498s
>>	java.util.Calendar.getTimeInMillis () 1.407s
>	java.util.Calendar.getInstance () 3.302s

Reproduction code

    /**
    Using JMH (LocalDate)

    Result: 338.352 ±(99.9%) 27.177 ops/s [Average]
    Statistics: (min, avg, max) = (5.716, 338.352, 435.926), stdev = 115.069
    Confidence interval (99.9%): [311.175, 365.529]


    # Run complete. Total time: 00:08:10

    Benchmark                        Mode  Samples    Score  Score error  Units
    c.c.m.MyBenchmark.testMethod    thrpt      200  338.352       27.177  ops/s
    */

    @Benchmark
    public void testMethod()
    {
        List<LocalDate> dates = new ArrayList<>();

        try (
                Connection conn = dataSource.getConnection();
                PreparedStatement ps = conn.prepareStatement("SELECT someDate from test.DatePerformance");
                ResultSet rs = ps.executeQuery()
        )
        {
            while(rs.next())
                dates.add(rs.getObject("someDate", LocalDate.class));

        } catch (SQLException e)
        {
            e.printStackTrace();
        }
    }
@magicfoodhand magicfoodhand added the Bug A bug in the driver. A high priority item that one can expect to be addressed quickly. label May 30, 2019
@rene-ye
Copy link
Member

rene-ye commented May 30, 2019

Hi @LavaBear, the team will look into this.

@gordthompson
Copy link
Contributor

@ulvii (et al.) - While investigating this you may also want to consider #1088

@gordthompson
Copy link
Contributor

gordthompson commented Jun 17, 2019

If creating Calendar objects is particularly expensive then perhaps a SQLServerResultSet object should have a single instance of a proleptic GregorianCalendar for UTC that can be used for conversions.

(BTW, this issue is not a bug. It is an enhancement request.)

@cogman
Copy link
Contributor

cogman commented Jun 19, 2019

@gordthompson Care should be taken in the fact that the Calendar object is not thread safe.

Ideally, the driver would switch away from using the calendar object and instead look at using ZonedDateTime or straight local dates. If my reading of the TDS standard is correct, than a SQL "Date" ends up on the wire as the integer "number of days since 1-1-1" That would be a trivial conversion to localDates because it would simply be "LocalDate.of(1,1,1).addDays(tdsIntValue);" (or, whatever that API is.

The new system is much faster and thread safe to boot. Converting to java.sql.Date is also pretty simple because it is just a "Date.from(ZonedDateTime.toInstant())" away.

@harawata
Copy link
Contributor

harawata commented Jun 20, 2019

EDIT: sorry, the conversion from DATETIME2 is not supported :P

There is a handy method microsoft.sql.DateTimeOffset#getOffsetDateTime() added via #830 .
https://github.com/microsoft/mssql-jdbc/blob/v7.3.1/src/main/java/microsoft/sql/DateTimeOffset.java#L222-L227

It would be better to use it for LocalDate/Time. Something like...

} else if (type == java.time.LocalDateTime.class) {
  microsoft.sql.DateTimeOffset dateTimeOffset = getDateTimeOffset(columnIndex);
  if (dateTimeOffset == null) {
    returnValue = null;
  } else {
    returnValue = dateTimeOffset.getOffsetDateTime().atZoneSameInstant(ZoneOffset.UTC).toLocalDateTime();
  }

@gordthompson
Copy link
Contributor

gordthompson commented Jun 20, 2019

@cogman - Good points. If the driver now requires Java_8+ then perhaps datetime2 values should really be unpacked from the TDS payload as LocalDateTime objects (because that's what they really are) and then converted to java.sql.Timestamp values if needed.

@ulvii
Copy link
Contributor

ulvii commented Jun 24, 2019

Hi @LavaBear, @gordthompson , @cogman , @harawata ,
Thank you for the suggestions and sorry for the delay, the team is currently busy with the release preparations.

I went through the discussion briefly and wanted to clarify a few things.

  1. Please note that most of the conversions between SQL Server temporal types and JAVA object types are currently handled using GregorianCalendar, both when reading and writing. Are you proposing to completely migrate away from GregorianCalendar to ZonedDateTime/ LocalDateTime or only for LocalDate, LocalDateTime, and LocalTime classes when calling getObject()?
  2. Do the new ZonedDateTime/LocalDateTime APIs provide all the functionality that GregorianCalendar has? Are you aware of any limitations?
  3. Are there benchmarks? Do the new APIs always perform better or there are scenarios where we might have to introduce performance regressions?

To speed up the process, please feel free to create an initial PR or post any other recommendations that we should keep in mind when investigating the proposal.

@ulvii ulvii added Enhancement An enhancement to the driver. Lower priority than bugs. and removed Bug A bug in the driver. A high priority item that one can expect to be addressed quickly. labels Jun 24, 2019
@gordthompson
Copy link
Contributor

re: performance

Here's a simple benchmark to compare the results of unpacking a datetime2 value.

// Test Data - TDS protocol delivers datetime2 values as two integers:
//   1) the number of days since 0001-01-01, and
//   2) the number of 100-nanosecond "ticks" since the beginning of the current day
Long dateValue = 737240L;        // 2019-07-01
Long timeValue = 432000000000L;  // 12:00:00
//
int loopLimit = 1000;
Long t0 = null;  // loop timer start time

// the old way
t0 = System.nanoTime();
GregorianCalendar cal = new GregorianCalendar(new SimpleTimeZone(0, "UTC"), Locale.US);
java.util.Date foreverAgo = new java.util.Date(Long.MIN_VALUE);
ZoneId utcZone = ZoneId.of("UTC");
for (int i = 0; i < loopLimit; i++) {
    cal.clear();
    cal.setGregorianChange(foreverAgo);  // proleptic
    cal.set(1, 0, 1, 0, 0, 0);  // 0001-01-01
    cal.add(GregorianCalendar.DAY_OF_YEAR, dateValue.intValue());
    cal.add(GregorianCalendar.MILLISECOND, (int) (timeValue / 10000L));
    LocalDateTime theOldWay = LocalDateTime.ofInstant(cal.toInstant(), utcZone);
}
System.out.printf("(old) java.util: %d ms for %d iterations%n", 
        (System.nanoTime() - t0) / 1000000, 
        loopLimit);
// (old) java.util: 279 ms for 1000 iterations

// the new way
t0 = System.nanoTime();
for (int i = 0; i < loopLimit; i++) {
    LocalDateTime theNewWay = java.time.temporal.ChronoUnit.DAYS.addTo(
            java.time.temporal.ChronoUnit.NANOS.addTo(
                    dt2Base, 
                    timeValue * 100L), 
            dateValue);
}
System.out.printf("(new) java.time: %d ms for %d iterations%n", 
        (System.nanoTime() - t0) / 1000000, 
        loopLimit);
// (new) java.time: 23 ms for 1000 iterations

I ran the test several times and the comments showing the console output are representative values from my (very old) test machine.

@cogman
Copy link
Contributor

cogman commented Jun 25, 2019

@ulvii

Here are some answers to your question

  1. Yes. The Gregorian calendar has a lot of design flaws. One of the largest being that it is not thread safe (completely mutable) and is fairly slow because it does a ton of excess work on initialization. That could be a large change, so I'm ok with it waiting.

  2. Yes they provide all the same functionality while also being much closer aligned with how you want to think about dates. The old API tried to hide away when it was doing things like time zone conversions for you. The new API is much better at keeping that in plain site. Further, the old API was highly mutable in often surprising ways. The new API is completely immutable. Finally, converting between old and new is relatively simple.

  3. Here are some JMH results based on @gordthompson 's benchmark using Java 11.

MyBenchmark.test_read_local_date_new thrpt 25 19109523.054 ± 136645.875 ops/s
MyBenchmark.test_read_local_date_new_calendar_current_implementation thrpt 25 3235636.892 ± 30089.733 ops/s
MyBenchmark.test_read_local_date_reuse_calendar thrpt 25 3254206.917 ± 51282.245 ops/s

Validates his claim of being around 5->6x faster.

package org.sample;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.infra.Blackhole;

import java.time.LocalDateTime;
import java.time.ZoneId;
import java.util.*;

@State(Scope.Thread)
public class MyBenchmark {
	private static long dateValue = 737240L;
	private static long timeValue = 432000000000L;
	private static LocalDateTime EPOCH = LocalDateTime.of(1, 1, 1, 12, 0);

	private GregorianCalendar cal = new GregorianCalendar(new SimpleTimeZone(0, "UTC"), Locale.US);
	private java.util.Date foreverAgo = new java.util.Date(Long.MIN_VALUE);
	private ZoneId utcZone = ZoneId.of("UTC");

	@Benchmark
	public void test_read_local_date_new_calendar_current_implementation(Blackhole bh) {
		bh.consume(makeReusingCalendar());
	}

	private LocalDateTime makeNewCalendar() {
		GregorianCalendar cal = new GregorianCalendar(new SimpleTimeZone(0, "UTC"), Locale.US);
		java.util.Date foreverAgo = new java.util.Date(Long.MIN_VALUE);
		ZoneId utcZone = ZoneId.of("UTC");

		cal.clear();
		cal.setGregorianChange(foreverAgo);  // proleptic
		cal.set(1, 0, 1, 0, 0, 0);  // 0001-01-01
		cal.add(GregorianCalendar.DAY_OF_YEAR, (int)dateValue);
		cal.add(GregorianCalendar.MILLISECOND, (int) (timeValue / 10000L));
		return LocalDateTime.ofInstant(cal.toInstant(), utcZone);
	}


	@Benchmark
	public void test_read_local_date_reuse_calendar(Blackhole bh) {
		bh.consume(makeReusingCalendar());
	}

	private LocalDateTime makeReusingCalendar() {

		cal.clear();
		cal.setGregorianChange(foreverAgo);  // proleptic
		cal.set(1, 0, 1, 0, 0, 0);  // 0001-01-01
		cal.add(GregorianCalendar.DAY_OF_YEAR, (int)dateValue);
		cal.add(GregorianCalendar.MILLISECOND, (int) (timeValue / 10000L));
		return LocalDateTime.ofInstant(cal.toInstant(), utcZone);
	}

	@Benchmark
	public void test_read_local_date_new(Blackhole bh) {
		bh.consume(EPOCH.plusDays(dateValue).plusNanos(timeValue * 100));
	}
}

@gordthompson
Copy link
Contributor

gordthompson commented Jun 25, 2019

Thanks to @cogman for confirming that unpacking the datetime2 TDS payload directly to LocalDateTime is a pretty big win ...

... if we ultimately want a LocalDateTime value.

But if we ultimately want a java.sql.Timestamp value then unpacking to LocalDateTime and converting to Timestamp appears to take about 40% longer than unpacking to a GregorianCalendar and converting to Timestamp, even if we use the shortcut of arbitrarily using UTC for the Timestamp time zone:

// Test Data - TDS protocol delivers datetime2 values as two integers:
//   1) the number of days since 0001-01-01, and
//   2) the number of 100-nanosecond "ticks" since the beginning of the current day
Long dateValue = 737240L;        // 2019-07-01
Long timeValue = 432000000000L;  // 12:00:00
//
int loopLimit = 1000;
Long t0 = null;  // loop timer start time

// the old way
t0 = System.nanoTime();
GregorianCalendar cal = new GregorianCalendar(new SimpleTimeZone(0, "UTC"), Locale.US);
java.util.Date foreverAgo = new java.util.Date(Long.MIN_VALUE);
ZoneId utcZone = ZoneId.of("UTC");
for (int i = 0; i < loopLimit; i++) {
    cal.clear();
    cal.setGregorianChange(foreverAgo);  // proleptic
    cal.set(1, 0, 1, 0, 0, 0);  // 0001-01-01
    cal.add(GregorianCalendar.DAY_OF_YEAR, dateValue.intValue());
    cal.add(GregorianCalendar.MILLISECOND, (int) (timeValue / 10000L));
    java.sql.Timestamp tsOld = new java.sql.Timestamp(cal.getTimeInMillis());
}
System.out.printf("(old) java.util: %d ms for %d iterations%n", 
        (System.nanoTime() - t0) / 1000000, 
        loopLimit);
// (old) java.util: 251 ms for 1000 iterations

// the new way
t0 = System.nanoTime();
LocalDateTime dt2Base = LocalDateTime.of(1, 1, 1, 0, 0);
for (int i = 0; i < loopLimit; i++) {
    LocalDateTime theNewWay = java.time.temporal.ChronoUnit.DAYS.addTo(
            java.time.temporal.ChronoUnit.NANOS.addTo(
                    dt2Base, 
                    timeValue * 100L), 
            dateValue);
    java.sql.Timestamp tsNew = java.sql.Timestamp.from(theNewWay.toInstant(ZoneOffset.UTC));
}
System.out.printf("(new) java.time: %d ms for %d iterations%n", 
        (System.nanoTime() - t0) / 1000000, 
        loopLimit);
// (new) java.time: 360 ms for 1000 iterations

If we wanted our Timestamp to be in the default time zone for the JVM then presumably we'd have to go from LocalDateTime through java.time.ZonedDateTime (to account for the changing offsets in Standard vs. Daylight/Summer time), and that might be even slower.

@cogman
Copy link
Contributor

cogman commented Jun 26, 2019

@gordthompson I believe your benchmark might be falling prey to some common java microbenchmarking problems. (No warmup, not consuming the result, etc). If I were to guess, your method isn't getting fully optimized since it isn't "hot" enough.

I've ran your scenario through JMH tests and I'm getting different results (but not really surprising).

MyBenchmark.test_read_local_date_new thrpt 25 14836448.410 ± 181472.322 ops/s
MyBenchmark.test_read_local_date_reuse_calendar thrpt 25 3322991.970 ± 97922.281 ops/s

It isn't surprising because the 2 extra steps to create a Timestamp boil down to a couple of additions and multiplications. The calendar code has a ton of extra checks. It is pretty intense when you start reading everything that goes into "getTimeInMillis" on the cal object.

Here is the code I benchmarked with

package org.sample;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.infra.Blackhole;

import java.sql.Timestamp;
import java.time.LocalDateTime;
import java.time.ZoneId;
import java.time.ZoneOffset;
import java.util.*;

@State(Scope.Thread)
public class MyBenchmark {
	private static long dateValue = 737240L;
	private static long timeValue = 432000000000L;
	private static LocalDateTime EPOCH = LocalDateTime.of(1, 1, 1, 12, 0);

	private GregorianCalendar cal = new GregorianCalendar(new SimpleTimeZone(0, "UTC"), Locale.US);
	private java.util.Date foreverAgo = new java.util.Date(Long.MIN_VALUE);
	private ZoneId utcZone = ZoneId.of("UTC");

	@Benchmark
	public void test_read_local_date_reuse_calendar(Blackhole bh) {
		bh.consume(makeReusingCalendar());
	}

	private Timestamp makeReusingCalendar() {

		cal.clear();
		cal.setGregorianChange(foreverAgo);  // proleptic
		cal.set(1, 0, 1, 0, 0, 0);  // 0001-01-01
		cal.add(GregorianCalendar.DAY_OF_YEAR, (int)dateValue);
		cal.add(GregorianCalendar.MILLISECOND, (int) (timeValue / 10000L));
		return new Timestamp(cal.getTimeInMillis());
	}

	@Benchmark
	public void test_read_local_date_new(Blackhole bh) {
		bh.consume(Timestamp.from(EPOCH.plusDays(dateValue)
				.plusNanos(timeValue * 100)
				.toInstant(ZoneOffset.UTC)));
	}
}

Also, here is an excellent video by Aleksey Shipilev (Author of JMH and a major player in Java performance) about microbenchmark pitfalls.
https://www.youtube.com/watch?v=VaWgOCDBxYw

@gordthompson
Copy link
Contributor

@cogman - Thanks for the tips. Clearly the Java compiler and JRE are a lot more clever than I gave them credit for.

Hoping to provide more meaningful input to this discussion, I tried to set up a JMH project in Eclipse using jmh-java-benchmark-archetype version 1.21. Unfortunately, while the project itself looks convincing it just doesn't work. If I do (the Eclipse equivalent of) mvn clean package I get a jar file named "datetime2-unpack-jmh-0.0.1-SNAPSHOT-shaded.jar" but when I try to run it I get

no main manifest attribute, in datetime2-unpack-jmh-0.0.1-SNAPSHOT.jar

Based on this Stack Overflow answer if I hack pom.xml to change ...

<plugin>
    <artifactId>maven-jar-plugin</artifactId>
    <version>2.4</version>
</plugin>

... to ...

<plugin>
    <artifactId>maven-jar-plugin</artifactId>
    <version>2.4</version>
    <configuration>
        <archive>
            <manifest>
                <addClasspath>true</addClasspath>
                <classpathPrefix>lib/</classpathPrefix>
                <mainClass>org.sample.MyBenchmark</mainClass>
            </manifest>
        </archive>
    </configuration>
</plugin>

(or variations on that theme) I consistently get

Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/openjdk/jmh/infra/Blackhole
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Unknown Source)
        at java.lang.Class.privateGetMethodRecursive(Unknown Source)
        at java.lang.Class.getMethod0(Unknown Source)
        at java.lang.Class.getMethod(Unknown Source)
        at sun.launcher.LauncherHelper.validateMainClass(Unknown Source)
        at sun.launcher.LauncherHelper.checkAndLoadMain(Unknown Source)
Caused by: java.lang.ClassNotFoundException: org.openjdk.jmh.infra.Blackhole
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 7 more

Oh well. I tried.

@cogman
Copy link
Contributor

cogman commented Jun 27, 2019

@gordthompson This is maybe getting off topic. But the problem is that "org.sample.MyBenchmark" doesn't contain an actual main static method. In order to make this run, you need to hit the JMH main class. ( org.openjdk.jmh.Main ) It will do the scanning to find the benchmark classes for you. You can either do that by adding a manifest or by directing java to run that class directly (

It has been a while since I've set this up, but attached is the entirety of my pom file for running jmh.

Few key points, it pulls out jar signing, because it is making an uberjar. Instead of using the jar plugin to write the manifest it is doing that as part of the shading plugin.
pom.xml.gz

https://stackoverflow.com/questions/5474666/how-to-run-a-class-from-jar-which-is-not-the-main-class-in-its-manifest-file

@gordthompson
Copy link
Contributor

@cogman - Thanks again. The build was producing two separate jar files and I was simply trying to run the wrong one. (D'oh!)

Running your latest tests on my very-old-and-slow test box yielded

Benchmark                                         Mode  Cnt       Score       Error  Units
MyBenchmark.test_read_local_date_new             thrpt   25  210629.896 ±   726.327  ops/s
MyBenchmark.test_read_local_date_reuse_calendar  thrpt   25   95127.109 ± 12725.871  ops/s

If I'm reading that right, UTC_Timestamp via LocalDateTime is actually more like twice as fast as UTC_Timestamp via GregorianCalendar on my test box. Very different from my previous naïve attempt at benchmarking.

Still, it's not really a fair fight. The GregorianCalendar approach has to deal with the baggage of a Calendar object while the LocalDateTime approach just uses a fixed offset (UTC+0) and avoids the whole "messy rest-of-the-world time zone" thing. Since the code needs to be able to produce a Timestamp in any time zone (not just UTC) a "better" comparison would be to create the Timestamp in the JVM default time zone (which for me happens to be America/Denver) ...

package org.sample;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.infra.Blackhole;

import java.sql.Timestamp;
import java.time.LocalDateTime;
import java.util.*;

@State(Scope.Thread)
public class MyBenchmark {
	private static long dateValue = 737240L;
	private static long timeValue = 432000000000L;
	private static LocalDateTime EPOCH = LocalDateTime.of(1, 1, 1, 12, 0);

	private GregorianCalendar cal = new GregorianCalendar();  // default JVM time zone
	private java.util.Date foreverAgo = new java.util.Date(Long.MIN_VALUE);

	@Benchmark
	public void test_read_local_date_reuse_calendar(Blackhole bh) {
		bh.consume(makeReusingCalendar());
	}

	private Timestamp makeReusingCalendar() {

		cal.clear();
		cal.setGregorianChange(foreverAgo);  // proleptic
		cal.set(1, 0, 1, 0, 0, 0);  // 0001-01-01
		cal.add(GregorianCalendar.DAY_OF_YEAR, (int)dateValue);
		cal.add(GregorianCalendar.MILLISECOND, (int) (timeValue / 10000L));
		return new Timestamp(cal.getTimeInMillis());
	}

	@Benchmark
	public void test_read_local_date_new(Blackhole bh) {
		bh.consume(Timestamp.valueOf(EPOCH.plusDays(dateValue)
				.plusNanos(timeValue * 100)));
	}
}

... and the results I see are ...

Benchmark                                         Mode  Cnt       Score      Error  Units
MyBenchmark.test_read_local_date_new             thrpt   25  143919.857 ± 1135.624  ops/s
MyBenchmark.test_read_local_date_reuse_calendar  thrpt   25   74379.197 ±  375.586  ops/s

Again, if I'm reading things correctly, the LocalDateTime approach took a hit (from 211k ops/s to 144k ops/s) and so did the GregorianCalendar approach (from 95k ops/s to 74k ops/s) but the LocalDateTime approach still seems to be roughly twice as fast.

@cogman
Copy link
Contributor

cogman commented Jul 1, 2019

@ulvii I think us customers at least are all in agreement. Switching over would generally be a good improvement for the driver both from a performance standpoint and from a ergonomic standpoint in the codebase.

If I get some time today I might dive in a bit and see if I can bang something out (or at least how big of an impact it would be). Do you have any more questions for us?

@harawata
Copy link
Contributor

harawata commented Jul 1, 2019

Hi all,

I wrote a quick and dirty patch for benchmarking.
https://github.com/microsoft/mssql-jdbc/compare/dev...harawata:gh-1070?expand=1

I also created a JMH project, but am not sure if I did it right (I haven't gotten any consistent result probably because there are many other processes running on the same machine).

@ulvii
Copy link
Contributor

ulvii commented Jul 24, 2019

Hi everyone,
Thanks for all the effort. We will investigate the impact of the change for different JDBC/SQL types. In the meantime, please feel free to provide any other details that you think would speed up the process. 👍

@peterbae
Copy link
Contributor

Hi @gordthompson, @cogman, thanks for your input. I've recently starting looking into this issue, and I've set up my own JMH environment to measure the performance difference between relying on GregorianCalendar vs LocalDateTime. The results reflect the same findings we've discussed in this thread (ultimately, utilizing GregorianCalendar can be anywhere between 2~5 times slower than the LocalDateTime alternative, depending on the context). Therefore, I agree that we should move away from relying on GregorianCalendar in the driver, and I'm considering applying this change to the entire driver, wherever applicable.

However, I wanted to clarify one thing - it's the re-construction of the GregorianCalendar that is slow (setting the time values after calling .clear()), and simply retrieving data from the GregorianCalendar instance is fast (i.g. retrieve the TimeZone information). If the client has already provided a GregorianCalendar instance (along with the timezone information) to the driver, then I believe there shouldn't be a problem with using that instance to insert/select temporal objects.

This means that the driver only needs to replace instances in the driver where GregorianCalendar instances are being created (or being re-constructed), without having to modify most of the public/internal APIs. Is my understanding of the scope of the problem correct, or does anyone think that we should try to remove reliance on Calendar objects altogether (as much as we can) in the driver? (for example, perhaps try to replace the Calendar parameter in DDC::convertTemporalToObject with a TimeZone object, since it really only needs the TimeZone information in that method?) Thanks.

@cogman
Copy link
Contributor

cogman commented Aug 30, 2019

@peterbae Creation is definitely the slowest part of the Calendar classes.

The big problem with the GregorianCalendar class is that it is HIGHLY thread-unsafe. So any usage of it could potentially trigger thread safety/race condition problems that would be hard to track down. That isn't to say that you couldn't use it in a thread safe manor, but that you'll need to inspect every method called on GregorianCalendar to evaluate whether or not it is mutating internal state (And I mean every method, for example, getTimeZone can mutate internal state).

You could add synchronization around it, but that will add overhead that will be hard to capture in a JMH benchmark (because the JVM does neat and terrible things with locks that will hide actual costs).

IMO, the safest thing to do is replace it outright rather than try to share instances of it across threads. The new localdate stuff is all thread safe (it is all immutable objects). Next safest and probably fastest (but potentially costly memory wise) would be thread local storage for the calendar object.

@peterbae
Copy link
Contributor

Hi @cogman, thanks for the input. I agree that we should use as little of GregorianCalendar as possible. I'd like to replace all instances of Calendar in the driver, but the JDBC specs state that the driver needs to provide public APIs that accept Calendar objects (mostly from getters/setters for temporal objects in prepared/callable statements).

Looking at how the driver actually uses the Calendar objects passed from the client, most of the time they're used for the timezone information. I'll look into extracting the necessary information from the Calendar object as soon as it's passed from the public API, while replacing as many instances of Calendar parameters with TimeZone parameters in internal methods. I'll also make sure that the driver never initializes any GregorianCalendar objects, either.

It'll take a while, but I'll update everyone once the initial PR/design has been made. Please let me know if you have any other input in the meantime.

@peterbae
Copy link
Contributor

peterbae commented Nov 6, 2019

Hi @gordthompson, @cogman , I've been working on converting all instances of Calendar to ZonedDateTime in the driver for a while. I have an initial implementation (https://github.com/peterbae/mssql-jdbc/tree/CalToZDT) that's still buggy when it comes to passing all of our internal testing. However, more importantly, I realized that this isn't feasible within the current implementation context of the driver.

There's a couple of problems with completely replacing Calendar with ZonedDateTime:

  1. ZonedDateTime is the equivalent of a continuous Calendar object, but the driver was relying on a non-continuous calendar most of the time for the conversion. This means that for the dates before the Gregorian cutoff, the driver will need additional logic to replicate the original behavior of the non-continuous Calendar object if the driver were to completely migrate away from using Calendar objects for temporal data type conversion (I haven't been able to figure out how to do this as it seems the offset dates fluctuate depending on the actual year, but presumably this will be buggy and performance degrading).

  2. From my testing, it seems like the Calendar object's timezone offset always follows the International Standard Time (IST) rules, whereas ZonedDateTime (and Instant) doesn't necessarily follow IST.
    For example (my CalendarToZonedDateTime.testConvertFromTimestamp() method can demonstrate this), putting 1884-01-01 12:08:12.081 as the timestamp value won't fail the test, but changing that value to anytime before 1884 (the year when IST was adopted) fails, because the offset value goes from -8:00 to -8:12:28.

I think it's possible to solve the offset value problem by finding the expected offset value from Calendar, compare that to the offset value given by ZonedDateTime, then padding the difference to ZonedDateTime object (example: dtv.java#677), but this introduces major performance degradation. Here's the performance data I've measured before/after my changes, using my JMH framework:

Before my changes:

Benchmark                                     Mode  Cnt         Score        Error  Units
MyBenchmark.testExecuteCastAfterGregorian    thrpt   25      3158.933 ±     75.471  ops/s
MyBenchmark.testExecuteCastBeforeGregorian   thrpt   25      3157.668 ±     46.166  ops/s
MyBenchmark.testExecuteTableAfterGregorian   thrpt   25      1040.168 ±     29.996  ops/s
MyBenchmark.testExecuteTableBeforeGregorian  thrpt   25      1039.311 ±     34.864  ops/s
MyBenchmark.testGetDTOAfterGregorian         thrpt   25   2466225.354 ±  46751.922  ops/s
MyBenchmark.testGetDTOBeforeGregorian        thrpt   25   1134882.866 ±   7121.619  ops/s
MyBenchmark.testGetTimestampAfterGregorian   thrpt   25   2496906.544 ±  20329.848  ops/s
MyBenchmark.testGetTimestampBeforeGregorian  thrpt   25   1134950.349 ±   7908.037  ops/s
MyBenchmark.testSetDTO                       thrpt   25  24578113.784 ± 205272.912  ops/s
MyBenchmark.testSetTimestamp                 thrpt   25  24678204.585 ± 117714.068  ops/s

After my changes:

Benchmark                                     Mode  Cnt         Score        Error  Units
MyBenchmark.testExecuteCastAfterGregorian    thrpt   25      3110.902 ±     82.482  ops/s
MyBenchmark.testExecuteCastBeforeGregorian   thrpt   25      3159.163 ±     34.234  ops/s
MyBenchmark.testExecuteTableAfterGregorian   thrpt   25      1009.815 ±     43.995  ops/s
MyBenchmark.testExecuteTableBeforeGregorian  thrpt   25       985.471 ±     51.050  ops/s
MyBenchmark.testGetDTOAfterGregorian         thrpt   25    448032.106 ±   7980.397  ops/s
MyBenchmark.testGetDTOBeforeGregorian        thrpt   25    438564.948 ±  17780.253  ops/s
MyBenchmark.testGetTimestampAfterGregorian   thrpt   25    449824.718 ±   6852.955  ops/s
MyBenchmark.testGetTimestampBeforeGregorian  thrpt   25    453920.562 ±  22323.679  ops/s
MyBenchmark.testSetDTO                       thrpt   25  24503831.627 ± 214236.776  ops/s
MyBenchmark.testSetTimestamp                 thrpt   25  24520602.552 ± 174342.838  ops/s

The meaningful results are from the testGet... methods. They're slower after my changes because of the additional logic to compensate for the difference in offset values between Calendar's offset values and ZonedDateTime's offset values, even though we know that ZonedDateTime should be faster than Calendar normally.

Also, having to do this conversion from the offset data from Calendar means the driver isn't completely moving away from relying on Calendar objects anymore, so there's less reason to move forward with this change.

I'm not 100% certain if I'm doing everything correctly, but so far my findings show that replacing Calendar to ZonedDateTime in the driver is not reasonable. However, going back to the original
issue that @LavaBear has created (and also the other issue #1088), I think we should be able to fix SQLServerResultSet.getObject(int, LocalDateTime) case so that we improve the performance / fix date cases before Gregorian cutoff without having to deal with the problems I mentioned above (to clarify, that means I will be fixing issue #1070 and #1088 without replacing Calendar with ZonedDateTime in most parts of the driver). Does that sound reasonable, or does anyone have any other suggestions? Thanks a lot for everyone's input on this issue.

@marschall
Copy link
Contributor

1. ZonedDateTime is the equivalent of a continuous Calendar object, but the driver was relying on a non-continuous calendar most of the time for the conversion. This means that for the dates before the Gregorian cutoff, the driver will need additional logic to replicate the original behavior of the non-continuous Calendar object if the driver were to completely migrate away from using Calendar objects for temporal data type conversion (I haven't been able to figure out how to do this as it seems the offset dates fluctuate depending on the actual year, but presumably this will be buggy and performance degrading).

Are you sure the old behavior is required and intended rather than a bug? My understanding is SQL Server is proleptic and java.time is proleptic so there shouldn't be a problem. We had a similar issue in pgjdbc pgjdbc/pgjdbc#1534, turned out the code was buggy and never worked. Now the driver simply returns the database values.

@gordthompson
Copy link
Contributor

gordthompson commented Nov 6, 2019

I agree with @marschall . When retrieving a datetime2 column value directly to LocalDateTime the LocalDateTime value should simply be the exact same value as stored in the table, even if it is for a "non-existent" date like 1582-10-10. The current implementation does not handle this correctly because it used a (default) non-continuous Gregorian Calendar to interpret the Timestamp value as UTC. (My bad.)

@peterbae
Copy link
Contributor

peterbae commented Nov 6, 2019

Thanks @marschall @gordthompson, I agree with this assessment as well. I think that means we should be able to solve the first problem easily. Do we have a way to deal with the second problem? In the meantime, I'll be looking into fixing the original issues from #1070 and #1088.

@gordthompson
Copy link
Contributor

From my testing, it seems like the Calendar object's timezone offset always follows the International Standard Time (IST) rules, whereas ZonedDateTime (and Instant) doesn't necessarily follow IST.

...

I think it's possible to solve the offset value problem by finding the expected offset value from Calendar, compare that to the offset value given by ZonedDateTime, then padding the difference to ZonedDateTime object

I would argue that it is not necessary to ensure that values unpacked to a Calendar object and values unpacked to a ZonedDateTime object should yield the same results because they are different things. As long as the behaviour for legacy date/time objects (Calendar, java.sql.Timestamp) objects does not change — for backward compatibility — the ZonedDateTime value can be whatever java.time thinks it should be.

In other words, if datetime2 values are always initially unpacked to LocalDateTime and then, if desired, converted to java.sql.Timestamp (via a user-supplied Calendar) or to java.time.Instant (via a user-specified time zone) then mssql-jdbc would be "playing by the rules" and any discrepancies between Calendar and ZonedDateTime values would be something for the Java team to explain.

@cogman
Copy link
Contributor

cogman commented Nov 6, 2019

I'm with @gordthompson on this one.

I think as far as bugs go, while there may be some cases where this causes issues, I can't imagine there are very many applications that care about the exact timestamp of a date from before 1884. Doesn't really seem like something you'd run into all that often.

I know I'll probably eat these words though as I'm sure some legal document engine is critically dependent on the exact minute some 14th century document was signed.

@peterbae
Copy link
Contributor

peterbae commented Nov 7, 2019

Thanks everyone for your ideas. I agree that we don't necessarily have to make sure that Calendar and ZonedDateTime objects have to yield the same results. What I meant to convey was that ultimately there's little point in modifying the code inside DDC.convertTemporalToObject to replace all Calendar instances with ZonedDateTime instances, because due to the difference in how offsets are calculated before 1884, we still need to rely on the existing Calendar object. Also, having to re-adjust the ZonedDateTime object with the difference in offset value is costly (as shown in the JMH benchmark data I posted above) since it seems like we will need to return java.sql.Timestamp value pretty often from the method. As for @cogman's point regarding the dates before 1884 not being used often, our test suites contain data that goes before 1884 (also the other issue #1088 specifically deals with data before 1582), so we don't want the driver to retrieve incorrect data for the user.

To summarize, I'm suggesting that I should fix the original issue that was brought up by @LavaBear instead of doing a complete overhaul of how we send/retrieve temporal data. If we really want to replace all instances of Calendar instances with ZonedDateTime in the driver (even though it seems like we will need to rely on Calendar API to some degree), then we should track it in a separate issue. As for how we will fix this issue regarding performance when calling rs.getObject(int, LocalDateTime.class), we could simply go with the original suggestion from @gordthompson because it looks like the Calendar object that gets created is only queried for its timezone value without any modifications, and it shouldn't pose any concurrency issues. Please let me know if there's a better solution that doesn't involve a global GregorianCalendar instance or making major changes to the driver's logic.

@gordthompson
Copy link
Contributor

gordthompson commented Nov 8, 2019

I don't think you should be looking to simply replace Calendar with ZonedDateTime in the existing code. I think you should be looking to eliminate the whole notion of time zones wherever possible. That would mean unpacking datetime2 columns directly to LocalDateTime, and unpacking datetimeoffset columns directly to OffsetDateTime. Calendar may still have to be used for things like getTimestamp("colname", Calendar), but in those cases the user would be supplying their own Calendar anyway.

The following POC code illustrates what I mean. It shows that with java.time we can directly convert TDS bytes to LocalDateTime for the different datetime2(precision) values, at least for a reasonably current date:

package com.example.sandbox;

import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.time.LocalDateTime;

public class SandboxMain {

    public static void main(String[] args) throws Throwable {
        for (int precision = 0; precision <= 7; ++precision) {
            decodeDatetime2ForPrecision(precision);
        }
    }
    
    private static void decodeDatetime2ForPrecision(int precision) {
        byte[] tdsByteArray = getDatetime2BytesFromTdsPayload(precision);
        
        // decode datetime2 bytes as per TDS specification
        //
        // https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/786f5b8a-f87d-4980-9070-b9b7274c681d

        Integer multiplier = null;  // for conversion to nanoseconds 
        switch (precision) {
            case 0: multiplier = 1_000_000_000; break;
            case 1: multiplier = 100_000_000; break;
            case 2: multiplier = 10_000_000; break;
            case 3: multiplier = 1_000_000; break;
            case 4: multiplier = 100_000; break;
            case 5: multiplier = 10_000; break;
            case 6: multiplier = 1_000; break;
            case 7: multiplier = 100; break;
        }
        
        System.out.printf("Precision = %d%n", precision);
                
        int timeLength = tdsByteArray.length - 3;  // DATE part is always 3 bytes
        ByteBuffer bb = ByteBuffer.allocate(12);
        bb.put(tdsByteArray, 0, timeLength);  // TIME part
        bb.position(8);
        bb.put(tdsByteArray, timeLength, 3);  // DATE part
        bb.position(0);
        bb.order(ByteOrder.nativeOrder());
        
        long fractionalSeconds = bb.getLong();
        System.out.printf("  TIME part: %d%n", fractionalSeconds);
        
        int days = bb.getInt();
        System.out.printf("  DATE part: %d%n", days);
        
        LocalDateTime ldt = LocalDateTime.of(1, 1, 1, 0, 0)
                .plusDays(days)
                .plusNanos(fractionalSeconds * multiplier);
        System.out.printf("  LocalDateTime: %s%n", ldt.toString());
    }
    
    private static byte[] getDatetime2BytesFromTdsPayload(int precision) {
        // --- Test Data ---
        //
        // test case: SELECT CAST('2017-01-01 01:23:45.1' AS DATETIME2(precision))
        //
        byte[] tdsBytes = null;  // actual TDS bytes from server (via Wireshark)
        switch (precision) {
            case 0:
                tdsBytes = createByteArray("a1 13 00 49 3c 0b");
                break;
            case 1:
                tdsBytes = createByteArray("4b c4 00 49 3c 0b");
                break;
            case 2:
                tdsBytes = createByteArray("ee aa 07 49 3c 0b");
                break;
            case 3:
                tdsBytes = createByteArray("4c ad 4c 00 49 3c 0b");
                break;
            case 4:
                tdsBytes = createByteArray("f8 c4 fe 02 49 3c 0b");
                break;
            case 5:
                tdsBytes = createByteArray("b0 b1 f3 1d 00 49 3c 0b");
                break;
            case 6:
                tdsBytes = createByteArray("e0 f0 84 2b 01 49 3c 0b");
                break;
            case 7:
                tdsBytes = createByteArray("c0 68 31 b3 0b 49 3c 0b");
                break;
        }
        return tdsBytes;
    }

    private static byte[] createByteArray(String s) {
        String[] stringArray = s.split(" ");
        byte[] byteArray = new byte[stringArray.length];
        for (int i = 0; i < stringArray.length; ++i) {
            byteArray[i] = (byte) Integer.parseInt(stringArray[i], 16);
        }
        return byteArray;
    }   

}

Console output:

Precision = 0
  TIME part: 5025
  DATE part: 736329
  LocalDateTime: 2017-01-01T01:23:45
Precision = 1
  TIME part: 50251
  DATE part: 736329
  LocalDateTime: 2017-01-01T01:23:45.100
Precision = 2
  TIME part: 502510
  DATE part: 736329
  LocalDateTime: 2017-01-01T01:23:45.100
Precision = 3
  TIME part: 5025100
  DATE part: 736329
  LocalDateTime: 2017-01-01T01:23:45.100
Precision = 4
  TIME part: 50251000
  DATE part: 736329
  LocalDateTime: 2017-01-01T01:23:45.100
Precision = 5
  TIME part: 502510000
  DATE part: 736329
  LocalDateTime: 2017-01-01T01:23:45.100
Precision = 6
  TIME part: 5025100000
  DATE part: 736329
  LocalDateTime: 2017-01-01T01:23:45.100
Precision = 7
  TIME part: 50251000000
  DATE part: 736329
  LocalDateTime: 2017-01-01T01:23:45.100

@gordthompson
Copy link
Contributor

BTW, I just did another test using a datetime2 value from the year 1066 and that unpacked correctly as well.

@peterbae
Copy link
Contributor

peterbae commented Dec 3, 2019

Hi all,

I've made PR #1200 that addresses this issue. I was able to eliminate the usage of Calendar when it comes to most getter cases, except for the cases where the user provides their own Calendar instance (this automatically implies the new code doesn't affect DateTimeOffset scenarios, since DateTimeOffset is always associated with a UTC timezone Calendar). I've also optimized the getter scenarios for LocalDateTime/LocalDate/LocalTime.

I've updated my JMH test suite to cover all of the data conversion scenarios. The results can be found here (dev) and here (with current change). The data shows that we see a significant performance improvement whenever there is no Calendar object involved.

The reason why I decided to use the old code path when a custom Calendar object is involved is because I couldn't quite get all of the edge cases correct without using the Calendar that was provided by the user (whenever it is provided), and my implementation wasn't very optimized to a point where it was performing worse compared to simply relying on the old code path. Please let me know if there's a way to completely eliminate the use of Calendar object even when the user has provided their own Calendar instance.

I'm looking to see if I can apply the same changes to the setter scenarios so that we can optimize the performance to both getter/setter scenarios (and ideally make it into the next preview). In the meantime, if anyone has any suggestions/comments, please let me know.

@peterbae
Copy link
Contributor

8.1.1 Preview Release is out, which contains this fix. Please give the preview a try, and let me know of any further possible improvements that could be made. Otherwise, I will close this issue in due time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement An enhancement to the driver. Lower priority than bugs.
Projects
None yet
Development

No branches or pull requests

8 participants