-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Performance Issues reading LocalDate,LocalDateTime, and LocalTime #1070
Comments
Hi @LavaBear, the team will look into this. |
If creating Calendar objects is particularly expensive then perhaps a SQLServerResultSet object should have a single instance of a proleptic (BTW, this issue is not a bug. It is an enhancement request.) |
@gordthompson Care should be taken in the fact that the Calendar object is not thread safe. Ideally, the driver would switch away from using the calendar object and instead look at using ZonedDateTime or straight local dates. If my reading of the TDS standard is correct, than a SQL "Date" ends up on the wire as the integer "number of days since 1-1-1" That would be a trivial conversion to localDates because it would simply be "LocalDate.of(1,1,1).addDays(tdsIntValue);" (or, whatever that API is. The new system is much faster and thread safe to boot. Converting to java.sql.Date is also pretty simple because it is just a "Date.from(ZonedDateTime.toInstant())" away. |
EDIT: sorry, the conversion from DATETIME2 is not supported :P There is a handy method It would be better to use it for LocalDate/Time. Something like... } else if (type == java.time.LocalDateTime.class) {
microsoft.sql.DateTimeOffset dateTimeOffset = getDateTimeOffset(columnIndex);
if (dateTimeOffset == null) {
returnValue = null;
} else {
returnValue = dateTimeOffset.getOffsetDateTime().atZoneSameInstant(ZoneOffset.UTC).toLocalDateTime();
} |
@cogman - Good points. If the driver now requires Java_8+ then perhaps datetime2 values should really be unpacked from the TDS payload as LocalDateTime objects (because that's what they really are) and then converted to java.sql.Timestamp values if needed. |
Hi @LavaBear, @gordthompson , @cogman , @harawata , I went through the discussion briefly and wanted to clarify a few things.
To speed up the process, please feel free to create an initial PR or post any other recommendations that we should keep in mind when investigating the proposal. |
re: performance Here's a simple benchmark to compare the results of unpacking a datetime2 value. // Test Data - TDS protocol delivers datetime2 values as two integers:
// 1) the number of days since 0001-01-01, and
// 2) the number of 100-nanosecond "ticks" since the beginning of the current day
Long dateValue = 737240L; // 2019-07-01
Long timeValue = 432000000000L; // 12:00:00
//
int loopLimit = 1000;
Long t0 = null; // loop timer start time
// the old way
t0 = System.nanoTime();
GregorianCalendar cal = new GregorianCalendar(new SimpleTimeZone(0, "UTC"), Locale.US);
java.util.Date foreverAgo = new java.util.Date(Long.MIN_VALUE);
ZoneId utcZone = ZoneId.of("UTC");
for (int i = 0; i < loopLimit; i++) {
cal.clear();
cal.setGregorianChange(foreverAgo); // proleptic
cal.set(1, 0, 1, 0, 0, 0); // 0001-01-01
cal.add(GregorianCalendar.DAY_OF_YEAR, dateValue.intValue());
cal.add(GregorianCalendar.MILLISECOND, (int) (timeValue / 10000L));
LocalDateTime theOldWay = LocalDateTime.ofInstant(cal.toInstant(), utcZone);
}
System.out.printf("(old) java.util: %d ms for %d iterations%n",
(System.nanoTime() - t0) / 1000000,
loopLimit);
// (old) java.util: 279 ms for 1000 iterations
// the new way
t0 = System.nanoTime();
for (int i = 0; i < loopLimit; i++) {
LocalDateTime theNewWay = java.time.temporal.ChronoUnit.DAYS.addTo(
java.time.temporal.ChronoUnit.NANOS.addTo(
dt2Base,
timeValue * 100L),
dateValue);
}
System.out.printf("(new) java.time: %d ms for %d iterations%n",
(System.nanoTime() - t0) / 1000000,
loopLimit);
// (new) java.time: 23 ms for 1000 iterations I ran the test several times and the comments showing the console output are representative values from my (very old) test machine. |
Here are some answers to your question
Validates his claim of being around 5->6x faster.
|
Thanks to @cogman for confirming that unpacking the ... if we ultimately want a But if we ultimately want a // Test Data - TDS protocol delivers datetime2 values as two integers:
// 1) the number of days since 0001-01-01, and
// 2) the number of 100-nanosecond "ticks" since the beginning of the current day
Long dateValue = 737240L; // 2019-07-01
Long timeValue = 432000000000L; // 12:00:00
//
int loopLimit = 1000;
Long t0 = null; // loop timer start time
// the old way
t0 = System.nanoTime();
GregorianCalendar cal = new GregorianCalendar(new SimpleTimeZone(0, "UTC"), Locale.US);
java.util.Date foreverAgo = new java.util.Date(Long.MIN_VALUE);
ZoneId utcZone = ZoneId.of("UTC");
for (int i = 0; i < loopLimit; i++) {
cal.clear();
cal.setGregorianChange(foreverAgo); // proleptic
cal.set(1, 0, 1, 0, 0, 0); // 0001-01-01
cal.add(GregorianCalendar.DAY_OF_YEAR, dateValue.intValue());
cal.add(GregorianCalendar.MILLISECOND, (int) (timeValue / 10000L));
java.sql.Timestamp tsOld = new java.sql.Timestamp(cal.getTimeInMillis());
}
System.out.printf("(old) java.util: %d ms for %d iterations%n",
(System.nanoTime() - t0) / 1000000,
loopLimit);
// (old) java.util: 251 ms for 1000 iterations
// the new way
t0 = System.nanoTime();
LocalDateTime dt2Base = LocalDateTime.of(1, 1, 1, 0, 0);
for (int i = 0; i < loopLimit; i++) {
LocalDateTime theNewWay = java.time.temporal.ChronoUnit.DAYS.addTo(
java.time.temporal.ChronoUnit.NANOS.addTo(
dt2Base,
timeValue * 100L),
dateValue);
java.sql.Timestamp tsNew = java.sql.Timestamp.from(theNewWay.toInstant(ZoneOffset.UTC));
}
System.out.printf("(new) java.time: %d ms for %d iterations%n",
(System.nanoTime() - t0) / 1000000,
loopLimit);
// (new) java.time: 360 ms for 1000 iterations If we wanted our |
@gordthompson I believe your benchmark might be falling prey to some common java microbenchmarking problems. (No warmup, not consuming the result, etc). If I were to guess, your method isn't getting fully optimized since it isn't "hot" enough. I've ran your scenario through JMH tests and I'm getting different results (but not really surprising).
It isn't surprising because the 2 extra steps to create a Timestamp boil down to a couple of additions and multiplications. The calendar code has a ton of extra checks. It is pretty intense when you start reading everything that goes into "getTimeInMillis" on the cal object. Here is the code I benchmarked with
Also, here is an excellent video by Aleksey Shipilev (Author of JMH and a major player in Java performance) about microbenchmark pitfalls. |
@cogman - Thanks for the tips. Clearly the Java compiler and JRE are a lot more clever than I gave them credit for. Hoping to provide more meaningful input to this discussion, I tried to set up a JMH project in Eclipse using
Based on this Stack Overflow answer if I hack pom.xml to change ... <plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>2.4</version>
</plugin> ... to ... <plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>2.4</version>
<configuration>
<archive>
<manifest>
<addClasspath>true</addClasspath>
<classpathPrefix>lib/</classpathPrefix>
<mainClass>org.sample.MyBenchmark</mainClass>
</manifest>
</archive>
</configuration>
</plugin> (or variations on that theme) I consistently get
Oh well. I tried. |
@gordthompson This is maybe getting off topic. But the problem is that "org.sample.MyBenchmark" doesn't contain an actual main static method. In order to make this run, you need to hit the JMH main class. ( org.openjdk.jmh.Main ) It will do the scanning to find the benchmark classes for you. You can either do that by adding a manifest or by directing java to run that class directly ( It has been a while since I've set this up, but attached is the entirety of my pom file for running jmh. Few key points, it pulls out jar signing, because it is making an uberjar. Instead of using the jar plugin to write the manifest it is doing that as part of the shading plugin. |
@cogman - Thanks again. The build was producing two separate jar files and I was simply trying to run the wrong one. (D'oh!) Running your latest tests on my very-old-and-slow test box yielded
If I'm reading that right, UTC_Timestamp via LocalDateTime is actually more like twice as fast as UTC_Timestamp via GregorianCalendar on my test box. Very different from my previous naïve attempt at benchmarking. Still, it's not really a fair fight. The GregorianCalendar approach has to deal with the baggage of a Calendar object while the LocalDateTime approach just uses a fixed offset (UTC+0) and avoids the whole "messy rest-of-the-world time zone" thing. Since the code needs to be able to produce a Timestamp in any time zone (not just UTC) a "better" comparison would be to create the Timestamp in the JVM default time zone (which for me happens to be America/Denver) ... package org.sample;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.infra.Blackhole;
import java.sql.Timestamp;
import java.time.LocalDateTime;
import java.util.*;
@State(Scope.Thread)
public class MyBenchmark {
private static long dateValue = 737240L;
private static long timeValue = 432000000000L;
private static LocalDateTime EPOCH = LocalDateTime.of(1, 1, 1, 12, 0);
private GregorianCalendar cal = new GregorianCalendar(); // default JVM time zone
private java.util.Date foreverAgo = new java.util.Date(Long.MIN_VALUE);
@Benchmark
public void test_read_local_date_reuse_calendar(Blackhole bh) {
bh.consume(makeReusingCalendar());
}
private Timestamp makeReusingCalendar() {
cal.clear();
cal.setGregorianChange(foreverAgo); // proleptic
cal.set(1, 0, 1, 0, 0, 0); // 0001-01-01
cal.add(GregorianCalendar.DAY_OF_YEAR, (int)dateValue);
cal.add(GregorianCalendar.MILLISECOND, (int) (timeValue / 10000L));
return new Timestamp(cal.getTimeInMillis());
}
@Benchmark
public void test_read_local_date_new(Blackhole bh) {
bh.consume(Timestamp.valueOf(EPOCH.plusDays(dateValue)
.plusNanos(timeValue * 100)));
}
} ... and the results I see are ...
Again, if I'm reading things correctly, the LocalDateTime approach took a hit (from 211k ops/s to 144k ops/s) and so did the GregorianCalendar approach (from 95k ops/s to 74k ops/s) but the LocalDateTime approach still seems to be roughly twice as fast. |
@ulvii I think us customers at least are all in agreement. Switching over would generally be a good improvement for the driver both from a performance standpoint and from a ergonomic standpoint in the codebase. If I get some time today I might dive in a bit and see if I can bang something out (or at least how big of an impact it would be). Do you have any more questions for us? |
Hi all, I wrote a quick and dirty patch for benchmarking. I also created a JMH project, but am not sure if I did it right (I haven't gotten any consistent result probably because there are many other processes running on the same machine). |
Hi everyone, |
Hi @gordthompson, @cogman, thanks for your input. I've recently starting looking into this issue, and I've set up my own JMH environment to measure the performance difference between relying on GregorianCalendar vs LocalDateTime. The results reflect the same findings we've discussed in this thread (ultimately, utilizing GregorianCalendar can be anywhere between 2~5 times slower than the LocalDateTime alternative, depending on the context). Therefore, I agree that we should move away from relying on GregorianCalendar in the driver, and I'm considering applying this change to the entire driver, wherever applicable. However, I wanted to clarify one thing - it's the re-construction of the GregorianCalendar that is slow (setting the time values after calling .clear()), and simply retrieving data from the GregorianCalendar instance is fast (i.g. retrieve the TimeZone information). If the client has already provided a GregorianCalendar instance (along with the timezone information) to the driver, then I believe there shouldn't be a problem with using that instance to insert/select temporal objects. This means that the driver only needs to replace instances in the driver where GregorianCalendar instances are being created (or being re-constructed), without having to modify most of the public/internal APIs. Is my understanding of the scope of the problem correct, or does anyone think that we should try to remove reliance on Calendar objects altogether (as much as we can) in the driver? (for example, perhaps try to replace the Calendar parameter in |
@peterbae Creation is definitely the slowest part of the Calendar classes. The big problem with the GregorianCalendar class is that it is HIGHLY thread-unsafe. So any usage of it could potentially trigger thread safety/race condition problems that would be hard to track down. That isn't to say that you couldn't use it in a thread safe manor, but that you'll need to inspect every method called on GregorianCalendar to evaluate whether or not it is mutating internal state (And I mean every method, for example, getTimeZone can mutate internal state). You could add synchronization around it, but that will add overhead that will be hard to capture in a JMH benchmark (because the JVM does neat and terrible things with locks that will hide actual costs). IMO, the safest thing to do is replace it outright rather than try to share instances of it across threads. The new localdate stuff is all thread safe (it is all immutable objects). Next safest and probably fastest (but potentially costly memory wise) would be thread local storage for the calendar object. |
Hi @cogman, thanks for the input. I agree that we should use as little of GregorianCalendar as possible. I'd like to replace all instances of Calendar in the driver, but the JDBC specs state that the driver needs to provide public APIs that accept Calendar objects (mostly from getters/setters for temporal objects in prepared/callable statements). Looking at how the driver actually uses the Calendar objects passed from the client, most of the time they're used for the timezone information. I'll look into extracting the necessary information from the Calendar object as soon as it's passed from the public API, while replacing as many instances of Calendar parameters with TimeZone parameters in internal methods. I'll also make sure that the driver never initializes any GregorianCalendar objects, either. It'll take a while, but I'll update everyone once the initial PR/design has been made. Please let me know if you have any other input in the meantime. |
Hi @gordthompson, @cogman , I've been working on converting all instances of Calendar to ZonedDateTime in the driver for a while. I have an initial implementation (https://github.com/peterbae/mssql-jdbc/tree/CalToZDT) that's still buggy when it comes to passing all of our internal testing. However, more importantly, I realized that this isn't feasible within the current implementation context of the driver. There's a couple of problems with completely replacing Calendar with ZonedDateTime:
I think it's possible to solve the offset value problem by finding the expected offset value from Calendar, compare that to the offset value given by ZonedDateTime, then padding the difference to ZonedDateTime object (example: dtv.java#677), but this introduces major performance degradation. Here's the performance data I've measured before/after my changes, using my JMH framework: Before my changes:
After my changes:
The meaningful results are from the Also, having to do this conversion from the offset data from Calendar means the driver isn't completely moving away from relying on Calendar objects anymore, so there's less reason to move forward with this change. I'm not 100% certain if I'm doing everything correctly, but so far my findings show that replacing Calendar to ZonedDateTime in the driver is not reasonable. However, going back to the original |
Are you sure the old behavior is required and intended rather than a bug? My understanding is SQL Server is proleptic and java.time is proleptic so there shouldn't be a problem. We had a similar issue in pgjdbc pgjdbc/pgjdbc#1534, turned out the code was buggy and never worked. Now the driver simply returns the database values. |
I agree with @marschall . When retrieving a |
Thanks @marschall @gordthompson, I agree with this assessment as well. I think that means we should be able to solve the first problem easily. Do we have a way to deal with the second problem? In the meantime, I'll be looking into fixing the original issues from #1070 and #1088. |
I would argue that it is not necessary to ensure that values unpacked to a In other words, if |
I'm with @gordthompson on this one. I think as far as bugs go, while there may be some cases where this causes issues, I can't imagine there are very many applications that care about the exact timestamp of a date from before 1884. Doesn't really seem like something you'd run into all that often. I know I'll probably eat these words though as I'm sure some legal document engine is critically dependent on the exact minute some 14th century document was signed. |
Thanks everyone for your ideas. I agree that we don't necessarily have to make sure that To summarize, I'm suggesting that I should fix the original issue that was brought up by @LavaBear instead of doing a complete overhaul of how we send/retrieve temporal data. If we really want to replace all instances of |
I don't think you should be looking to simply replace The following POC code illustrates what I mean. It shows that with package com.example.sandbox;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.time.LocalDateTime;
public class SandboxMain {
public static void main(String[] args) throws Throwable {
for (int precision = 0; precision <= 7; ++precision) {
decodeDatetime2ForPrecision(precision);
}
}
private static void decodeDatetime2ForPrecision(int precision) {
byte[] tdsByteArray = getDatetime2BytesFromTdsPayload(precision);
// decode datetime2 bytes as per TDS specification
//
// https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/786f5b8a-f87d-4980-9070-b9b7274c681d
Integer multiplier = null; // for conversion to nanoseconds
switch (precision) {
case 0: multiplier = 1_000_000_000; break;
case 1: multiplier = 100_000_000; break;
case 2: multiplier = 10_000_000; break;
case 3: multiplier = 1_000_000; break;
case 4: multiplier = 100_000; break;
case 5: multiplier = 10_000; break;
case 6: multiplier = 1_000; break;
case 7: multiplier = 100; break;
}
System.out.printf("Precision = %d%n", precision);
int timeLength = tdsByteArray.length - 3; // DATE part is always 3 bytes
ByteBuffer bb = ByteBuffer.allocate(12);
bb.put(tdsByteArray, 0, timeLength); // TIME part
bb.position(8);
bb.put(tdsByteArray, timeLength, 3); // DATE part
bb.position(0);
bb.order(ByteOrder.nativeOrder());
long fractionalSeconds = bb.getLong();
System.out.printf(" TIME part: %d%n", fractionalSeconds);
int days = bb.getInt();
System.out.printf(" DATE part: %d%n", days);
LocalDateTime ldt = LocalDateTime.of(1, 1, 1, 0, 0)
.plusDays(days)
.plusNanos(fractionalSeconds * multiplier);
System.out.printf(" LocalDateTime: %s%n", ldt.toString());
}
private static byte[] getDatetime2BytesFromTdsPayload(int precision) {
// --- Test Data ---
//
// test case: SELECT CAST('2017-01-01 01:23:45.1' AS DATETIME2(precision))
//
byte[] tdsBytes = null; // actual TDS bytes from server (via Wireshark)
switch (precision) {
case 0:
tdsBytes = createByteArray("a1 13 00 49 3c 0b");
break;
case 1:
tdsBytes = createByteArray("4b c4 00 49 3c 0b");
break;
case 2:
tdsBytes = createByteArray("ee aa 07 49 3c 0b");
break;
case 3:
tdsBytes = createByteArray("4c ad 4c 00 49 3c 0b");
break;
case 4:
tdsBytes = createByteArray("f8 c4 fe 02 49 3c 0b");
break;
case 5:
tdsBytes = createByteArray("b0 b1 f3 1d 00 49 3c 0b");
break;
case 6:
tdsBytes = createByteArray("e0 f0 84 2b 01 49 3c 0b");
break;
case 7:
tdsBytes = createByteArray("c0 68 31 b3 0b 49 3c 0b");
break;
}
return tdsBytes;
}
private static byte[] createByteArray(String s) {
String[] stringArray = s.split(" ");
byte[] byteArray = new byte[stringArray.length];
for (int i = 0; i < stringArray.length; ++i) {
byteArray[i] = (byte) Integer.parseInt(stringArray[i], 16);
}
return byteArray;
}
} Console output:
|
BTW, I just did another test using a datetime2 value from the year 1066 and that unpacked correctly as well. |
Hi all, I've made PR #1200 that addresses this issue. I was able to eliminate the usage of Calendar when it comes to most getter cases, except for the cases where the user provides their own Calendar instance (this automatically implies the new code doesn't affect DateTimeOffset scenarios, since DateTimeOffset is always associated with a UTC timezone Calendar). I've also optimized the getter scenarios for LocalDateTime/LocalDate/LocalTime. I've updated my JMH test suite to cover all of the data conversion scenarios. The results can be found here (dev) and here (with current change). The data shows that we see a significant performance improvement whenever there is no Calendar object involved. The reason why I decided to use the old code path when a custom Calendar object is involved is because I couldn't quite get all of the edge cases correct without using the Calendar that was provided by the user (whenever it is provided), and my implementation wasn't very optimized to a point where it was performing worse compared to simply relying on the old code path. Please let me know if there's a way to completely eliminate the use of Calendar object even when the user has provided their own Calendar instance. I'm looking to see if I can apply the same changes to the setter scenarios so that we can optimize the performance to both getter/setter scenarios (and ideally make it into the next preview). In the meantime, if anyone has any suggestions/comments, please let me know. |
8.1.1 Preview Release is out, which contains this fix. Please give the preview a try, and let me know of any further possible improvements that could be made. Otherwise, I will close this issue in due time. |
Driver version
7.2.1.jre8
SQL Server version
Microsoft SQL Server 2014 (SP3) (KB4022619) - 12.0.6024.0 (X64)
Sep 7 2018 01:37:51
Copyright (c) Microsoft Corporation
Developer Edition (64-bit) on Windows NT 6.3 (Build 9600: )
Client Operating System
Ubuntu 16.04
JAVA/JVM version
openjdk version "12" 2019-03-19
OpenJDK Runtime Environment (build 12+33)
OpenJDK 64-Bit Server VM (build 12+33, mixed mode, sharing)
Table schema
Problem description
Expected behaviour:
Actual behaviour:
Error message/stack trace:
Any other details that can be helpful:
Reproduction code
The text was updated successfully, but these errors were encountered: