SQLServerResultSet.findColumn(String columnName) should cache result #1055

stulscott · 2019-05-13T06:51:29Z

The performance of retrieving a value from SQLServerResultSet by column name could be vastly improved by caching the result of the findColumn(columnName) method. Since this is such a simple to implement but hugely beneficial change I am curious if there is a reason why this has not yet been implemented. I am happy to create a patch if it is agreed by the community here that it is desirable to add this feature to the driver.

rene-ye · 2019-05-13T16:00:29Z

Hi @stulscott, please feel free to create a PR against the repo and the team will review it in the near future.

rene-ye · 2019-05-22T20:28:05Z

Hi @stulscott, #1066 attempts to address this issue, please take a look to see if it's within expectations or even feel free to review it.

stulscott · 2019-05-24T02:46:33Z

Hi @stulscott, #1066 attempts to address this issue, please take a look to see if it's within expectations or even feel free to review it.

#1066 appears to only affect SQLServerCallableStatement. SQLServerResultSet.findColumn() still iterates through all columns on every call.

rene-ye · 2019-05-27T20:10:07Z

Hi @stulscott, I took a look at the implementation for findColumn() in SQLServerResultSet, and came to the conclusion that implementing caching would not always improve performance, and will degrade performance in some cases. The implementation as of now just iterates through the list of columns and does a string compare via a getter. If I understand what you want, you'd like the driver to cache column names (either as it iterates or all at once), and perform some kind of quick lookup (HashMap) instead of iterating through the column list every time. This approach might sometimes be faster but it leads to some issues.

Requires HashMap as well as a TreeMap. HashMap is incapable of doing both a case sensitive AND a case insensitive lookup.
Requires more memory. If we were to implement both the maps, in worst case scenarios the driver would require 3x the memory it's currently using for column name storage.
The time it takes to hash/construct these maps may outweigh the current approach if findColumn() is not called many times.

There are also other open ended implementation decisions that impact the performance such as:

When do we populate the maps? Do we do it immediately after we receive all columns or do we do it on the first findColumns() call?
Do we populate the maps all at once or only the columns we've already iterated through? If not all at once, are cache misses going to significantly impact performance?

I'd like your opinion on the points above and on what solution you envisioned when creating the issue.

stulscott · 2019-05-29T05:27:59Z

Hi @rene-ye. I've attached a patch. Hopefully that answers your questions. Essentially you just cache the column name provided. Yes if the caller passes the column name with different cases it will be cached multiple times but that would seem a very unlikely scenario. There is a small cost to creating the hash map and caching the discovered indices but that will only be a problem if there is only one row in the result set. As soon as there are multiple rows and multiple columns being referenced by name, caching the indices drastically improves performance. In fact this is the main reason why I use JTDS instead of this driver.

column_to_index_cache.txt

rene-ye · 2019-05-30T19:15:27Z

Hi @stulscott, I've applied the changes you've requested to the PR, and the team will review whether it'll be merged. Please feel free to make comments/review the PR if something stands out to you.

ulvii · 2019-09-24T23:06:37Z

#1066 merged.

stulscott added the Question Used when a question is asked, as opposed to an issue being raised label May 13, 2019

rene-ye mentioned this issue May 22, 2019

Performance | Improved performance of column name string lookups #1066

Merged

rene-ye self-assigned this May 24, 2019

ulvii closed this as completed Sep 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQLServerResultSet.findColumn(String columnName) should cache result #1055

SQLServerResultSet.findColumn(String columnName) should cache result #1055

stulscott commented May 13, 2019

rene-ye commented May 13, 2019

rene-ye commented May 22, 2019

stulscott commented May 24, 2019

rene-ye commented May 27, 2019 •

edited

Loading

stulscott commented May 29, 2019

rene-ye commented May 30, 2019 •

edited

Loading

ulvii commented Sep 24, 2019

SQLServerResultSet.findColumn(String columnName) should cache result #1055

SQLServerResultSet.findColumn(String columnName) should cache result #1055

Comments

stulscott commented May 13, 2019

rene-ye commented May 13, 2019

rene-ye commented May 22, 2019

stulscott commented May 24, 2019

rene-ye commented May 27, 2019 • edited Loading

stulscott commented May 29, 2019

rene-ye commented May 30, 2019 • edited Loading

ulvii commented Sep 24, 2019

rene-ye commented May 27, 2019 •

edited

Loading

rene-ye commented May 30, 2019 •

edited

Loading