Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLServerResultSet.findColumn(String columnName) should cache result #1055

Closed
stulscott opened this issue May 13, 2019 · 7 comments
Closed
Assignees
Labels
Question Used when a question is asked, as opposed to an issue being raised

Comments

@stulscott
Copy link

The performance of retrieving a value from SQLServerResultSet by column name could be vastly improved by caching the result of the findColumn(columnName) method. Since this is such a simple to implement but hugely beneficial change I am curious if there is a reason why this has not yet been implemented. I am happy to create a patch if it is agreed by the community here that it is desirable to add this feature to the driver.

@stulscott stulscott added the Question Used when a question is asked, as opposed to an issue being raised label May 13, 2019
@rene-ye
Copy link
Member

rene-ye commented May 13, 2019

Hi @stulscott, please feel free to create a PR against the repo and the team will review it in the near future.

@rene-ye
Copy link
Member

rene-ye commented May 22, 2019

Hi @stulscott, #1066 attempts to address this issue, please take a look to see if it's within expectations or even feel free to review it.

@stulscott
Copy link
Author

Hi @stulscott, #1066 attempts to address this issue, please take a look to see if it's within expectations or even feel free to review it.

#1066 appears to only affect SQLServerCallableStatement. SQLServerResultSet.findColumn() still iterates through all columns on every call.

@rene-ye rene-ye self-assigned this May 24, 2019
@rene-ye
Copy link
Member

rene-ye commented May 27, 2019

Hi @stulscott, I took a look at the implementation for findColumn() in SQLServerResultSet, and came to the conclusion that implementing caching would not always improve performance, and will degrade performance in some cases. The implementation as of now just iterates through the list of columns and does a string compare via a getter. If I understand what you want, you'd like the driver to cache column names (either as it iterates or all at once), and perform some kind of quick lookup (HashMap) instead of iterating through the column list every time. This approach might sometimes be faster but it leads to some issues.

  1. Requires HashMap as well as a TreeMap. HashMap is incapable of doing both a case sensitive AND a case insensitive lookup.
  2. Requires more memory. If we were to implement both the maps, in worst case scenarios the driver would require 3x the memory it's currently using for column name storage.
  3. The time it takes to hash/construct these maps may outweigh the current approach if findColumn() is not called many times.

There are also other open ended implementation decisions that impact the performance such as:

  • When do we populate the maps? Do we do it immediately after we receive all columns or do we do it on the first findColumns() call?
  • Do we populate the maps all at once or only the columns we've already iterated through? If not all at once, are cache misses going to significantly impact performance?

I'd like your opinion on the points above and on what solution you envisioned when creating the issue.

@stulscott
Copy link
Author

Hi @rene-ye. I've attached a patch. Hopefully that answers your questions. Essentially you just cache the column name provided. Yes if the caller passes the column name with different cases it will be cached multiple times but that would seem a very unlikely scenario. There is a small cost to creating the hash map and caching the discovered indices but that will only be a problem if there is only one row in the result set. As soon as there are multiple rows and multiple columns being referenced by name, caching the indices drastically improves performance. In fact this is the main reason why I use JTDS instead of this driver.

column_to_index_cache.txt

@rene-ye
Copy link
Member

rene-ye commented May 30, 2019

Hi @stulscott, I've applied the changes you've requested to the PR, and the team will review whether it'll be merged. Please feel free to make comments/review the PR if something stands out to you.

@ulvii
Copy link
Contributor

ulvii commented Sep 24, 2019

#1066 merged.

@ulvii ulvii closed this as completed Sep 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Question Used when a question is asked, as opposed to an issue being raised
Projects
None yet
Development

No branches or pull requests

3 participants