Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve loadRecords performance in SQLiteNormalizedCache #1519

Conversation

bharath2020
Copy link
Contributor

@bharath2020 bharath2020 commented Nov 15, 2020

Motivation:

We noticed the performance of loadRecords on SQLiteNormalizedCache degrades especially when the query field contains arguments making the CacheKey string larger in memory size. Per our readings, loading a nominal 200 records from the SQLite store takes around 4.2x seconds, and the fix in this PR decreases time consumed by loadRecords to 0.8 seconds an improvement of 4x the current time.

Here is the screenshot of TimeProfile indicating the time spent in the == infix operator.

Screen Shot 2020-11-13 at 12 40 50 PM

What is the root cause of the performance issue?

To answer that let us take a look at existing code in the loadRecords method

let recordsOrNil: [Record?] = keys.map { key in  // Loop N times
         if let recordIndex = records.firstIndex(where: { $0.key == key }) {	   // Loop N Times    
              recordsIndexMap[key].flatMap { records[$0] }
              return records[recordIndex]		
         }		
      return nil		
}

Here are a few problems from the above code:

  1. The first two lines from the above code have a complexity of O(N^2) excluding the key comparison using ==.
  2. looking at the stack trace from the Time Profile in the attached screenshot, reveals that string comparison is not using the hash and instead compares each string for equality. This would increase the complexity exponentially by another magnitude.

The performance is evident in queries that involve query fields with arguments as shown in the below example

query {
     searchEmployees(organizationId: String!): [Employee]
}

For the query searchEmployees(organizationId: "<16 char UUID>"), a sample CacheKey for a record would be QUERY_ROOT.searchEmployees(orignzationId: "<16 char UUID>").<EmployeeField> and this makes CacheKey comparison even more time-consuming as the length of the CacheKey increases, which is what happened in our case.

Solution:

 let recordsIndexMap = records.indices.reversed().reduce(into: [:]) { resultMap, index in
        resultMap[records[index].key, default: index] = index
  }

 let recordsOrNil: [Record?] = keys.map { key in
        recordsIndexMap[key].flatMap { records[$0] }
 }

Reduce the complexity of the code snippet to O(N) following the below steps:

  • Convert the list of records [Record] returned by selectRecords method into a hashmap recordsIndexMap with key being CacheKey and value is the index of the record in [Record]. This provides an O(1) time complexity to lookup recordsIndexMap for a given CacheKey to find the matching Record from the list of records [Record] l
  • Construct recordsOrNil while doing a lookup with recordsIndexMap (basically a hashmap of CacheKeys) making the entire code snippet's complexity O(N)
    thus bringing down the overall performance by the 2 fold magnitude

@bharath2020
Copy link
Contributor Author

Looks like the tests are bit flakey. Also, I do not have permission to re-run the tests. Please advise.

@designatednerd
Copy link
Contributor

Yes, the tests are extremely flaky - I'll kick them.

That's awesome that it seems like it helps performance a bunch - can you explain a bit about why this improves performance? Would also love to hear thoughts from @martijnwalraven, who's spent the last several weeks poking around in the cache

@bharath2020
Copy link
Contributor Author

@designatednerd I have updated the original PR description (so future readers find it in one place) explaining the root cause of the problem and how I solved it. Hope it explains well.

Co-authored-by: TizianoCoroneo <tizianocoroneo@me.com>
@bharath2020 bharath2020 force-pushed the fix/load-records-performance-issue branch from fd31996 to 293dd3b Compare November 16, 2020 14:07
@designatednerd
Copy link
Contributor

Awesome, thanks so much for adding the detailed explanation! Again, going to defer to @martijnwalraven for final approval but the explanation helps a ton.

@martijnwalraven
Copy link
Contributor

@bharath2020 Good catch! This looks all right to me, so I think we should merge it.

I'll be working on larger changes to query execution and caching though, and I think we may be able to eliminate this step completely. The only reason the returned records have to be ordered by the queried keys is because that is currently a requirement of the way DataLoader batch loads records. And since I'll be overhauling data loading, this seems like a good time to change that.

There are also a number of other performance issues that I hope to fix by these changes. If you have any time to work on this, it would be great to have more performance tests in place to test the changes against. I recently added a first test case for parsing, but we have no realistic tests of data loading. And to be honest, 0.8 seconds still seems pretty high to load 200 records from a local database, so I suspect there is more we can do.

@bharath2020
Copy link
Contributor Author

bharath2020 commented Nov 17, 2020

@bharath2020 Good catch! This looks all right to me, so I think we should merge it.
Thanks.

There are also a number of other performance issues that I hope to fix by these changes. If you have any time to work on this, it would be great to have more performance tests in place to test the changes against.

I will see what I can do.

And to be honest, 0.8 seconds still seems pretty high to load 200 records from a local database, so I suspect there is more we can do.

Definitely, 0.8 seconds is too high. I did follow up after the fix to see where the time is being spent. At this time it is still not clear. Will keep looking at it as actively as it is kind of a blocker for us.

@bharath2020
Copy link
Contributor Author

@designatednerd Looks like this PR is good to merge. Let me know If there is anything else I need to take care of. Thanks.

@designatednerd
Copy link
Contributor

waiting on last looks from @martijnwalraven - once I merge this I'll work on cutting a new version, I think it's time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants