Memory leak using query_method with large data sets #122

bmclean · 2014-12-12T16:18:37Z

This issue is regarding the query_method decorator and large data sets (> 1000 items).

We are experiencing high memory usage when performing queries on large data sets and limiting the results. The memory seems to be allocated for the entire set, not just the results returned. For example, a query for the first 100 entities from a dataset of 2000 will cause an extra 5 MB of RAM to be used on each request and this is never freed. However, if the last 100 entities of the 2000 are requested, no additional memory is used. If we set produce_cursors to False in the query options used in fetch_page, no nextPageToken is returned, and the behaviour is identical to requesting the last 100 entities (no additional memory used). As a fix, we've added the following option to the query_method decorator in model.py (Nov 26, 2014, commit ed91623):

Line 1435:

def query_method(cls,
                 query_fields=(),
                 collection_fields=None,
                 limit_default=QUERY_LIMIT_DEFAULT,
                 limit_max=QUERY_LIMIT_MAX,
                 user_required=False,
                 use_projection=False,
                 produce_cursors=True,
                 **kwargs):

Line 1600:

query_options['produce_cursors'] = produce_cursors
items, next_cursor, more_results = query.fetch_page(
       request_limit, **query_options)

We then call the query_method with produce_cursors=False. Is there a better solution? Or could this fix be considered for implementation?

The text was updated successfully, but these errors were encountered:

dhermes · 2014-12-12T17:12:40Z

@bmclean Thanks for reporting this.

Are you setting limit_default? It seems strange that this would be occurring. Also it appears you have requests coming in with limit set from the query.

Can you post some sample code and a sample query / payload?

bmclean · 2014-12-12T17:58:04Z

Yes, we have limit_default at 100. Here is an example of our call:

    @Transaction.query_method(
        query_fields=('accountId', 'belowVersion', 'limit',),
        path='TransactionGetOlder/{accountId}/{belowVersion}/{limit}',
        user_required=True, name='getOlderTransactions',
        limit_default=100, limit_max=100,
        produce_cursors=True)
    def TransactionGetOlder(self, query):
        return query.order(-Transaction.entityVersion)

(@dhermes Updated to eliminate overflow)

Here is the query:

GET /_ah/api/libraAPI/v1/TransactionGetOlder/E8F5E3BD-808B-11E4-9CA3-80E6500B2C9A/0/100

Here is an example payload:
https://gist.github.com/bmclean/d762d1022dd988469f68#file-payload

dhermes · 2014-12-12T18:12:46Z

@bmclean Thanks for this. I was curious if you were using a reverse query. By using query.order(-Transaction.entityVersion) you are making it so that the "first 100" are really the "last 100".

I'm not 100% about the reason for the large memory use. ndb uses 2 levels of caching, in-memory and memcache (in addition to persisting in the datastore). I think (and am checking now) that asking for the beginning of a reversed query is what is causing the issue.

For large datasets, using paging with limits is really not possible. Instead, cursors are the way to enable paging.

dhermes · 2014-12-12T18:24:58Z

OK I tested (locally) the cache and I still don't understand why 5MB is used.

from google.appengine.ext import ndb

class A(ndb.Model):
  a = ndb.StringProperty()

# 100+ entities already stored for the A model

ctx = ndb.get_context()
print 'Cache 1:', ctx._cache
A(a='zap').put()
print 'Cache 2:', ctx._cache
results, cursor, more = A.query().fetch_page(2)
print 'Cache 3:', ctx._cache
results, cursor, more = A.query().order(-A.a).fetch_page(2)
print 'Cache 4:', ctx._cache

So the above is printing the in-memory cache after a few operations.

The results are as follows (I edited the datastore IDs for viewing):

Cache 1: {}
Cache 2: {Key('A', 1): A(key=Key('A', 1), a=u'zap')}
Cache 3: {Key('A', 1): A(key=Key('A', 1), a=u'zap'),
          Key('A', 2): A(key=Key('A', 2), a=u'46'),
          Key('A', 3): A(key=Key('A', 3), a=u'40')}
Cache 4: {Key('A', 1): A(key=Key('A', 1), a=u'zap'),
          Key('A', 2): A(key=Key('A', 2), a=u'46'),
          Key('A', 4): A(key=Key('A', 4), a=u'zap'),
          Key('A', 5): A(key=Key('A', 5), a=u'zap'),
          Key('A', 3): A(key=Key('A', 3), a=u'40')}

dhermes · 2014-12-12T18:31:42Z

It took me awhile to find, but here is a reference that might help: http://stackoverflow.com/a/3566878/1068170

The fact that endpoints-proto-datastore doesn't use offset makes it a little different though.

bmclean · 2014-12-12T18:33:34Z

We just tried removing the reverse query and it didn't make a difference. One other thing we did try was to turn off both the in-memory cache and the memcache in our Transaction class:

_use_cache = False
_use_memcache = False

and we still see the memory jump up with each query.

dhermes · 2014-12-12T18:35:11Z

@bmclean Thanks for digging. If you do the query outside of endpoints-proto-datastore do you have the same issues? I'm still unclear if the issue is

endpoints-proto-datastore directly
endpoints-proto-datastore calling ndb incorrectly
ndb doing something wrong / unexpected

bmclean · 2014-12-12T19:06:01Z

I executed the following 20 times using the remote_api_stub:

 Transaction.query(ancestor=ndb.Key('Account', account_id)).fetch_page(100, start_cursor=None)

The memory of the python process started at 104.8 MB and ended at 108.1 MB.

When I make the request 20 times using the Google API explorer the memory of the python process started at 106.5 MB and ended at 199.8 MB.

dhermes · 2014-12-12T19:11:11Z

Wow, thanks for this! I am totally fine to accept your fix.

I don't fully understand why it works / what is broken without it.
I'd like to run it by someone who actually (still) works at Google.

Do you want to send a pull request?

dhermes · 2014-12-12T19:24:45Z

# Works just fine
q = Transaction.query(ancestor=ndb.Key('Account', account_id))
result, next_cursor, more = q.fetch_page(100, start_cursor=None)
# Happens in endpoints-proto-datastore
# query_info.cursor is none if not set from the request
items, next_cursor, more_results = query.fetch_page(
            request_limit, start_cursor=query_info.cursor, projection=['field1, 'field2'])

a possible culprit may be

query_info = request_entity._endpoints_query_info
query_info.SetQuery()

since these _EndpointsQueryInfo objects will also hold a (potentially large) entity. It's unclear to me why not having produce_cursors set would allow them to be GC'ed?

Can you add

import gc
potential = [obj for obj in gc.get_objects()
             if 'endpoints_proto_datastore' in type(obj).__module__]

into your code to see where the 5MB bump comes from in remote_api_stub.

dhermes · 2014-12-12T19:34:07Z

Some more potentially helpful links:
https://code.google.com/p/googleappengine/issues/detail?id=9610
http://stackoverflow.com/questions/12095259/ndb-not-clearing-memory-during-a-long-request
Memory profiler: http://stackoverflow.com/a/110826/1068170

bmclean · 2014-12-12T21:02:53Z

We realized that running the query from the remote_api_stub really isn't the same thing. So we added the test query to one of our webapp2 controllers:

q = Transaction.query(ancestor=ndb.Key('Account', account_id))
result, next_cursor, more = q.fetch_page(100, start_cursor=None)

and refreshed the page 20 times. This resulted in the memory climbing to 185 MB. However, if we change the query to a fetch (instead of fetch_page):

q = Transaction.query(ancestor=ndb.Key('Account', account_id))
result = q.fetch(100)

the memory doesn't climb at all.

dhermes · 2014-12-12T21:07:37Z

I read through all of https://code.google.com/p/googleappengine/issues/detail?id=9610 and I'm pretty sure this is what we have encountered.

ISTM the bug is clearly in ndb. However, I am unclear of how to address this within endpoints-proto-datastore. Your fix is nice, but will remove the ability for paging. I suppose for sophisticated users such as yourself, patching the library and using db instead of ndb in query_method will give you a way to eliminate the memory leak but also enable paging.

bmclean · 2014-12-12T21:19:18Z

We are defaulting produce_cursors to True in model.py so the default paging behaviour is unchanged. We then set produce_cursors=False in our API method decorators to explicitly disable it.

Obviously this is a workaround for an ndb bug, not an issue in endpoints-proto-datastore. Based on this behaviour, we're going to avoid fetch_page calls with cursors in our web application as well.

dhermes · 2014-12-12T21:21:18Z

How do you mean "defaulting produce_cursors to True in model.py"? Like as a module globule or defined on the classes themselves?

Also, if you don't call fetch_page, how do you get a cursor back from a query?

bmclean · 2014-12-12T21:33:13Z

We added an argument to the query decorator called produce_cursors which has a default value of True.

Line 1435 of model.py:

def query_method(cls,
                 query_fields=(),
                 collection_fields=None,
                 limit_default=QUERY_LIMIT_DEFAULT,
                 limit_max=QUERY_LIMIT_MAX,
                 user_required=False,
                 use_projection=False,
                 produce_cursors=True,
                 **kwargs):

We then use the decorator like this:

@Transaction.query_method(
        query_fields=('accountId', 'belowVersion', 'limit',),
        path='TransactionGetOlder/{accountId}/{belowVersion}/{limit}',
        user_required=True, name='getOlderTransactions',
        limit_default=100, limit_max=100,
        produce_cursors=False)
def TransactionGetOlder(self, query):
    return query.order(-Transaction.entityVersion)

We don't use a cursor for this call. We sort by an attribute called entityVersion which we pass in to the request and then use an inequality filter to return a subset of the results. This is likely not as efficient as a cursor, but works well.

dhermes · 2014-12-12T21:49:57Z

I see.

bmclean · 2014-12-12T21:55:10Z

If you think our change is useful to the project or you want to see the changes, I can generate a pull request. Just let me know. We certainly appreciate all of the help today @dhermes!

dhermes · 2014-12-12T23:58:32Z

@bmclean I think I understand your change, I just don't understand why it fixes the ndb bug, so let's hold off.

I'm closing out.

/cc @pcostell @Alfus https://code.google.com/p/googleappengine/issues/detail?id=9610 is still out in the wild.

dhermes · 2015-02-06T02:53:37Z

@josemontesdeoca FYI

This is another report and gives some sample code for the memory leak in
https://code.google.com/p/googleappengine/issues/detail?id=9610

In particular, the peaks 185 MB / 199.8 MB of RAM stuck out in my memory after your comment about a peak of 300 MB.

josemontesdeoca · 2015-02-06T05:35:31Z

@dhermes Thanks for the notice! I'll definetly take a closer look

dhermes closed this as completed Dec 12, 2014

dhermes mentioned this issue Jan 27, 2015

Critical memory leaks and problems with NDB queries dhermes/ndb-git#6

Open

dhermes mentioned this issue Mar 18, 2015

Integrating NDB into gcloud.datastore? googleapis/google-cloud-python#557

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak using query_method with large data sets #122

Memory leak using query_method with large data sets #122

bmclean commented Dec 12, 2014

dhermes commented Dec 12, 2014

bmclean commented Dec 12, 2014

dhermes commented Dec 12, 2014

dhermes commented Dec 12, 2014

dhermes commented Dec 12, 2014

bmclean commented Dec 12, 2014

dhermes commented Dec 12, 2014

bmclean commented Dec 12, 2014

dhermes commented Dec 12, 2014

dhermes commented Dec 12, 2014

dhermes commented Dec 12, 2014

bmclean commented Dec 12, 2014

dhermes commented Dec 12, 2014

bmclean commented Dec 12, 2014

dhermes commented Dec 12, 2014

bmclean commented Dec 12, 2014

dhermes commented Dec 12, 2014

bmclean commented Dec 12, 2014

dhermes commented Dec 12, 2014

dhermes commented Feb 6, 2015

josemontesdeoca commented Feb 6, 2015

Memory leak using query_method with large data sets #122

Memory leak using query_method with large data sets #122

Comments

bmclean commented Dec 12, 2014

dhermes commented Dec 12, 2014

bmclean commented Dec 12, 2014

dhermes commented Dec 12, 2014

dhermes commented Dec 12, 2014

dhermes commented Dec 12, 2014

bmclean commented Dec 12, 2014

dhermes commented Dec 12, 2014

bmclean commented Dec 12, 2014

dhermes commented Dec 12, 2014

dhermes commented Dec 12, 2014

dhermes commented Dec 12, 2014

bmclean commented Dec 12, 2014

dhermes commented Dec 12, 2014

bmclean commented Dec 12, 2014

dhermes commented Dec 12, 2014

bmclean commented Dec 12, 2014

dhermes commented Dec 12, 2014

bmclean commented Dec 12, 2014

dhermes commented Dec 12, 2014

dhermes commented Feb 6, 2015

josemontesdeoca commented Feb 6, 2015