-
-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FieldTracker performance #323
Comments
I have created https://github.com/drozdowsky/django-tracking-model to solve this |
Did anyone found a way to disable TrackerField when doing a read operation via api? |
Initially I thought that graphene-django was the cause of my performance issues, but it turns out that FieldTracker was performing tons of deepcopy operations. I saw little difference between graphene-django and DRF but a huge difference in disabling FieldTracker. Details and timing here: graphql-python/graphene#268 (comment) I'll try and see if there is something about how graphene works that causes the FieldTracker to spend so much time in Curious about the context around your API - packages used to implement it? @furious-luke @mgaby25 I have used FieldTracker in the past with DRF and never experienced such a performance hit. Here, using it with both DRF and graphene/graphene-django it is substantial. Didn't have any issues a long time ago with:
Have substantial issues with now:
|
@mgaby25 @kevindice I thought I would share my solution to this which allows you to conditionally disable the _field_tracking_disabled = contextvars.ContextVar("field_tracking_disabled", default=False)
@contextlib.contextmanager
def field_tracking_disabled():
"""
A context manager that can be used to disable any and all field tracking within a Django model. There is a penalty
in initializing the "before" state of all fields for models when they are read and we do not have to pay that penalty
if we are not actually updating anything on the model. This is applied by default to all GET requests within the
ConditionalFieldTrackingMiddleware.
Example usage:
with field_tracking_disabled():
some_model = SomeModel.objects.get(id=...) # no 'before state' values are tracked
do_something(some_model)
Attempting updates to a model read within this context will cause an exception:
with field_tracking_disabled():
some_model = SomeModel.objects.get(id=...)
some_model.name = "abc123"
some_model.save() # raises a ValueError
The update prevention is applied to the lifecycle of the entire model instance and is initialized when the model is
read. Thus, this will also cause an exception:
with field_tracking_disabled():
some_model = SomeModel.objects.get(id=...)
some_model.name = "abc123"
some_model.save() # raises a ValueError
Finally, this also prevents any access of the 'tracker' attribute on a model as it would otherwise cause inconsistent
results as the initial state was never applied:
with field_tracking_disabled():
some_model = SomeModel.objects.get(id=...)
some_model.name = "abc123"
some_model.tracker.changed() # raises a ValueError
some_model.save() # raises a ValueError
This context manager is both threadsafe and async-safe.
"""
token = _field_tracking_disabled.set(True)
try:
yield
finally:
_field_tracking_disabled.reset(token)
def is_field_tracking_disabled() -> bool:
return _field_tracking_disabled.get()
class ConditionalFieldTracker(FieldTracker):
"""An implementation of FieldTracker that allows itself to be disabled"""
field_tracking_disabled_attr = "_field_tracking_disabled"
def __init__(self, fields=None):
super().__init__(fields)
def initialize_tracker(self, sender, instance, **kwargs):
"""Invoked as a post_init signal of the given instance"""
# The post_init signal is registered in finalize_class, which is registered in contribute_to_class,
# a Django model hook point (see http://lazypython.blogspot.com/2008/11/django-models-digging-little-deeper.html)
# The super().initialize_tracker part of this is the problematic part and specifically the
# set_saved_fields() method. We need to avoid that and the deepcopy() within it.
# If field tracking is disabled, this will prevent all access to the 'tracker' (or self.name) property to prevent
# strange results
if is_field_tracking_disabled():
if not isinstance(instance, self.model_class):
# Necessary to ensure that our patching only happens _once_ because the post_init signal is registered for all models
return # Only init instances of given model (including children)
setattr(instance, self.field_tracking_disabled_attr, is_field_tracking_disabled())
else:
super().initialize_tracker(sender, instance, **kwargs)
def __get__(self, instance, owner):
if instance is None:
return self
if getattr(instance, self.field_tracking_disabled_attr, False):
raise ValueError(
f"The {self.name} attribute of {instance.__class__} cannot be accessed when the model is initialized"
" within a field_tracking_disabled context block. Either re-read the model outside of this block or"
" modify settings.MODIFYING_GET_URL_REGEXES")
else:
return super().__get__(instance, owner)
def patch_save(self, model):
original_model_save = model.save
super().patch_save(model)
tracker_patched_save = model.save
def conditional_tracker_save(instance, *args, **kwargs):
is_update = bool(not instance._state.adding)
field_tracking_disabled = getattr(instance, self.field_tracking_disabled_attr, False)
if is_update and field_tracking_disabled:
raise ValueError(
f"An instance of {instance.__class__} was initialized within a field_tracking_disabled"
" context block and updates will not work as expected! If this is intentional, you might need to"
" modify settings.MODIFYING_GET_URL_REGEXES")
elif field_tracking_disabled:
# In this case we did not initialize the _tracker attribute on the model since the initialize_tracker
# was blocked. Therefore it is not safe to call the tracker version of the save, so instead invoke the
# unmodified version
return original_model_save(instance, *args, **kwargs)
else:
return tracker_patched_save(instance, *args, **kwargs)
model.save = conditional_tracker_save Then I added a middleware that disables it for all GET requests that can also look for a class ConditionalFieldTrackingMiddleware:
"""
Disables all field tracking on a GET request and hooks up extra validation to ensure there is never an update during
a GET that might impact what happens in the AuditLog or any other place that we care about FieldTracker results
"""
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
def continue_middleware():
return self.get_response(request)
is_whitelisted = False
for regex in settings.MODIFYING_GET_URL_REGEXES:
is_whitelisted = bool(re.findall(regex, request.get_full_path()))
if is_whitelisted:
break
if request.method == 'GET' and not is_whitelisted:
with field_tracking_disabled():
return continue_middleware()
else:
return continue_middleware() To use this, follow the same docs but use from django.db import models
from model_utils import FieldTracker
class Post(models.Model):
title = models.CharField(max_length=100)
body = models.TextField()
tracker = ConditionalFieldTracker() |
Did the most recent changes in https://github.com/jazzband/django-model-utils/releases/tag/4.2.0 affect this? |
@easherma-truth the fix I referenced above was targeting |
I noticed a bottleneck when using a large quantity of models that inherit from FieldTracker, specifically
I have a monkeypatch that relies on monkeypatching django-model-utils/model_utils/tracker.py Lines 343 to 349 in 6d4112e
my newly defined i would of course love to not monkeypatch this lib, please consider routing requests with a dictionary lookup on a cheap hashable if possible rather than relying on while true exec isinstance 🙇 |
@dylan-shipwell could you share your monkey patch? |
Adding django-model-utils/model_utils/tracker.py Line 346 in 46d3392
I mean models.signals.post_init.connect(self.initialize_tracker, sender=sender) Either ways a solution to read-only model queries is still needed in many cases. 🙇♀️ |
We've found that for our use-cases the field tracker can consume a not insignificant amount of time when attached to models being queried from our APIs. It adds about 150ms to our list requests, which doesn't seem like much, but is currently the single largest time sink in the query.
We'd really like to find a way to switch off the field tracking for these instances where we know the model will be used only for reads. The problem is that we can't come up with a clean way of passing this information into the queryset's model instantiation. Just wondering if you've given any thought to allowing the field tracker to be switched off for certain queries?
The text was updated successfully, but these errors were encountered: