Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Fix error on sqlite 3.7 #2697

Merged
merged 3 commits into from
Nov 27, 2017
Merged

Fix error on sqlite 3.7 #2697

merged 3 commits into from
Nov 27, 2017

Conversation

richvdh
Copy link
Member

@richvdh richvdh commented Nov 21, 2017

Create the url_cache index on local_media_repository as a background update, so
that we can detect whether we are on sqlite or not and create a partial or
complete index accordingly.

To avoid running the cleanup job before we have built the index, add a bailout
which will defer the cleanup if the bg updates are still running.

Fixes #2572.

Create the url_cache index on local_media_repository as a background update, so
that we can detect whether we are on sqlite or not and create a partial or
complete index accordingly.

To avoid running the cleanup job before we have built the index, add a bailout
which will defer the cleanup if the bg updates are still running.

Fixes #2572.
@richvdh
Copy link
Member Author

richvdh commented Nov 21, 2017

Hum. We're going to end up doing the background update on the main synapse, which means my carefully-laid plans not to do the cleanup until the index has been rebuilt will be for naught. Any better ideas?

(Not that it really matters, since matrix.org only has a thousand rows in this table; still it feels like a shame for anybody else who might be running the media_repo in a worker)

Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to remove the delta file for creating the index?

@@ -426,8 +431,7 @@ def _expire_url_cache_data(self):

yield self.store.delete_url_cache_media(removed_media)

if removed_media:
logger.info("Deleted %d media from url cache", len(removed_media))
logger.info("Deleted %d media from url cache", len(removed_media))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have we removed the conditional? We added it so that we wouldn't spam the logs with lots of pointless log lines.

Copy link
Member Author

@richvdh richvdh Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it makes a nice closure to

logger.info("Running url preview cache expiry")
.

I don't think that two log lines every ten seconds constitutes a significant amount of spam. It's useful to know it's running; if nothing else it highlights the fact that it was running four times every ten seconds (leading to #2701).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People specifically added a PR as it was annoying, fwiw. Once every ten seconds isn't too bad by itself, but we have quite a few background tasks that should probably be logging to the same extent.

Personally, I'd probably put the first log line after we ask the DB for expired URLs and then log both if any expired media ids were returned or we're at debug logging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People specifically added a PR as it was annoying, fwiw.

wat? people added a PR?

we have quite a few background tasks that should probably be logging to the same extent.

Probably. I'm not going to fix them in this PR though.

Personally, I'd probably put the first log line after we ask the DB for expired URLs and then log both if any expired media ids were returned or we're at debug logging.

I'm really unhappy about background tasks that run silently. Not least because it leads to the sort of fuck-up we had here where it's running much more often than we think it is and it's only good luck which stops the processes racing against each other and doing real damage.

@erikjohnston
Copy link
Member

Hum. We're going to end up doing the background update on the main synapse, which means my carefully-laid plans not to do the cleanup until the index has been rebuilt will be for naught. Any better ideas?

Not really, other than making has_completed_background_updates read the database table. In fact, it would only need to read the database table up until it started returning true, and then it could safely keep returning true?

@erikjohnston erikjohnston assigned richvdh and unassigned erikjohnston Nov 21, 2017
@richvdh
Copy link
Member Author

richvdh commented Nov 21, 2017

Don't we need to remove the delta file for creating the index?

https://github.com/matrix-org/synapse/pull/2697/files#diff-487f7332efcd28fdefeb96ce1ef0dd8eR19 removes the offending update. I don't think we want to remove the whole file.

@richvdh
Copy link
Member Author

richvdh commented Nov 21, 2017

Not really, other than making has_completed_background_updates read the database table. In fact, it would only need to read the database table up until it started returning true, and then it could safely keep returning true?

That sounds sane. I'll look into it.

@erikjohnston
Copy link
Member

https://github.com/matrix-org/synapse/pull/2697/files#diff-487f7332efcd28fdefeb96ce1ef0dd8eR19 removes the offending update. I don't think we want to remove the whole file.

Oh, I'm blind.

so that the right thing happens on workers.
@richvdh richvdh assigned erikjohnston and unassigned richvdh Nov 22, 2017
@richvdh
Copy link
Member Author

richvdh commented Nov 22, 2017

@erikjohnston ptal

@richvdh richvdh merged commit 5a4da5b into develop Nov 27, 2017
@richvdh richvdh deleted the rav/fix_urlcache_index_error branch November 29, 2017 16:03
clokep added a commit that referenced this pull request Aug 30, 2022
…13657)

Media downloaded as part of a URL preview is normally deleted after two days.
However, while a background database migration is running, the process is
stopped. A long-running database migration can therefore cause the media
store to fill up with old preview files.

This logic was added in #2697 to make sure that we didn't try to run the expiry
without an index on `local_media_repository.created_ts`; the original logic that
needs that index was added in #2478 (in `get_url_cache_media_before`, as
amended by 93247a4), and is still present.

Given that the background update was added before Synapse v1.0.0, just drop
this check and assume the index exists.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants