Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Do not try to store invalid data in the stats table #8226

Merged
merged 7 commits into from
Sep 2, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/8226.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fixes a longstanding bug where stats updates could break when unexpected profile data was included in events.
clokep marked this conversation as resolved.
Show resolved Hide resolved
5 changes: 3 additions & 2 deletions synapse/storage/databases/main/stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,8 @@ async def update_room_state(self, room_id: str, fields: Dict[str, Any]) -> None:
"""

# For whatever reason some of the fields may contain null bytes, which
# postgres isn't a fan of, so we replace those fields with null.
# postgres isn't a fan of, or be an unexpected type. Replace those
# fields with null.
for col in (
"join_rules",
"history_visibility",
Expand All @@ -242,7 +243,7 @@ async def update_room_state(self, room_id: str, fields: Dict[str, Any]) -> None:
"canonical_alias",
):
field = fields.get(col)
if field and "\0" in field:
if not isinstance(field, str) or "\0" in field:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it matters, but this changes the behaviour of fields that do not exist or are NULL.

Before, they would not be part of an upsert — now they will be upserted to NULL.

Is this fine? I get a funny feeling it may not be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this would violate the assumption that seems to be part of

await self.store.update_room_state(room_id, state)
; I believe it will give you a dict only populated with the fields that have changed.

I must apologise for the lack of docstring which makes it clear, though it also looks like I inherited this function..?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was assuming that the dictionary would include strings or None for all fields, but you're 100% correct that this does slightly change the behavior!

In the case of invalid data, does upserting with None make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code as-is will never "clear" a value (e.g. if topic goes from "foo" -> None it will be stale and stay at "foo" forever). I'm unsure if this is correct behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code as-is will never "clear" a value (e.g. if topic goes from "foo" -> None it will be stale and stay at "foo" forever). I'm unsure if this is correct behavior.

I guess I didn't think too hard about unsetting topics; can you do that, or would it be empty string? (not that empty string was properly handled to begin with, by the looks)...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think upserting with None when you get invalid data is fine, if that corresponds with the interpretation that clients, etc are meant to have.

If you get a history_visibility entry with an invalid value, what happens? Stays the same, or resets to some 'default' value? I suppose that is ultimately what might guide us here.

(Though this isn't used for any important information other than the room directory really, so as long as it always leads to a 'safer' outcome I suppose it's fine; e.g. not revealing private rooms in the public room list even if they have sent nonsense visibility and join rule state events)...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed some changes which should stop overriding data that wasn't submitted. I also fixed the issue of being unable to submit None values into the method. I'm not 100% sure that's correct or not though.

fields[col] = None

await self.db_pool.simple_upsert(
Expand Down