-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Track when the pulled event signature fails #13815
Changes from 7 commits
924ae2b
d240aeb
cfb4e88
3d8507d
88a75cf
d29ac0b
14e39ee
7898371
43f1d1a
83feb1b
7d102e8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Keep track when an event pulled over federation fails its signature check so we can intelligently back-off in the future. |
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -13,7 +13,7 @@ | |||||||||||
# See the License for the specific language governing permissions and | ||||||||||||
# limitations under the License. | ||||||||||||
import logging | ||||||||||||
from typing import TYPE_CHECKING | ||||||||||||
from typing import TYPE_CHECKING, Awaitable, Callable, Optional | ||||||||||||
|
||||||||||||
from synapse.api.constants import MAX_DEPTH, EventContentFields, EventTypes, Membership | ||||||||||||
from synapse.api.errors import Codes, SynapseError | ||||||||||||
|
@@ -58,7 +58,12 @@ def __init__(self, hs: "HomeServer"): | |||||||||||
|
||||||||||||
@trace | ||||||||||||
async def _check_sigs_and_hash( | ||||||||||||
self, room_version: RoomVersion, pdu: EventBase | ||||||||||||
self, | ||||||||||||
room_version: RoomVersion, | ||||||||||||
pdu: EventBase, | ||||||||||||
record_failure_callback: Optional[ | ||||||||||||
Callable[[EventBase, str], Awaitable[None]] | ||||||||||||
] = None, | ||||||||||||
) -> EventBase: | ||||||||||||
"""Checks that event is correctly signed by the sending server. | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be good to see a description of at least There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (done in a later commit; in future please consider collapsing those into the same commit so that commit by commit review is possible for mere mortals like me who can't track the full thing ;-)) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My PR's are definitely not commit by commit review friendly. I find working that way in the first place is impossible. And rebasing and splitting multiple changes in the same file is a super chore. I'd rather split stuff by PR which feel free to call out if you see something to split out. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you can make them a bit more CbC friendly, I would really appreciate it to make reviews less daunting if anything. (Maybe I'm the only one though? Definitely not a requirement, but just a preference from me especially when I'm not familiar with the area of code) Since you seem to have the idea that it's difficult to do: My technique for things like this is to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. P.S. I find splitting by commits especially useful to avoid conflating 'style' changes (e.g. moving methods, renaming methods) and semantic changes. I think asking you to put these in separate PRs would be too much, though and it would certainly hurt the latency of review. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What are the team thoughts around rebasing? Anyone annoyed/worried about references changing after they already pulled the branch? If you're just looking for some of the low hanging fruit of pulling out some style changes or merging some subsequent changes back to the original chunk, that seems a lot more do-able. If we're talking about building up iteratively commit by commit, I have a lot more apprehension to put effort into doing that. I assume you don't want me to rebase things together from different review cycles so you can easily review the things that have changed. What about merge commits? I find it a lot harder to rebase around merge commits and would prefer to just rebase my branch than merge A note around rebased history: I really like the GitLab diff versioning so even if you rebase, you can still reference old diff. I used to use this all the time over there. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I have similar experiences with GL and reviewboard. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I believe the rough consensus is: "don't rebase after someone has started to review the changes", because doing so resets the "I have viewed this file" in Github's review UI. |
||||||||||||
|
||||||||||||
|
@@ -70,6 +75,11 @@ async def _check_sigs_and_hash( | |||||||||||
Args: | ||||||||||||
room_version: The room version of the PDU | ||||||||||||
pdu: the event to be checked | ||||||||||||
record_failure_callback: A callback to run whenever the given event | ||||||||||||
fails signature or hash checks. This includes exceptions | ||||||||||||
that would be normally be thrown/raised but also things like | ||||||||||||
checking for event tampering where we just return the redacted | ||||||||||||
event. | ||||||||||||
|
||||||||||||
Returns: | ||||||||||||
* the original event if the checks pass | ||||||||||||
|
@@ -80,7 +90,12 @@ async def _check_sigs_and_hash( | |||||||||||
InvalidEventSignatureError if the signature check failed. Nothing | ||||||||||||
will be logged in this case. | ||||||||||||
""" | ||||||||||||
await _check_sigs_on_pdu(self.keyring, room_version, pdu) | ||||||||||||
try: | ||||||||||||
await _check_sigs_on_pdu(self.keyring, room_version, pdu) | ||||||||||||
except Exception as exc: | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This except clause seems too broad; I'd worry it might swallow errors arising from bugs rather than solely invalid signatures. Can we narrow down the interesting types of Exceptions that we care about here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. e.g. this will catch There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like we can just worry about But all of the downstream places in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What we decide here, we should also apply to #13814 Is there a way we can make the errors we care about opt-out instead of opt-in? It seems like it's going to suck to keep some of these error lists up to date whenever the downstream functions change and raise different errors. Maybe it's better to be opt-in to handle just what you know but as an example at the following spot, we probably need to look at synapse/synapse/handlers/federation_event.py Lines 916 to 920 in 6f0c3e6
(created a draft PR to track this point, #13969, and will update with what we think about here) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can't imagine opt-out is easier — I don't think Tracking the set of exceptions that some code can throw is difficult in Python, unfortunately. I would suggest trying some cases of where you want to track a failed pull attempt, perhaps even adding tests that reproduce these cases. I can think of at least these classes of problems:
I realise this sounds a bit painful but we ought to figure out what we should do for these cases. Should a network problem actually be tracked against the event at all; won't the existing backoff behaviour be sufficient there? .... Perhaps, rather than accepting that the code underneath is always going to be messy, try and improve the downstream situation — can we define superclasses for some of these interesting groups of exceptions a e.g. Just some thoughts, realise it may not necessarily be easy.. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Fair enough on
Having these tests would be great. We'd want to apply them to
In general, if anything goes wrong with trying to pull an event from another server, we should backoff and do it in the background next time (#13623). The existing backoff underlying all of the federation requests (
Seems like we shouldn't be asking homeservers if they aren't on our whitelist in the first place. I guess we just let Thanks for the thoughtful response on this @reivilibre! Really helping me form the right thoughts around all of this! |
||||||||||||
if record_failure_callback: | ||||||||||||
await record_failure_callback(pdu, str(exc)) | ||||||||||||
raise exc | ||||||||||||
|
||||||||||||
if not check_event_content_hash(pdu): | ||||||||||||
# let's try to distinguish between failures because the event was | ||||||||||||
|
@@ -116,6 +131,10 @@ async def _check_sigs_and_hash( | |||||||||||
"event_id": pdu.event_id, | ||||||||||||
} | ||||||||||||
) | ||||||||||||
if record_failure_callback: | ||||||||||||
await record_failure_callback( | ||||||||||||
pdu, "Event content has been tampered with" | ||||||||||||
) | ||||||||||||
return redacted_event | ||||||||||||
|
||||||||||||
spam_check = await self.spam_checker.check_event_for_spam(pdu) | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the future, we will potentially record a failed attempt when the spam checker soft-fails an event. Part of #13700 |
||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -278,7 +278,7 @@ async def backfill( | |
pdus = [event_from_pdu_json(p, room_version) for p in transaction_data_pdus] | ||
|
||
# Check signatures and hash of pdus, removing any from the list that fail checks | ||
pdus[:] = await self._check_sigs_and_hash_and_fetch( | ||
pdus[:] = await self._check_sigs_and_hash_for_pulled_events_and_fetch( | ||
dest, pdus, room_version=room_version | ||
) | ||
|
||
|
@@ -328,7 +328,17 @@ async def get_pdu_from_destination_raw( | |
|
||
# Check signatures are correct. | ||
try: | ||
signed_pdu = await self._check_sigs_and_hash(room_version, pdu) | ||
|
||
async def _record_failure_callback( | ||
event: EventBase, cause: str | ||
) -> None: | ||
await self.store.record_event_failed_pull_attempt( | ||
event.room_id, event.event_id, cause | ||
) | ||
|
||
signed_pdu = await self._check_sigs_and_hash( | ||
room_version, pdu, _record_failure_callback | ||
) | ||
except InvalidEventSignatureError as e: | ||
errmsg = f"event id {pdu.event_id}: {e}" | ||
logger.warning("%s", errmsg) | ||
|
@@ -547,24 +557,28 @@ async def get_room_state( | |
len(auth_event_map), | ||
) | ||
|
||
valid_auth_events = await self._check_sigs_and_hash_and_fetch( | ||
valid_auth_events = await self._check_sigs_and_hash_for_pulled_events_and_fetch( | ||
destination, auth_event_map.values(), room_version | ||
) | ||
|
||
valid_state_events = await self._check_sigs_and_hash_and_fetch( | ||
destination, state_event_map.values(), room_version | ||
valid_state_events = ( | ||
await self._check_sigs_and_hash_for_pulled_events_and_fetch( | ||
destination, state_event_map.values(), room_version | ||
) | ||
) | ||
|
||
return valid_state_events, valid_auth_events | ||
|
||
@trace | ||
async def _check_sigs_and_hash_and_fetch( | ||
async def _check_sigs_and_hash_for_pulled_events_and_fetch( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Renamed this so we stick to using this only for pulled event scenarios |
||
self, | ||
origin: str, | ||
pdus: Collection[EventBase], | ||
room_version: RoomVersion, | ||
) -> List[EventBase]: | ||
"""Checks the signatures and hashes of a list of events. | ||
""" | ||
Checks the signatures and hashes of a list of pulled events we got from | ||
federation and records any signature failures as failed pull attempts. | ||
If a PDU fails its signature check then we check if we have it in | ||
the database, and if not then request it from the sender's server (if that | ||
|
@@ -597,11 +611,17 @@ async def _check_sigs_and_hash_and_fetch( | |
|
||
valid_pdus: List[EventBase] = [] | ||
|
||
async def _record_failure_callback(event: EventBase, cause: str) -> None: | ||
await self.store.record_event_failed_pull_attempt( | ||
event.room_id, event.event_id, cause | ||
) | ||
|
||
async def _execute(pdu: EventBase) -> None: | ||
valid_pdu = await self._check_sigs_and_hash_and_fetch_one( | ||
pdu=pdu, | ||
origin=origin, | ||
room_version=room_version, | ||
record_failure_callback=_record_failure_callback, | ||
) | ||
|
||
if valid_pdu: | ||
|
@@ -618,6 +638,9 @@ async def _check_sigs_and_hash_and_fetch_one( | |
pdu: EventBase, | ||
origin: str, | ||
room_version: RoomVersion, | ||
record_failure_callback: Optional[ | ||
Callable[[EventBase, str], Awaitable[None]] | ||
] = None, | ||
) -> Optional[EventBase]: | ||
"""Takes a PDU and checks its signatures and hashes. | ||
|
@@ -634,14 +657,21 @@ async def _check_sigs_and_hash_and_fetch_one( | |
origin | ||
pdu | ||
room_version | ||
record_failure_callback: A callback to run whenever the given event | ||
fails signature or hash checks. This includes exceptions | ||
that would be normally be thrown/raised but also things like | ||
checking for event tampering where we just return the redacted | ||
event. | ||
Returns: | ||
The PDU (possibly redacted) if it has valid signatures and hashes. | ||
None if no valid copy could be found. | ||
""" | ||
|
||
try: | ||
return await self._check_sigs_and_hash(room_version, pdu) | ||
return await self._check_sigs_and_hash( | ||
room_version, pdu, record_failure_callback | ||
) | ||
except InvalidEventSignatureError as e: | ||
logger.warning( | ||
"Signature on retrieved event %s was invalid (%s). " | ||
|
@@ -694,7 +724,7 @@ async def get_event_auth( | |
|
||
auth_chain = [event_from_pdu_json(p, room_version) for p in res["auth_chain"]] | ||
|
||
signed_auth = await self._check_sigs_and_hash_and_fetch( | ||
signed_auth = await self._check_sigs_and_hash_for_pulled_events_and_fetch( | ||
destination, auth_chain, room_version=room_version | ||
) | ||
|
||
|
@@ -1401,7 +1431,7 @@ async def get_missing_events( | |
event_from_pdu_json(e, room_version) for e in content.get("events", []) | ||
] | ||
|
||
signed_events = await self._check_sigs_and_hash_and_fetch( | ||
signed_events = await self._check_sigs_and_hash_for_pulled_events_and_fetch( | ||
destination, events, room_version=room_version | ||
) | ||
except HttpResponseException as e: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, see #13815 (comment) for why we're using a callback pattern here.