Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a crypto handler #26

Merged
merged 9 commits into from
Oct 12, 2022
Merged

Add a crypto handler #26

merged 9 commits into from
Oct 12, 2022

Conversation

babolivier
Copy link
Contributor

@babolivier babolivier commented Sep 29, 2022

To handle requests with an Olm-encrypted body, see https://github.com/matrix-org/matrix-content-scanner-python/blob/main/docs/api.md#encrypted-post-body

Mypy checks are currently failing, I'm trying to work that out at https://gitlab.matrix.org/matrix-org/olm/-/merge_requests/62 fixed now

The PyPI version of python-olm is too out of date to have type annotations (and the type marker), and updating it is currently blocked behind https://github.com/vector-im/legal-compliance/issues/223 and friends (which is at least a month away from being resolved).

@babolivier babolivier requested a review from a team as a code owner September 29, 2022 16:33
@babolivier babolivier force-pushed the babolivier/crypto_handler branch from 8688e0c to ac7e6e6 Compare September 29, 2022 16:34
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems plausible, I guess.

I can't help but feel that using libolm is a bit of a retrograde step nowadays: shouldn't we be using the rust matrix-sdk-crypto these days? Not that I have any real idea how much work it would be to create python bindings for that, but it might be worth running past the crypto folks for an opinion.

tox.ini Outdated Show resolved Hide resolved
matrix_content_scanner/crypto.py Outdated Show resolved Hide resolved
matrix_content_scanner/crypto.py Outdated Show resolved Hide resolved
matrix_content_scanner/crypto.py Show resolved Hide resolved
@babolivier
Copy link
Contributor Author

I can't help but feel that using libolm is a bit of a retrograde step nowadays: shouldn't we be using the rust matrix-sdk-crypto these days? Not that I have any real idea how much work it would be to create python bindings for that, but it might be worth running past the crypto folks for an opinion.

The long-term solution will be to use vodozemac, which will have first-party Python bindings: https://github.com/matrix-org/vodozemac-bindings/tree/main/python. However, currently these are very beta, undocumented, and don't exist on a pypi-style repo.

Vodozemac also currently does not provide (as far as I can tell) a replacement for the olm.pk module (which refers to this part of the libolm code), so it can't easily (if at all) be used as a drop-in replacement for libolm in the content scanner currently.

Another thing that is nice with using olm here is that it can directly use pickle files from https://github.com/matrix-org/matrix-content-scanner, whereas using vodozemac would require a migration step (ref). I think it would be best to implement this migration as a second step after the rewrite is done.

@babolivier babolivier requested a review from richvdh October 3, 2022 15:38
matrix_content_scanner/config.py Outdated Show resolved Hide resolved
matrix_content_scanner/crypto.py Outdated Show resolved Hide resolved
@babolivier babolivier requested a review from richvdh October 4, 2022 12:49
Comment on lines +36 to +42
# The current version of python-olm that's on PyPI does not include a types marker.
# Hopefully that's something we can fix at some point, but in the mean time let's not
# block things on this and instead use the wheels on gitlab.matrix.org's repository (which
# do have a type marker). We use --index-url (and not --extra-index-url) so that pip does
# not try to download the python-olm that's on pypi.org. This is fine because GitLab will
# redirect requests for packages it doesn't know about to pypi.org.
install_command = python -m pip install --index-url=https://gitlab.matrix.org/api/v4/projects/27/packages/pypi/simple {opts} {packages}
Copy link
Contributor Author

@babolivier babolivier Oct 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we don't run mypy with the same version of Olm as the one we actually use for running the content scanner and its tests (pypi -> 3.1.3; gitlab -> 3.2.13), which seems to be fine but isn't ideal.

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some random nits and comments but nothing significant

lgtm!

README.md Show resolved Hide resolved
README.md Outdated
Comment on lines 31 to 33
```commandline
python -m matrix_content_scanner.mcs -c CONFIG_FILE
python -m matrix_content_scanner.mcs -c CONFIG_FILE --generate-secrets
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhmm, I'm not entirely sure about this approach. My experience with synapse (which does similar things with key files) is that requirements like this make it a bit of a pain to run in kubernetes-like environments, and led to matrix-org/synapse#13615.

I realise this comment is in conflict with my earlier comment. I'm just not sure what is the best approach.

I suggest you stick with what you have, to avoid further vacillation. Just warning you that you might need to revisit this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm also on the fence. I initially had it automatically generate secrets because that's what the javascript content scanner does, but there are good arguments in both directions tbh.

I suggest you stick with what you have, to avoid further vacillation. Just warning you that you might need to revisit this.

Actually, the issue you linked above seems to indicate this approach would cause issues with k8s-style deployments. I know there's an intent to deploy the content scanner on EMS (and actually most of the infrastructure is already in place for this afaik) so I'd rather revert to generating it automatically.

config.sample.yaml Outdated Show resolved Hide resolved
matrix_content_scanner/mcs.py Outdated Show resolved Hide resolved
tox.ini Outdated Show resolved Hide resolved
@babolivier babolivier requested a review from richvdh October 11, 2022 14:48
@babolivier
Copy link
Contributor Author

@richvdh Asking for another review despite the ✔️ since I've made some changes that seem significant enough to warrant another review (with more context in #26 (comment)).

config.sample.yaml Outdated Show resolved Hide resolved
matrix_content_scanner/crypto.py Outdated Show resolved Hide resolved
matrix_content_scanner/crypto.py Outdated Show resolved Hide resolved
@babolivier babolivier requested a review from richvdh October 12, 2022 14:44
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm otherwise

config.sample.yaml Outdated Show resolved Hide resolved
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
@babolivier babolivier linked an issue Oct 12, 2022 that may be closed by this pull request
@babolivier babolivier merged commit fb12651 into main Oct 12, 2022
@babolivier babolivier deleted the babolivier/crypto_handler branch October 12, 2022 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate crypto handler
2 participants