Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
The issue was there all the time, but became critical in 9950dd0 with version 0.3.3. `class _S3ConfigurationMap` is not thread safe because both `set/get_configuration()` ask for `self.arguments` and `self.resources` being `None`. However, since the configuration map is a singleton, when using the library in a multithreaded environment, it can happen that half the part of `_initial_setup` has run and `self.arguments` is indeed not `None`, but `self.resources` is still `None` because in another thread, creation of the boto3 resource is still ongoing. Then, our thread would *not* call `_initial_setup()`, but still die on `self.resources` being `None` when trying to access it, leading to the error described in #97. I could not make the bug appear without introducing the IO delay by creating the resource in `_initial_setup()`. Maybe it is a CPython implementation detail that makes `_initial_setup()` atomic or incredibly rare to not be atomic without the creation of the boto resource. Anyways, the setup is now synchronised with a lock, which is required, again, given that `_S3ConfigurationMap` is a singleton. Here is the code that reliably triggers the bug without the change: ```py from multiprocessing.pool import ThreadPool from s3path import S3Path def do(i): dst = S3Path.from_uri('s3://something') / f'hello_{i}.txt' dst.write_bytes(b'hello') ThreadPool(processes=20).map(do, range(100)) ```
- Loading branch information