Testing utilities for downstream code #28

jl-wynen · 2022-08-17T14:54:52Z

As it stands, code built on top of pyscicat can only be tested either against an instance of scicat or by mocking requests. The former is too slow for development. The latter means that you lose the abstraction the client provides because users need to know exactly which requests are made.

For example, let's say we have some code that creates a dataset and datablock. But we don't know (or care) exactly how, in what order, or if it does anything else to scicat. We only care about the fact that at the end, there is a dataset and a datablock in scicat. Currently, we have to test it like this:

client = ScicatClient(...)
dataset = ...
datablock = ....
with requests_mock.Mocker() as mock_request:
    set_up_mock(mock_requests)
    a_function_that_uploads_a_dataset(client, dataset, datablock)
   # assert that mock_request.request_history contains the correct requests

A much better solution would be

client = FakeClient()
dataset = ...
datablock = ....
a_function_that_uploads_a_dataset(client, dataset, datablock)
assert dataset in client.__fake_datasets__
assert datablok in client.__fake_datablocks__

This is better because:

it requires no knowledge of requests and the API
it does not require parsing of URLs and request parameters
it still works if the order of operations in a_function_that_uploads_a_dataset changes or it uses different endpoints for some reason

With the changes in #27, FakeClient can be implemented along these lines:

class FakeClient(ScicatClient):
    def __init__(self, datasets):
        super().__init__(base_url='https://does-not-exist', token='123')
        self._datasets = datasets

    @property
    def __fake_datasets__(self):
        return self._datasets

    def _call_endpoint(
            self,
            cmd: str,
            endpoint: str,
            data: BaseModel = None,
            operation: str = "",
            allow_404=False,
        ) -> Optional[dict]:
        if cmd == 'get':
            base, pid = endpoint.split('/')
            if base == 'Datasets':
                return self.__fake_datasets__[pid].dict(exclude_none=True)
            raise NotImplementedError(f'GET with endpoint {endpoint} not supported')
        else:
            raise NotImplementedError(f'Command not supported: {cmd}')

It is not strictly necessary to support all endpoints in the fake. It would already be useful if it only supports the most common ones.

Downsides:

increased maintainance burden on pyscicat. The fake needs to be kept up to date as well as the actual client
the fake needs to be tested which might be tricky

I think that we need something like this in our code. Otherwise, we cannot really test our usage of pyscicat. It is an option that we develop it entirely on our side. But it would be beneficial to the larger community if the fake was provided by pyscicat itself.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing utilities for downstream code #28

Testing utilities for downstream code #28

jl-wynen commented Aug 17, 2022

Testing utilities for downstream code #28

Testing utilities for downstream code #28

Comments

jl-wynen commented Aug 17, 2022