Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raise_hdf5_format_error doesn't work for streams #1938

Closed
jeromekelleher opened this issue Nov 24, 2021 · 2 comments · Fixed by #1942
Closed

raise_hdf5_format_error doesn't work for streams #1938

jeromekelleher opened this issue Nov 24, 2021 · 2 comments · Fixed by #1942
Labels
bug Something isn't working
Milestone

Comments

@jeromekelleher
Copy link
Member

The recently introduced raise_hdf5_format_error can be called on either a path or a file object, but assumes the input is a path.

Since we can't sniff the first few bytes of a file object (it might be an unseekable stream, like a socket) it's pretty questionable whether this functionality is worth it

@jeromekelleher jeromekelleher added the bug Something isn't working label Nov 24, 2021
@jeromekelleher jeromekelleher added this to the Python 0.4.0 milestone Nov 24, 2021
@jeromekelleher jeromekelleher removed the bug Something isn't working label Nov 24, 2021
@jeromekelleher
Copy link
Member Author

No, hold on, I'm wrong, it works all right, it's just that it produces a confusing backtrace:

cls = <class 'tskit.trees.TreeSequence'>
29
file_or_path = <_io.BufferedReader name='/tmp/pytest-of-runner/pytest-0/popen-gw1/test_ts_twofile_stream_fails0/tmp.trees'>
30
skip_tables = False
31

32
    @classmethod
33
    def load(cls, file_or_path, skip_tables=False):
34
        file, local_file = util.convert_file_like_to_open_file(file_or_path, "rb")
35
        try:
36
            ts = _tskit.TreeSequence()
37
>           ts.load(file, skip_tables=bool(skip_tables))
38
E           _tskit.FileFormatError: File not in KAS format
39

40
tskit/trees.py:3609: FileFormatError
41

42
During handling of the above exception, another exception occurred:
43

44
self = <tests.test_file_format.TestSkipTables object at 0x7efdf4d20750>
45
tmp_path = PosixPath('/tmp/pytest-of-runner/pytest-0/popen-gw1/test_ts_twofile_stream_fails0')
46
ts_fixture = <tskit.trees.TreeSequence object at 0x7efdfa0554d0>
47

48
    def test_ts_twofile_stream_fails(self, tmp_path, ts_fixture):
49
        # We can't skip_tables while reading from a stream
50
        save_path = tmp_path / "tmp.trees"
51
        with open(save_path, "wb") as f:
52
            ts_fixture.dump(f)
53
            ts_fixture.dump(f)
54
        with open(save_path, "rb") as f:
55
            tskit.load(f, skip_tables=True)
56
            with pytest.raises(exceptions.FileFormatError):
57
>               tskit.load(f)
58

59
tests/test_file_format.py:1131: 
60
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
61
tskit/trees.py:2953: in load
62
    return TreeSequence.load(file, skip_tables=skip_tables)
63
tskit/trees.py:3613: in load
64
    formats.raise_hdf5_format_error(file_or_path, e)
65
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
66

67
filename = <_io.BufferedReader name='/tmp/pytest-of-runner/pytest-0/popen-gw1/test_ts_twofile_stream_fails0/tmp.trees'>
68
original_exception = FileFormatError('File not in KAS format')
69

70
    def raise_hdf5_format_error(filename, original_exception):
71
        """
72
        Tries to open the specified file as a legacy HDF5 file. If it looks like
73
        an msprime format HDF5 file, raise an error advising to run tskit upgrade.
74
        """
75
        hdf_magic = b"\211HDF\r\n\032\n"
76
        try:
77
>           with open(filename, "rb") as f:
78
E           TypeError: expected str, bytes or os.PathLike object, not _io.BufferedReader
79

80
tskit/formats.py:273: TypeError

It looked to me that this as a TypeError, but actually this is just the chaining history I think.

So it's not a bug, but I think having a confusing backtrace here is much worse than giving a slightly less friendly error to users with very old files.

@jeromekelleher
Copy link
Member Author

I think the exception chaining is incorrect here at the least, and that might be causing issues in #1882 so I'm reupping this to a bug for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant