Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read HTTPS catalog #215

Closed
hombit opened this issue Feb 15, 2024 · 4 comments
Closed

Cannot read HTTPS catalog #215

hombit opened this issue Feb 15, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@hombit
Copy link
Contributor

hombit commented Feb 15, 2024

Bug report

I was trying to read a catalog through HTTPS, with LSDB:

lsdb.read_hipscat("https://epyc.astro.washington.edu/~lincc-frameworks/half_degree_surveys/gaia")

But something was probably wrong with URL concatenation and and I've got this error:

Expand
File ~/.virtualenvs/lsdb-release/lib/python3.11/site-packages/lsdb/loaders/hipscat/read_hipscat.py:59, in read_hipscat(path, catalog_type, storage_options, columns, margin_cache, **kwargs)
     56 config_args = {field.name: kwd_args[field.name] for field in dataclasses.fields(HipscatLoadingConfig)}
     57 config = HipscatLoadingConfig(**config_args)
---> 59 catalog_type_to_use = _get_dataset_class_from_catalog_info(path, storage_options=storage_options)
     61 if catalog_type is not None:
     62     catalog_type_to_use = catalog_type

File ~/.virtualenvs/lsdb-release/lib/python3.11/site-packages/lsdb/loaders/hipscat/read_hipscat.py:73, in _get_dataset_class_from_catalog_info(base_catalog_path, storage_options)
     71 base_catalog_dir = hc.io.get_file_pointer_from_path(base_catalog_path)
     72 catalog_info_path = hc.io.paths.get_catalog_info_pointer(base_catalog_dir)
---> 73 catalog_info = BaseCatalogInfo.read_from_metadata_file(catalog_info_path, storage_options=storage_options)
     74 catalog_type = catalog_info.catalog_type
     75 if catalog_type not in dataset_class_for_catalog_type:

File ~/.virtualenvs/lsdb-release/lib/python3.11/site-packages/hipscat/catalog/dataset/base_catalog_info.py:58, in BaseCatalogInfo.read_from_metadata_file(cls, catalog_info_file, storage_options)
     45 @classmethod
     46 def read_from_metadata_file(
     47     cls, catalog_info_file: FilePointer, storage_options: Union[Dict[Any, Any], None] = None
     48 ) -> Self:
     49     """Read catalog info from the `catalog_info.json` metadata file
     50
     51     Args:
   (...)
     56         A CatalogInfo object with the data from the `catalog_info.json` file
     57     """
---> 58     metadata_keywords = file_io.load_json_file(catalog_info_file, storage_options=storage_options)
     59     catalog_info_keywords = {}
     60     for field in dataclasses.fields(cls):

File ~/.virtualenvs/lsdb-release/lib/python3.11/site-packages/hipscat/io/file_io/file_io.py:114, in load_json_file(file_pointer, encoding, storage_options)
    112 json_dict = None
    113 file_system, file_pointer = get_fs(file_pointer, storage_options)
--> 114 with file_system.open(file_pointer, "r", encoding=encoding) as json_file:
    115     json_dict = json.load(json_file)
    117 return json_dict

File ~/.virtualenvs/lsdb-release/lib/python3.11/site-packages/fsspec/spec.py:1297, in AbstractFileSystem.open(self, path, mode, block_size, cache_options, compression, **kwargs)
   1289     mode = mode.replace("t", "") + "b"
   1291     text_kwargs = {
   1292         k: kwargs.pop(k)
   1293         for k in ["encoding", "errors", "newline"]
   1294         if k in kwargs
   1295     }
   1296     return io.TextIOWrapper(
-> 1297         self.open(
   1298             path,
   1299             mode,
   1300             block_size=block_size,
   1301             cache_options=cache_options,
   1302             compression=compression,
   1303             **kwargs,
   1304         ),
   1305         **text_kwargs,
   1306     )
   1307 else:
   1308     ac = kwargs.pop("autocommit", not self._intrans)

File ~/.virtualenvs/lsdb-release/lib/python3.11/site-packages/fsspec/spec.py:1309, in AbstractFileSystem.open(self, path, mode, block_size, cache_options, compression, **kwargs)
   1307 else:
   1308     ac = kwargs.pop("autocommit", not self._intrans)
-> 1309     f = self._open(
   1310         path,
   1311         mode=mode,
   1312         block_size=block_size,
   1313         autocommit=ac,
   1314         cache_options=cache_options,
   1315         **kwargs,
   1316     )
   1317     if compression is not None:
   1318         from fsspec.compression import compr

File ~/.virtualenvs/lsdb-release/lib/python3.11/site-packages/fsspec/implementations/http.py:353, in HTTPFileSystem._open(self, path, mode, block_size, autocommit, cache_type, cache_options, size, **kwargs)
    351 kw["asynchronous"] = self.asynchronous
    352 kw.update(kwargs)
--> 353 size = size or self.info(path, **kwargs)["size"]
    354 session = sync(self.loop, self.set_session)
    355 if block_size and size:

File ~/.virtualenvs/lsdb-release/lib/python3.11/site-packages/fsspec/asyn.py:118, in sync_wrapper.<locals>.wrapper(*args, **kwargs)
    115 @functools.wraps(func)
    116 def wrapper(*args, **kwargs):
    117     self = obj or args[0]
--> 118     return sync(self.loop, func, *args, **kwargs)

File ~/.virtualenvs/lsdb-release/lib/python3.11/site-packages/fsspec/asyn.py:103, in sync(loop, func, timeout, *args, **kwargs)
    101     raise FSTimeoutError from return_result
    102 elif isinstance(return_result, BaseException):
--> 103     raise return_result
    104 else:
    105     return return_result

File ~/.virtualenvs/lsdb-release/lib/python3.11/site-packages/fsspec/asyn.py:56, in _runner(event, coro, result, timeout)
     54     coro = asyncio.wait_for(coro, timeout=timeout)
     55 try:
---> 56     result[0] = await coro
     57 except Exception as ex:
     58     result[0] = ex

File ~/.virtualenvs/lsdb-release/lib/python3.11/site-packages/fsspec/implementations/http.py:427, in HTTPFileSystem._info(self, url, **kwargs)
    424     except Exception as exc:
    425         if policy == "get":
    426             # If get failed, then raise a FileNotFoundError
--> 427             raise FileNotFoundError(url) from exc
    428         logger.debug(str(exc))
    430 return {"name": url, "size": None, **info, "type": "file"}

FileNotFoundError: epyc.astro.washington.edu/~lincc-frameworks/half_degree_surveys/gaia/catalog_info.json
@hombit hombit added the bug Something isn't working label Feb 15, 2024
@dougbrn
Copy link
Contributor

dougbrn commented Feb 15, 2024

@hombit I put in a fix for this the other day: #212, make sure you're on the latest version of hipscat (probably dev version at the point)

@hombit
Copy link
Contributor Author

hombit commented Feb 15, 2024

@dougbrn, thank you so much. I updated to main and it works!. However still have the same issue for HTTP (no S) URLs. Does it supposed to work for this and another protocols like FTP or S3?

@dougbrn
Copy link
Contributor

dougbrn commented Feb 15, 2024

Ah, right. My fix was applied only in the case of "https", the hipscat code by default strips the protocol from the path and all I did was make an exception for the "https" protocol. So it's not surprising that "http" or anything else will have the same issue

@hombit
Copy link
Contributor Author

hombit commented Feb 15, 2024

@dougbrn, thank you, I'll open a separate issue for that

@hombit hombit closed this as completed Feb 15, 2024
@hombit hombit mentioned this issue Jul 8, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants