Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] NLP Tutorial "'numpy.float64' object cannot be interpreted as an integer" #1006

Closed
Kydlaw opened this issue Jul 27, 2023 · 2 comments
Closed
Labels
bug Something isn't working needs information Needs more info from the issuer

Comments

@Kydlaw
Copy link

Kydlaw commented Jul 27, 2023

Describe the bug
When running the NLP tutorial, on opening the 'text_embedding' link in the final step, the page loads but crashes after a few seconds when the program attempts to cluster the embeddings.

The problem might come from the data type of the vectors provided to the fit_predict is not right.

Error stack

'numpy.float64' object cannot be interpreted as an integer

GraphQL request:18:7
18 |       UMAPPoints(timeRange: $timeRange, minDist: $minDist, nNeighbors: $nNeighbo
   |       ^
   | rs, nSamples: $nSamples, minClusterSize: $minClusterSize, clusterMinSamples: $cl
Traceback (most recent call last):
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\graphql\execution\execute.py", line 521, in execute_field
    result = resolve_fn(source, info, **args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\strawberry\schema\schema_converter.py", line 597, in _resolver
    return _get_result_with_extensions(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\strawberry\schema\schema_converter.py", line 583, in extension_resolver
    return reduce(
           ^^^^^^^
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\strawberry\schema\schema_converter.py", line 578, in wrapped_get_result
    return _get_result(
           ^^^^^^^^^^^^
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\strawberry\schema\schema_converter.py", line 539, in _get_result
    return field.get_result(
           ^^^^^^^^^^^^^^^^^
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\strawberry\field.py", line 177, in get_result
    return self.base_resolver(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\strawberry\types\fields\resolver.py", line 187, in __call__
    return self.wrapped_func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\phoenix\server\api\types\EmbeddingDimension.py", line 414, in UMAPPoints
    ).generate(data, n_components=n_components)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\phoenix\pointcloud\pointcloud.py", line 67, in generate
    clusters = self.clustersFinder.find_clusters(projections)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\phoenix\pointcloud\clustering.py", line 21, in find_clusters
    cluster_ids: npt.NDArray[np.int_] = HDBSCAN(**asdict(self)).fit_predict(mat)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\hdbscan\hdbscan_.py", line 1243, in fit_predict
    self.fit(X)
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\hdbscan\hdbscan_.py", line 1205, in fit
    ) = hdbscan(clean_data, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\hdbscan\hdbscan_.py", line 884, in hdbscan
    _tree_to_labels(
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\hdbscan\hdbscan_.py", line 80, in _tree_to_labels
    labels, probabilities, stabilities = get_clusters(
                                         ^^^^^^^^^^^^^
  File "hdbscan\\_hdbscan_tree.pyx", line 659, in hdbscan._hdbscan_tree.get_clusters
  File "hdbscan\\_hdbscan_tree.pyx", line 733, in hdbscan._hdbscan_tree.get_clusters
TypeError: 'numpy.float64' object cannot be interpreted as an integer
Stack (most recent call last):
  File "C:\Users\ju\.pyenv\pyenv-win\versions\3.11.1\Lib\threading.py", line 995, in _bootstrap
    self._bootstrap_inner()
  File "C:\Users\ju\.pyenv\pyenv-win\versions\3.11.1\Lib\threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "C:\Users\ju\.pyenv\pyenv-win\versions\3.11.1\Lib\threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\uvicorn\server.py", line 61, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "C:\Users\ju\.pyenv\pyenv-win\versions\3.11.1\Lib\asyncio\runners.py", line 190, in run
    return runner.run(main)
  File "C:\Users\ju\.pyenv\pyenv-win\versions\3.11.1\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "C:\Users\ju\.pyenv\pyenv-win\versions\3.11.1\Lib\asyncio\base_events.py", line 640, in run_until_complete
    self.run_forever()
  File "C:\Users\ju\.pyenv\pyenv-win\versions\3.11.1\Lib\asyncio\base_events.py", line 607, in run_forever
    self._run_once()
  File "C:\Users\ju\.pyenv\pyenv-win\versions\3.11.1\Lib\asyncio\base_events.py", line 1919, in _run_once
    handle._run()
  File "C:\Users\ju\.pyenv\pyenv-win\versions\3.11.1\Lib\asyncio\events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\starlette\middleware\base.py", line 166, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\starlette\middleware\exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\starlette\_exception_handler.py", line 44, in wrapped_app
    await app(scope, receive, sender)
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\starlette\routing.py", line 746, in __call__
    await route.handle(scope, receive, send)
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\starlette\routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\strawberry\asgi\__init__.py", line 111, in __call__
    return await self.handle_http(scope, receive, send)
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\strawberry\asgi\__init__.py", line 178, in handle_http
    response = await self.run(request)
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\strawberry\http\async_base_view.py", line 176, in run
    result = await self.execute_operation(
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\strawberry\http\async_base_view.py", line 115, in execute_operation
    return await self.schema.execute(
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\strawberry\schema\schema.py", line 248, in execute
    result = await execute(
  File "c:\Users\ju\Code\Pentalog\jiratool\.phoenix-venv\Lib\site-packages\strawberry\schema\execute.py", line 156, in execute
    process_errors(result.errors, execution_context)

To Reproduce
Steps to reproduce the behavior:

  1. Run the NLP tutorial
  2. Click on 'text_embedding'

Expected behavior
Should result in the screenshot seen on the tutorial.

Environment (please complete the following information):

  • OS: Windows 11
  • Notebook Runtime: Jupyter notebook & VS Code notebooks
  • Browser Chrome 114.0.5735.199
  • Version: 0.0.30

Additional context
Add any other context about the problem here (e.x. a link to a colab)

@Kydlaw Kydlaw added the bug Something isn't working label Jul 27, 2023
@github-project-automation github-project-automation bot moved this to 📘 Todo in phoenix Jul 27, 2023
@mikeldking
Copy link
Contributor

@Kydlaw Thanks so much for filing a bug report! From looking at your stack trace, I believe you are hitting this bug in HDBSCAN when cython 3 launched. See scikit-learn-contrib/hdbscan#600 (comment)

We worked closely with the authors to get the issue resolved within HDBSCAN and have actually pinned HDBSCAN more aggressively:

"hdbscan>=0.8.33, <1.0.0",

We believe it's fixed in 0.8.33 (see scikit-learn-contrib/hdbscan@7611cfe). Let us know if it's still broken after upgrading to this version. We can work with the authors to try to bottom it out!

@mikeldking mikeldking added the needs information Needs more info from the issuer label Jul 27, 2023
@Kydlaw
Copy link
Author

Kydlaw commented Jul 28, 2023

Thank you for your quick reply on my investigations I indeed saw you spawn on the HDBSCAN issue tracker as well.

Neither cython==0.29.36 nor cython==3.0.0 works with HDBSCAN==0.8.33 for me.

As it is definitely an HDBSCAN issue (I tried independently of phoenix) and not a phoenix one, so feel free to close this issue

@github-project-automation github-project-automation bot moved this from 📘 Todo to ✅ Done in phoenix Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs information Needs more info from the issuer
Projects
Archived in project
Development

No branches or pull requests

2 participants