Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I get This event loop is already running when combining adlfs and --autoreload #6934

Closed
MarcSkovMadsen opened this issue Jun 24, 2024 · 5 comments · Fixed by #6940
Closed
Milestone

Comments

@MarcSkovMadsen
Copy link
Collaborator

MarcSkovMadsen commented Jun 24, 2024

If creating a filesystem using from adlfs import AzureBlobFileSystem. I use that filesystem to do pd.read_parquet(... filesystem=filesystem) and df.to_parquet(..., filesystem=filesystem).

When I serve my app with --autoreload I get

ERROR: This event loop is already running
Exception ignored in atexit callback: <bound method BaseServer._atexit of <panel.io.server.Server object at 0x7f70461b4cd0>>
Traceback (most recent call last):
  File "/home/jovyan/repos/mt-pm-reporting/.venv/lib/python3.11/site-packages/bokeh/server/server.py", line 290, in _atexit
    self.stop(wait=False)
  File "/home/jovyan/repos/mt-pm-reporting/.venv/lib/python3.11/site-packages/panel/io/server.py", line 354, in stop
    self._loop.asyncio_loop.run_until_complete(stop_autoreload())
  File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 629, in run_until_complete
    self._check_running()
  File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 588, in _check_running
    raise RuntimeError('This event loop is already running')

I believe the line self._loop.asyncio_loop.run_until_complete(stop_autoreload()) above needs to be able to handle if the loop is already running. It is already running because with autoreload the script is run one time before the server actually starts. And this runs the adlfs async functionality.

Minimum, reproducible example

panel==1.4.4, adfls=2024.4.1

  • Create an Azure Blob Storage and container with the name test
  • Set the environment variable export AZURE_CONNECTIONSTRING_TEST="DefaultEndpointsProtocol=https;AccountName=.."
  • Run the file below python app.py to create the data
  • Serve the file below panel serve app.py --autoreload --index app
import os

import pandas as pd
import panel as pn
from adlfs import AzureBlobFileSystem

azure_blob_connection_string = os.environ["AZURE_CONNECTIONSTRING_TEST"]
filesystem = AzureBlobFileSystem(connection_string=azure_blob_connection_string)

path = "test/panel_adlfs.parquet"


def read_data():
    return pd.read_parquet(path=path, engine="pyarrow", filesystem=filesystem)


def write_data():
    if filesystem.exists(path):
        filesystem.rm(path)

    df = pd.DataFrame({"x": [1, 2, 3, 4]})
    df.to_parquet(path, engine="pyarrow", filesystem=filesystem)


if __name__ == "__main__":
    write_data()
    print(read_data())
elif pn.state.served:
    data = read_data()
    pn.panel(data).servable()

The interesting thing is that if you raise an exception raise ValueError() the application will serve and show you the error. When you remove the ValueError the application will reload and work fine. I.e. the problem is that something else gets a chance to start the ioloop before Panel with autoreload.

Work Around

You can work around the problem if you make sure not to use the adfls filesystem the first time the script is executed.

import os

import pandas as pd
import panel as pn
from adlfs import AzureBlobFileSystem
   
azure_blob_connection_string = os.environ["AZURE_CONNECTIONSTRING_TEST"]
filesystem = AzureBlobFileSystem(connection_string=azure_blob_connection_string)

path = "test/panel_adlfs.parquet"


def read_data():
    return pd.read_parquet(path=path, engine="pyarrow", filesystem=filesystem)


def write_data():
    if filesystem.exists(path):
        filesystem.rm(path)

    df = pd.DataFrame({"x": [1, 2, 3, 4]})
    df.to_parquet(path, engine="pyarrow", filesystem=filesystem)

def can_load():
    if "can_load" in pn.state.cache:
        return True
    
    pn.state.cache["can_load"]=True
    return not pn.config.autoreload
    

if __name__ == "__main__":
    write_data()
    print(read_data())
elif pn.state.served:
    if not can_load():
        pn.panel("Please reload ...").servable()
    else:
        data = read_data()
        pn.panel(data).servable()

Additional Context

@MarcSkovMadsen MarcSkovMadsen added this to the next milestone Jun 24, 2024
@MarcSkovMadsen
Copy link
Collaborator Author

MarcSkovMadsen commented Jun 24, 2024

I've tried adding

import nest_asyncio
nest_asyncio.apply()

to the top of the script I'm serving. Then I get

2024-06-24 12:25:17,809 Exception in callback BaseAsyncIOLoop._handle_events(4, 1)
handle: <Handle BaseAsyncIOLoop._handle_events(4, 1)>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
RuntimeError: cannot enter context: <_contextvars.Context object at 0x7f2c558fb240> is already entered
2024-06-24 12:25:18,165 WebSocket connection opened
2024-06-24 12:25:18,166 ServerConnection created

@philippjfr philippjfr modified the milestones: next, v1.5.0 Jun 24, 2024
@hoxbro hoxbro changed the title I can 'This event loop is already running when combining adlfs and --autoreload` I can This event loop is already running when combining adlfs and --autoreload Jun 25, 2024
@MarcSkovMadsen
Copy link
Collaborator Author

MarcSkovMadsen commented Jun 26, 2024

I've added a minimum, reproducible example in the top post

@MarcSkovMadsen
Copy link
Collaborator Author

You can work around the problem by not using the adfls filesystem the first time the script is executed with autoreload. I've added the minimum, reproducible example above.

@MarcSkovMadsen MarcSkovMadsen changed the title I can This event loop is already running when combining adlfs and --autoreload I get This event loop is already running when combining adlfs and --autoreload Jun 26, 2024
@MarcSkovMadsen
Copy link
Collaborator Author

MarcSkovMadsen commented Jun 26, 2024

In general adlfs seems not to work great with Panel.

Below is the pyinstrument profiling where I do a pandas.read_parquet operation using the adlfs filesystem.

image

Executing the same function with the same arguments with python alone takes 1 second. Don't know why it takes 9 seconds to acquire a lock? And why its needed.

@philippjfr
Copy link
Member

Don't know why it takes 9 seconds to acquire a lock? And why its needed.

It's likely not acquiring the lock that takes so long but the actual computation. Looking at threaded applications with a profiler is notoriously difficult and the profiler itself may significantly distort the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants