You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried setting this up in a workspace with UC, running on a cluster in Shared Access Mode (DBR 15.4) so I could pickup coordinates data from UC tables. The osrm-backend.sh script is loaded from a Volume and it's working fine, though we had to run osrm-routed on port 80 (-p 80) else we couldn't get a connection to the server from e.g. a notebook.
Calling the backend using the private IP of the worker directly works fine (the step 1: Verify Server Running on Each Worker), but as soon as we try to distribute the computing using a DataFrame and a udf (pyspark.sql.functions.udf or unity catalog udf defined in SQL), which is doing the HTTP call on localhost (127.0.0.1), we get a connection refused error. Using the RDD method of the original notebook didn't work because of some deprecation error (can try to reproduce if needed).
Does maybe @dbbnicole have any idea why the worker node isn't able to do an http call on itself? Is there some other restriction which I'm not aware of? Note that I tested with DBR 13.3 LTS and everything seemed to work fine, but I need to use a more recent DBR to ensure full UC compat.
edit: The osrm-backend.sh init script seems to be working (completing in ~1 min 30 s), and the script uses curl to call localhost, so it seems the python code running in the udf isn't able to get through to localhost.
edit, this is the udf code
create or replacefunctionmycatalog.myschema.get_osrm_route(coordinates string, parameters string)
returns string
language python
as $$
import requests
fromrequests.adapters import HTTPAdapter
fromurllib3.util.retry import Retry
session =requests.Session()
retry = Retry(connect=1, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
parameters = parameters if parameters is not None else "alternatives=true&steps=false&geometries=geojson&overview=simplified&annotations=false"
url = f"http://localhost:80/route/v1/driving/{coordinates}?{parameters}"
with session as s:
r =s.get(url, verify=False)
return r.text
$$
The text was updated successfully, but these errors were encountered:
Hello,
I tried setting this up in a workspace with UC, running on a cluster in Shared Access Mode (DBR 15.4) so I could pickup coordinates data from UC tables. The osrm-backend.sh script is loaded from a Volume and it's working fine, though we had to run osrm-routed on port 80 (-p 80) else we couldn't get a connection to the server from e.g. a notebook.
Calling the backend using the private IP of the worker directly works fine (the step 1: Verify Server Running on Each Worker), but as soon as we try to distribute the computing using a DataFrame and a udf (pyspark.sql.functions.udf or unity catalog udf defined in SQL), which is doing the HTTP call on localhost (127.0.0.1), we get a connection refused error. Using the RDD method of the original notebook didn't work because of some deprecation error (can try to reproduce if needed).
Does maybe @dbbnicole have any idea why the worker node isn't able to do an http call on itself? Is there some other restriction which I'm not aware of? Note that I tested with DBR 13.3 LTS and everything seemed to work fine, but I need to use a more recent DBR to ensure full UC compat.
edit: The osrm-backend.sh init script seems to be working (completing in ~1 min 30 s), and the script uses curl to call localhost, so it seems the python code running in the udf isn't able to get through to localhost.
edit, this is the udf code
The text was updated successfully, but these errors were encountered: