Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote Extensions Multiple Endpoints #4834

Merged
merged 4 commits into from
Jul 31, 2023

Conversation

awestover
Copy link
Contributor

Problem

Joonas:

If this download takes a long time, there will be timeouts in production. This means that the downloads will get interrupted. I am unsure if a download is first done to a tempfile and then renamed as the real target file name, but it's something to consider.

If timeouts are possible, then it's possible that you have a duplicate calls to download the same thing.

It might be possible to use the "downloaded extensions" as a map to avoid launching two downloads at the same time (probably requires help of Once). We don't have a good example of this in pageserver.

Anastasia:

postgres database allows multiple simultaneous connections. so let's say we

  1. create compute node
  2. CREATE EXTENSION postgis
  3. wait till compute node shuts down
  4. spin up compute node again (there are no postgis files available, because compute is stateless)
  5. connect to postrgres from client (psql or any app) from multiple connections and run query that uses any postgis function. this will trigger library downloading from compute_ctl
  6. here starts this 2simultaneous requests situation
before we dive deep into this rabbit hole
can you please write a python test to verify that this works
(probably separate from existing one, it's ok to just copy-paste most of the code)
create compute node
execute('CREATE EXTENSION address_standardizer_data_us;');
execute query to ensure that it works
SELECT house_num, name, suftype, city, country, state, unit  FROM standardize_address('us_lex',
			   'us_gaz', 'us_rules', 'One Devonshire Place, PH 301, Boston, MA 02109');
shut down the compute node
remove postgis files locally
spin up compute node again (there are no postgis files available, because compute is stateless)
connect to postrgres and execute the query from 3. again

Alek:

  • 1st request sets started_download = true
  • it sets download_completed = true if it succeeds
  • subsequent requests hang and repeatedly check download_completed until it gets set or until they timeout
  • 3 seconds afterthe started_download = true was set some thread checks if download_completed was set. if not, then it sets started_download back to false

@awestover awestover requested a review from lubennikovaav July 28, 2023 15:03
@github-actions
Copy link

github-actions bot commented Jul 28, 2023

1264 tests run: 1214 passed, 0 failed, 50 skipped (full report)


@awestover awestover marked this pull request as ready for review July 28, 2023 17:59
@awestover awestover requested a review from a team as a code owner July 28, 2023 17:59
@awestover awestover requested review from tychoish and removed request for a team and tychoish July 28, 2023 17:59
@awestover awestover merged commit 8ac5fc0 into alek_targz Jul 31, 2023
@awestover awestover deleted the alek/remote_ext_multiple_eps branch July 31, 2023 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant