Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add http proxy #13368

Merged
merged 19 commits into from
Aug 26, 2024
Merged

Add http proxy #13368

merged 19 commits into from
Aug 26, 2024

Conversation

samansmink
Copy link
Contributor

Fixes #3836.

This PR's main goals is to expose http proxy and http_proxy credentials using basic auth in DuckDB. Along the way I added a few bits and bobs.

This PR includes:

  • a new http secret type which has
    • http proxy config
    • custom http headers
    • custom bearer token
  • added some tests for the http proxy using squid (similar to whats used to test this in the azure extension)
  • I removed the unused bearer token secret type in favor of the new http secret type
  • MAP type is now supported as a value type in a CREATE SECRET statement
  • Fix encoding issue for globbing a huggingface url with slashes in the ref name.

Examples:

Add http proxy through settings:

set http_proxy='duckdb.org:1337'
from 'https://mah-server/some-file.parquet'



Add http proxy through secret:

CREATE SECRET http3 (
    TYPE HTTP, 
    http_proxy '${HTTP_PROXY_PRIVATE}',
    http_proxy_username 'john',
    http_proxy_password 'doe'
);
from 'https://mah-server/some-file.parquet'

Add custom header map

CREATE SECRET http3 (
    TYPE HTTP, 
    EXTRA_HTTP_HEADERS MAP{
		'Authorization': 'Bearer ${my token}',
		'my_own_header': 'my_special_value'
	}
);
from 'https://mah-server/some-file.parquet'

Ping @dylanspag-lmco from #13361

@samansmink
Copy link
Contributor Author

Just realized there are 2 things missing from this:

  • Creating an http proxy from the env variables using an automatic provider
  • The http secret should live in main duckdb code, not httpfs. That's nicer beause then you can use the secret also to install the httpfs extension

@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 12, 2024 08:54
@samansmink samansmink marked this pull request as ready for review August 13, 2024 09:33
@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 15, 2024 10:09
@samansmink samansmink marked this pull request as ready for review August 15, 2024 10:10
@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 15, 2024 12:56
@Mytherin Mytherin marked this pull request as ready for review August 23, 2024 10:39
@Mytherin Mytherin merged commit 56619fa into duckdb:main Aug 26, 2024
50 of 68 checks passed
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Sep 7, 2024
Merge pull request duckdb/duckdb#13368 from samansmink/add-http-proxy
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Sep 8, 2024
* chore: Update vendored sources to duckdb/duckdb@56619fa

Merge pull request duckdb/duckdb#13368 from samansmink/add-http-proxy

* ext fix

---------

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Co-authored-by: Kirill Müller <kirill@cynkra.com>
@Tishj Tishj added the Needs Documentation Use for issues or PRs that require changes in the documentation label Sep 18, 2024
@mghen
Copy link

mghen commented Sep 26, 2024

@samansmink can you clarify what the format of the http_proxy string must be (either using the set or secrets method)? I have a proxy that looks like www-domain.com.au:8080 and when I attempt the SET method I get the following error:

duckdb.duckdb.ParserException: Parser Error: syntax error at or near "'www-domain.com.au:8080'"

If I attempt to use the secrets method, I don't get this error immediately, but after a subsequent INSTALL call and then a long wait, I get the following instead (presumably after attempts to use the secret):

duckdb.duckdb.IOException: IO Error: Failed to download extension "httpfs" at URL "http://extensions.duckdb.org/v1.1.1/windows_amd64/httpfs.duckdb_extension.gz"

@Mytherin
Copy link
Collaborator

The parser exception sounds like a syntax error in your query. The correct syntax is as follows:

set http_proxy='www-domain.com.au:8080';

@mghen
Copy link

mghen commented Sep 26, 2024

You are quite right @Mytherin haha. I'll attempt to atone for my sins with a nice full example for others!

Below is an example in python using the http proxy setting to read parquet files from an S3 bucket when behind a proxy:

import duckdb

# get your proxy credentials
proxy_addr = 'www-proxy.domain.com'
proxy_user = 'yourusername'
proxy_pass = 'yourpassword'

# setup duckdb
con = duckdb.connect(':memory:')
con.execute(f"""
    SET http_proxy='{proxy_addr}';
    SET http_proxy_username='{proxy_user}';
    SET http_proxy_password='{proxy_pass}';

    INSTALL httpfs;
    INSTALL aws;

    LOAD httpfs;
    LOAD aws;

    CREATE SECRET ( TYPE S3, PROVIDER CREDENTIAL_CHAIN, CHAIN 'env' );
""")  # may need to change CHAIN option to suit your own local handling of credentials (link below)

# run queries
sql = "SELECT * FROM read_parquet('s3://your-s3-bucket/*.parquet')"
df = con.execute(sql).df()

# see results
print(df)

More info on options for the CHAIN keyword in the httpfs s3 docs. If you get a 403 error, it may be due to duckdb not receiving (and thus passing) your credentials through, and you may like to try other options recommended on that docs page.

@deeco
Copy link

deeco commented Jan 9, 2025

as per @samansmink comment earlier, is this available globally using create or replace secret for other extensions or is it unique only to httpfs ?

duckdb works fine from a local mac, but when trying to use from duckdb cli from within local development docker in same MAC its giving ssl Error from duckdb cli, certs are added for the proxy under /etc/ssl/certs/ and all verify working from pip etc , only ENV variables are set also to cert locations but not adhering to trusted certs loaded, where does duckdb or how does it reference ssl certificates from duckdb cli ?

using a different extenstion get below, is it duckdb or extension ?

IO Error: Curl Request to 'https://myurl' failed with error: 'Problem with the SSL CA cert (path? access rights?)'

@nite
Copy link

nite commented Feb 13, 2025

great stuff - also see this is in the docs now https://duckdbsnippets.com/snippets/184/query-an-authenticated-api-endpoint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Documentation Use for issues or PRs that require changes in the documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

http_proxy and https_proxy env vars are ignored when installing extensions.
6 participants