Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk streaming request bodies only #157

Merged
merged 10 commits into from
Sep 28, 2023
Merged

Conversation

mgyucht
Copy link
Contributor

@mgyucht mgyucht commented Sep 26, 2023

Changes

#138 changed the CommonsHttpClient implementation to use InputStreamEntity for all request bodies sent using the Java SDK. As a result, requests are sent with a Transfer-Encoding: chunked header, and the request body is chunked accordingly. However, the Databricks REST API only tolerates chunked requests for certain APIs; for others, it ignores the request body if Content-Length is not specified. Auth falls under that category, which caused #154.

To fix this, Request will keep track of string entities and input stream entities separately. Previously, Request had separate fields for the body and debugBody, but they were both always set, so it wasn't possible to distinguish between these two cases. Now, HTTP clients can have different behavior based on whether the request body can be fully materialized as a string or is lazily read as with input streams.

As part of this, I have removed SimpleHttpServer. This was used before we were sure whether it was possible to use the HttpServer built into the JDK. Now that we are confident that we can (see usage in ExternalBrowserCredentialsProvider.java), I've replaced it with HttpServer and a custom HttpHandler in FixtureServer (the only place where it is used now).

One other small change: I've updated the logging configuration and dependency for the cli auth demo app. This ensures that users can see debug output (which I used when debugging this).

Closes #154.

Tests

I've refactored FixtureServer to support more advanced validation on the HTTP requests sent by clients for testing purposes. Now, users can assert that a request has a specific method, path, headers, body, and check for headers that should not be present.

One downside of this change is that users need to call the .with() method once per API call, rather than one time, at the start of each test.

Unit tests cover both cases (request body should be chunked when specified as a String; request body should not be chunked when specified as an InputStream).

@mgyucht mgyucht changed the title Only chunk request bodies when needed Chunk streaming request bodies only Sep 26, 2023
@mgyucht mgyucht requested a review from tanmay-db September 26, 2023 18:05
@@ -303,11 +300,15 @@ private String makeLogRecord(Request in, Response out) {
in.getHeaders()
.forEach((header, value) -> sb.append(String.format("\n * %s: %s", header, value)));
}
String requestBody = in.getDebugBody();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to be <InputStream>; I've changed it to (streamed body) in the debug log.


public class CommonsHttpClient implements HttpClient {
private static final Logger LOG = LoggerFactory.getLogger(ClustersExt.class);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not ClustersExt, but CommonsHttpClient

private final List<FixtureMapping> fixtures = new ArrayList<>();

public FixtureServer() throws IOException {
HttpHandler handler = new CallbackResponseHandler();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FixtureHandler?

request.setEntity(new InputStreamEntity(in.getBodyStream()));
} else {
LOG.warn(
"withEntity called with a request with no body, so no request entity will be set. URI: {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, withEntity is only being called in POST / PUT and PATCH right? Is this pathway only for the PUT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withEntity should be called for any HTTP methods that have a request body, i.e. POST/PUT/PATCH. GET/DELETE and other methods we may support in the future like HEAD/OPTIONS don't support request bodies, so we don't use this method.

@mgyucht mgyucht enabled auto-merge September 28, 2023 07:47
@mgyucht mgyucht added this pull request to the merge queue Sep 28, 2023
Merged via the queue into main with commit 6b92a5a Sep 28, 2023
@mgyucht mgyucht deleted the only-chunk-bodies-when-needed branch September 28, 2023 14:08
mgyucht added a commit that referenced this pull request Oct 3, 2023
* Add additional error metadata to API errors ([#153](#153)).
* Bugfix: Chunk streaming request bodies only ([#157](#157)).
* Relicense the Java SDK using the Apache 2.0 license ([#158](#158)).

Breaking API Changes:

 * Changed `list()` method for `accountClient.metastoreAssignments()` service to return `com.databricks.sdk.service.catalog.ListAccountMetastoreAssignmentsResponse` class.
 * Changed `artifactMatchers` field for `com.databricks.sdk.service.catalog.ArtifactAllowlistInfo` to `com.databricks.sdk.service.catalog.ArtifactMatcherList` class.
 * Removed `owner` field for `com.databricks.sdk.service.catalog.CreateConnection`. Use instead the `owner` field of `UpdateConnection`.
 * Changed `artifactMatchers` field for `com.databricks.sdk.service.catalog.SetArtifactAllowlist` to `com.databricks.sdk.service.catalog.ArtifactMatcherList` class.
 * Removed `options` field for `com.databricks.sdk.service.catalog.UpdateCatalog`.
 * Changed `cancelAllRuns()` method for `workspaceClient.jobs()` service with new required argument order.
 * Changed `jobId` field for `com.databricks.sdk.service.jobs.CancelAllRuns` to no longer be required.
 * Changed `jobParameters` field for `com.databricks.sdk.service.jobs.RunNow` to `com.databricks.sdk.service.jobs.ParamPairs` class.
 * Changed `query()` method for `workspaceClient.servingEndpoints()` service. New request type is `com.databricks.sdk.service.serving.QueryEndpointInput` class.
 * Removed `com.databricks.sdk.service.serving.QueryRequest` class.
 * Changed `list()` method for `workspaceClient.cleanRooms()` service to require request of `com.databricks.sdk.service.sharing.ListCleanRoomsRequest` class.
 * Changed `executeStatement()` method for `workspaceClient.statementExecution()` service with new required argument order.
 * Renamed `com.databricks.sdk.service.sql.ChunkInfo` class to `BaseChunkInfo`.
 * Changed `onWaitTimeout` field for `com.databricks.sdk.service.sql.ExecuteStatementRequest` to `com.databricks.sdk.service.sql.ExecuteStatementRequestOnWaitTimeout` class.
 * Changed `statement` field for `com.databricks.sdk.service.sql.ExecuteStatementRequest` to be required.
 * Changed `warehouseId` field for `com.databricks.sdk.service.sql.ExecuteStatementRequest` to be required.
 * Changed `chunks` field for `com.databricks.sdk.service.sql.ResultManifest` to `com.databricks.sdk.service.sql.BaseChunkInfoList` class.
 * Renamed `com.databricks.sdk.service.sql.TimeoutAction` class to `ExecuteStatementRequestOnWaitTimeout`.

API Changes:

 * Added `com.databricks.sdk.service.catalog.ListAccountMetastoreAssignmentsResponse` class.
 * Added `com.databricks.sdk.service.catalog.WorkspaceId` class.
 * Added `allQueuedRuns` field for `com.databricks.sdk.service.jobs.CancelAllRuns`.
 * Added `queue` field for `com.databricks.sdk.service.jobs.CreateJob`.
 * Added `queue` field for `com.databricks.sdk.service.jobs.JobSettings`.
 * Added `jobParameters` field for `com.databricks.sdk.service.jobs.RepairRun`.
 * Added `queue` field for `com.databricks.sdk.service.jobs.RunNow`.
 * Added `jobParameters` field for `com.databricks.sdk.service.jobs.RunParameters`.
 * Added `queueReason` field for `com.databricks.sdk.service.jobs.RunState`.
 * Added `queueDuration` field for `com.databricks.sdk.service.jobs.RunTask`.
 * Added `queue` field for `com.databricks.sdk.service.jobs.SubmitRun`.
 * Added `com.databricks.sdk.service.jobs.QueueSettings` class.
 * Added `accountClient.oAuthPublishedApps()` service.
 * Added `com.databricks.sdk.service.oauth2.GetPublishedAppsOutput` class.
 * Added `com.databricks.sdk.service.oauth2.ListOAuthPublishedAppsRequest` class.
 * Added `com.databricks.sdk.service.oauth2.PublishedAppOutput` class.
 * Added `notifications` field for `com.databricks.sdk.service.pipelines.CreatePipeline`.
 * Added `notifications` field for `com.databricks.sdk.service.pipelines.EditPipeline`.
 * Added `notifications` field for `com.databricks.sdk.service.pipelines.PipelineSpec`.
 * Added `com.databricks.sdk.service.pipelines.Notifications` class.
 * Added `patch()` method for `workspaceClient.servingEndpoints()` service.
 * Added `tags` field for `com.databricks.sdk.service.serving.CreateServingEndpoint`.
 * Added `tags` field for `com.databricks.sdk.service.serving.ServingEndpoint`.
 * Added `tags` field for `com.databricks.sdk.service.serving.ServingEndpointDetailed`.
 * Added `com.databricks.sdk.service.serving.DataframeSplitInput` class.
 * Added `com.databricks.sdk.service.serving.EndpointTag` class.
 * Added `com.databricks.sdk.service.serving.PatchServingEndpointTags` class.
 * Added `com.databricks.sdk.service.serving.QueryEndpointInput` class.
 * Added `workspaceClient.credentialsManager()` service.
 * Added `workspaceClient.settings()` service.
 * Added `com.databricks.sdk.service.settings.DefaultNamespaceSetting` class.
 * Added `com.databricks.sdk.service.settings.DeleteDefaultWorkspaceNamespaceRequest` class.
 * Added `com.databricks.sdk.service.settings.DeleteDefaultWorkspaceNamespaceResponse` class.
 * Added `com.databricks.sdk.service.settings.ExchangeToken` class.
 * Added `com.databricks.sdk.service.settings.ExchangeTokenRequest` class.
 * Added `com.databricks.sdk.service.settings.ExchangeTokenResponse` class.
 * Added `com.databricks.sdk.service.settings.PartitionId` class.
 * Added `com.databricks.sdk.service.settings.ReadDefaultWorkspaceNamespaceRequest` class.
 * Added `com.databricks.sdk.service.settings.StringMessage` class.
 * Added `com.databricks.sdk.service.settings.TokenType` class.
 * Added `com.databricks.sdk.service.settings.UpdateDefaultWorkspaceNamespaceRequest` class.
 * Added `nextPageToken` field for `com.databricks.sdk.service.sharing.ListCleanRoomsResponse`.
 * Added `com.databricks.sdk.service.sharing.ListCleanRoomsRequest` class.
 * Added `emptyResultState` field for `com.databricks.sdk.service.sql.AlertOptions`.
 * Added `truncated` field for `com.databricks.sdk.service.sql.ResultManifest`.
 * Added `com.databricks.sdk.service.sql.AlertOptionsEmptyResultState` class.

OpenAPI SHA: bcbf6e851e3d82fd910940910dd31c10c059746c, Date: 2023-10-02
@mgyucht mgyucht mentioned this pull request Oct 3, 2023
github-merge-queue bot pushed a commit that referenced this pull request Oct 3, 2023
* Add additional error metadata to API errors
([#153](#153)).
* Bugfix: Chunk streaming request bodies only
([#157](#157)).
* Relicense the Java SDK using the Apache 2.0 license
([#158](#158)).

Breaking API Changes:

* Changed `list()` method for `accountClient.metastoreAssignments()`
service to return
`com.databricks.sdk.service.catalog.ListAccountMetastoreAssignmentsResponse`
class.
* Changed `artifactMatchers` field for
`com.databricks.sdk.service.catalog.ArtifactAllowlistInfo` to
`com.databricks.sdk.service.catalog.ArtifactMatcherList` class.
* Removed `owner` field for
`com.databricks.sdk.service.catalog.CreateConnection`. Use instead the
`owner` field of `UpdateConnection`.
* Changed `artifactMatchers` field for
`com.databricks.sdk.service.catalog.SetArtifactAllowlist` to
`com.databricks.sdk.service.catalog.ArtifactMatcherList` class.
* Removed `options` field for
`com.databricks.sdk.service.catalog.UpdateCatalog`.
* Changed `cancelAllRuns()` method for `workspaceClient.jobs()` service
with new required argument order.
* Changed `jobId` field for
`com.databricks.sdk.service.jobs.CancelAllRuns` to no longer be
required.
* Changed `jobParameters` field for
`com.databricks.sdk.service.jobs.RunNow` to
`com.databricks.sdk.service.jobs.ParamPairs` class.
* Changed `query()` method for `workspaceClient.servingEndpoints()`
service. New request type is
`com.databricks.sdk.service.serving.QueryEndpointInput` class.
 * Removed `com.databricks.sdk.service.serving.QueryRequest` class.
* Changed `list()` method for `workspaceClient.cleanRooms()` service to
require request of
`com.databricks.sdk.service.sharing.ListCleanRoomsRequest` class.
* Changed `executeStatement()` method for
`workspaceClient.statementExecution()` service with new required
argument order.
* Renamed `com.databricks.sdk.service.sql.ChunkInfo` class to
`BaseChunkInfo`.
* Changed `onWaitTimeout` field for
`com.databricks.sdk.service.sql.ExecuteStatementRequest` to
`com.databricks.sdk.service.sql.ExecuteStatementRequestOnWaitTimeout`
class.
* Changed `statement` field for
`com.databricks.sdk.service.sql.ExecuteStatementRequest` to be required.
* Changed `warehouseId` field for
`com.databricks.sdk.service.sql.ExecuteStatementRequest` to be required.
* Changed `chunks` field for
`com.databricks.sdk.service.sql.ResultManifest` to
`com.databricks.sdk.service.sql.BaseChunkInfoList` class.
* Renamed `com.databricks.sdk.service.sql.TimeoutAction` class to
`ExecuteStatementRequestOnWaitTimeout`.

API Changes:

* Added
`com.databricks.sdk.service.catalog.ListAccountMetastoreAssignmentsResponse`
class.
 * Added `com.databricks.sdk.service.catalog.WorkspaceId` class.
* Added `allQueuedRuns` field for
`com.databricks.sdk.service.jobs.CancelAllRuns`.
 * Added `queue` field for `com.databricks.sdk.service.jobs.CreateJob`.
* Added `queue` field for `com.databricks.sdk.service.jobs.JobSettings`.
* Added `jobParameters` field for
`com.databricks.sdk.service.jobs.RepairRun`.
 * Added `queue` field for `com.databricks.sdk.service.jobs.RunNow`.
* Added `jobParameters` field for
`com.databricks.sdk.service.jobs.RunParameters`.
* Added `queueReason` field for
`com.databricks.sdk.service.jobs.RunState`.
* Added `queueDuration` field for
`com.databricks.sdk.service.jobs.RunTask`.
 * Added `queue` field for `com.databricks.sdk.service.jobs.SubmitRun`.
 * Added `com.databricks.sdk.service.jobs.QueueSettings` class.
 * Added `accountClient.oAuthPublishedApps()` service.
* Added `com.databricks.sdk.service.oauth2.GetPublishedAppsOutput`
class.
* Added
`com.databricks.sdk.service.oauth2.ListOAuthPublishedAppsRequest` class.
 * Added `com.databricks.sdk.service.oauth2.PublishedAppOutput` class.
* Added `notifications` field for
`com.databricks.sdk.service.pipelines.CreatePipeline`.
* Added `notifications` field for
`com.databricks.sdk.service.pipelines.EditPipeline`.
* Added `notifications` field for
`com.databricks.sdk.service.pipelines.PipelineSpec`.
 * Added `com.databricks.sdk.service.pipelines.Notifications` class.
* Added `patch()` method for `workspaceClient.servingEndpoints()`
service.
* Added `tags` field for
`com.databricks.sdk.service.serving.CreateServingEndpoint`.
* Added `tags` field for
`com.databricks.sdk.service.serving.ServingEndpoint`.
* Added `tags` field for
`com.databricks.sdk.service.serving.ServingEndpointDetailed`.
 * Added `com.databricks.sdk.service.serving.DataframeSplitInput` class.
 * Added `com.databricks.sdk.service.serving.EndpointTag` class.
* Added `com.databricks.sdk.service.serving.PatchServingEndpointTags`
class.
 * Added `com.databricks.sdk.service.serving.QueryEndpointInput` class.
 * Added `workspaceClient.credentialsManager()` service.
 * Added `workspaceClient.settings()` service.
* Added `com.databricks.sdk.service.settings.DefaultNamespaceSetting`
class.
* Added
`com.databricks.sdk.service.settings.DeleteDefaultWorkspaceNamespaceRequest`
class.
* Added
`com.databricks.sdk.service.settings.DeleteDefaultWorkspaceNamespaceResponse`
class.
 * Added `com.databricks.sdk.service.settings.ExchangeToken` class.
* Added `com.databricks.sdk.service.settings.ExchangeTokenRequest`
class.
* Added `com.databricks.sdk.service.settings.ExchangeTokenResponse`
class.
 * Added `com.databricks.sdk.service.settings.PartitionId` class.
* Added
`com.databricks.sdk.service.settings.ReadDefaultWorkspaceNamespaceRequest`
class.
 * Added `com.databricks.sdk.service.settings.StringMessage` class.
 * Added `com.databricks.sdk.service.settings.TokenType` class.
* Added
`com.databricks.sdk.service.settings.UpdateDefaultWorkspaceNamespaceRequest`
class.
* Added `nextPageToken` field for
`com.databricks.sdk.service.sharing.ListCleanRoomsResponse`.
* Added `com.databricks.sdk.service.sharing.ListCleanRoomsRequest`
class.
* Added `emptyResultState` field for
`com.databricks.sdk.service.sql.AlertOptions`.
* Added `truncated` field for
`com.databricks.sdk.service.sql.ResultManifest`.
* Added `com.databricks.sdk.service.sql.AlertOptionsEmptyResultState`
class.

OpenAPI SHA: bcbf6e851e3d82fd910940910dd31c10c059746c, Date: 2023-10-02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ISSUE] oauth-m2m fails to retrieve access token
2 participants