-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20529][Core]Allow worker and master work with a proxy server #17821
Conversation
Test build #76351 has finished for PR 17821 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks solid, just some minor comments. Thanks!
case class RegisteredWorker( | ||
master: RpcEndpointRef, | ||
masterWebUiUrl: String, | ||
masterAddress: RpcAddress) extends DeployMessage with RegisterWorkerResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid adding an extra field here? Perhaps just put the masterAddress
in the master
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked the current codes. Unfortunately, we cannot remove this extra field. master.address
and masterAddress
are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, that sounds good.
@@ -266,7 +282,7 @@ private[deploy] class Worker( | |||
if (registerMasterFutures != null) { | |||
registerMasterFutures.foreach(_.cancel(true)) | |||
} | |||
val masterAddress = masterRef.address | |||
val masterAddress = masterAddressToConnect.get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we conf protect this change (with a default that still uses masterRef
). If we can merge master
and masterAddress
as I suggested above, we can just add a conf on the master and the worker code can be largely unaffected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new conf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a small question. Thanks!
case class RegisteredWorker( | ||
master: RpcEndpointRef, | ||
masterWebUiUrl: String, | ||
masterAddress: RpcAddress) extends DeployMessage with RegisterWorkerResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, that sounds good.
@@ -266,7 +289,8 @@ private[deploy] class Worker( | |||
if (registerMasterFutures != null) { | |||
registerMasterFutures.foreach(_.cancel(true)) | |||
} | |||
val masterAddress = masterRef.address | |||
val masterAddress = | |||
if (preferConfiguredMasterAddress) masterAddressToConnect.get else masterRef.address |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it isn't an issue but do you think we should fall back to masterRef.address
in case masterAddressToConnect
isn't set (instead of throwing a generic scala exception)? Something along the lines of:
val masterAddress = masterAddressToConnect match {
case Some(master) if preferConfiguredMasterAddress => master
case _ => masterRef.address
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now masterRef
and masterAddressToConnect
are set at the same time. It's impossible unless we break something in future. It's better to fail rather than hiding the broken change.
Test build #76392 has finished for PR 17821 at commit
|
Test build #3684 has finished for PR 17821 at commit
|
Thanks! Merging to master and 2.2. |
## What changes were proposed in this pull request? In the current codes, when worker connects to master, master will send its address to the worker. Then worker will save this address and use it to reconnect in case of failure. However, sometimes, this address is not correct. If there is a proxy between master and worker, the address master sent is not the address of proxy. In this PR, the master address used by the worker will be sent to the master, then master just replies this address back, worker will use this address to reconnect in case of failure. In other words, the worker will use the config master address set in the worker side if possible rather than the master address set in the master side. There is still one potential issue though. When a master is restarted or takes over leadership, the work will use the address sent from the master to connect. If there is still a proxy between master and worker, the address may be wrong. However, there is no way to figure it out just in the worker. ## How was this patch tested? The new added unit test. Author: Shixiong Zhu <shixiong@databricks.com> Closes #17821 from zsxwing/SPARK-20529. (cherry picked from commit 9150bca) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
## What changes were proposed in this pull request? In the current codes, when worker connects to master, master will send its address to the worker. Then worker will save this address and use it to reconnect in case of failure. However, sometimes, this address is not correct. If there is a proxy between master and worker, the address master sent is not the address of proxy. In this PR, the master address used by the worker will be sent to the master, then master just replies this address back, worker will use this address to reconnect in case of failure. In other words, the worker will use the config master address set in the worker side if possible rather than the master address set in the master side. There is still one potential issue though. When a master is restarted or takes over leadership, the work will use the address sent from the master to connect. If there is still a proxy between master and worker, the address may be wrong. However, there is no way to figure it out just in the worker. ## How was this patch tested? The new added unit test. Author: Shixiong Zhu <shixiong@databricks.com> Closes apache#17821 from zsxwing/SPARK-20529.
## What changes were proposed in this pull request? In the current codes, when worker connects to master, master will send its address to the worker. Then worker will save this address and use it to reconnect in case of failure. However, sometimes, this address is not correct. If there is a proxy between master and worker, the address master sent is not the address of proxy. In this PR, the master address used by the worker will be sent to the master, then master just replies this address back, worker will use this address to reconnect in case of failure. In other words, the worker will use the config master address set in the worker side if possible rather than the master address set in the master side. There is still one potential issue though. When a master is restarted or takes over leadership, the work will use the address sent from the master to connect. If there is still a proxy between master and worker, the address may be wrong. However, there is no way to figure it out just in the worker. ## How was this patch tested? The new added unit test. Author: Shixiong Zhu <shixiong@databricks.com> Closes apache#17821 from zsxwing/SPARK-20529.
What changes were proposed in this pull request?
In the current codes, when worker connects to master, master will send its address to the worker. Then worker will save this address and use it to reconnect in case of failure. However, sometimes, this address is not correct. If there is a proxy between master and worker, the address master sent is not the address of proxy.
In this PR, the master address used by the worker will be sent to the master, then master just replies this address back, worker will use this address to reconnect in case of failure. In other words, the worker will use the config master address set in the worker side if possible rather than the master address set in the master side.
There is still one potential issue though. When a master is restarted or takes over leadership, the work will use the address sent from the master to connect. If there is still a proxy between master and worker, the address may be wrong. However, there is no way to figure it out just in the worker.
How was this patch tested?
The new added unit test.