You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
use command python http-service/launch_host.py --config http-service/config.json to lanuch a http server with single gpu and the config like: { "nproc_per_node": 1, "model": "/mnt/models/FLUX.1-schnell", "pipefusion_parallel_degree": 1, "ulysses_degree": 1, "ring_degree": 1, "height": 512, "width": 512, "save_disk_path": "./results/", "use_cfg_parallel": false, "max_queue_size": 4 }
when your input didn't contain 'save_disk_path' like this: curl -X POST http://127.0.0.1:6000/generate \ -H "Content-Type: application/json" \ -d '{ "prompt": "A lovely rabbit", "num_inference_steps": 50, "seed": 42, "cfg": 7.5, }'
it would generate the error:
[Rank 0] 2024-12-02 11:00:22 - ERROR - Error processing request 1733108408.2835336: Invalid destination rank: destination rank should not be the same as the rank of the current process.
after i checkout the code, i found the reason:
if save_disk_path is not None:
......
else:
if is_dp_last_group():
# serialize output object
output_bytes = pickle.dumps(output)
# send output to rank 0
dist.send(torch.tensor(len(output_bytes), device=f"cuda:{local_rank}"), dst=0)
dist.send(torch.ByteTensor(list(output_bytes)).to(f"cuda:{local_rank}"), dst=0)
logger.info(f"Output sent to rank 0")
and the custom GroupCoordinator method send_object will assert when there is only one gpu:
class GroupCoordinator:
def send_object(self, obj: Any, dst: int) -> None:
"""Send the input object list to the destination rank."""
"""NOTE: `dst` is the local rank of the destination rank."""
assert dst < self.world_size, f"Invalid dst rank ({dst})"
assert dst != self.rank, (
"Invalid destination rank. Destination rank is the same "
"as the current rank."
)
so the way i temporary fix this by adding a check dist.get_world_size() > 1:
if save_disk_path is not None:
......
elif dist.get_world_size() > 1:
if is_dp_last_group():
# serialize output object
output_bytes = pickle.dumps(output)
# send output to rank 0
dist.send(torch.tensor(len(output_bytes), device=f"cuda:{local_rank}"), dst=0)
dist.send(torch.ByteTensor(list(output_bytes)).to(f"cuda:{local_rank}"), dst=0)
logger.info(f"Output sent to rank 0")
maybe there is a better way to fix this
The text was updated successfully, but these errors were encountered:
Thanks for your help. Sorry for not fully testing the not save_disk_path condition. Would you mind giving a Merge Request with your above updates? I believe it is very helpful.
Thanks for your help. Sorry for not fully testing the not save_disk_path condition. Would you mind giving a Merge Request with your above updates? I believe it is very helpful.
use command python http-service/launch_host.py --config http-service/config.json to lanuch a http server with single gpu and the config like:
{ "nproc_per_node": 1, "model": "/mnt/models/FLUX.1-schnell", "pipefusion_parallel_degree": 1, "ulysses_degree": 1, "ring_degree": 1, "height": 512, "width": 512, "save_disk_path": "./results/", "use_cfg_parallel": false, "max_queue_size": 4 }
when your input didn't contain 'save_disk_path' like this:
curl -X POST http://127.0.0.1:6000/generate \ -H "Content-Type: application/json" \ -d '{ "prompt": "A lovely rabbit", "num_inference_steps": 50, "seed": 42, "cfg": 7.5, }'
it would generate the error:
[Rank 0] 2024-12-02 11:00:22 - ERROR - Error processing request 1733108408.2835336: Invalid destination rank: destination rank should not be the same as the rank of the current process.
after i checkout the code, i found the reason:
and the custom GroupCoordinator method send_object will assert when there is only one gpu:
so the way i temporary fix this by adding a check dist.get_world_size() > 1:
maybe there is a better way to fix this
The text was updated successfully, but these errors were encountered: