host.py get some error #373

fy1214 · 2024-12-02T03:27:48Z

use command python http-service/launch_host.py --config http-service/config.json to lanuch a http server with single gpu and the config like:
{ "nproc_per_node": 1, "model": "/mnt/models/FLUX.1-schnell", "pipefusion_parallel_degree": 1, "ulysses_degree": 1, "ring_degree": 1, "height": 512, "width": 512, "save_disk_path": "./results/", "use_cfg_parallel": false, "max_queue_size": 4 }
when your input didn't contain 'save_disk_path' like this:
curl -X POST http://127.0.0.1:6000/generate \ -H "Content-Type: application/json" \ -d '{ "prompt": "A lovely rabbit", "num_inference_steps": 50, "seed": 42, "cfg": 7.5, }'
it would generate the error:
[Rank 0] 2024-12-02 11:00:22 - ERROR - Error processing request 1733108408.2835336: Invalid destination rank: destination rank should not be the same as the rank of the current process.

after i checkout the code, i found the reason:

if save_disk_path is not None:
    ......
else:
    if is_dp_last_group():
        # serialize output object
        output_bytes = pickle.dumps(output)
        # send output to rank 0
        dist.send(torch.tensor(len(output_bytes), device=f"cuda:{local_rank}"), dst=0)
        dist.send(torch.ByteTensor(list(output_bytes)).to(f"cuda:{local_rank}"), dst=0)
        logger.info(f"Output sent to rank 0")

and the custom GroupCoordinator method send_object will assert when there is only one gpu:

class GroupCoordinator:
    def send_object(self, obj: Any, dst: int) -> None:
        """Send the input object list to the destination rank."""
        """NOTE: `dst` is the local rank of the destination rank."""

        assert dst < self.world_size, f"Invalid dst rank ({dst})"

        assert dst != self.rank, (
            "Invalid destination rank. Destination rank is the same "
            "as the current rank."
    )

so the way i temporary fix this by adding a check dist.get_world_size() > 1:

if save_disk_path is not None:
    ......
elif dist.get_world_size() > 1:
    if is_dp_last_group():
        # serialize output object
        output_bytes = pickle.dumps(output)
        # send output to rank 0
        dist.send(torch.tensor(len(output_bytes), device=f"cuda:{local_rank}"), dst=0)
        dist.send(torch.ByteTensor(list(output_bytes)).to(f"cuda:{local_rank}"), dst=0)
        logger.info(f"Output sent to rank 0")

maybe there is a better way to fix this

The text was updated successfully, but these errors were encountered:

feifeibear · 2024-12-02T05:53:56Z

Thanks for your help. Sorry for not fully testing the not save_disk_path condition. Would you mind giving a Merge Request with your above updates? I believe it is very helpful.

fy1214 · 2024-12-02T07:19:23Z

Thanks for your help. Sorry for not fully testing the not save_disk_path condition. Would you mind giving a Merge Request with your above updates? I believe it is very helpful.

sure

feifeibear added the bug Something isn't working label Dec 2, 2024

fy1214 mentioned this issue Dec 3, 2024

fix host.py in single gpu case #375

Merged

feifeibear closed this as completed Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

host.py get some error #373

host.py get some error #373

fy1214 commented Dec 2, 2024 •

edited

Loading

feifeibear commented Dec 2, 2024

fy1214 commented Dec 2, 2024

host.py get some error #373

host.py get some error #373

Comments

fy1214 commented Dec 2, 2024 • edited Loading

feifeibear commented Dec 2, 2024

fy1214 commented Dec 2, 2024

fy1214 commented Dec 2, 2024 •

edited

Loading