Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

host.py get some error #373

Closed
fy1214 opened this issue Dec 2, 2024 · 2 comments
Closed

host.py get some error #373

fy1214 opened this issue Dec 2, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@fy1214
Copy link
Contributor

fy1214 commented Dec 2, 2024

use command python http-service/launch_host.py --config http-service/config.json to lanuch a http server with single gpu and the config like:
{ "nproc_per_node": 1, "model": "/mnt/models/FLUX.1-schnell", "pipefusion_parallel_degree": 1, "ulysses_degree": 1, "ring_degree": 1, "height": 512, "width": 512, "save_disk_path": "./results/", "use_cfg_parallel": false, "max_queue_size": 4 }
when your input didn't contain 'save_disk_path' like this:
curl -X POST http://127.0.0.1:6000/generate \ -H "Content-Type: application/json" \ -d '{ "prompt": "A lovely rabbit", "num_inference_steps": 50, "seed": 42, "cfg": 7.5, }'
it would generate the error:
[Rank 0] 2024-12-02 11:00:22 - ERROR - Error processing request 1733108408.2835336: Invalid destination rank: destination rank should not be the same as the rank of the current process.

after i checkout the code, i found the reason:

if save_disk_path is not None:
    ......
else:
    if is_dp_last_group():
        # serialize output object
        output_bytes = pickle.dumps(output)
        # send output to rank 0
        dist.send(torch.tensor(len(output_bytes), device=f"cuda:{local_rank}"), dst=0)
        dist.send(torch.ByteTensor(list(output_bytes)).to(f"cuda:{local_rank}"), dst=0)
        logger.info(f"Output sent to rank 0")

and the custom GroupCoordinator method send_object will assert when there is only one gpu:

class GroupCoordinator:
    def send_object(self, obj: Any, dst: int) -> None:
        """Send the input object list to the destination rank."""
        """NOTE: `dst` is the local rank of the destination rank."""

        assert dst < self.world_size, f"Invalid dst rank ({dst})"

        assert dst != self.rank, (
            "Invalid destination rank. Destination rank is the same "
            "as the current rank."
    )

so the way i temporary fix this by adding a check dist.get_world_size() > 1:

if save_disk_path is not None:
    ......
elif dist.get_world_size() > 1:
    if is_dp_last_group():
        # serialize output object
        output_bytes = pickle.dumps(output)
        # send output to rank 0
        dist.send(torch.tensor(len(output_bytes), device=f"cuda:{local_rank}"), dst=0)
        dist.send(torch.ByteTensor(list(output_bytes)).to(f"cuda:{local_rank}"), dst=0)
        logger.info(f"Output sent to rank 0")

maybe there is a better way to fix this

@feifeibear
Copy link
Collaborator

Thanks for your help. Sorry for not fully testing the not save_disk_path condition. Would you mind giving a Merge Request with your above updates? I believe it is very helpful.

@feifeibear feifeibear added the bug Something isn't working label Dec 2, 2024
@fy1214
Copy link
Contributor Author

fy1214 commented Dec 2, 2024

Thanks for your help. Sorry for not fully testing the not save_disk_path condition. Would you mind giving a Merge Request with your above updates? I believe it is very helpful.

sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants