Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ch 12 练习12.5.3的代码不完善 #109

Open
YueZhengMeng opened this issue Jun 30, 2024 · 0 comments
Open

ch 12 练习12.5.3的代码不完善 #109

YueZhengMeng opened this issue Jun 30, 2024 · 0 comments

Comments

@YueZhengMeng
Copy link
Contributor

练习12.5.3

实现一个更高效的 allreduce 函数用于在不同的 GPU 上聚合不同的参数?为什么这样的效率更高?

import torch

def allreduce(data: list):
    # 获取需通信的进程总数
    world_size = len(data)

    # 将所有张量移动到相同的设备上
    device = data[0].device
    for i in range(1, world_size):
        data[i] = data[i].to(device)

    # 归约阶段
    for i in range(world_size - 1):
        if i % 2 == 0:
            src = i + 1
            dest = i
        else:
            src = i
            dest = i + 1
        torch.cuda.synchronize(device)  # 同步设备上的计算
        data[dest] += data[src]

    # 广播阶段
    for i in range(1, world_size):
        src = 0
        dest = i
        torch.cuda.synchronize(device)  # 同步设备上的计算
        data[dest] = data[src].to(data[dest].device)
# 测试
data = [torch.ones((1, 2), device=d2l.try_gpu(i)) * (i + 1) for i in range(2)]
print('allreduce之前:\n', data[0], '\n', data[1])
allreduce(data)
print('allreduce之后:\n', data[0], '\n', data[1])
allreduce之前:
tensor([[1., 1.]], device='cuda:0') 
tensor([[2., 2.]], device='cuda:1')
allreduce之后:
tensor([[3., 3.]], device='cuda:0') 
tensor([[3., 3.]], device='cuda:0')

解答代码中给出的函数只在例子中的data,即包含两项数据时,返回正确结果
如果data有更多项,如

# 测试
data = [torch.ones((1, 2), device=d2l.try_gpu(i)) * (i + 1) for i in range(5)]

输出的结果仍然只相当于前两项的求和与广播

allreduce之前:
tensor([[1., 1.]], device='cuda:0') 
tensor([[2., 2.]], device='cuda:1')
allreduce之后:
tensor([[3., 3.]], device='cuda:0') 
tensor([[3., 3.]], device='cuda:0')

个人认为函数中归约阶段的代码不完善,没有正确完成累加data中所有数据项的任务

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant