You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I made a quick patch to server to test RPC running phi-3 fully offloaded onto a remote GPU with the server and all seemed OK, timings:
pp: 258.19 tokens per second
tg: 48.41 tokens per second
Run locally on the same GPU as the remote machine gives:
pp: 563.30 tokens per second
tg: 92.00 tokens per second
Possible Implementation
If you have an idea as to how it can be implemented, please write a detailed description. Feel free to give links to external sources or share visuals that might be helpful to understand the details better.
Patches are trivial:
printf(" --port PORT port to listen (default (default: %d)\n", sparams.port);
+ printf(" --rpc SERVERS comma separated list of RPC servers\n");
} else if (arg == "--host") {
if (++i >= argc) {
invalid_param = true;
break;
}
sparams.hostname = argv[i];
I made a quick patch to server to test RPC running phi-3 fully offloaded onto a remote GPU with the server and all seemed OK, timings:
pp: 258.19 tokens per second
tg: 48.41 tokens per second
Run locally on the same GPU as the remote machine gives:
pp: 563.30 tokens per second
tg: 92.00 tokens per second
Possible Implementation
If you have an idea as to how it can be implemented, please write a detailed description. Feel free to give links to external sources or share visuals that might be helpful to understand the details better.
Patches are trivial:
+ printf(" --rpc SERVERS comma separated list of RPC servers\n");
+ } else if (arg == "--rpc") {
+ if (++i >= argc) {
+ invalid_param = true;
+ break;
+ }
+ params.rpc_servers = argv[i];
The text was updated successfully, but these errors were encountered: