The lifecycle of ggml tensor #1109

soccercheng · 2025-02-11T01:14:43Z

soccercheng
Feb 11, 2025

I'm working on enabling GGML on a PCIE card with RISC-V AMP(Asymmetric multiprocessing) architecture in it, some of testing and examples can run on my port.

I'm curious about the lifecycle of ggml tensor, in the CPU and RPC implementations, I'm not able find any clues about when and how ggml tensors are released on RPC server side?

For example, in "examples/gpt-2", I can find "identical" ggml tensors are set multiple times with different size...

[add_ggml_tensor] in 1st graph compute

[MCU]GGML: [DEBUG] add_ggml_tensor: New tensor id(0x636f73530070), [tensor(inp_tokens, 110fee3a0): buffer(0X110fee540), data(0X1e30983c0), data_size(0X10)]
[MCU]MCU_WORKERS: [INFO] (0) Processing command task: A647E2F1-B5AD-40DE-B2A8-E6F60E4AA138, command=0X00000007
[MCU]GGML: [DEBUG] add_ggml_tensor: New tensor id(0x636f735301f0), [tensor(position, 110fee1e0): buffer(0X110fee540), data(0X1e30983e0), data_size(0X10)]
[MCU]MCU_WORKERS: [INFO] (0) Processing command task: 7DF94AE0-3B5F-4EC6-8A40-49EBA4AB2C6D, command=0X00000007
[MCU]GGML: [DEBUG] add_ggml_tensor: New tensor id(0x636f735307f0), [tensor(KQ_mask, 110fedfe0): buffer(0X110fee540), data(0X1e3098400), data_size(0X40)]
[MCU]MCU_WORKERS: [INFO] (0) Processing command task: 48A337B4-A73E-49EE-AC94-8C26B9D26BD8, command=0X00000007
[MCU]GGML: [DEBUG] add_ggml_tensor: New tensor id(0x636f73530370), [tensor(node_0, 110fede20): buffer(0X110fee540), data(0X1e30a2400), data_size(0X3000)]
[MCU]MCU_WORKERS: [INFO] (0) Processing command task: B375DA85-462C-4ADF-8EAE-B1A2667B96C1, command=0X00000007
[MCU]GGML: [DEBUG] add_ggml_tensor: New tensor id(0x636f735304f0), [tensor(node_1, 110fedc20): buffer(0X110fee540), data(0X1e30a6000), data_size(0X3000)]

[add_ggml_tensor] in 2nd graph compute

[MCU]GGML: [WARN] add_ggml_tensor: Existing tensor id(0x636f73530070), [tensor(inp_tokens, 110fee3a0): buffer(0X110fee540), data(0X1e30983c0), data_size(0X10)]
[MCU]GGML: [WARN] add_ggml_tensor: [serialized tensor(inp_tokens, 1e3098280, id(0x636f73530070)): buffer(0X636f7350e6a0), data(0X1e30983c0), data_size(0X14)]
[MCU]GGML: [DEBUG] add_ggml_tensor: New tensor id(0x636f73530070), [tensor(inp_tokens, 110fb0ce0): buffer(0X110fee540), data(0X1e30983c0), data_size(0X14)]
[MCU]MCU_WORKERS: [INFO] (0) Processing command task: 77D0C8F9-E3CD-4744-84CC-5F764FB20580, command=0X00000007
[MCU]GGML: [WARN] add_ggml_tensor: Existing tensor id(0x636f735301f0), [tensor(position, 110fee1e0): buffer(0X110fee540), data(0X1e30983e0), data_size(0X10)]
[MCU]GGML: [WARN] add_ggml_tensor: [serialized tensor(position, 1e3098280, id(0x636f735301f0)): buffer(0X636f7350e6a0), data(0X1e30983e0), data_size(0X14)]
[MCU]GGML: [DEBUG] add_ggml_tensor: New tensor id(0x636f735301f0), [tensor(position, 110fb0b20): buffer(0X110fee540), data(0X1e30983e0), data_size(0X14)]
[MCU]MCU_WORKERS: [INFO] (0) Processing command task: 5092049B-7AFD-40A2-BE7C-7022C9C85793, command=0X00000007
[MCU]GGML: [WARN] add_ggml_tensor: Existing tensor id(0x636f735307f0), [tensor(KQ_mask, 110fedfe0): buffer(0X110fee540), data(0X1e3098400), data_size(0X40)]
[MCU]GGML: [WARN] add_ggml_tensor: [serialized tensor(KQ_mask, 1e3098280, id(0x636f735307f0)): buffer(0X636f7350e6a0), data(0X1e3098400), data_size(0XB4)]
[MCU]GGML: [DEBUG] add_ggml_tensor: New tensor id(0x636f735307f0), [tensor(KQ_mask, 110fb0920): buffer(0X110fee540), data(0X1e3098400), data_size(0XB4)]
[MCU]MCU_WORKERS: [INFO] (0) Processing command task: C767DCF0-273B-470E-A90E-8F683501007F, command=0X00000007
[MCU]GGML: [WARN] add_ggml_tensor: Existing tensor id(0x636f73530370), [tensor(node_0, 110fede20): buffer(0X110fee540), data(0X1e30a2400), data_size(0X3000)]
[MCU]GGML: [WARN] add_ggml_tensor: [serialized tensor(node_0, 1e3098280, id(0x636f73530370)): buffer(0X636f7350e6a0), data(0X1e30a2400), data_size(0X3C00)]
[MCU]GGML: [DEBUG] add_ggml_tensor: New tensor id(0x636f73530370), [tensor(node_0, 110fb0760): buffer(0X110fee540), data(0X1e30a2400), data_size(0X3C00)]
[MCU]MCU_WORKERS: [INFO] (0) Processing command task: BBCFC35A-CEB4-4A76-9068-906E8B5B9720, command=0X00000007
[MCU]GGML: [WARN] add_ggml_tensor: Existing tensor id(0x636f735304f0), [tensor(node_1, 110fedc20): buffer(0X110fee540), data(0X1e30a6000), data_size(0X3000)]
[MCU]GGML: [WARN] add_ggml_tensor: [serialized tensor(node_1, 1e3098280, id(0x636f735304f0)): buffer(0X636f7350e6a0), data(0X1e30a6000), data_size(0X3C00)]
[MCU]GGML: [DEBUG] add_ggml_tensor: New tensor id(0x636f735304f0), [tensor(node_1, 110fb0560): buffer(0X110fee540), data(0X1e30a6000), data_size(0X3C00)]

Can you please provide some hints for my reference ?

slaren · 2025-02-11T12:44:47Z

slaren
Feb 11, 2025
Maintainer

For most backends, tensors are just a pointer within a ggml_backend_buffer. Normally, tensors are never released. There are some exceptions to this, but that's true for both the CPU and RPC backends.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The lifecycle of ggml tensor #1109

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

The lifecycle of ggml tensor #1109

soccercheng Feb 11, 2025

Replies: 1 comment

slaren Feb 11, 2025 Maintainer

soccercheng
Feb 11, 2025

slaren
Feb 11, 2025
Maintainer