diff --git a/doc/deepvm memory management.md b/doc/deepvm memory management.md index d96637c..2bd5695 100644 --- a/doc/deepvm memory management.md +++ b/doc/deepvm memory management.md @@ -20,6 +20,7 @@ deepvm 内存管理是基于内存池的管理,减少内存申请和释放的 - **限定单线程**环境:可以免除锁等同步机制开销 - 主要支持 IoT 环境下有限的内存大小:内存池**单池上限 4GB**(32 bit 按字节寻址),**单块上限 1GB**(30 bit 表示) +- 暂时限定 32 位寻址。 - deeplang 中受限的内存使用:**不做读写权限管理**,交由虚机与语言机制完成 #### 注意点 @@ -42,11 +43,11 @@ Allocated block 分为 head, payload 和可能存在的 padding 三个部分。 - payload 是外部可以**直接使用**的内存(裸内存),允许合法地直接读写 - padding (optional) 负责填充内存对齐情况下的剩余部分 -| block head 的成员 | 作用 | -| ----------------- | --------------------------------------------------- | -| [a] allocated (1 bit) | 指示当前 block 是否被使用,此时应为 1 | -| [p] previous block allocated (1 bit) | 标识(内存地址上紧邻着的)上一个内存块是否被使用 | -| [size] block size (30 bits) | 记录 payload 内存区域的大小 | +| block head 的成员 | 作用 | +| ------------------------------------ | ------------------------------------------------ | +| [a] allocated (1 bit) | 指示当前 block 是否被使用,此时应为 1 | +| [p] previous block allocated (1 bit) | 标识(内存地址上紧邻着的)上一个内存块是否被使用 | +| [size] block size (30 bits) | 记录 head + payload 内存区域的大小 | #### 未被使用的内存块 Free Block @@ -62,12 +63,13 @@ Allocated block 分为 head, payload 和可能存在的 padding 三个部分。 在内存池中申请一块内存。初始化内存池的时候,内存池中只有一个 free block,由下文所述 Remainder Block 地址指针直接管理。除 head 中的两个标识位外,其内所有值均不做定义。 -注意,remainder block 中的 block head 应保证有 4 bytes,payload 可以为 0 byte。即,永远存在一个 remainder block +注意,remainder block 中的 block head 应保证有 4 bytes,payload 可以至少为 0 byte。应永远存在一个 remainder block。 ![Remainder Block 示意图](img/deepvm-mem-block-structure-remainder-block.svg) - allocated 应置 0 - 在 remainder block 为第一个块时,pre allocated 应置 1,其余情况下根据实际状况由其紧邻的上一个块设置。 + - 实际情况下由于合并机制的存在,P 标记可以视为恒为 1。 ### 内存回收加速机制 Bins @@ -81,20 +83,20 @@ Allocated block 分为 head, payload 和可能存在的 padding 三个部分。 考虑快速分配,所有的 fast blocks 一般情况下不参与空闲内存合并操作。必要时可通过其他机制整合回收。 -注意,由于单向链表仅需要指向一个方向,且每条链上的所有块大小一致,fast blocks 不包含 predecessor offset 和 footer,仅保留 head 和 successor offset。由于每个 block 都需要一个 4-byte head 和指向另一个块的 4-bytes successor 偏移值,最小的 block 至少为 8 bytes。 +注意,由于单向链表仅需要指向一个方向,且每条链上的所有块大小一致,fast blocks 不包含 predecessor pointer 和 footer,仅保留 head 和 successor pointer。由于每个 block 都需要一个 4-byte head 和指向另一个块的 4-bytes successor 地址,最小的 block 至少为 8 bytes。简便起见,fast bins 使用指针(绝对地址)而不是偏移值。 考虑设置以下大小的 bins[^注1]: -| Fast Bin Size (4 bytes head + n bytes payload + padding) | Comments | -| --------------------------------------------------------- | -------- | -| 8 bytes (4 + 4) | 大部分内置数值类型使用 (char / unicode char, bool, int32_t, Single precision floating number) | -| 16 bytes (4 + 12) | 部分增强内置数值类型使用 (int64_t, Double precision floating number) | -| 24 bytes (4 + 20) | 部分小型复合结构使用 (String under 20 bytes, function table with less than 5 entries) | -| 32 bytes (4 + 28) | 部分小型复合结构 | -| 40 bytes (4 + 36) | 数学库:八元组 | -| 48 bytes (4 + 44) | 其余结构。主要是保证 Sorted bins 的高效率 | -| 56 bytes (4 + 52) | 同上 | -| 64 bytes (4 + 60) | 同上,以及满足 8 bytes alignment | +| Fast Bin Size (4 bytes head + n bytes payload + padding) | Comments | +| --------------------------------------------------------- | --------------------------------------------------------------------------------------------- | +| 8 bytes (4 + 4) | 大部分内置数值类型使用 (char / unicode char, bool, int32_t, Single precision floating number) | +| 16 bytes (4 + 12) | 部分增强内置数值类型使用 (int64_t, Double precision floating number) | +| 24 bytes (4 + 20) | 部分小型复合结构使用 (String under 20 bytes, function table with less than 5 entries) | +| 32 bytes (4 + 28) | 部分小型复合结构 | +| 40 bytes (4 + 36) | 数学库:八元组 | +| 48 bytes (4 + 44) | 其余结构。主要是保证 Sorted bins 的高效率 | +| 56 bytes (4 + 52) | 同上 | +| 64 bytes (4 + 60) | 同上,以及满足 8 bytes alignment | [^注1]: 注1:暂时只想到了这些常见应用场景 @@ -132,7 +134,7 @@ Sorted Bins 在分配时允许部分分配,释放时允许邻近合并,其 ### 全局元信息 -共 88 bytes。 +共 96 bytes。 ![Global Metadata 示意图](img/deepvm-mem-global-metadata.svg) @@ -142,12 +144,16 @@ Sorted Bins 在分配时允许部分分配,释放时允许邻近合并,其 #### 第一个 Sorted Block 地址/跳表首块地址 -8 bytes,也是最小的 sorted block 的地址。 +8 bytes,也是最小的 sorted block 的地址。默认指向一个大小为 72 bytes 的 fake sorted block,作为头部索引。 -#### 剩余块 Remainder Block 地址 +#### 剩余块 Remainder Block 首地址 8 bytes。即地址上最后的 free block 的首地址,用以支持一些额外的操作。 +#### 剩余块 Remainder Block 尾地址 + +8 bytes。即地址上最后的 free block 的尾部后的一个字节的地址,用以支持一些额外的操作。 + #### Fast Bins Array $8 \times 8 \textrm{ bytes} = 64 \textrm{ bytes}$。依序保存 fast bins 中每个 bin 的第一个元素地址,empty bin head 指向自身。 @@ -168,16 +174,20 @@ $8 \times 8 \textrm{ bytes} = 64 \textrm{ bytes}$。依序保存 fast bins 中 假设内存按字节寻址,布局方式为小端序 little-endian 1. 使用函数向系统申请 `n` bytes 大的连续内存,保存其首地址,设为 `addr` -2. 将全局可用内存空间(`addr` 处的八字节)设为 `n - 88` -3. 将跳表首块地址(`addr + 8` 处的八字节)设为 `addr + 8` -4. 将剩余块(Remainder Block)地址(`addr + 16` 处的八字节)设为 `addr + 88` -5. 将 Fast Bin Array (`addr + 24` – `addr + 87` 共 64 字节)每八字节为一组,均设为自身的地址 - - 例如,`addr + 24` 起始的八字节应设为 `addr + 24`;最后一个 `addr + 80` 起始的八字节应设为 `addr + 80` -6. 将剩余块的头部(`addr + 88` – `addr + 91`)初始化 - 1. 设 `m` 为 remainder block 的 payload 部分的大小,即 `m = n - 92` +2. 将全局可用内存空间(`addr` 处的八字节)设为 `n - 172` +3. 将跳表首块地址(`addr + 8` 处的八字节)设为 `addr + 96` +4. 初始化跳表首块 `addr + 96 – addr + 167` + 1. 将 head 设为 0,即 A/P flags 均为 0,且 block size = 0,为最小值,防止被占用 + 2. 将各个 offsets 设为 0 + 3. 将 level of indices 设为最大值 13 +5. 将剩余块(Remainder Block)首地址(`addr + 16` 处的八字节)设为 `addr + 168` +6. 将剩余块(Remainder Block)尾地址(`addr + 24` 处的八字节)设为 `addr + n` +7. 将 Fast Bin Array (`addr + 32` – `addr + 95` 共 64 字节)每八字节为一组,均设为`NULL` +8. 将剩余块的头部(`addr + 168` – `addr + 171`)初始化 + 1. 设 `m` 为 remainder block 的 payload 部分的大小,即 `m = n - 100` 2. 将 `m` 左移 2 bits,然后将空出的两个位从低到高分别置 0, 1 - 3. 示例代码 (C/C++) `*((int *)(addr + 88)) = ((n - 92) << 2) & 2` -7. 初始化完成,结果如图 + 3. 示例代码 (C/C++) `*((int *)(addr + 96)) = ((n - 100) << 2) & 2` +9. 初始化完成,结果如图 ![deep_mem_init 算法结果示意图](img/deepvm-mem-init.svg) @@ -193,8 +203,8 @@ $8 \times 8 \textrm{ bytes} = 64 \textrm{ bytes}$。依序保存 fast bins 中 1. 计算偏移值,直接查找对应的 fast bin 下是否存在可使用的 fast block。 2. 若存在,删除对应表头元素,跳至 step 4 - - 将该 fast block 的 predecessor 的 successor 偏移值更新,使其指向自身的 successor;若自身已为最后一块,则直接置为 0 -3. 若不存在,依据 remainder block 地址和其 payload 大小计算出其尾地址,在尾地址处切割处对应大小的 block 返回。更新 remainder block head 中的 size 值。 + - 将该 fast block 的 successor 指针更新,使其指向自身的 successor;若自身已为最后一块,则直接置为 `NULL` +3. 若不存在,在 remainder block 尾地址处切割处对应大小的 block 返回。更新 remainder block 的尾地址。 4. 更新全局可用内存空间大小。 5. 初始化将要返回的 fast block,完成后返回其 block payload 的地址。 - Block head 的 allocated 标记置 1,pre-allocated 标记置 1,block size 置为 payload 的大小(单位为**字节**) diff --git a/src/vm/Makefile b/src/vm/Makefile index 8113ace..cfb00d6 100644 --- a/src/vm/Makefile +++ b/src/vm/Makefile @@ -1,7 +1,7 @@ # 指令编译器和选项 CC=gcc CFLAGS=-Wall -std=gnu99 - + # 目标文件 TARGET=wasmvm SRCS = wasm_c_api.c\ @@ -28,15 +28,17 @@ SRCS = wasm_c_api.c\ bh_list.c\ bh_log.c\ bh_common.c\ - bh_assert.c + bh_assert.c\ + deep_mem_alloc.c\ + ./include/random/xoroshiro128plus.c INC = -I./include OBJS = $(SRCS:.c=.o) $(TARGET):$(OBJS) $(CC) -o $@ $^ -lpthread -lm - + clean: rm -rf $(TARGET) $(OBJS) - + %.o:%.c $(CC) $(CFLAGS) $(INC) -o $@ -c -g $< diff --git a/src/vm/deep_mem_alloc.c b/src/vm/deep_mem_alloc.c new file mode 100644 index 0000000..6d7f0fc --- /dev/null +++ b/src/vm/deep_mem_alloc.c @@ -0,0 +1,544 @@ +#include "deep_mem_alloc.h" +#include "random/random.h" +#include "string.h" + +struct mem_pool *pool; + +static void *deep_malloc_fast_bins (uint32_t size); +static void *deep_malloc_sorted_bins (uint32_t size); +static void deep_free_fast_bins (void *ptr); +static void deep_free_sorted_bins (void *ptr); + +/* helper functions for maintining the sorted_block skiplist */ +static void _split_into_two_sorted_blocks (struct sorted_block *block, + uint32_t aligned_size); +static void _merge_into_single_block (struct sorted_block *curr, + struct sorted_block *next); + +static struct sorted_block * +_allocate_block_from_skiplist (uint32_t aligned_size); + +static inline bool _sorted_block_is_in_skiplist (struct sorted_block *block); +static struct sorted_block * +_find_sorted_block_by_size_on_index (struct sorted_block *node, uint32_t size, + uint32_t index_level); +static struct sorted_block * +_find_sorted_block_by_size (struct sorted_block *node, uint32_t size); +static void _insert_sorted_block_to_skiplist (struct sorted_block *block); +static void _remove_sorted_block_from_skiplist (struct sorted_block *block); + +bool +deep_mem_init (void *mem, uint32_t size) +{ + assert (mem != NULL); + + if (size < sizeof (struct mem_pool)) + { + return false; /* given buffer is too small */ + } + + memset (mem, 0, size); + mem_size_t aligned_size = ALIGN_MEM_SIZE_TRUNC (size); + + pool = (struct mem_pool *)mem; + pool->free_memory = aligned_size - sizeof (struct mem_pool) + - sizeof (struct sorted_block) - sizeof (block_head_t); + /* the first node in the list, to simplify implementation */ + pool->sorted_block.addr + = (struct sorted_block *)(get_pointer_by_offset_in_bytes ( + mem, sizeof (struct mem_pool))); + /* all other fields are set as 0 */ + pool->sorted_block.addr->level_of_indices = SORTED_BLOCK_INDICES_LEVEL; + + pool->remainder_block.addr + = (struct sorted_block *)(get_pointer_by_offset_in_bytes ( + mem, sizeof (struct mem_pool) + sizeof (struct sorted_block))); + pool->remainder_block_end.addr + = (struct sorted_block *)(get_pointer_by_offset_in_bytes ( + mem, aligned_size - 8)); // -8 for safety + for (int i = 0; i < FAST_BIN_LENGTH; ++i) + { + pool->fast_bins[i].addr = NULL; + } + // initialise remainder block's head + block_set_A_flag (&pool->remainder_block.addr->head, false); + block_set_P_flag (&pool->remainder_block.addr->head, true); + + return true; +} + +void +deep_mem_destroy (void) +{ + pool = NULL; +} + +void * +deep_malloc (uint32_t size) +{ + assert (pool != NULL); + + if (pool->free_memory < size) + { + return NULL; + } + if (size <= FAST_BIN_MAX_SIZE) + { + return deep_malloc_fast_bins (size); + } + return deep_malloc_sorted_bins (size); +} + +static void * +deep_malloc_fast_bins (uint32_t size) +{ + block_size_t aligned_size = ALIGN_MEM_SIZE (size + sizeof (block_head_t)); + uint32_t offset = (aligned_size >> 3) - 1; + bool P_flag = false; + struct fast_block *ret = NULL; + block_size_t payload_size; + + if (pool->fast_bins[offset].addr != NULL) + { + ret = pool->fast_bins[offset].addr; + pool->fast_bins[offset].addr = ret->next; + P_flag = prev_block_is_allocated (&ret->head); + payload_size = block_get_size (&ret->head) - sizeof (block_head_t); + } + else if (aligned_size <= get_remainder_size (pool)) + { + ret = (struct fast_block *)(get_pointer_by_offset_in_bytes ( + pool->remainder_block_end.addr, + -(int64_t)aligned_size - sizeof (block_head_t))); + pool->remainder_block_end.addr = (struct sorted_block *)ret; + + payload_size = aligned_size - sizeof (block_head_t); + block_set_size (&ret->head, aligned_size); + pool->free_memory -= sizeof (block_head_t); + } + else + { + return NULL; + } + + memset (&ret->payload, 0, payload_size); + block_set_A_flag (&ret->head, true); + block_set_P_flag (&ret->head, P_flag); + pool->free_memory -= payload_size; + + return &ret->payload; +} + +static void * +deep_malloc_sorted_bins (uint32_t size) +{ + block_size_t aligned_size = ALIGN_MEM_SIZE (size + sizeof (block_head_t)); + struct sorted_block *ret = NULL; + block_size_t payload_size; + + if ((pool->sorted_block.addr != NULL) + && ((ret = _allocate_block_from_skiplist (aligned_size)) != NULL)) + { + /* pass */ + } + else if (aligned_size <= get_remainder_size (pool)) + { + /* no suitable sorted_block */ + ret = (struct sorted_block *)pool->remainder_block.addr; + block_set_size (&ret->head, get_remainder_size (pool)); + _split_into_two_sorted_blocks (ret, aligned_size); + } + else + { + return NULL; + } + + payload_size = aligned_size - sizeof (block_head_t); + memset (&ret->payload, 0, payload_size); + block_set_A_flag (&ret->head, true); + block_set_P_flag (&get_block_by_offset (ret, aligned_size)->head, true); + pool->free_memory -= payload_size; + + return &ret->payload; +} + +void * +deep_realloc (void *ptr, uint32_t size) +{ + return NULL; +} + +void +deep_free (void *ptr) +{ + assert (pool != NULL); + + void *head + = get_pointer_by_offset_in_bytes (ptr, -(int64_t)sizeof (block_head_t)); + + if (ptr == NULL || !block_is_allocated ((block_head_t *)head)) + { + return; + } + block_size_t size = block_get_size ((block_head_t *)head); + + if (size <= FAST_BIN_MAX_SIZE) + { + deep_free_fast_bins (head); + } + else + { + deep_free_sorted_bins (head); + } +} + +static void +deep_free_fast_bins (void *ptr) +{ + struct fast_block *block = ptr; + block_size_t payload_size + = block_get_size (&block->head) - sizeof (block_head_t); + uint32_t offset = ((payload_size + sizeof (block_head_t)) >> 3) - 1; + + memset (block->payload, 0, payload_size); + + block_set_A_flag (&block->head, false); + block->next = pool->fast_bins[offset].addr; + pool->fast_bins[offset].addr = block; + + pool->free_memory += payload_size; +} + +static void +deep_free_sorted_bins (void *ptr) +{ + struct sorted_block *block = ptr; + struct sorted_block *the_other = NULL; + block_size_t payload_size + = block_get_size (&block->head) - sizeof (block_head_t); + + memset (block->payload, 0, payload_size); + + block_set_A_flag (&block->head, false); + + /* try to merge */ + if (prev_block_is_allocated (&block->head)) + { + block_size_t prev_size + = block_get_size ((block_head_t *)(block - sizeof (block_head_t))); + the_other = get_block_by_offset (block, -((int32_t)prev_size)); + _merge_into_single_block (the_other, block); + + block = the_other; + } + + the_other = get_block_by_offset (block, block_get_size (&block->head)); + if (!block_is_allocated (&the_other->head)) + { + _merge_into_single_block (block, the_other); + } + + if (!_sorted_block_is_in_skiplist (block)) + { + _insert_sorted_block_to_skiplist (block); + } + + pool->free_memory += payload_size; +} + +bool +deep_mem_migrate (void *new_mem, uint32_t size) +{ + return false; +} + +/* helper functions for maintining the sorted_block skiplist */ +static void +_split_into_two_sorted_blocks (struct sorted_block *block, + uint32_t aligned_size) +{ + struct sorted_block *new_block = get_block_by_offset (block, aligned_size); + block_size_t new_block_size = block_get_size (&block->head) - aligned_size; + + memset (new_block, 0, new_block_size); + block_set_size (&new_block->head, new_block_size); + block_set_A_flag (&new_block->head, false); + block_set_P_flag (&new_block->head, false); /* by default */ + _insert_sorted_block_to_skiplist (new_block); + + block_set_size (&block->head, aligned_size); + pool->free_memory -= sizeof (block_head_t); +} + +/** + * Assuming `curr` and `next` are contiguous in memory address, + * where curr < next + * Attempt to remove both nodes from list, merge them, and insert the merged. + * NOTE: + * - will update `free_memory` of releasing `sizeof (block_head_t)` to pool + **/ +static void +_merge_into_single_block (struct sorted_block *curr, struct sorted_block *next) +{ + _remove_sorted_block_from_skiplist (curr); + _remove_sorted_block_from_skiplist (next); + + block_size_t new_size + = block_get_size (&curr->head) + block_get_size (&next->head); + + block_set_size (&curr->head, new_size); + memset (&curr->payload, 0, new_size - sizeof (curr->head)); + + _insert_sorted_block_to_skiplist (curr); + + pool->free_memory += sizeof (block_head_t); +} + +/** Obtain a most apporiate block from sorted_list if possible. + * + * - Obtain one with exact same size. + * - Obtain one with bigger size, but split into two sorted blocks + * - returns the part with exactly same size + * - insert the rest into sorted_block skiplist + * - NOTE: this requires the block found be at least + * (`aligned_size + SORTED_BIN_MIN_SIZE`) big + * - NULL + * + * NOTE: The obtained block will be **removed** from the skiplist. + **/ +static struct sorted_block * +_allocate_block_from_skiplist (uint32_t aligned_size) +{ + struct sorted_block *ret = NULL; + + if ((pool->sorted_block.addr == NULL) + || ((ret = _find_sorted_block_by_size (pool->sorted_block.addr, + aligned_size)) + == NULL)) + { + return NULL; + } + if (block_get_size (&ret->head) != aligned_size) + { + if ((block_get_size (&ret->head) < aligned_size + SORTED_BIN_MIN_SIZE) + && (ret = _find_sorted_block_by_size ( + ret, aligned_size + SORTED_BIN_MIN_SIZE)) + == NULL) + { + return NULL; + } + _split_into_two_sorted_blocks (ret, aligned_size); + } + _remove_sorted_block_from_skiplist (ret); + + return ret; +} + +static inline bool +_sorted_block_is_in_skiplist (struct sorted_block *block) +{ + assert (block != NULL); + return (block->pred_offset != 0 || block->level_of_indices != 0); +} + +/** + * returns a block with desired size on the list of given index level; + * if not possible, the greatest one that is smaller than desired. + * + * NOTE: + * - there will always be an infimum due to the existence of head + * - this function will not check nodes on other index level + * - this function will not check if there are any predecessor in the chain + * with same key. It assumes the `node` given has embedded all indices. + **/ +static struct sorted_block * +_find_sorted_block_by_size_on_index (struct sorted_block *node, uint32_t size, + uint32_t index_level) +{ + assert (node != NULL); + + struct sorted_block *curr = node; + struct sorted_block *prev = curr; + + while (block_get_size (&curr->head) < size) + { + prev = curr; /* curr is the candidate of infimum. */ + /* reached the end of the skiplist or the biggest smaller sorted block */ + if (index_level >= SORTED_BLOCK_INDICES_LEVEL + || curr->offsets[index_level] == 0) + { + break; + } + curr = get_block_by_offset (curr, curr->offsets[index_level]); + } + + /* return a node with no indices to avoid copying indices. */ + if (block_get_size (&curr->head) == size && curr->succ_offset != 0) + { + return get_block_by_offset (curr, curr->succ_offset); + } + + return prev; +} + +/** + * returns a block with desired size; if not possible, the least greater one + * + * NOTE: + * - returns NULL when supremum is not in the list + **/ +static struct sorted_block * +_find_sorted_block_by_size (struct sorted_block *node, uint32_t size) +{ + assert (node != NULL); + + struct sorted_block *curr = node; + + /* indices should only exists on first node in each sub-list. */ + while (curr->pred_offset != 0) + { + curr = get_block_by_offset (curr, curr->pred_offset); + } + + while (block_get_size (&curr->head) < size) + { + uint32_t index_level + = SORTED_BLOCK_INDICES_LEVEL - curr->level_of_indices; + + /* skip non-existing indices, to node with size <= than desired */ + while (index_level < SORTED_BLOCK_INDICES_LEVEL + || curr->offsets[index_level] == 0 + || (block_get_size ( + &get_block_by_offset (curr, curr->offsets[index_level]) + ->head) + > size)) + { + index_level++; + } + + /* reached the end of the skiplist or the biggest smaller sorted block */ + if (index_level >= SORTED_BLOCK_INDICES_LEVEL + || curr->offsets[index_level] == 0) + { + break; + } + + /* will not be NULL as curr's size is smaller than size */ + curr = _find_sorted_block_by_size_on_index (curr, size, index_level); + } + + /* all nodes are smaller than required. */ + if (block_get_size (&curr->head) < size) + { + return NULL; + } + + /* return a node with no indices to avoid copying indices. */ + if (curr->succ_offset != 0) + { + curr = get_block_by_offset (curr, curr->succ_offset); + } + + return curr; +} + +static void +_insert_sorted_block_to_skiplist (struct sorted_block *block) +{ + assert (block != NULL); + assert (pool->sorted_block.addr != NULL); + + block_size_t size = block_get_size (&block->head); + struct sorted_block *pos + = _find_sorted_block_by_size (pool->sorted_block.addr, size); + + /* insert into the chain with same size. */ + if (pos != NULL && block_get_size (&pos->head) == size) + { + block->pred_offset = get_offset_between_blocks (block, pos); + if (pos->succ_offset != 0) + { + block->succ_offset + = pos->succ_offset - get_offset_between_blocks (pos, block); + } + else + { + block->succ_offset = 0; /* end of chain */ + } + pos->succ_offset = get_offset_between_blocks (pos, block); + + return; + } + + block->level_of_indices + = ((uint32_t) (next () >> 32)) % SORTED_BLOCK_INDICES_LEVEL + 1; + + for (uint32_t index_level + = SORTED_BLOCK_INDICES_LEVEL - block->level_of_indices; + index_level < SORTED_BLOCK_INDICES_LEVEL; ++index_level) + { + pos = _find_sorted_block_by_size_on_index (pool->sorted_block.addr, size, + index_level); + if (pos->offsets[index_level] != 0) + { + block->offsets[index_level] + = pos->offsets[index_level] + - get_offset_between_blocks (pos, block); + } + else + { + block->offsets[index_level] = 0; + } + pos->offsets[index_level] = get_offset_between_blocks (pos, block); + } +} + +/** + * Remove the node and update all indices / offsets. + * May traverse the list multiple times + * + * NOTE: never removes a node with offsets when it has children in the chain + **/ +static void +_remove_sorted_block_from_skiplist (struct sorted_block *block) +{ + assert (block != NULL); + /* considering a node which is not in the skiplist */ + /* assert (block->level_of_indices != 0 || block->pred_offset != 0); */ + + struct sorted_block *prev = NULL; + block_size_t size = block_get_size (&block->head); + + for (uint32_t index_level + = SORTED_BLOCK_INDICES_LEVEL - block->level_of_indices; + index_level < SORTED_BLOCK_INDICES_LEVEL; ++index_level) + { + /* -1 to find the strictly smaller node. */ + prev = _find_sorted_block_by_size_on_index (pool->sorted_block.addr, + size - 1, index_level); + if (block->offsets[index_level] != 0) + { + prev->offsets[index_level] += block->offsets[index_level]; + } + else + { + prev->offsets[index_level] = 0; + } + } + + if (block->pred_offset != 0) + { + if (block->succ_offset != 0) + { + get_block_by_offset (block, block->pred_offset)->succ_offset + += block->succ_offset; + get_block_by_offset (block, block->succ_offset)->pred_offset + += block->pred_offset; + } + else + { + get_block_by_offset (block, block->pred_offset)->succ_offset = 0; + } + } + /* no other cases, as if it is the first node, it should be the only node. */ +} diff --git a/src/vm/include/deep_mem_alloc.h b/src/vm/include/deep_mem_alloc.h new file mode 100644 index 0000000..0919347 --- /dev/null +++ b/src/vm/include/deep_mem_alloc.h @@ -0,0 +1,185 @@ +#ifndef _DEEP_MEM_ALLOC_H +#define _DEEP_MEM_ALLOC_H + +#include "bh_platform.h" + +#define FAST_BIN_LENGTH (8) +#define FAST_BIN_MAX_SIZE (64) /* 8 * 8 bytes */ +#define SORTED_BIN_MIN_SIZE (72) /* 64 + 8 bytes */ + +#define A_FLAG_OFFSET (0) /* allocated */ +#define A_FLAG_MASK (0x00000001) +#define P_FLAG_OFFSET (1) /* previous block is allocated */ +#define P_FLAG_MASK (0x00000002) +#define BLOCK_SIZE_MULTIPLIER (3) +#define BLOCK_SIZE_MASK (0xfffffff4) +#define REMAINDER_SIZE_MULTIPLIER BLOCK_SIZE_MULTIPLIER +#define REMAINDER_SIZE_MASK ((0xffffffff << 32) & BLOCK_SIZE_MASK) + +#define SORTED_BLOCK_INDICES_LEVEL (13) + +/* need to update when *_SIZE_MULTIPLIER changes. */ +#define ALIGN_MEM_SIZE(size) (((size + 0x7) >> 3) << 3) +#define ALIGN_MEM_SIZE_TRUNC(size) ((size >> 3) << 3) + +typedef void *mem_t; +typedef uint64_t mem_size_t; +typedef uint32_t block_head_t; +typedef uint32_t block_size_t; + +struct mem_pool +{ + uint64_t free_memory; + union + { + uint64_t _padding; + struct sorted_block *addr; + } sorted_block; + union + { + uint64_t _padding; + struct sorted_block *addr; + } remainder_block; + union + { + uint64_t _padding; + struct sorted_block *addr; + } remainder_block_end; /* should not be dereferenced */ + union + { + uint64_t _padding; + struct fast_block *addr; + } fast_bins[FAST_BIN_LENGTH]; +}; + +struct fast_block +{ + block_head_t head; + union + { + uint32_t _padding; /* TODO! update for 64 bits system */ + struct fast_block *next; + void *payload; + }; +}; + +struct sorted_block +{ + block_head_t head; + union + { + struct + { + int32_t pred_offset; + int32_t succ_offset; + uint32_t level_of_indices; + // bigger array index corresponds to lower index in skip list, + // i.e., skipping less nodes in the skip list + // 0 means this node is the last one in this level of index. + // offsets[SORTED_BLOCK_INDICES_LEVEL - 1] is the level where each node + // is connected one by one consecutively. + // The skip list is in ascending order by block's size. + int32_t offsets[SORTED_BLOCK_INDICES_LEVEL]; + // padding + // uint32_t footer; + }; + void *payload; + }; +}; + +bool deep_mem_init (void *mem, uint32_t size); + +void deep_mem_destroy (void); + +void *deep_malloc (uint32_t size); + +void *deep_realloc (void *ptr, uint32_t size); + +void deep_free (void *ptr); + +bool deep_mem_migrate (void *new_mem, uint32_t size); + +static inline bool +block_is_allocated (block_head_t const *head) +{ + return (*head) & A_FLAG_MASK; +} + +static inline void +block_set_A_flag (block_head_t *head, bool allocated) +{ + *head = allocated ? (*head | A_FLAG_MASK) : (*head & (~A_FLAG_MASK)); +} + +static inline bool +prev_block_is_allocated (block_head_t const *head) +{ + return (*head) & P_FLAG_MASK; +} + +static inline void +block_set_P_flag (block_head_t *head, bool allocated) +{ + *head = allocated ? (*head | P_FLAG_MASK) : (*head & (~P_FLAG_MASK)); +} + +/** + * since the block is 8-bytes aligned, the smallest 8's multiple greater than + * size is used instead. + * + * Aligned size, including the block head + **/ +static inline block_size_t +block_get_size (block_head_t const *head) +{ + return (*head) & BLOCK_SIZE_MASK; +} + +/** + * since the block is 8-bytes aligned, the smallest 8's multiple greater than + * size is used instead. + * + * Aligned size, including the block head + **/ +static inline void +block_set_size (block_head_t *head, block_size_t size) +{ + *head = (*head & (~BLOCK_SIZE_MASK)) /* preserve flags */ + | ALIGN_MEM_SIZE (size); /* ensure rounds up */ +} + +static inline void * +get_pointer_by_offset_in_bytes (void *p, int64_t offset) +{ + return (uint8_t *)p + offset; +} + +static inline int64_t +get_offset_between_pointers_in_bytes (void *p, void *q) +{ + return (uint8_t *)p - (uint8_t *)q; +} + +static inline struct sorted_block * +get_block_by_offset (struct sorted_block *node, int32_t offset) +{ + return (struct sorted_block *)(get_pointer_by_offset_in_bytes ((mem_t *)node, + offset)); +} + +static inline int32_t +get_offset_between_blocks (struct sorted_block *origin, + struct sorted_block *target) +{ + return get_offset_between_pointers_in_bytes ((mem_t *)target, + (mem_t *)origin); +} + +static inline mem_size_t +get_remainder_size (struct mem_pool const *pool) +{ + return get_offset_between_pointers_in_bytes (pool->remainder_block_end.addr, + pool->remainder_block.addr); +} + +#endif /* _DEEP_MEM_ALLOC_H */ diff --git a/src/vm/include/random/random.h b/src/vm/include/random/random.h new file mode 100644 index 0000000..e8a2284 --- /dev/null +++ b/src/vm/include/random/random.h @@ -0,0 +1,19 @@ +#ifndef _VM_INCLUDE_RANDOM_H +#define _VM_INCLUDE_RANDOM_H + +#include + +uint64_t next (void); + +/* This is the jump function for the generator. It is equivalent + to 2^64 calls to next(); it can be used to generate 2^64 + non-overlapping subsequences for parallel computations. */ +void jump (void); + +/* This is the long-jump function for the generator. It is equivalent to + 2^96 calls to next(); it can be used to generate 2^32 starting points, + from each of which jump() will generate 2^32 non-overlapping + subsequences for parallel distributed computations. */ +void long_jump (void); + +#endif /* _VM_INCLUDE_RANDOM_H */ diff --git a/src/vm/include/random/xoroshiro128plus.c b/src/vm/include/random/xoroshiro128plus.c new file mode 100644 index 0000000..0c70bf6 --- /dev/null +++ b/src/vm/include/random/xoroshiro128plus.c @@ -0,0 +1,111 @@ +/* Written in 2016-2018 by David Blackman and Sebastiano Vigna (vigna@acm.org) + +To the extent possible under law, the author has dedicated all copyright +and related and neighboring rights to this software to the public domain +worldwide. This software is distributed without any warranty. + +See . */ + +#include "random.h" + +#include + +/* This is xoroshiro128+ 1.0, our best and fastest small-state generator + for floating-point numbers. We suggest to use its upper bits for + floating-point generation, as it is slightly faster than + xoroshiro128++/xoroshiro128**. It passes all tests we are aware of + except for the four lower bits, which might fail linearity tests (and + just those), so if low linear complexity is not considered an issue (as + it is usually the case) it can be used to generate 64-bit outputs, too; + moreover, this generator has a very mild Hamming-weight dependency + making our test (http://prng.di.unimi.it/hwd.php) fail after 5 TB of + output; we believe this slight bias cannot affect any application. If + you are concerned, use xoroshiro128++, xoroshiro128** or xoshiro256+. + + We suggest to use a sign test to extract a random Boolean value, and + right shifts to extract subsets of bits. + + The state must be seeded so that it is not everywhere zero. If you have + a 64-bit seed, we suggest to seed a splitmix64 generator and use its + output to fill s. + + NOTE: the parameters (a=24, b=16, b=37) of this version give slightly + better results in our test than the 2016 version (a=55, b=14, c=36). +*/ + +static inline uint64_t +rotl (const uint64_t x, int k) +{ + return (x << k) | (x >> (64 - k)); +} + +// two random numbers obtained from www.random.org +static uint64_t s[2] = { 0x562217302acf9a69, 0x2916753e667e5094 }; + +uint64_t +next (void) +{ + const uint64_t s0 = s[0]; + uint64_t s1 = s[1]; + const uint64_t result = s0 + s1; + + s1 ^= s0; + s[0] = rotl (s0, 24) ^ s1 ^ (s1 << 16); // a, b + s[1] = rotl (s1, 37); // c + + return result; +} + +/* This is the jump function for the generator. It is equivalent + to 2^64 calls to next(); it can be used to generate 2^64 + non-overlapping subsequences for parallel computations. */ + +void +jump (void) +{ + static const uint64_t JUMP[] = { 0xdf900294d8f554a5, 0x170865df4b3201fc }; + + uint64_t s0 = 0; + uint64_t s1 = 0; + for (int i = 0; i < sizeof JUMP / sizeof *JUMP; i++) + for (int b = 0; b < 64; b++) + { + if (JUMP[i] & UINT64_C (1) << b) + { + s0 ^= s[0]; + s1 ^= s[1]; + } + next (); + } + + s[0] = s0; + s[1] = s1; +} + +/* This is the long-jump function for the generator. It is equivalent to + 2^96 calls to next(); it can be used to generate 2^32 starting points, + from each of which jump() will generate 2^32 non-overlapping + subsequences for parallel distributed computations. */ + +void +long_jump (void) +{ + static const uint64_t LONG_JUMP[] + = { 0xd2a98b26625eee7b, 0xdddf9b1090aa7ac1 }; + + uint64_t s0 = 0; + uint64_t s1 = 0; + for (int i = 0; i < sizeof LONG_JUMP / sizeof *LONG_JUMP; i++) + for (int b = 0; b < 64; b++) + { + if (LONG_JUMP[i] & UINT64_C (1) << b) + { + s0 ^= s[0]; + s1 ^= s[1]; + } + next (); + } + + s[0] = s0; + s[1] = s1; +}