diff --git a/doc/deepvm memory management.md b/doc/deepvm memory management.md
index d96637c..2bd5695 100644
--- a/doc/deepvm memory management.md	
+++ b/doc/deepvm memory management.md	
@@ -20,6 +20,7 @@ deepvm 内存管理是基于内存池的管理，减少内存申请和释放的
 
 - **限定单线程**环境：可以免除锁等同步机制开销
 - 主要支持 IoT 环境下有限的内存大小：内存池**单池上限 4GB**（32 bit 按字节寻址），**单块上限 1GB**（30 bit 表示）
+- 暂时限定 32 位寻址。
 - deeplang 中受限的内存使用：**不做读写权限管理**，交由虚机与语言机制完成
 
 #### 注意点
@@ -42,11 +43,11 @@ Allocated block 分为 head, payload 和可能存在的 padding 三个部分。
 - payload 是外部可以**直接使用**的内存（裸内存），允许合法地直接读写
 - padding (optional) 负责填充内存对齐情况下的剩余部分
 
-| block head 的成员 | 作用                                                |
-| ----------------- | --------------------------------------------------- |
-| [a] allocated (1 bit)  | 指示当前 block 是否被使用，此时应为 1                    |
-| [p] previous block allocated (1 bit) | 标识（内存地址上紧邻着的）上一个内存块是否被使用    |
-| [size] block size (30 bits) | 记录 payload 内存区域的大小 |
+| block head 的成员                    | 作用                                             |
+| ------------------------------------ | ------------------------------------------------ |
+| [a] allocated (1 bit)                | 指示当前 block 是否被使用，此时应为 1            |
+| [p] previous block allocated (1 bit) | 标识（内存地址上紧邻着的）上一个内存块是否被使用 |
+| [size] block size (30 bits)          | 记录 head + payload 内存区域的大小               |
 
 #### 未被使用的内存块 Free Block
 
@@ -62,12 +63,13 @@ Allocated block 分为 head, payload 和可能存在的 padding 三个部分。
 
 在内存池中申请一块内存。初始化内存池的时候，内存池中只有一个 free block，由下文所述 Remainder Block 地址指针直接管理。除 head 中的两个标识位外，其内所有值均不做定义。
 
-注意，remainder block 中的 block head 应保证有 4 bytes，payload 可以为 0 byte。即，永远存在一个 remainder block
+注意，remainder block 中的 block head 应保证有 4 bytes，payload 可以至少为 0 byte。应永远存在一个 remainder block。
 
 ![Remainder Block 示意图](img/deepvm-mem-block-structure-remainder-block.svg)
 
 - allocated 应置 0
 - 在 remainder block 为第一个块时，pre allocated 应置 1，其余情况下根据实际状况由其紧邻的上一个块设置。
+  -  实际情况下由于合并机制的存在，P 标记可以视为恒为 1。
 
 ### 内存回收加速机制 Bins
 
@@ -81,20 +83,20 @@ Allocated block 分为 head, payload 和可能存在的 padding 三个部分。
 
 考虑快速分配，所有的 fast blocks 一般情况下不参与空闲内存合并操作。必要时可通过其他机制整合回收。
 
-注意，由于单向链表仅需要指向一个方向，且每条链上的所有块大小一致，fast blocks 不包含 predecessor offset 和 footer，仅保留 head 和 successor offset。由于每个 block 都需要一个 4-byte head 和指向另一个块的 4-bytes successor 偏移值，最小的 block 至少为 8 bytes。
+注意，由于单向链表仅需要指向一个方向，且每条链上的所有块大小一致，fast blocks 不包含 predecessor pointer 和 footer，仅保留 head 和 successor pointer。由于每个 block 都需要一个 4-byte head 和指向另一个块的 4-bytes successor 地址，最小的 block 至少为 8 bytes。简便起见，fast bins 使用指针（绝对地址）而不是偏移值。
 
 考虑设置以下大小的 bins[^注1]：
 
-| Fast Bin Size (4 bytes head +  n bytes payload + padding) | Comments |
-| --------------------------------------------------------- | -------- |
-| 8 bytes (4 + 4)   | 大部分内置数值类型使用 (char / unicode char, bool, int32_t, Single precision floating number) |
-| 16 bytes (4 + 12) | 部分增强内置数值类型使用 (int64_t, Double precision floating number) |
-| 24 bytes (4 + 20) | 部分小型复合结构使用 (String under 20 bytes, function table with less than 5 entries) |
-| 32 bytes (4 + 28) | 部分小型复合结构 |
-| 40 bytes (4 + 36) | 数学库：八元组 |
-| 48 bytes (4 + 44) | 其余结构。主要是保证 Sorted bins 的高效率 |
-| 56 bytes (4 + 52) | 同上 |
-| 64 bytes (4 + 60) | 同上，以及满足 8 bytes alignment |
+| Fast Bin Size (4 bytes head +  n bytes payload + padding) | Comments                                                                                      |
+| --------------------------------------------------------- | --------------------------------------------------------------------------------------------- |
+| 8 bytes (4 + 4)                                           | 大部分内置数值类型使用 (char / unicode char, bool, int32_t, Single precision floating number) |
+| 16 bytes (4 + 12)                                         | 部分增强内置数值类型使用 (int64_t, Double precision floating number)                          |
+| 24 bytes (4 + 20)                                         | 部分小型复合结构使用 (String under 20 bytes, function table with less than 5 entries)         |
+| 32 bytes (4 + 28)                                         | 部分小型复合结构                                                                              |
+| 40 bytes (4 + 36)                                         | 数学库：八元组                                                                                |
+| 48 bytes (4 + 44)                                         | 其余结构。主要是保证 Sorted bins 的高效率                                                     |
+| 56 bytes (4 + 52)                                         | 同上                                                                                          |
+| 64 bytes (4 + 60)                                         | 同上，以及满足 8 bytes alignment                                                              |
 
 [^注1]: 注1：暂时只想到了这些常见应用场景
 
@@ -132,7 +134,7 @@ Sorted Bins 在分配时允许部分分配，释放时允许邻近合并，其
 
 ### 全局元信息
 
-共 88 bytes。
+共 96 bytes。
 
 ![Global Metadata 示意图](img/deepvm-mem-global-metadata.svg)
 
@@ -142,12 +144,16 @@ Sorted Bins 在分配时允许部分分配，释放时允许邻近合并，其
 
 #### 第一个 Sorted Block 地址/跳表首块地址
 
-8 bytes，也是最小的 sorted block 的地址。
+8 bytes，也是最小的 sorted block 的地址。默认指向一个大小为 72 bytes 的 fake sorted block，作为头部索引。
 
-#### 剩余块 Remainder Block 地址
+#### 剩余块 Remainder Block 首地址
 
 8 bytes。即地址上最后的 free block 的首地址，用以支持一些额外的操作。
 
+#### 剩余块 Remainder Block 尾地址
+
+8 bytes。即地址上最后的 free block 的尾部后的一个字节的地址，用以支持一些额外的操作。
+
 #### Fast Bins Array
 
 $8 \times 8 \textrm{ bytes} = 64 \textrm{ bytes}$。依序保存 fast bins 中每个 bin 的第一个元素地址，empty bin head 指向自身。
@@ -168,16 +174,20 @@ $8 \times 8 \textrm{ bytes} = 64 \textrm{ bytes}$。依序保存 fast bins 中
 假设内存按字节寻址，布局方式为小端序 little-endian
 
 1. 使用函数向系统申请 `n` bytes 大的连续内存，保存其首地址，设为 `addr`
-2. 将全局可用内存空间（`addr` 处的八字节）设为 `n - 88`
-3. 将跳表首块地址（`addr + 8` 处的八字节）设为 `addr + 8`
-4. 将剩余块（Remainder Block）地址（`addr + 16` 处的八字节）设为 `addr + 88`
-5. 将 Fast Bin Array （`addr + 24` – `addr + 87` 共 64 字节）每八字节为一组，均设为自身的地址
-   - 例如，`addr + 24` 起始的八字节应设为 `addr + 24`；最后一个 `addr + 80` 起始的八字节应设为 `addr + 80`
-6. 将剩余块的头部（`addr + 88` – `addr + 91`）初始化
-   1. 设 `m` 为 remainder block 的 payload 部分的大小，即 `m = n - 92`
+2. 将全局可用内存空间（`addr` 处的八字节）设为 `n - 172`
+3. 将跳表首块地址（`addr + 8` 处的八字节）设为 `addr + 96`
+4. 初始化跳表首块 `addr + 96 – addr + 167`
+   1. 将 head 设为 0，即 A/P flags 均为 0，且 block size = 0，为最小值，防止被占用
+   2. 将各个 offsets 设为 0
+   3. 将 level of indices 设为最大值 13
+5. 将剩余块（Remainder Block）首地址（`addr + 16` 处的八字节）设为 `addr + 168`
+6. 将剩余块（Remainder Block）尾地址（`addr + 24` 处的八字节）设为 `addr + n`
+7. 将 Fast Bin Array （`addr + 32` – `addr + 95` 共 64 字节）每八字节为一组，均设为`NULL`
+8. 将剩余块的头部（`addr + 168` – `addr + 171`）初始化
+   1. 设 `m` 为 remainder block 的 payload 部分的大小，即 `m = n - 100`
    2. 将 `m` 左移 2 bits，然后将空出的两个位从低到高分别置 0, 1
-   3. 示例代码 (C/C++) `*((int *)(addr + 88)) = ((n - 92) << 2) & 2`
-7. 初始化完成，结果如图
+   3. 示例代码 (C/C++) `*((int *)(addr + 96)) = ((n - 100) << 2) & 2`
+9. 初始化完成，结果如图
 
 ![deep_mem_init 算法结果示意图](img/deepvm-mem-init.svg)
 
@@ -193,8 +203,8 @@ $8 \times 8 \textrm{ bytes} = 64 \textrm{ bytes}$。依序保存 fast bins 中
 
 1. 计算偏移值，直接查找对应的 fast bin 下是否存在可使用的 fast block。
 2. 若存在，删除对应表头元素，跳至 step 4
-   - 将该 fast block 的 predecessor 的 successor 偏移值更新，使其指向自身的 successor；若自身已为最后一块，则直接置为 0
-3. 若不存在，依据 remainder block 地址和其 payload 大小计算出其尾地址，在尾地址处切割处对应大小的 block 返回。更新 remainder block head 中的 size 值。
+   - 将该 fast block 的 successor 指针更新，使其指向自身的 successor；若自身已为最后一块，则直接置为 `NULL`
+3. 若不存在，在 remainder block 尾地址处切割处对应大小的 block 返回。更新 remainder block 的尾地址。
 4. 更新全局可用内存空间大小。
 5. 初始化将要返回的 fast block，完成后返回其 block payload 的地址。
    - Block head 的 allocated 标记置 1，pre-allocated 标记置 1，block size 置为 payload 的大小（单位为**字节**）
diff --git a/src/vm/Makefile b/src/vm/Makefile
index 8113ace..cfb00d6 100644
--- a/src/vm/Makefile
+++ b/src/vm/Makefile
@@ -1,7 +1,7 @@
 # 指令编译器和选项
 CC=gcc
 CFLAGS=-Wall -std=gnu99
- 
+
 # 目标文件
 TARGET=wasmvm
 SRCS = wasm_c_api.c\
@@ -28,15 +28,17 @@ SRCS = wasm_c_api.c\
 	   bh_list.c\
 	   bh_log.c\
 	   bh_common.c\
-	   bh_assert.c
+	   bh_assert.c\
+	   deep_mem_alloc.c\
+	   ./include/random/xoroshiro128plus.c
 INC = -I./include
 OBJS = $(SRCS:.c=.o)
 
 $(TARGET):$(OBJS)
 	$(CC) -o $@ $^ -lpthread -lm
- 
+
 clean:
 	rm -rf $(TARGET) $(OBJS)
- 
+
 %.o:%.c
 	$(CC) $(CFLAGS) $(INC) -o $@ -c -g $<
diff --git a/src/vm/deep_mem_alloc.c b/src/vm/deep_mem_alloc.c
new file mode 100644
index 0000000..6d7f0fc
--- /dev/null
+++ b/src/vm/deep_mem_alloc.c
@@ -0,0 +1,544 @@
+#include "deep_mem_alloc.h"
+#include "random/random.h"
+#include "string.h"
+
+struct mem_pool *pool;
+
+static void *deep_malloc_fast_bins (uint32_t size);
+static void *deep_malloc_sorted_bins (uint32_t size);
+static void deep_free_fast_bins (void *ptr);
+static void deep_free_sorted_bins (void *ptr);
+
+/* helper functions for maintining the sorted_block skiplist */
+static void _split_into_two_sorted_blocks (struct sorted_block *block,
+                                           uint32_t aligned_size);
+static void _merge_into_single_block (struct sorted_block *curr,
+                                      struct sorted_block *next);
+
+static struct sorted_block *
+_allocate_block_from_skiplist (uint32_t aligned_size);
+
+static inline bool _sorted_block_is_in_skiplist (struct sorted_block *block);
+static struct sorted_block *
+_find_sorted_block_by_size_on_index (struct sorted_block *node, uint32_t size,
+                                     uint32_t index_level);
+static struct sorted_block *
+_find_sorted_block_by_size (struct sorted_block *node, uint32_t size);
+static void _insert_sorted_block_to_skiplist (struct sorted_block *block);
+static void _remove_sorted_block_from_skiplist (struct sorted_block *block);
+
+bool
+deep_mem_init (void *mem, uint32_t size)
+{
+  assert (mem != NULL);
+
+  if (size < sizeof (struct mem_pool))
+    {
+      return false; /* given buffer is too small */
+    }
+
+  memset (mem, 0, size);
+  mem_size_t aligned_size = ALIGN_MEM_SIZE_TRUNC (size);
+
+  pool = (struct mem_pool *)mem;
+  pool->free_memory = aligned_size - sizeof (struct mem_pool)
+                      - sizeof (struct sorted_block) - sizeof (block_head_t);
+  /* the first node in the list, to simplify implementation */
+  pool->sorted_block.addr
+      = (struct sorted_block *)(get_pointer_by_offset_in_bytes (
+          mem, sizeof (struct mem_pool)));
+  /* all other fields are set as 0 */
+  pool->sorted_block.addr->level_of_indices = SORTED_BLOCK_INDICES_LEVEL;
+
+  pool->remainder_block.addr
+      = (struct sorted_block *)(get_pointer_by_offset_in_bytes (
+          mem, sizeof (struct mem_pool) + sizeof (struct sorted_block)));
+  pool->remainder_block_end.addr
+      = (struct sorted_block *)(get_pointer_by_offset_in_bytes (
+          mem, aligned_size - 8)); // -8 for safety
+  for (int i = 0; i < FAST_BIN_LENGTH; ++i)
+    {
+      pool->fast_bins[i].addr = NULL;
+    }
+  // initialise remainder block's head
+  block_set_A_flag (&pool->remainder_block.addr->head, false);
+  block_set_P_flag (&pool->remainder_block.addr->head, true);
+
+  return true;
+}
+
+void
+deep_mem_destroy (void)
+{
+  pool = NULL;
+}
+
+void *
+deep_malloc (uint32_t size)
+{
+  assert (pool != NULL);
+
+  if (pool->free_memory < size)
+    {
+      return NULL;
+    }
+  if (size <= FAST_BIN_MAX_SIZE)
+    {
+      return deep_malloc_fast_bins (size);
+    }
+  return deep_malloc_sorted_bins (size);
+}
+
+static void *
+deep_malloc_fast_bins (uint32_t size)
+{
+  block_size_t aligned_size = ALIGN_MEM_SIZE (size + sizeof (block_head_t));
+  uint32_t offset = (aligned_size >> 3) - 1;
+  bool P_flag = false;
+  struct fast_block *ret = NULL;
+  block_size_t payload_size;
+
+  if (pool->fast_bins[offset].addr != NULL)
+    {
+      ret = pool->fast_bins[offset].addr;
+      pool->fast_bins[offset].addr = ret->next;
+      P_flag = prev_block_is_allocated (&ret->head);
+      payload_size = block_get_size (&ret->head) - sizeof (block_head_t);
+    }
+  else if (aligned_size <= get_remainder_size (pool))
+    {
+      ret = (struct fast_block *)(get_pointer_by_offset_in_bytes (
+          pool->remainder_block_end.addr,
+          -(int64_t)aligned_size - sizeof (block_head_t)));
+      pool->remainder_block_end.addr = (struct sorted_block *)ret;
+
+      payload_size = aligned_size - sizeof (block_head_t);
+      block_set_size (&ret->head, aligned_size);
+      pool->free_memory -= sizeof (block_head_t);
+    }
+  else
+    {
+      return NULL;
+    }
+
+  memset (&ret->payload, 0, payload_size);
+  block_set_A_flag (&ret->head, true);
+  block_set_P_flag (&ret->head, P_flag);
+  pool->free_memory -= payload_size;
+
+  return &ret->payload;
+}
+
+static void *
+deep_malloc_sorted_bins (uint32_t size)
+{
+  block_size_t aligned_size = ALIGN_MEM_SIZE (size + sizeof (block_head_t));
+  struct sorted_block *ret = NULL;
+  block_size_t payload_size;
+
+  if ((pool->sorted_block.addr != NULL)
+      && ((ret = _allocate_block_from_skiplist (aligned_size)) != NULL))
+    {
+      /* pass */
+    }
+  else if (aligned_size <= get_remainder_size (pool))
+    {
+      /* no suitable sorted_block */
+      ret = (struct sorted_block *)pool->remainder_block.addr;
+      block_set_size (&ret->head, get_remainder_size (pool));
+      _split_into_two_sorted_blocks (ret, aligned_size);
+    }
+  else
+    {
+      return NULL;
+    }
+
+  payload_size = aligned_size - sizeof (block_head_t);
+  memset (&ret->payload, 0, payload_size);
+  block_set_A_flag (&ret->head, true);
+  block_set_P_flag (&get_block_by_offset (ret, aligned_size)->head, true);
+  pool->free_memory -= payload_size;
+
+  return &ret->payload;
+}
+
+void *
+deep_realloc (void *ptr, uint32_t size)
+{
+  return NULL;
+}
+
+void
+deep_free (void *ptr)
+{
+  assert (pool != NULL);
+
+  void *head
+      = get_pointer_by_offset_in_bytes (ptr, -(int64_t)sizeof (block_head_t));
+
+  if (ptr == NULL || !block_is_allocated ((block_head_t *)head))
+    {
+      return;
+    }
+  block_size_t size = block_get_size ((block_head_t *)head);
+
+  if (size <= FAST_BIN_MAX_SIZE)
+    {
+      deep_free_fast_bins (head);
+    }
+  else
+    {
+      deep_free_sorted_bins (head);
+    }
+}
+
+static void
+deep_free_fast_bins (void *ptr)
+{
+  struct fast_block *block = ptr;
+  block_size_t payload_size
+      = block_get_size (&block->head) - sizeof (block_head_t);
+  uint32_t offset = ((payload_size + sizeof (block_head_t)) >> 3) - 1;
+
+  memset (block->payload, 0, payload_size);
+
+  block_set_A_flag (&block->head, false);
+  block->next = pool->fast_bins[offset].addr;
+  pool->fast_bins[offset].addr = block;
+
+  pool->free_memory += payload_size;
+}
+
+static void
+deep_free_sorted_bins (void *ptr)
+{
+  struct sorted_block *block = ptr;
+  struct sorted_block *the_other = NULL;
+  block_size_t payload_size
+      = block_get_size (&block->head) - sizeof (block_head_t);
+
+  memset (block->payload, 0, payload_size);
+
+  block_set_A_flag (&block->head, false);
+
+  /* try to merge */
+  if (prev_block_is_allocated (&block->head))
+    {
+      block_size_t prev_size
+          = block_get_size ((block_head_t *)(block - sizeof (block_head_t)));
+      the_other = get_block_by_offset (block, -((int32_t)prev_size));
+      _merge_into_single_block (the_other, block);
+
+      block = the_other;
+    }
+
+  the_other = get_block_by_offset (block, block_get_size (&block->head));
+  if (!block_is_allocated (&the_other->head))
+    {
+      _merge_into_single_block (block, the_other);
+    }
+
+  if (!_sorted_block_is_in_skiplist (block))
+    {
+      _insert_sorted_block_to_skiplist (block);
+    }
+
+  pool->free_memory += payload_size;
+}
+
+bool
+deep_mem_migrate (void *new_mem, uint32_t size)
+{
+  return false;
+}
+
+/* helper functions for maintining the sorted_block skiplist */
+static void
+_split_into_two_sorted_blocks (struct sorted_block *block,
+                               uint32_t aligned_size)
+{
+  struct sorted_block *new_block = get_block_by_offset (block, aligned_size);
+  block_size_t new_block_size = block_get_size (&block->head) - aligned_size;
+
+  memset (new_block, 0, new_block_size);
+  block_set_size (&new_block->head, new_block_size);
+  block_set_A_flag (&new_block->head, false);
+  block_set_P_flag (&new_block->head, false); /* by default */
+  _insert_sorted_block_to_skiplist (new_block);
+
+  block_set_size (&block->head, aligned_size);
+  pool->free_memory -= sizeof (block_head_t);
+}
+
+/**
+ * Assuming `curr` and `next` are contiguous in memory address,
+ * where curr < next
+ * Attempt to remove both nodes from list, merge them, and insert the merged.
+ * NOTE:
+ *   - will update `free_memory` of releasing `sizeof (block_head_t)` to pool
+ **/
+static void
+_merge_into_single_block (struct sorted_block *curr, struct sorted_block *next)
+{
+  _remove_sorted_block_from_skiplist (curr);
+  _remove_sorted_block_from_skiplist (next);
+
+  block_size_t new_size
+      = block_get_size (&curr->head) + block_get_size (&next->head);
+
+  block_set_size (&curr->head, new_size);
+  memset (&curr->payload, 0, new_size - sizeof (curr->head));
+
+  _insert_sorted_block_to_skiplist (curr);
+
+  pool->free_memory += sizeof (block_head_t);
+}
+
+/** Obtain a most apporiate block from sorted_list if possible.
+ *
+ * - Obtain one with exact same size.
+ * - Obtain one with bigger size, but split into two sorted blocks
+ *   - returns the part with exactly same size
+ *   - insert the rest into sorted_block skiplist
+ *   - NOTE: this requires the block found be at least
+ *           (`aligned_size + SORTED_BIN_MIN_SIZE`) big
+ * - NULL
+ *
+ * NOTE: The obtained block will be **removed** from the skiplist.
+ **/
+static struct sorted_block *
+_allocate_block_from_skiplist (uint32_t aligned_size)
+{
+  struct sorted_block *ret = NULL;
+
+  if ((pool->sorted_block.addr == NULL)
+      || ((ret = _find_sorted_block_by_size (pool->sorted_block.addr,
+                                             aligned_size))
+          == NULL))
+    {
+      return NULL;
+    }
+  if (block_get_size (&ret->head) != aligned_size)
+    {
+      if ((block_get_size (&ret->head) < aligned_size + SORTED_BIN_MIN_SIZE)
+          && (ret = _find_sorted_block_by_size (
+                  ret, aligned_size + SORTED_BIN_MIN_SIZE))
+                 == NULL)
+        {
+          return NULL;
+        }
+      _split_into_two_sorted_blocks (ret, aligned_size);
+    }
+  _remove_sorted_block_from_skiplist (ret);
+
+  return ret;
+}
+
+static inline bool
+_sorted_block_is_in_skiplist (struct sorted_block *block)
+{
+  assert (block != NULL);
+  return (block->pred_offset != 0 || block->level_of_indices != 0);
+}
+
+/**
+ *  returns a block with desired size on the list of given index level;
+ * if not possible, the greatest one that is smaller than desired.
+ *
+ * NOTE:
+ *   - there will always be an infimum due to the existence of head
+ *   - this function will not check nodes on other index level
+ *   - this function will not check if there are any predecessor in the chain
+ *     with same key. It assumes the `node` given has embedded all indices.
+ **/
+static struct sorted_block *
+_find_sorted_block_by_size_on_index (struct sorted_block *node, uint32_t size,
+                                     uint32_t index_level)
+{
+  assert (node != NULL);
+
+  struct sorted_block *curr = node;
+  struct sorted_block *prev = curr;
+
+  while (block_get_size (&curr->head) < size)
+    {
+      prev = curr; /* curr is the candidate of infimum. */
+      /* reached the end of the skiplist or the biggest smaller sorted block */
+      if (index_level >= SORTED_BLOCK_INDICES_LEVEL
+          || curr->offsets[index_level] == 0)
+        {
+          break;
+        }
+      curr = get_block_by_offset (curr, curr->offsets[index_level]);
+    }
+
+  /* return a node with no indices to avoid copying indices. */
+  if (block_get_size (&curr->head) == size && curr->succ_offset != 0)
+    {
+      return get_block_by_offset (curr, curr->succ_offset);
+    }
+
+  return prev;
+}
+
+/**
+ *  returns a block with desired size; if not possible, the least greater one
+ *
+ * NOTE:
+ *   - returns NULL when supremum is not in the list
+ **/
+static struct sorted_block *
+_find_sorted_block_by_size (struct sorted_block *node, uint32_t size)
+{
+  assert (node != NULL);
+
+  struct sorted_block *curr = node;
+
+  /* indices should only exists on first node in each sub-list. */
+  while (curr->pred_offset != 0)
+    {
+      curr = get_block_by_offset (curr, curr->pred_offset);
+    }
+
+  while (block_get_size (&curr->head) < size)
+    {
+      uint32_t index_level
+          = SORTED_BLOCK_INDICES_LEVEL - curr->level_of_indices;
+
+      /* skip non-existing indices, to node with size <= than desired */
+      while (index_level < SORTED_BLOCK_INDICES_LEVEL
+             || curr->offsets[index_level] == 0
+             || (block_get_size (
+                     &get_block_by_offset (curr, curr->offsets[index_level])
+                          ->head)
+                 > size))
+        {
+          index_level++;
+        }
+
+      /* reached the end of the skiplist or the biggest smaller sorted block */
+      if (index_level >= SORTED_BLOCK_INDICES_LEVEL
+          || curr->offsets[index_level] == 0)
+        {
+          break;
+        }
+
+      /* will not be NULL as curr's size is smaller than size */
+      curr = _find_sorted_block_by_size_on_index (curr, size, index_level);
+    }
+
+  /* all nodes are smaller than required. */
+  if (block_get_size (&curr->head) < size)
+    {
+      return NULL;
+    }
+
+  /* return a node with no indices to avoid copying indices. */
+  if (curr->succ_offset != 0)
+    {
+      curr = get_block_by_offset (curr, curr->succ_offset);
+    }
+
+  return curr;
+}
+
+static void
+_insert_sorted_block_to_skiplist (struct sorted_block *block)
+{
+  assert (block != NULL);
+  assert (pool->sorted_block.addr != NULL);
+
+  block_size_t size = block_get_size (&block->head);
+  struct sorted_block *pos
+      = _find_sorted_block_by_size (pool->sorted_block.addr, size);
+
+  /* insert into the chain with same size. */
+  if (pos != NULL && block_get_size (&pos->head) == size)
+    {
+      block->pred_offset = get_offset_between_blocks (block, pos);
+      if (pos->succ_offset != 0)
+        {
+          block->succ_offset
+              = pos->succ_offset - get_offset_between_blocks (pos, block);
+        }
+      else
+        {
+          block->succ_offset = 0; /* end of chain */
+        }
+      pos->succ_offset = get_offset_between_blocks (pos, block);
+
+      return;
+    }
+
+  block->level_of_indices
+      = ((uint32_t) (next () >> 32)) % SORTED_BLOCK_INDICES_LEVEL + 1;
+
+  for (uint32_t index_level
+       = SORTED_BLOCK_INDICES_LEVEL - block->level_of_indices;
+       index_level < SORTED_BLOCK_INDICES_LEVEL; ++index_level)
+    {
+      pos = _find_sorted_block_by_size_on_index (pool->sorted_block.addr, size,
+                                                 index_level);
+      if (pos->offsets[index_level] != 0)
+        {
+          block->offsets[index_level]
+              = pos->offsets[index_level]
+                - get_offset_between_blocks (pos, block);
+        }
+      else
+        {
+          block->offsets[index_level] = 0;
+        }
+      pos->offsets[index_level] = get_offset_between_blocks (pos, block);
+    }
+}
+
+/**
+ * Remove the node and update all indices / offsets.
+ * May traverse the list multiple times
+ *
+ * NOTE: never removes a node with offsets when it has children in the chain
+ **/
+static void
+_remove_sorted_block_from_skiplist (struct sorted_block *block)
+{
+  assert (block != NULL);
+  /* considering a node which is not in the skiplist */
+  /* assert (block->level_of_indices != 0 || block->pred_offset != 0); */
+
+  struct sorted_block *prev = NULL;
+  block_size_t size = block_get_size (&block->head);
+
+  for (uint32_t index_level
+       = SORTED_BLOCK_INDICES_LEVEL - block->level_of_indices;
+       index_level < SORTED_BLOCK_INDICES_LEVEL; ++index_level)
+    {
+      /* -1 to find the strictly smaller node. */
+      prev = _find_sorted_block_by_size_on_index (pool->sorted_block.addr,
+                                                  size - 1, index_level);
+      if (block->offsets[index_level] != 0)
+        {
+          prev->offsets[index_level] += block->offsets[index_level];
+        }
+      else
+        {
+          prev->offsets[index_level] = 0;
+        }
+    }
+
+  if (block->pred_offset != 0)
+    {
+      if (block->succ_offset != 0)
+        {
+          get_block_by_offset (block, block->pred_offset)->succ_offset
+              += block->succ_offset;
+          get_block_by_offset (block, block->succ_offset)->pred_offset
+              += block->pred_offset;
+        }
+      else
+        {
+          get_block_by_offset (block, block->pred_offset)->succ_offset = 0;
+        }
+    }
+  /* no other cases, as if it is the first node, it should be the only node. */
+}
diff --git a/src/vm/include/deep_mem_alloc.h b/src/vm/include/deep_mem_alloc.h
new file mode 100644
index 0000000..0919347
--- /dev/null
+++ b/src/vm/include/deep_mem_alloc.h
@@ -0,0 +1,185 @@
+#ifndef _DEEP_MEM_ALLOC_H
+#define _DEEP_MEM_ALLOC_H
+
+#include "bh_platform.h"
+
+#define FAST_BIN_LENGTH (8)
+#define FAST_BIN_MAX_SIZE (64)   /* 8 * 8 bytes */
+#define SORTED_BIN_MIN_SIZE (72) /* 64 + 8 bytes */
+
+#define A_FLAG_OFFSET (0) /* allocated */
+#define A_FLAG_MASK (0x00000001)
+#define P_FLAG_OFFSET (1) /* previous block is allocated */
+#define P_FLAG_MASK (0x00000002)
+#define BLOCK_SIZE_MULTIPLIER (3)
+#define BLOCK_SIZE_MASK (0xfffffff4)
+#define REMAINDER_SIZE_MULTIPLIER BLOCK_SIZE_MULTIPLIER
+#define REMAINDER_SIZE_MASK ((0xffffffff << 32) & BLOCK_SIZE_MASK)
+
+#define SORTED_BLOCK_INDICES_LEVEL (13)
+
+/* need to update when *_SIZE_MULTIPLIER changes. */
+#define ALIGN_MEM_SIZE(size) (((size + 0x7) >> 3) << 3)
+#define ALIGN_MEM_SIZE_TRUNC(size) ((size >> 3) << 3)
+
+typedef void *mem_t;
+typedef uint64_t mem_size_t;
+typedef uint32_t block_head_t;
+typedef uint32_t block_size_t;
+
+struct mem_pool
+{
+  uint64_t free_memory;
+  union
+  {
+    uint64_t _padding;
+    struct sorted_block *addr;
+  } sorted_block;
+  union
+  {
+    uint64_t _padding;
+    struct sorted_block *addr;
+  } remainder_block;
+  union
+  {
+    uint64_t _padding;
+    struct sorted_block *addr;
+  } remainder_block_end; /* should not be dereferenced */
+  union
+  {
+    uint64_t _padding;
+    struct fast_block *addr;
+  } fast_bins[FAST_BIN_LENGTH];
+};
+
+struct fast_block
+{
+  block_head_t head;
+  union
+  {
+    uint32_t _padding; /* TODO! update for 64 bits system */
+    struct fast_block *next;
+    void *payload;
+  };
+};
+
+struct sorted_block
+{
+  block_head_t head;
+  union
+  {
+    struct
+    {
+      int32_t pred_offset;
+      int32_t succ_offset;
+      uint32_t level_of_indices;
+      // bigger array index corresponds to lower index in skip list,
+      // i.e., skipping less nodes in the skip list
+      // 0 means this node is the last one in this level of index.
+      // offsets[SORTED_BLOCK_INDICES_LEVEL - 1] is the level where each node
+      // is connected one by one consecutively.
+      // The skip list is in ascending order by block's size.
+      int32_t offsets[SORTED_BLOCK_INDICES_LEVEL];
+      // padding
+      // uint32_t footer;
+    };
+    void *payload;
+  };
+};
+
+bool deep_mem_init (void *mem, uint32_t size);
+
+void deep_mem_destroy (void);
+
+void *deep_malloc (uint32_t size);
+
+void *deep_realloc (void *ptr, uint32_t size);
+
+void deep_free (void *ptr);
+
+bool deep_mem_migrate (void *new_mem, uint32_t size);
+
+static inline bool
+block_is_allocated (block_head_t const *head)
+{
+  return (*head) & A_FLAG_MASK;
+}
+
+static inline void
+block_set_A_flag (block_head_t *head, bool allocated)
+{
+  *head = allocated ? (*head | A_FLAG_MASK) : (*head & (~A_FLAG_MASK));
+}
+
+static inline bool
+prev_block_is_allocated (block_head_t const *head)
+{
+  return (*head) & P_FLAG_MASK;
+}
+
+static inline void
+block_set_P_flag (block_head_t *head, bool allocated)
+{
+  *head = allocated ? (*head | P_FLAG_MASK) : (*head & (~P_FLAG_MASK));
+}
+
+/**
+ * since the block is 8-bytes aligned, the smallest 8's multiple greater than
+ * size is used instead.
+ *
+ * Aligned size, including the block head
+ **/
+static inline block_size_t
+block_get_size (block_head_t const *head)
+{
+  return (*head) & BLOCK_SIZE_MASK;
+}
+
+/**
+ * since the block is 8-bytes aligned, the smallest 8's multiple greater than
+ * size is used instead.
+ *
+ * Aligned size, including the block head
+ **/
+static inline void
+block_set_size (block_head_t *head, block_size_t size)
+{
+  *head = (*head & (~BLOCK_SIZE_MASK)) /* preserve flags */
+          | ALIGN_MEM_SIZE (size);     /* ensure rounds up */
+}
+
+static inline void *
+get_pointer_by_offset_in_bytes (void *p, int64_t offset)
+{
+  return (uint8_t *)p + offset;
+}
+
+static inline int64_t
+get_offset_between_pointers_in_bytes (void *p, void *q)
+{
+  return (uint8_t *)p - (uint8_t *)q;
+}
+
+static inline struct sorted_block *
+get_block_by_offset (struct sorted_block *node, int32_t offset)
+{
+  return (struct sorted_block *)(get_pointer_by_offset_in_bytes ((mem_t *)node,
+                                                                 offset));
+}
+
+static inline int32_t
+get_offset_between_blocks (struct sorted_block *origin,
+                           struct sorted_block *target)
+{
+  return get_offset_between_pointers_in_bytes ((mem_t *)target,
+                                               (mem_t *)origin);
+}
+
+static inline mem_size_t
+get_remainder_size (struct mem_pool const *pool)
+{
+  return get_offset_between_pointers_in_bytes (pool->remainder_block_end.addr,
+                                               pool->remainder_block.addr);
+}
+
+#endif /* _DEEP_MEM_ALLOC_H */
diff --git a/src/vm/include/random/random.h b/src/vm/include/random/random.h
new file mode 100644
index 0000000..e8a2284
--- /dev/null
+++ b/src/vm/include/random/random.h
@@ -0,0 +1,19 @@
+#ifndef _VM_INCLUDE_RANDOM_H
+#define _VM_INCLUDE_RANDOM_H
+
+#include <stdint.h>
+
+uint64_t next (void);
+
+/* This is the jump function for the generator. It is equivalent
+   to 2^64 calls to next(); it can be used to generate 2^64
+   non-overlapping subsequences for parallel computations. */
+void jump (void);
+
+/* This is the long-jump function for the generator. It is equivalent to
+   2^96 calls to next(); it can be used to generate 2^32 starting points,
+   from each of which jump() will generate 2^32 non-overlapping
+   subsequences for parallel distributed computations. */
+void long_jump (void);
+
+#endif /* _VM_INCLUDE_RANDOM_H */
diff --git a/src/vm/include/random/xoroshiro128plus.c b/src/vm/include/random/xoroshiro128plus.c
new file mode 100644
index 0000000..0c70bf6
--- /dev/null
+++ b/src/vm/include/random/xoroshiro128plus.c
@@ -0,0 +1,111 @@
+/*  Written in 2016-2018 by David Blackman and Sebastiano Vigna (vigna@acm.org)
+
+To the extent possible under law, the author has dedicated all copyright
+and related and neighboring rights to this software to the public domain
+worldwide. This software is distributed without any warranty.
+
+See <http://creativecommons.org/publicdomain/zero/1.0/>. */
+
+#include "random.h"
+
+#include <stdint.h>
+
+/* This is xoroshiro128+ 1.0, our best and fastest small-state generator
+   for floating-point numbers. We suggest to use its upper bits for
+   floating-point generation, as it is slightly faster than
+   xoroshiro128++/xoroshiro128**. It passes all tests we are aware of
+   except for the four lower bits, which might fail linearity tests (and
+   just those), so if low linear complexity is not considered an issue (as
+   it is usually the case) it can be used to generate 64-bit outputs, too;
+   moreover, this generator has a very mild Hamming-weight dependency
+   making our test (http://prng.di.unimi.it/hwd.php) fail after 5 TB of
+   output; we believe this slight bias cannot affect any application. If
+   you are concerned, use xoroshiro128++, xoroshiro128** or xoshiro256+.
+
+   We suggest to use a sign test to extract a random Boolean value, and
+   right shifts to extract subsets of bits.
+
+   The state must be seeded so that it is not everywhere zero. If you have
+   a 64-bit seed, we suggest to seed a splitmix64 generator and use its
+   output to fill s.
+
+   NOTE: the parameters (a=24, b=16, b=37) of this version give slightly
+   better results in our test than the 2016 version (a=55, b=14, c=36).
+*/
+
+static inline uint64_t
+rotl (const uint64_t x, int k)
+{
+  return (x << k) | (x >> (64 - k));
+}
+
+// two random numbers obtained from www.random.org
+static uint64_t s[2] = { 0x562217302acf9a69, 0x2916753e667e5094 };
+
+uint64_t
+next (void)
+{
+  const uint64_t s0 = s[0];
+  uint64_t s1 = s[1];
+  const uint64_t result = s0 + s1;
+
+  s1 ^= s0;
+  s[0] = rotl (s0, 24) ^ s1 ^ (s1 << 16); // a, b
+  s[1] = rotl (s1, 37);                   // c
+
+  return result;
+}
+
+/* This is the jump function for the generator. It is equivalent
+   to 2^64 calls to next(); it can be used to generate 2^64
+   non-overlapping subsequences for parallel computations. */
+
+void
+jump (void)
+{
+  static const uint64_t JUMP[] = { 0xdf900294d8f554a5, 0x170865df4b3201fc };
+
+  uint64_t s0 = 0;
+  uint64_t s1 = 0;
+  for (int i = 0; i < sizeof JUMP / sizeof *JUMP; i++)
+    for (int b = 0; b < 64; b++)
+      {
+        if (JUMP[i] & UINT64_C (1) << b)
+          {
+            s0 ^= s[0];
+            s1 ^= s[1];
+          }
+        next ();
+      }
+
+  s[0] = s0;
+  s[1] = s1;
+}
+
+/* This is the long-jump function for the generator. It is equivalent to
+   2^96 calls to next(); it can be used to generate 2^32 starting points,
+   from each of which jump() will generate 2^32 non-overlapping
+   subsequences for parallel distributed computations. */
+
+void
+long_jump (void)
+{
+  static const uint64_t LONG_JUMP[]
+      = { 0xd2a98b26625eee7b, 0xdddf9b1090aa7ac1 };
+
+  uint64_t s0 = 0;
+  uint64_t s1 = 0;
+  for (int i = 0; i < sizeof LONG_JUMP / sizeof *LONG_JUMP; i++)
+    for (int b = 0; b < 64; b++)
+      {
+        if (LONG_JUMP[i] & UINT64_C (1) << b)
+          {
+            s0 ^= s[0];
+            s1 ^= s[1];
+          }
+        next ();
+      }
+
+  s[0] = s0;
+  s[1] = s1;
+}