ScQ-Cloud · ScQ-Cloud · Mar 16, 2023 · Mar 15, 2023
diff --git a/README.md b/README.md
@@ -20,5 +20,9 @@ Please see the website [docs](https://scq-cloud.github.io/).
 
 ## Examples
 
+The example shows quantum reinforcement learning interacts with Quafu to solve CartPole environment. 
+
+Refer to https://github.com/enchanted123/quantum-RL-with-quafu for more details.
+
 ## Authors
 This project is developed by the quantum cloud computing team at the Beijing Academy of Quantum Information Sciences.
diff --git a/examples/quantum_rl/Quantum RL with Quafu.ipynb b/examples/quantum_rl/Quantum RL with Quafu.ipynb
@@ -0,0 +1,333 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "5bbddd29",
+   "metadata": {},
+   "source": [
+    "# Reinforcement learining with quantum computing cloud Quafu"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d891b68d",
+   "metadata": {},
+   "source": [
+    "Before starting the journey of executing reinforcement learining(RL) task on real quantum devices with Quafu, you have to make sure the environment is consistent. The following code is based on python 3.8 to meet the need of the specific version of tensorflow. Then, you can install the follwing packages:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3fc952f8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install pyquafu \n",
+    "!pip install tensorflow==2.7.0\n",
+    "!pip install tensorflow-quantum==0.7.2\n",
+    "!pip install gym"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "40def399",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "2023-03-15 14:42:36.950239: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA\n",
+      "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
+      "2023-03-15 14:42:38.671011: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20593 MB memory:  -> device: 0, name: GeForce RTX 3090, pci bus id: 0000:5b:00.0, compute capability: 8.6\n",
+      "2023-03-15 14:42:38.672331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 22306 MB memory:  -> device: 1, name: GeForce RTX 3090, pci bus id: 0000:9b:00.0, compute capability: 8.6\n",
+      "2023-03-15 14:42:38.673411: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 22143 MB memory:  -> device: 2, name: Tesla P40, pci bus id: 0000:25:00.0, compute capability: 6.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "# model imports\n",
+    "import tensorflow as tf\n",
+    "import argparse\n",
+    "import re\n",
+    "import cirq\n",
+    "import gym\n",
+    "import numpy as np\n",
+    "from functools import reduce\n",
+    "from PIL import Image\n",
+    "from quafu import QuantumCircuit as quafuQC\n",
+    "from quafu import Task, User\n",
+    "\n",
+    "import models.quantum_genotypes as genotypes\n",
+    "from models.quantum_models import generate_circuit\n",
+    "from models.quantum_models import generate_model_policy as Network\n",
+    "from models.quantum_models import get_model_circuit_params"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f3319328",
+   "metadata": {},
+   "source": [
+    "Set some parameters about the RL task:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "1f8d50c1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "parser = argparse.ArgumentParser('Reinforcement learining with quantum computing cloud Quafu')\n",
+    "parser.add_argument('--env_name', type=str, default='CartPole-v1', help='environment name')\n",
+    "parser.add_argument('--state_bounds', type=np.array, default=np.array([2.4, 2.5, 0.21, 2.5]), help='state bounds')\n",
+    "parser.add_argument('--n_qubits', type=int, default=4, help='the number of qubits')\n",
+    "parser.add_argument('--n_actions', type=int, default=2, help='the number of actions')\n",
+    "parser.add_argument('--arch', type=str, default='NSGANet_id10', help='which architecture to use')\n",
+    "parser.add_argument('--shots', type=int, default=1000, help='the number of sampling')\n",
+    "parser.add_argument('--backend', type=str, default='ScQ-P20', help='which backend to use')\n",
+    "parser.add_argument('--model_path', type=str, default='./weights/weights_id10_quafu.h5', help='path of pretrained model')\n",
+    "args = parser.parse_args(args=[])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b75c1ea",
+   "metadata": {},
+   "source": [
+    "According to the results retrieved by Quafu, you can compute expectations with observables($Z_{0} * Z_{1} * Z_{2} * Z_{3}$ for CartPole) as the follwing function:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "79aebf20",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def get_res_exp(res):\n",
+    "    # access to probabilities of all possibilities \n",
+    "    prob = res.probabilities\n",
+    "    sumexp = 0\n",
+    "    for k, v in prob.items():\n",
+    "        count = 0\n",
+    "        for i in range(len(k)):\n",
+    "            if k[i] == '1':\n",
+    "                count += 1\n",
+    "        if count % 2 == 0:\n",
+    "            sumexp += v\n",
+    "        else:\n",
+    "            sumexp -= v\n",
+    "    return sumexp"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e78631a5",
+   "metadata": {},
+   "source": [
+    "It's important to construct a process to send circuits to Quafu and get results from it. The next part shows the whole pipeline of involing Cirq circuits with Quafu and acquire expectations with quantum devices."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "7b611594",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def get_quafu_exp(circuit):\n",
+    "    # convert Cirq circuts to qasm\n",
+    "    openqasm = circuit.to_qasm(header='')\n",
+    "    openqasm = re.sub('//.*\\n', '', openqasm)\n",
+    "    openqasm = \"\".join([s for s in openqasm.splitlines(True) if s.strip()])\n",
+    "    \n",
+    "    # fill in with your token, register on website http://quafu.baqis.ac.cn/\n",
+    "    user = User()\n",
+    "    user.save_apitoken(\" \")\n",
+    "    \n",
+    "    # initialize to Quafu circuits\n",
+    "    q = quafuQC(args.n_qubits)\n",
+    "    q.from_openqasm(openqasm)\n",
+    "    \n",
+    "    # create the task\n",
+    "    task = Task()\n",
+    "    task.load_account()\n",
+    "    \n",
+    "    # choose sampling number and specific quantum devices\n",
+    "    shots = args.shots   \n",
+    "    task.config(backend=args.backend, shots=shots, compile=True)\n",
+    "    task_id = task.send(q, wait=True).taskid\n",
+    "    print('task_id:', task_id)\n",
+    "    \n",
+    "    # retrieve the result of completed tasks and compute expectations\n",
+    "    task_status = task.retrieve(task_id).task_status\n",
+    "    if task_status == 'Completed':\n",
+    "        task = Task()\n",
+    "        task.load_account()\n",
+    "        res = task.retrieve(task_id)\n",
+    "        OB = get_res_exp(res)\n",
+    "    return task_id, tf.convert_to_tensor([[OB]])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12875e09",
+   "metadata": {},
+   "source": [
+    "The next post-processing layer apply stored action-specific weights on expectation values."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "67e2e227",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class Alternating_(tf.keras.layers.Layer):\n",
+    "    def __init__(self, obsw):\n",
+    "        super(Alternating_, self).__init__()\n",
+    "        self.w = tf.Variable(\n",
+    "            initial_value=tf.constant(obsw), dtype=\"float32\", trainable=True, name=\"obsw\")\n",
+    "\n",
+    "    def call(self, inputs):\n",
+    "        return tf.matmul(inputs, self.w)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a0a0ef0",
+   "metadata": {},
+   "source": [
+    "Then the softmax layer outputs the policy of the agent to choose next actions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "3c69ae1e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def get_obs_policy(obsw):\n",
+    "    process = tf.keras.Sequential([ Alternating_(obsw),\n",
+    "                                    tf.keras.layers.Lambda(lambda x: x * 1.0),\n",
+    "                                    tf.keras.layers.Softmax()\n",
+    "                                ], name=\"obs_policy\")\n",
+    "    return process"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed82bd69",
+   "metadata": {},
+   "source": [
+    "Prepare for loading model weights:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "3b1d0556",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "qubits = cirq.GridQubit.rect(1, args.n_qubits)\n",
+    "genotype = eval(\"genotypes.%s\" % args.arch)\n",
+    "ops = [cirq.Z(q) for q in qubits]\n",
+    "observables = [reduce((lambda x, y: x * y), ops)] # Z_0*Z_1*Z_2*Z_3\n",
+    "model = Network(qubits, genotype, args.n_actions, observables)\n",
+    "model.load_weights(args.model_path)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77547584",
+   "metadata": {},
+   "source": [
+    "The follwing part builds an interaction between the agent in CartPole environment and Quafu. Every action choice means a task completed by quantum devices and finally, you can get a gif picturing the whole process."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "37cd2946",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# update gym to the version having render_mode, which is 0.26.1 in this file\n",
+    "# initialize the environment\n",
+    "env = gym.make(args.env_name, render_mode=\"rgb_array\")\n",
+    "state, _ = env.reset()\n",
+    "frames = []\n",
+    "\n",
+    "# set the number of episodes\n",
+    "for epi in range(20):\n",
+    "    im = Image.fromarray(env.render())\n",
+    "    frames.append(im)  \n",
+    "    \n",
+    "    # get PQC model parameters and expectations\n",
+    "    stateb = state/args.state_bounds\n",
+    "    newtheta, newlamda = get_model_circuit_params(qubits, genotype, model)\n",
+    "    circuit, _, _ = generate_circuit(qubits, genotype, newtheta, newlamda, stateb)\n",
+    "    _, expectation = get_quafu_exp(circuit)\n",
+    "    \n",
+    "    # get policy model parameters\n",
+    "    obsw = model.get_layer('observables-policy').get_weights()[0]\n",
+    "    obspolicy = get_obs_policy(obsw)\n",
+    "    policy = obspolicy(expectation)\n",
+    "    print('policy:', policy)\n",
+    "    \n",
+    "    # choose actions and make a step\n",
+    "    action = np.random.choice(args.n_actions, p=policy.numpy()[0])\n",
+    "    state, reward, terminated, truncated, _ = env.step(action)\n",
+    "    if terminated or truncated:\n",
+    "        print(epi+1)\n",
+    "        break\n",
+    "env.close()\n",
+    "\n",
+    "# save to your path\n",
+    "frames[1].save('./visualization/id10_quafu_gif/gym_CartPole_20.gif', save_all=True, append_images=frames[2:], optimize=False, duration=20, loop=0)\n",
+    "# for example, ./visualization/id10_quafu_gif/gym_CartPole_20.gif"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "96034943",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "tfjyx",
+   "language": "python",
+   "name": "tfjyx"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "8453d71d5161753483e9e20455923c6b6a939f471903d1c75f4816d93f062f94"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}