Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add QRL example #7

Merged
merged 1 commit into from
Mar 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,9 @@ Please see the website [docs](https://scq-cloud.github.io/).

## Examples

The example shows quantum reinforcement learning interacts with Quafu to solve CartPole environment.

Refer to https://github.com/enchanted123/quantum-RL-with-quafu for more details.

## Authors
This project is developed by the quantum cloud computing team at the Beijing Academy of Quantum Information Sciences.
333 changes: 333 additions & 0 deletions examples/quantum_rl/Quantum RL with Quafu.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,333 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "5bbddd29",
"metadata": {},
"source": [
"# Reinforcement learining with quantum computing cloud Quafu"
]
},
{
"cell_type": "markdown",
"id": "d891b68d",
"metadata": {},
"source": [
"Before starting the journey of executing reinforcement learining(RL) task on real quantum devices with Quafu, you have to make sure the environment is consistent. The following code is based on python 3.8 to meet the need of the specific version of tensorflow. Then, you can install the follwing packages:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3fc952f8",
"metadata": {},
"outputs": [],
"source": [
"!pip install pyquafu \n",
"!pip install tensorflow==2.7.0\n",
"!pip install tensorflow-quantum==0.7.2\n",
"!pip install gym"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "40def399",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2023-03-15 14:42:36.950239: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA\n",
"To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
"2023-03-15 14:42:38.671011: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20593 MB memory: -> device: 0, name: GeForce RTX 3090, pci bus id: 0000:5b:00.0, compute capability: 8.6\n",
"2023-03-15 14:42:38.672331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 22306 MB memory: -> device: 1, name: GeForce RTX 3090, pci bus id: 0000:9b:00.0, compute capability: 8.6\n",
"2023-03-15 14:42:38.673411: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 22143 MB memory: -> device: 2, name: Tesla P40, pci bus id: 0000:25:00.0, compute capability: 6.1\n"
]
}
],
"source": [
"# model imports\n",
"import tensorflow as tf\n",
"import argparse\n",
"import re\n",
"import cirq\n",
"import gym\n",
"import numpy as np\n",
"from functools import reduce\n",
"from PIL import Image\n",
"from quafu import QuantumCircuit as quafuQC\n",
"from quafu import Task, User\n",
"\n",
"import models.quantum_genotypes as genotypes\n",
"from models.quantum_models import generate_circuit\n",
"from models.quantum_models import generate_model_policy as Network\n",
"from models.quantum_models import get_model_circuit_params"
]
},
{
"cell_type": "markdown",
"id": "f3319328",
"metadata": {},
"source": [
"Set some parameters about the RL task:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "1f8d50c1",
"metadata": {},
"outputs": [],
"source": [
"parser = argparse.ArgumentParser('Reinforcement learining with quantum computing cloud Quafu')\n",
"parser.add_argument('--env_name', type=str, default='CartPole-v1', help='environment name')\n",
"parser.add_argument('--state_bounds', type=np.array, default=np.array([2.4, 2.5, 0.21, 2.5]), help='state bounds')\n",
"parser.add_argument('--n_qubits', type=int, default=4, help='the number of qubits')\n",
"parser.add_argument('--n_actions', type=int, default=2, help='the number of actions')\n",
"parser.add_argument('--arch', type=str, default='NSGANet_id10', help='which architecture to use')\n",
"parser.add_argument('--shots', type=int, default=1000, help='the number of sampling')\n",
"parser.add_argument('--backend', type=str, default='ScQ-P20', help='which backend to use')\n",
"parser.add_argument('--model_path', type=str, default='./weights/weights_id10_quafu.h5', help='path of pretrained model')\n",
"args = parser.parse_args(args=[])"
]
},
{
"cell_type": "markdown",
"id": "0b75c1ea",
"metadata": {},
"source": [
"According to the results retrieved by Quafu, you can compute expectations with observables($Z_{0} * Z_{1} * Z_{2} * Z_{3}$ for CartPole) as the follwing function:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "79aebf20",
"metadata": {},
"outputs": [],
"source": [
"def get_res_exp(res):\n",
" # access to probabilities of all possibilities \n",
" prob = res.probabilities\n",
" sumexp = 0\n",
" for k, v in prob.items():\n",
" count = 0\n",
" for i in range(len(k)):\n",
" if k[i] == '1':\n",
" count += 1\n",
" if count % 2 == 0:\n",
" sumexp += v\n",
" else:\n",
" sumexp -= v\n",
" return sumexp"
]
},
{
"cell_type": "markdown",
"id": "e78631a5",
"metadata": {},
"source": [
"It's important to construct a process to send circuits to Quafu and get results from it. The next part shows the whole pipeline of involing Cirq circuits with Quafu and acquire expectations with quantum devices."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "7b611594",
"metadata": {},
"outputs": [],
"source": [
"def get_quafu_exp(circuit):\n",
" # convert Cirq circuts to qasm\n",
" openqasm = circuit.to_qasm(header='')\n",
" openqasm = re.sub('//.*\\n', '', openqasm)\n",
" openqasm = \"\".join([s for s in openqasm.splitlines(True) if s.strip()])\n",
" \n",
" # fill in with your token, register on website http://quafu.baqis.ac.cn/\n",
" user = User()\n",
" user.save_apitoken(\" \")\n",
" \n",
" # initialize to Quafu circuits\n",
" q = quafuQC(args.n_qubits)\n",
" q.from_openqasm(openqasm)\n",
" \n",
" # create the task\n",
" task = Task()\n",
" task.load_account()\n",
" \n",
" # choose sampling number and specific quantum devices\n",
" shots = args.shots \n",
" task.config(backend=args.backend, shots=shots, compile=True)\n",
" task_id = task.send(q, wait=True).taskid\n",
" print('task_id:', task_id)\n",
" \n",
" # retrieve the result of completed tasks and compute expectations\n",
" task_status = task.retrieve(task_id).task_status\n",
" if task_status == 'Completed':\n",
" task = Task()\n",
" task.load_account()\n",
" res = task.retrieve(task_id)\n",
" OB = get_res_exp(res)\n",
" return task_id, tf.convert_to_tensor([[OB]])"
]
},
{
"cell_type": "markdown",
"id": "12875e09",
"metadata": {},
"source": [
"The next post-processing layer apply stored action-specific weights on expectation values."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "67e2e227",
"metadata": {},
"outputs": [],
"source": [
"class Alternating_(tf.keras.layers.Layer):\n",
" def __init__(self, obsw):\n",
" super(Alternating_, self).__init__()\n",
" self.w = tf.Variable(\n",
" initial_value=tf.constant(obsw), dtype=\"float32\", trainable=True, name=\"obsw\")\n",
"\n",
" def call(self, inputs):\n",
" return tf.matmul(inputs, self.w)"
]
},
{
"cell_type": "markdown",
"id": "7a0a0ef0",
"metadata": {},
"source": [
"Then the softmax layer outputs the policy of the agent to choose next actions."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "3c69ae1e",
"metadata": {},
"outputs": [],
"source": [
"def get_obs_policy(obsw):\n",
" process = tf.keras.Sequential([ Alternating_(obsw),\n",
" tf.keras.layers.Lambda(lambda x: x * 1.0),\n",
" tf.keras.layers.Softmax()\n",
" ], name=\"obs_policy\")\n",
" return process"
]
},
{
"cell_type": "markdown",
"id": "ed82bd69",
"metadata": {},
"source": [
"Prepare for loading model weights:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "3b1d0556",
"metadata": {},
"outputs": [],
"source": [
"qubits = cirq.GridQubit.rect(1, args.n_qubits)\n",
"genotype = eval(\"genotypes.%s\" % args.arch)\n",
"ops = [cirq.Z(q) for q in qubits]\n",
"observables = [reduce((lambda x, y: x * y), ops)] # Z_0*Z_1*Z_2*Z_3\n",
"model = Network(qubits, genotype, args.n_actions, observables)\n",
"model.load_weights(args.model_path)"
]
},
{
"cell_type": "markdown",
"id": "77547584",
"metadata": {},
"source": [
"The follwing part builds an interaction between the agent in CartPole environment and Quafu. Every action choice means a task completed by quantum devices and finally, you can get a gif picturing the whole process."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "37cd2946",
"metadata": {},
"outputs": [],
"source": [
"# update gym to the version having render_mode, which is 0.26.1 in this file\n",
"# initialize the environment\n",
"env = gym.make(args.env_name, render_mode=\"rgb_array\")\n",
"state, _ = env.reset()\n",
"frames = []\n",
"\n",
"# set the number of episodes\n",
"for epi in range(20):\n",
" im = Image.fromarray(env.render())\n",
" frames.append(im) \n",
" \n",
" # get PQC model parameters and expectations\n",
" stateb = state/args.state_bounds\n",
" newtheta, newlamda = get_model_circuit_params(qubits, genotype, model)\n",
" circuit, _, _ = generate_circuit(qubits, genotype, newtheta, newlamda, stateb)\n",
" _, expectation = get_quafu_exp(circuit)\n",
" \n",
" # get policy model parameters\n",
" obsw = model.get_layer('observables-policy').get_weights()[0]\n",
" obspolicy = get_obs_policy(obsw)\n",
" policy = obspolicy(expectation)\n",
" print('policy:', policy)\n",
" \n",
" # choose actions and make a step\n",
" action = np.random.choice(args.n_actions, p=policy.numpy()[0])\n",
" state, reward, terminated, truncated, _ = env.step(action)\n",
" if terminated or truncated:\n",
" print(epi+1)\n",
" break\n",
"env.close()\n",
"\n",
"# save to your path\n",
"frames[1].save('./visualization/id10_quafu_gif/gym_CartPole_20.gif', save_all=True, append_images=frames[2:], optimize=False, duration=20, loop=0)\n",
"# for example, ./visualization/id10_quafu_gif/gym_CartPole_20.gif"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "96034943",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "tfjyx",
"language": "python",
"name": "tfjyx"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
},
"vscode": {
"interpreter": {
"hash": "8453d71d5161753483e9e20455923c6b6a939f471903d1c75f4816d93f062f94"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading