update Gradient descent notebook

- use minus log-likelihood for PET and fix wrong comments about ascent/descent for MR and PET - fix has_astra => have_astra - no longer use relative step-size as it fails for MR and CT - hide PET warnings
SyneRBI · Mar 19, 2024 · e42d18d · e42d18d
1 parent 33fae8f
commit e42d18d
Showing 1 changed file with 86 additions and 53 deletions.
diff --git a/notebooks/Synergistic/gradient_descent_mr_pet_ct.ipynb b/notebooks/Synergistic/gradient_descent_mr_pet_ct.ipynb
@@ -7,7 +7,7 @@
     "# Gradient descent for MR, PET and CT\n",
     "This demonstration shows how to do image reconstruction using gradient descent for different modalities. \n",
     "\n",
-    "It builds on the the notebook *acquisition_model_mr_pet_ct.ipynb*. The first part of the notebook which creates acquisition models and simulates data from the brainweb is the same code but with fewer comments. If anything is unclear, then please refer to the other notebook to get some further information.\n",
+    "It builds on the notebook [acquisition_model_mr_pet_ct.ipynb](acquisition_model_mr_pet_ct.ipynb). The first part of the notebook which creates acquisition models and simulates data from the brainweb is the same code but with fewer comments. If anything is unclear, then please refer to the other notebook to get some further information.\n",
     "\n",
     "This demo is a jupyter notebook, i.e. intended to be run step by step.\n",
     "You could export it as a Python file and run it one go, but that might\n",
@@ -18,20 +18,12 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**WARNING**: At present, this notebook actually implements gradient ascent for PET, and it uses some sign tricks for CT/MR. This is because the `sirf.STIR` objective functions correspond to log-likelihood (or posterior), which needs to be maximised, while this is not so common in CT/MR. We will have to edit this notebook to make this clearer in the next version!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Author: Christoph Kolbitsch, Edoardo Pasca  \n",
-    "First version: 23rd of April 2021  \n",
-    "Updated: 26nd of June 2021  \n",
+    "Author: Christoph Kolbitsch, Edoardo Pasca, Kris Thielemans  \n",
+    "First version: 23rd of April 2021\n",
     "\n",
     "CCP SyneRBI Synergistic Image Reconstruction Framework (SIRF).  \n",
     "Copyright 2015 - 2017, 2021 Rutherford Appleton Laboratory STFC.  \n",
-    "Copyright 2015 - 2019 University College London.   \n",
+    "Copyright 2015 - 2019, 2024 University College London.  \n",
     "Copyright 2021 Physikalisch-Technische Bundesanstalt.\n",
     "\n",
     "This is software developed for the Collaborative Computational\n",
@@ -109,6 +101,23 @@
     "    print(\"CIL ASTRA plugin is not installed\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's make the PET output less verbose about warnings etc."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pet.set_verbosity(0)\n",
+    "_ = pet.MessageRedirector(warn=None)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -424,7 +433,7 @@
     "tags": []
    },
    "source": [
-    "# Gradient descent (CT, MR) or ascent (PET)"
+    "# Gradient descent"
    ]
   },
   {
@@ -461,15 +470,15 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "obj_fun_pet_sirf = pet.make_Poisson_loglikelihood(raw_pet)\n",
+    "obj_fun_pet_sirf = pet.make_Poisson_loglikelihood(raw_pet, acq_model = acq_mod_pet)\n",
     "obj_fun_pet_sirf.set_up(bwd_pet)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Because `obj_func_pet` cannot be called directly but `obj_func_pet.value()` has to be used, we write a quick wrapper around it"
+    "As this demo will implement gradient descent, we need to use multiply the log-likelihood with -1, so we write a quick wrapper around it."
    ]
   },
   {
@@ -479,14 +488,14 @@
    "outputs": [],
    "source": [
     "def obj_fun_pet(curr_image_estimate):\n",
-    "    return(obj_fun_pet_sirf.value(curr_image_estimate))"
+    "    return(-obj_fun_pet_sirf(curr_image_estimate))"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The gradient is also already implemented and can simply be calculated by calling `obj_fun_pet.gradient`. Nevertheless, to be more explicit and consistent with the other modalities we will define a new function to do the job:"
+    "And of course the same for the gradient."
    ]
   },
   {
@@ -498,7 +507,7 @@
     "def obj_fun_grad_pet(curr_image_estimate):\n",
     "    # The 0 here means, only the gradient for subset 0 is returned. \n",
     "    # We will just accept this as is here, because subsets are too advanced for this demo.\n",
-    "    return(obj_fun_pet_sirf.gradient(curr_image_estimate, 0))"
+    "    return( -1*obj_fun_pet_sirf.gradient(curr_image_estimate, 0))"
    ]
   },
   {
@@ -515,12 +524,7 @@
    "outputs": [],
    "source": [
     "def make_positive(image_data):\n",
-    "    # The idea is to create an image with all 0s (zero_image) and then we can take the maximum over each voxel.\n",
-    "    # If the voxel value is larger than 0, then the original value is returned. If it is smaller than 0, then \n",
-    "    # the value of the zero_image, i.e. 0, is returned.\n",
-    "    zero_image = image_data.clone()\n",
-    "    zero_image.fill(0.0)\n",
-    "    image_data = image_data.maximum(zero_image)\n",
+    "    image_data = image_data.maximum(0)\n",
     "    return image_data"
    ]
   },
@@ -579,8 +583,7 @@
    "outputs": [],
    "source": [
     "def obj_fun_grad_ct(curr_image_estimate):\n",
-    "    # We are returning the negative gradient, because we are descending\n",
-    "    return(-least_squares_cil.gradient(curr_image_estimate))"
+    "    return(least_squares_cil.gradient(curr_image_estimate))"
    ]
   },
   {
@@ -594,7 +597,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "And last but not least __MR__ . If you want to know more about the objective function of __MR__ and its gradient, then please have a look at the notebook *d_undersampled_reconstructions.ipynb*."
+    "And last but not least __MR__ . If you want to know more about the objective function of __MR__ and its gradient, then please have a look at the notebook *../MR/d_undersampled_reconstructions.ipynb*."
    ]
   },
   {
@@ -615,15 +618,30 @@
    "outputs": [],
    "source": [
     "def obj_fun_grad_mr(curr_image_estimation):\n",
-    "    # We are returning the negative gradient, because we are descending\n",
-    "    return(-acq_mod_mr.backward(acq_mod_mr.forward(curr_image_estimate) - raw_mr))"
+    "    return(acq_mod_mr.backward(acq_mod_mr.forward(curr_image_estimate) - raw_mr))"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now we have all our `obj_func` and `obj_func_grad` we will select one modality and then implement the gradient descent/ascent appproach. We also need an image `init_image` to start with. Here we will simply use the simple reconstruction which we did above."
+    "Note that we could have used CIL's `LeastSquares` as well here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Implement gradient descent for one modality (and hence all)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we have all our `obj_func` and `obj_func_grad` we will select one modality and then implement the gradient descent/ascent appproach. We also need an image `init_image` to start with. Here we will simply use the simple reconstruction which we did above.\n",
+    "\n",
+    "We will implement the code using some common names. Set `curr_modality` to what you want to try."
    ]
   },
   {
@@ -632,22 +650,28 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "curr_modality = 'pet' # pet, ct, mr\n",
+    "curr_modality = 'mr' # pet, ct, mr\n",
     "\n",
     "if curr_modality.lower() == 'pet':\n",
     "    obj_fun = obj_fun_pet\n",
     "    obj_fun_grad = obj_fun_grad_pet\n",
     "    init_image = bwd_pet\n",
+    "    raw_data = raw_pet\n",
+    "    acq_model = acq_mod_pet\n",
     "elif curr_modality.lower() == 'ct':\n",
-    "    if not has_astra:\n",
+    "    if not have_astra:\n",
     "        raise ImportError('ASTRA toolbox is not installed')\n",
     "    obj_fun = obj_fun_ct\n",
     "    obj_fun_grad = obj_fun_grad_ct\n",
-    "    init_image = bwd_ct  \n",
+    "    init_image = bwd_ct\n",
+    "    raw_data = raw_ct\n",
+    "    acq_model = acq_mod_ct\n",
     "elif curr_modality.lower() == 'mr':\n",
     "    obj_fun = obj_fun_mr\n",
     "    obj_fun_grad = obj_fun_grad_mr\n",
-    "    init_image = bwd_mr  \n",
+    "    init_image = bwd_mr\n",
+    "    raw_data = raw_mr\n",
+    "    acq_model = acq_mod_mr\n",
     "else:\n",
     "    raise NameError('{:} not recognised'.format(curr_modality,))   "
    ]
@@ -656,13 +680,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now we come to the probably most important paramter for gradient descent, the infamous __step-size__ . Unfortunately, the gradient only gives us the direction along which we need to update the image, but does not tell us by how much we have to go in this direction. Therefore, we need to define how far we step along the gradient direction in each iteration by hand. \n",
-    "\n",
-    "To make sure the step-size is adapted to each modality as much as possible, we won't define the step-size directly, but we will calculate the step-size as `tau` times the norm of the current image estimate. Nevertheless, this is still just a guess. \n",
+    "Now we come to the probably most important parameter for gradient descent, the infamous __step-size__ . Unfortunately, the gradient only gives us the direction along which we need to update the image, but does not tell us by how much we have to go in this direction. Therefore, we need to define how far we step along the gradient direction in each iteration by hand. \n",
     "\n",
-    "If the step-size is too small, we have very slow convergence. If the step-size is too large, then we are overstepping our target image and the objective function starts to oscillate. The value below is reasonable for __PET__, but for the other modalities you will have to optimise it to ensure good convergence.\n",
+    "To make sure the step-size is adapted to each modality as much as possible, we won't define the step-size directly. If the step-size is too small, we have very slow convergence. If the step-size is too large, then we are overstepping our target image and the objective function starts to oscillate. You could have to optimise it to ensure good convergence.\n",
     "\n",
-    "Note that in the implementation below, we have used a \"relative step-size\" `tau` such that the norm of the increment (or step) is `tau curr_image_estmate.norm()`, which is intuitive, but theoretically and practically not well-motivated. Feel free to experiment with other step-size rules (see the end of this notebook for some suggestions)."
+    "Note that in the implementation below, we have used a \"relative step-size\" `tau` such that at the first iteration, the norm of the increment (or step) is `tau curr_image_estimate.norm()`, which is intuitive. Feel free to experiment with other step-size rules (see the end of this notebook for some suggestions)."
    ]
   },
   {
@@ -671,7 +693,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Relative step-size\n",
+    "# Relative step-size (this value is ok for PET, but not so much for MR!)\n",
     "tau = .3"
    ]
   },
@@ -696,7 +718,21 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Then we need to create an image as a starting point for the gradient descent algorithm and we will also create a numpy array, where we can store the image estimate at each iteration"
+    "Then we need to create an image as a starting point for the gradient descent algorithm and we will also create a numpy array, where we can store the image estimate at each iteration.\n",
+    "\n",
+    "Gradient descent is an additive algorithm, so it is a bit sensitive to the \"scale\" of the initial image. Using the wrong values could lead to slow convergence initially. Here we will fix the scale if the initial image by making sure that its forward projection roughly matches the raw data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "scale = raw_data.norm() / acq_model.direct(init_image).norm()\n",
+    "print(scale)\n",
+    "# let's use a slightly larger scale as illustration.\n",
+    "curr_image_estimate = init_image * (scale * 1.1)"
    ]
   },
   {
@@ -705,9 +741,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Initial image\n",
-    "curr_image_estimate = init_image.clone()\n",
-    "\n",
     "# Array for images at each iteration\n",
     "image_iteration = numpy.ndarray((num_iters + 1,), dtype=object)\n",
     "image_iteration[0] = curr_image_estimate\n",
@@ -734,11 +767,12 @@
     "    # First we calculate the gradient to find out how we need to update our current image estimate\n",
     "    grad = obj_fun_grad(curr_image_estimate)\n",
     "\n",
-    "    # Compute step-size relative to current image-norm\n",
-    "    step_size = tau * curr_image_estimate.norm() / grad.norm()\n",
-    "\n",
-    "    # Perform gradient ascent step\n",
-    "    curr_image_estimate = curr_image_estimate + step_size * grad\n",
+    "    # Compute step-size relative to initial image-norm (avoiding division by zero)\n",
+    "    if i == 1:\n",
+    "        step_size = tau * curr_image_estimate.norm() / (grad.norm() + 1e-19)\n",
+    "        print(step_size)\n",
+    "    # Perform gradient descent step\n",
+    "    curr_image_estimate = curr_image_estimate - step_size * grad\n",
     "\n",
     "    # In PET and CT we have to ensure values are positive. \n",
     "    if curr_modality.lower() == 'ct' or curr_modality.lower() == 'pet':\n",
@@ -760,7 +794,7 @@
     "# Plot objective function values\n",
     "plt.figure()\n",
     "plt.title('Objective function value')\n",
-    "plt.plot(obj_func_values, 'o-b')"
+    "plt.plot(obj_func_values[2:], 'o-b')"
    ]
   },
   {
@@ -785,10 +819,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "You could now experiment witih different step-size schemes. Some suggestions:\n",
-    "- change to a fixed step size, i.e. remove the `grad.norm()`\n",
+    "You could now experiment with different step-size schemes. Some suggestions:\n",
     "- use a decreasing step-size (sometimes called \"relaxation\"), e.g. $\\alpha/(n+\\beta)$ with $n$ the iteration number\n",
-    "- for MR and CT, the maximum step-size can be determined  from the norm of the forward model\n",
+    "- for MR and CT, the maximum step-size can be determined  from the norm of the forward model (or the Lipschitz constant of the gradient of the objective function)\n",
     "- back-tracking line search\n",
     "\n",
     "You could also have a look at the `cil.optimisation.algorithms.GD` algorithms from CIL, as illustrated in the [01_optimisation_gd_fista CIL-demo](\n",