{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Training: Hybrid case" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "In this tutorial we train a pure quantum :term:`PennyLane` :term:`model` to solve a toy problem: classifying whether a given sentence is about cooking or computing. We also train a hybrid model that determines whether a given pair of sentences are talking about different topics.\n", "\n", "We use an :py:class:`.IQPAnsatz` to convert :term:`string diagrams ` into :term:`quantum circuits `. When passing these circuits to the :py:class:`PennyLaneModel`, they are automatically converted into :term:`PennyLane` circuits.\n", "\n", ":download:`Download code <../_code/trainer-hybrid.ipynb>`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preparation\n", "\n", "We start by specifying some training hyperparameters and importing NumPy and PyTorch." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "BATCH_SIZE = 10\n", "EPOCHS = 15\n", "LEARNING_RATE = 0.1\n", "SEED = 42" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import torch\n", "import random\n", "import numpy as np\n", "\n", "torch.manual_seed(SEED)\n", "random.seed(SEED)\n", "np.random.seed(SEED)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Input data\n", "\n", "Let's read the data and print some example sentences." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def read_data(filename):\n", " labels, sentences = [], []\n", " with open(filename) as f:\n", " for line in f:\n", " t = float(line[0])\n", " labels.append([t, 1-t])\n", " sentences.append(line[1:].strip())\n", " return labels, sentences\n", "\n", "\n", "train_labels, train_data = read_data('../examples/datasets/mc_train_data.txt')\n", "dev_labels, dev_data = read_data('../examples/datasets/mc_dev_data.txt')\n", "test_labels, test_data = read_data('../examples/datasets/mc_test_data.txt')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "import os\n", "\n", "TESTING = int(os.environ.get('TEST_NOTEBOOKS', '0'))\n", "\n", "if TESTING:\n", " train_labels, train_data = train_labels[:2], train_data[:2]\n", " dev_labels, dev_data = dev_labels[:2], dev_data[:2]\n", " test_labels, test_data = test_labels[:2], test_data[:2]\n", " EPOCHS = 1" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['skillful man prepares sauce .',\n", " 'skillful man bakes dinner .',\n", " 'woman cooks tasty meal .',\n", " 'man prepares meal .',\n", " 'skillful woman debugs program .']" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_data[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Targets are represented as 2-dimensional arrays:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[[1.0, 0.0], [1.0, 0.0], [1.0, 0.0], [1.0, 0.0], [0.0, 1.0]]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_labels[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating and parameterising diagrams" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "The first step is to convert the sentences into :term:`string diagrams `." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Tagging sentences.\n", "Parsing tagged sentences.\n", "Turning parse trees to diagrams.\n", "Tagging sentences.\n", "Parsing tagged sentences.\n", "Turning parse trees to diagrams.\n", "Tagging sentences.\n", "Parsing tagged sentences.\n", "Turning parse trees to diagrams.\n" ] } ], "source": [ "from lambeq import BobcatParser\n", "\n", "reader = BobcatParser(verbose='text')\n", "\n", "raw_train_diagrams = reader.sentences2diagrams(train_data)\n", "raw_dev_diagrams = reader.sentences2diagrams(dev_data)\n", "raw_test_diagrams = reader.sentences2diagrams(test_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simplify diagrams" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "We simplify the diagrams by removing cups with :py:class:`~.RemoveCupsRewriter`; this reduces the number of :term:`post-selections ` in a diagram, allowing them to be evaluated more efficiently." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "from lambeq import RemoveCupsRewriter\n", "\n", "remove_cups = RemoveCupsRewriter()\n", "\n", "train_diagrams = [remove_cups(diagram) for diagram in raw_train_diagrams]\n", "dev_diagrams = [remove_cups(diagram) for diagram in raw_dev_diagrams]\n", "test_diagrams = [remove_cups(diagram) for diagram in raw_test_diagrams]" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "We can visualise these diagrams using :py:meth:`~lambeq.backend.grammar.Diagram.draw`." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "train_diagrams[0].draw()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create circuits" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "In order to run the experiments on a quantum computer, we apply a quantum :term:`ansatz ` to the string diagrams. For this experiment, we will use an :py:class:`.IQPAnsatz`, where noun wires (``n``) and sentence wires (``s``) are represented by one-qubit systems." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from lambeq import AtomicType, IQPAnsatz\n", "\n", "ansatz = IQPAnsatz({AtomicType.NOUN: 1, AtomicType.SENTENCE: 1},\n", " n_layers=1, n_single_qubit_params=3)\n", "\n", "train_circuits = [ansatz(diagram) for diagram in train_diagrams]\n", "dev_circuits = [ansatz(diagram) for diagram in dev_diagrams]\n", "test_circuits = [ansatz(diagram) for diagram in test_diagrams]\n", "\n", "train_circuits[0].draw(figsize=(8, 8))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training\n", "### Instantiate model" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "We instantiate a :py:class:`.PennyLaneModel`, by passing all diagrams to the class method :py:meth:`.PennyLaneModel.from_diagrams`. \n", "\n", "We also set `probabilities=True` so that the model outputs probabilities, rather than quantum states, which follows the behaviour of real quantum computers. \n", "\n", "Furthermore, we set `normalize=True` so that the output probabilities sum to one. This helps to prevent passing very small values to any following layers in a hybrid model." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "from lambeq import PennyLaneModel\n", "\n", "all_circuits = train_circuits + dev_circuits + test_circuits\n", "\n", "# if no backend_config is provided, the default is used, which is the same as below\n", "backend_config = {'backend': 'default.qubit'} # this is the default PennyLane simulator\n", "model = PennyLaneModel.from_diagrams(all_circuits,\n", " probabilities=True,\n", " normalize=True,\n", " backend_config=backend_config)\n", "model.initialise_weights()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Running on a real quantum computer" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "We can choose to run the model on a real quantum computer, using :term:`Qiskit` with IBMQ, or the Honeywell QAPI.\n", "\n", "To use IBM devices we have to save our IBMQ API token to the :term:`PennyLane` configuration file, as in the cell below." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "import pennylane as qml\n", "\n", "qml.default_config['qiskit.ibmq.ibmqx_token'] = 'my_API_token'\n", "qml.default_config.save(qml.default_config.path)\n", "backend_config = {'backend': 'qiskit.ibmq',\n", " 'device': 'ibmq_manila',\n", " 'shots': 1000}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "if TESTING:\n", " backend_config = None" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "q_model = PennyLaneModel.from_diagrams(all_circuits,\n", " probabilities=True,\n", " normalize=True,\n", " backend_config=backend_config)\n", "q_model.initialise_weights()" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "To use Honeywell/Quantinuum devices we have to pass the email address of an account with access to the Honeywell/Quantinuum QAPI to the :term:`PennyLane` configuration file.\n", "\n", "The first time you run a circuit on a Honeywell device, you will be prompted to enter your password. \n", "\n", "You can then run circuits without entering your password again for 30 days." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "qml.default_config['honeywell.global.user_email'] = ('my_Honeywell/Quantinuum_'\n", " 'account_email')\n", "qml.default_config.save(qml.default_config.path)\n", "\n", "backend_config = {'backend': 'honeywell.hqs',\n", " 'device': 'H1-1E',\n", " 'shots': 1000}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "if TESTING:\n", " backend_config = None" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "h_model = PennyLaneModel.from_diagrams(all_circuits,\n", " probabilities=True,\n", " normalize=True,\n", " backend_config=backend_config)\n", "h_model.initialise_weights()" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Running these models on a real quantum computer takes a significant amount of time as the circuits must be sent to the backend and queued, so in the remainder of this tutorial we will use `model`, which uses the default :term:`PennyLane` simulator, 'default.qubit'." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create datasets" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "To facilitate data shuffling and batching, ``lambeq`` provides a native :py:class:`.Dataset` class. Shuffling is enabled by default, and if not specified, the batch size is set to the length of the dataset." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "from lambeq import Dataset\n", "\n", "train_dataset = Dataset(train_circuits,\n", " train_labels,\n", " batch_size=BATCH_SIZE)\n", "\n", "val_dataset = Dataset(dev_circuits, dev_labels)" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Training can either by done using the :py:class:`.PytorchTrainer`, or by using native PyTorch. We give examples of both in the following section." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define loss and evaluation metric" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "When using :py:class:`.PytorchTrainer` we first define our evaluation metrics and loss function, which in this case will be the accuracy and the mean-squared error, respectively." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "def acc(y_hat, y):\n", " return (torch.argmax(y_hat, dim=1) ==\n", " torch.argmax(y, dim=1)).sum().item()/len(y)\n", "\n", "def loss(y_hat, y):\n", " return torch.nn.functional.mse_loss(y_hat, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Initialise trainer" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "As :term:`PennyLane` is compatible with PyTorch autograd, :py:class:`.PytorchTrainer` can automatically use many of the PyTorch optimizers, such as Adam to train our model." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "from lambeq import PytorchTrainer\n", "\n", "trainer = PytorchTrainer(\n", " model=model,\n", " loss_function=loss,\n", " optimizer=torch.optim.Adam,\n", " learning_rate=LEARNING_RATE,\n", " epochs=EPOCHS,\n", " evaluate_functions={'acc': acc},\n", " evaluate_on_train=True,\n", " use_tensorboard=False,\n", " verbose='text',\n", " seed=SEED)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Train" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "We can now pass the datasets to the :py:meth:`~lambeq.Trainer.fit` method of the trainer to start the training." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Epoch 1: train/loss: 0.1207 valid/loss: 0.0919 train/time: 0.71s valid/time: 0.16s train/acc: 0.7857 valid/acc: 0.8667\n", "Epoch 2: train/loss: 0.0486 valid/loss: 0.1035 train/time: 0.50s valid/time: 0.17s train/acc: 0.9286 valid/acc: 0.9000\n", "Epoch 3: train/loss: 0.0364 valid/loss: 0.0621 train/time: 0.49s valid/time: 0.17s train/acc: 0.9429 valid/acc: 0.9333\n", "Epoch 4: train/loss: 0.0466 valid/loss: 0.0392 train/time: 0.62s valid/time: 0.18s train/acc: 0.9857 valid/acc: 1.0000\n", "Epoch 5: train/loss: 0.0120 valid/loss: 0.0126 train/time: 0.49s valid/time: 0.18s train/acc: 0.9857 valid/acc: 1.0000\n", "Epoch 6: train/loss: 0.0014 valid/loss: 0.0178 train/time: 0.49s valid/time: 0.17s train/acc: 1.0000 valid/acc: 1.0000\n", "Epoch 7: train/loss: 0.0022 valid/loss: 0.0079 train/time: 0.60s valid/time: 0.17s train/acc: 1.0000 valid/acc: 1.0000\n", "Epoch 8: train/loss: 0.0041 valid/loss: 0.0061 train/time: 0.48s valid/time: 0.17s train/acc: 1.0000 valid/acc: 1.0000\n", "Epoch 9: train/loss: 0.0003 valid/loss: 0.0108 train/time: 0.47s valid/time: 0.17s train/acc: 1.0000 valid/acc: 1.0000\n", "Epoch 10: train/loss: 0.0001 valid/loss: 0.0205 train/time: 0.60s valid/time: 0.17s train/acc: 1.0000 valid/acc: 0.9667\n", "Epoch 11: train/loss: 0.0001 valid/loss: 0.0281 train/time: 0.50s valid/time: 0.16s train/acc: 1.0000 valid/acc: 0.9667\n", "Epoch 12: train/loss: 0.0005 valid/loss: 0.0309 train/time: 0.54s valid/time: 0.22s train/acc: 1.0000 valid/acc: 0.9667\n", "Epoch 13: train/loss: 0.0004 valid/loss: 0.0314 train/time: 0.57s valid/time: 0.18s train/acc: 1.0000 valid/acc: 0.9667\n", "Epoch 14: train/loss: 0.0004 valid/loss: 0.0308 train/time: 0.52s valid/time: 0.17s train/acc: 1.0000 valid/acc: 0.9667\n", "Epoch 15: train/loss: 0.0011 valid/loss: 0.0286 train/time: 0.47s valid/time: 0.17s train/acc: 1.0000 valid/acc: 0.9667\n", "\n", "Training completed!\n", "train/time: 8.06s train/time_per_epoch: 0.54s train/time_per_step: 0.08s valid/time: 2.60s valid/time_per_eval: 0.17s\n" ] } ], "source": [ "trainer.fit(train_dataset, val_dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Results\n", "\n", "Finally, we visualise the results and evaluate the model on the test data." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Final test accuracy: 0.9666666666666667\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "fig, ((ax_tl, ax_tr), (ax_bl, ax_br)) = plt.subplots(2, 2,\n", " sharex=True,\n", " sharey='row',\n", " figsize=(10, 6))\n", "ax_tl.set_title('Training set')\n", "ax_tr.set_title('Development set')\n", "ax_bl.set_xlabel('Iterations')\n", "ax_br.set_xlabel('Iterations')\n", "ax_bl.set_ylabel('Accuracy')\n", "ax_tl.set_ylabel('Loss')\n", "\n", "colours = iter(plt.rcParams['axes.prop_cycle'].by_key()['color'])\n", "range_ = np.arange(1, trainer.epochs+1)\n", "ax_tl.plot(range_, trainer.train_epoch_costs, color=next(colours))\n", "ax_bl.plot(range_, trainer.train_eval_results['acc'], color=next(colours))\n", "ax_tr.plot(range_, trainer.val_costs, color=next(colours))\n", "ax_br.plot(range_, trainer.val_eval_results['acc'], color=next(colours))\n", "\n", "# print test accuracy\n", "pred = model(test_circuits)\n", "labels = torch.tensor(test_labels)\n", "\n", "print('Final test accuracy: {}'.format(acc(pred, labels)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using standard PyTorch\n", "\n", "As we have a small dataset, we can use early stopping to prevent overfitting to the training data. In this case, we evaluate the performance of the model on the validation dataset every 5 epochs, and save a checkpoint if the validation accuracy has improved. If it does not improve for 10 epochs, we end the training, and load the model with the best validation accuracy." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "def accuracy(circs, labels):\n", " probs = model(circs)\n", " return (torch.argmax(probs, dim=1) ==\n", " torch.argmax(torch.tensor(labels), dim=1)).sum().item()/len(circs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Training is the same as standard PyTorch. We initialize an optimizer, pass it the model parameters, and then run a training loop in which we compute the loss, run a backwards pass to compute the gradients, and then take an optimizer step." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch: 0\n", "Train loss: 1.835998997092247\n", "Dev acc: 0.5333333333333333\n", "Epoch: 5\n", "Train loss: 0.19097438035532832\n", "Dev acc: 0.9\n", "Epoch: 10\n", "Train loss: 0.05956625810358673\n", "Dev acc: 0.9666666666666667\n" ] } ], "source": [ "model = PennyLaneModel.from_diagrams(all_circuits)\n", "model.initialise_weights()\n", "optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)\n", "\n", "best = {'acc': 0, 'epoch': 0}\n", "\n", "for i in range(EPOCHS):\n", " epoch_loss = 0\n", " for circuits, labels in train_dataset:\n", " optimizer.zero_grad()\n", " probs = model(circuits)\n", " loss = torch.nn.functional.mse_loss(probs,\n", " torch.tensor(labels))\n", " epoch_loss += loss.item()\n", " loss.backward()\n", " optimizer.step()\n", "\n", " if i % 5 == 0:\n", " dev_acc = accuracy(dev_circuits, dev_labels)\n", "\n", " print('Epoch: {}'.format(i))\n", " print('Train loss: {}'.format(epoch_loss))\n", " print('Dev acc: {}'.format(dev_acc))\n", "\n", " if dev_acc > best['acc']:\n", " best['acc'] = dev_acc\n", " best['epoch'] = i\n", " model.save('model.lt')\n", " elif i - best['epoch'] >= 10:\n", " print('Early stopping')\n", " break\n", "\n", "if best['acc'] > accuracy(dev_circuits, dev_labels):\n", " model.load('model.lt')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Evaluate test accuracy" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Final test accuracy: 0.9\n" ] } ], "source": [ "print('Final test accuracy: {}'.format(accuracy(test_circuits, test_labels)))" ] }, { "cell_type": "markdown", "metadata": { "raw_mimetype": "text/markdown" }, "source": [ "## Hybrid models" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "This model determines whether a pair of diagrams are about the same or different topics.\n", "\n", "It does this by first running the pair circuits to get a probability output for each, and then concatenating them together and passing them to a simple neural network.\n", "\n", "We expect the circuits to learn to output [0, 1] or [1, 0] depending on the topic they are referring to (cooking or computing), and the neural network to learn the XOR function to determine whether the topics are the same (output 0) or different (output 1).\n", "\n", ":term:`PennyLane` allows us to train both the circuits and the NN simultaneously using PyTorch autograd." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "BATCH_SIZE = 50\n", "EPOCHS = 100\n", "LEARNING_RATE = 0.1\n", "SEED = 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As the probability outputs from our circuits are guaranteed to be positive, we transform these outputs `x` by `2 * (x - 0.5)`, giving inputs to the neural network in the range [-1, 1]. \n", "\n", "This helps us to avoid \"dying ReLUs\", which could otherwise occur if all the input weights to a given hidden neuron were negative; in this case, the overall input to the neuron would be negative, and ReLU would set the output of it to 0, leading to the gradient of all these weights being 0 for all samples, causing the neuron to never learn. \n", "\n", "(A couple of alternative approaches could also involve initialising all the neural network weights to be positive, or using `LeakyReLU` as the activation function)." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "from torch import nn\n", "\n", "class XORSentenceModel(PennyLaneModel):\n", " def __init__(self, **kwargs):\n", " PennyLaneModel.__init__(self, **kwargs)\n", "\n", " self.xor_net = nn.Sequential(nn.Linear(4, 10),\n", " nn.ReLU(),\n", " nn.Linear(10, 1),\n", " nn.Sigmoid())\n", "\n", " def forward(self, diagram_pairs):\n", " first_d, second_d = zip(*diagram_pairs)\n", " evaluated_pairs = torch.cat((self.get_diagram_output(first_d),\n", " self.get_diagram_output(second_d)),\n", " dim=1)\n", " evaluated_pairs = 2 * (evaluated_pairs - 0.5)\n", " return self.xor_net(evaluated_pairs)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make paired dataset\n", "\n", "Our model is going to determine whether a given pair of sentences are talking about different topics, so we need to construct a dataset of pairs of diagrams for the train, dev, and test data." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "from itertools import combinations\n", "\n", "def make_pair_data(diagrams, labels):\n", " pair_diags = list(combinations(diagrams, 2))\n", " pair_labels = [int(x[0] == y[0]) for x, y in combinations(labels, 2)]\n", " return pair_diags, pair_labels\n", "\n", "train_pair_circuits, train_pair_labels = make_pair_data(train_circuits,\n", " train_labels)\n", "dev_pair_circuits, dev_pair_labels = make_pair_data(dev_circuits,\n", " dev_labels)\n", "test_pair_circuits, test_pair_labels = make_pair_data(test_circuits,\n", " test_labels)" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "There are lots of pairs (2415 train pairs), so we'll sample a subset to make this example train more quickly." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "TRAIN_SAMPLES, DEV_SAMPLES, TEST_SAMPLES = 300, 200, 200" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "if TESTING:\n", " TRAIN_SAMPLES, DEV_SAMPLES, TEST_SAMPLES = 1, 1, 1" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "train_pair_circuits, train_pair_labels = (\n", " zip(*random.sample(list(zip(train_pair_circuits, train_pair_labels)), \n", " TRAIN_SAMPLES)))\n", "dev_pair_circuits, dev_pair_labels = (\n", " zip(*random.sample(list(zip(dev_pair_circuits, dev_pair_labels)), DEV_SAMPLES)))\n", "test_pair_circuits, test_pair_labels = (\n", " zip(*random.sample(list(zip(test_pair_circuits, test_pair_labels)), TEST_SAMPLES)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Initialise model" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "As :py:class:`XORSentenceModel` inherits from :py:class:`.PennyLaneModel`, we can again pass in `probabilities=True` and `normalize=True` to :py:meth:`~XORSentenceModel.from_diagrams`." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "all_pair_circuits = (train_pair_circuits +\n", " dev_pair_circuits +\n", " test_pair_circuits)\n", "a, b = zip(*all_pair_circuits)\n", "\n", "model = XORSentenceModel.from_diagrams(a + b)\n", "model.initialise_weights()\n", "model = model\n", "\n", "train_pair_dataset = Dataset(train_pair_circuits,\n", " train_pair_labels,\n", " batch_size=BATCH_SIZE)\n", "\n", "optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Train and log accuracies\n", "\n", "We train the model using pure PyTorch in the exact same way as above." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "def accuracy(circs, labels):\n", " predicted = model(circs)\n", " return (torch.round(torch.flatten(predicted)) ==\n", " torch.Tensor(labels)).sum().item()/len(circs)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch: 0\n", "Train loss: 4.291878283023834\n", "Dev acc: 0.53\n", "Epoch: 5\n", "Train loss: 3.321199357509613\n", "Dev acc: 0.55\n", "Epoch: 10\n", "Train loss: 0.38510115444660187\n", "Dev acc: 0.955\n", "Epoch: 15\n", "Train loss: 0.9513051249086857\n", "Dev acc: 0.77\n", "Epoch: 20\n", "Train loss: 4.628978729248047\n", "Dev acc: 0.525\n", "Early stopping\n" ] } ], "source": [ "best = {'acc': 0, 'epoch': 0}\n", "\n", "for i in range(EPOCHS):\n", " epoch_loss = 0\n", " for circuits, labels in train_pair_dataset:\n", " optimizer.zero_grad()\n", " predicted = model(circuits)\n", " loss = torch.nn.functional.binary_cross_entropy(\n", " torch.flatten(predicted), torch.Tensor(labels))\n", " epoch_loss += loss.item()\n", " loss.backward()\n", " optimizer.step()\n", "\n", " if i % 5 == 0:\n", " dev_acc = accuracy(dev_pair_circuits, dev_pair_labels)\n", "\n", " print('Epoch: {}'.format(i))\n", " print('Train loss: {}'.format(epoch_loss))\n", " print('Dev acc: {}'.format(dev_acc))\n", "\n", " if dev_acc > best['acc']:\n", " best['acc'] = dev_acc\n", " best['epoch'] = i\n", " model.save('xor_model.lt')\n", " elif i - best['epoch'] >= 10:\n", " print('Early stopping')\n", " break\n", "\n", "if best['acc'] > accuracy(dev_pair_circuits, dev_pair_labels):\n", " model.load('xor_model.lt')\n", " model = model" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Final test accuracy: 0.945\n" ] } ], "source": [ "print('Final test accuracy: {}'.format(accuracy(test_pair_circuits,\n", " test_pair_labels)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Analysing the internal representations of the model\n", "\n", "We hypothesised that the quantum circuits would be able to separate the representations of sentences about food and cooking, and that the classical NN would learn to XOR these representations to give the model output. Here we can look at parts of the model separately to determine whether this hypothesis was accurate.\n", "\n", "First, we can look at the output of the NN when given the 4 possible binary inputs to XOR." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.9993979 ],\n", " [0.65196735],\n", " [0.00569755],\n", " [0.1350544 ]], dtype=float32)" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xor_labels = [[1, 0, 1, 0], [0, 1, 0, 1], [1, 0, 0, 1], [0, 1, 1, 0]]\n", "# the first two entries correspond to the same label for both sentences, the last two to different labels\n", "xor_tensors = torch.tensor(xor_labels).float()\n", "\n", "model.xor_net(xor_tensors).detach().numpy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that in the case that the labels are the same, the outputs are significantly greater than 0.5, and in the case that the labels are different, the outputs are significantly less than 0.5, and so the NN seems to have learned the XOR function.\n", "\n", "We can also look at the outputs of some of the test circuits to determine whether they have been able to seperate the two classes of sentences." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "FOOD_IDX, IT_IDX = 0, 6\n", "symbol_weight_map = dict(zip(model.symbols, model.weights))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "if TESTING: \n", " FOOD_IDX, IT_IDX = 0, 0" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "woman prepares tasty dinner .\n" ] }, { "data": { "text/plain": [ "array([0.42397027, 0.57602973])" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(test_data[FOOD_IDX])\n", "\n", "p_circ = test_circuits[FOOD_IDX].to_pennylane(probabilities=True)\n", "p_circ.initialise_concrete_params(symbol_weight_map)\n", "unnorm = p_circ.eval().detach().numpy()\n", "\n", "unnorm / np.sum(unnorm)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "skillful person runs software .\n" ] }, { "data": { "text/plain": [ "array([0.95847886, 0.04152114])" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(test_data[IT_IDX])\n", "\n", "p_circ = test_circuits[IT_IDX].to_pennylane(probabilities=True)\n", "p_circ.initialise_concrete_params(symbol_weight_map)\n", "unnorm = p_circ.eval().detach().numpy()\n", "\n", "unnorm / np.sum(unnorm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From these examples, it seems that the circuits are able to strongly differentiate between the two topics, assigning approximately [0, 1] to the sentence about food, and [1, 0] to the sentence about computing." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. rubric:: See also:\n", "\n", "- `Training: Classical case <./trainer-classical.ipynb>`_\n", "- `Training: Quantum case <./trainer-quantum.ipynb>`_\n", "- `Advanced: Manual training <../manual-training.rst>`_" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 4 }