Glossary

adjoint: In lambeq, each pregroup type \(p\) has a left (\(p^l\)) and a right (\(p^r\)) adjoint, which are used to represent arguments in composite types. For example, a transitive verb has type \(n^r \cdot s \cdot n^l\), meaning it expects a noun argument on both sides in order to return a sentence.
ansatz (plural: ansätze): A map that determines choices such as the number of qubits that every wire of a string diagram is associated with and the concrete parameterised quantum states that correspond to each word. For the classical case, an ansatz determines the number of dimensions associated with each type, and the way that large tensors are represented as matrix product states.
bag-of-words: A compositional model of meaning which represents a sentence as a multiset of words; that is, it does not take into account the order of words or any other syntactic relationship between them.
Bobcat: A state-of-the-art statistical CCG parser based on [SC2021]. Bobcat is lambeq’s default parser.
cap: A special morphism in a rigid category, which, together with a cup morphism, obey certain conditions called snake equations. In diagrammatic form, a cap is depicted as a wire with downward concavity (\(\cap\)). In the context of DisCoCat, a cap is mostly used to “bridge” disconnected wires in order to alter the normal “flow” of information from one word to another, for example in cases such as type-raising.
category: In category theory, a category is a mathematical structure that consists of a collection of objects and a collection of morphisms between objects, forming a labelled directed graph. A category has two basic properties: the ability to compose the arrows associatively and the existence of an identity arrow for each object. lambeq structures are expressed in terms of a monoidal category.
categorical quantum mechanics (CQM): The study of quantum foundations and quantum information using paradigms from mathematics and computer science, specifically monoidal categories. The primitive objects of study are physical processes and the different ways that these can be composed. The field was originated by Samson Abramsky and Bob Coecke in 2004 [AC2004].
CCGBank: The CCG version of Penn Treebank, a corpus of over 49,000 human-annotated syntactic trees created by Julia Hockenmaier and Mark Steedman [HS2007].
Combinatory Categorial Grammar (CCG): A grammar formalism inspired by combinatory logic and developed by Mark Steedman [Ste2000]. It defines a number of combinators (application, composition, and type-raising being the most common) that operate on syntactically-typed lexical items, by means of natural deduction style proofs. CCG is categorised as a mildly context-sensitive grammar, standing in between context-free and context-sensitive in Chomsky hierarchy and providing a nice trade-off between expressive power and computational complexity.
compact closed category: A symmetric rigid category. The symmetry of the category causes the left and right duals of an object to coincide: \(A^l=A^r=A^*\). A pregroup grammar is often referred to as a non-symmetric compact closed category.
compositional model: A model that produces semantic representations of sentences by composing together the semantic representations of the words within them. An example of a compositional model is DisCoCat.
cup: A special morphism in a rigid category, which, together with a cap morphism, obey certain conditions called snake equations. In diagrammatic form, a cup is depicted as a wire with upward concavity (\(\cup\)). In the context of DisCoCat, a cup usually represents a tensor contraction between two-word representations.
depccg: A statistical CCG parser for English and Japanese [YNM2017].
DisCoCat: The DIStributional COmpositional CATegorical model of natural language meaning developed by Bob Coecke, Mehrnoosh Sadrzadeh and Steve Clark [CSC2010]. The model applies a functor \(F: \textrm{Grammar} \to \textrm{Meaning}\) whose left-hand side is a free pregroup over a partially ordered set of basic grammar types, and the right-hand side is the category whose morphisms describe a sequence of operations that can be evaluated on a classical or quantum computer.
DisCoPy: DIStributional COmpositional PYthon. A Python library for working with monoidal categories [FTC2020]. It includes abstractions for creating all standard quantum gates and building quantum circuits. Additionally, it is equipped with many language-related features, such as support for pregroup grammars and functors for implementing compositional models.
Frobenius algebra: In the context of a symmetric monoidal category, a Frobenius algebra provides morphisms \(\Delta: A \to A\otimes A\) and \(\mu: A\otimes A \to A\) for any object \(A\), satisfying certain conditions (the so-called Frobenius equations) and implementing the notion of a spider. In lambeq and DisCoCat, spiders can be used to implement rewrite rules [Kea2014] [Kar2016] [SCC2014a] [SCC2014b].
functor: A structure-preserving transformation from one category to another. lambeq’s pipeline is essentially a chain of functorial transformations from a grammar category to a category accommodating the meaning of a sentence.
IQP circuit: Instantaneous Quantum Polynomial. A circuit which interleaves layers of Hadamard quantum gates with diagonal unitaries.
loss function: In machine learning, a function that estimates how far the prediction of a model is from its true value. The purpose of training is to minimise the loss over the training set.
matrix product state (MPS): A factorization of a large tensor into a chain-like product of smaller tensors. lambeq is equipped with ansätze that implement various forms of matrix product states, allowing the execution of large tensor networks on classical hardware.
model: A lambeq model is a class holding the trainable weights and other model-specific information, used in supervised learning. A model is always associated with a specific backend, such as PyTorch, NumPy, or tket, and is paired with a matching trainer.
monoidal category: A category equipped with the monoidal product \(\otimes\) and monoidal unit \(I\), providing an abstraction suitable for quantum computation. Categorical quantum mechanics (CQM) and DisCoCat are both based on the mathematical framework of monoidal categories.
natural language processing (NLP): The use of computational methods for solving language-related problems.
NISQ: Noisy Intermediate-Scale Quantum. A term for characterising the current state of quantum hardware, where quantum processors still contain a small number of qubits, and are not advanced enough to reach fault-tolerance nor large enough to profit substantially from quantum supremacy.
noise: Undesired artefacts that cause the measurement outcome of a quantum circuit to deviate from the ideal distribution.
parser: A statistical tool that converts a sentence into a hierarchical representation that reflects the syntactic relationships between the words (a syntax tree) based on a specific grammar formalism.
PennyLane: A Python library for differentiable programming of quantum computers, developed by Xanadu, enabling quantum machine learning. See more here.
post-selection: The act of conditioning the probability space on a particular event. In practice, this involves disregarding measurement outcomes where a particular qubit does not match the post-selected value.
pregroup grammar: A grammar formalism developed by Joachim Lambek in 1999 [Lam1999] based on the notion of a pregroup. Pregroup grammars are closely related to categorial grammars (such as CCG). In category-theoretic terms, a pregroup grammar forms a rigid category, sometimes also referred to as a non-symmetric compact closed category.
pytket: A Python interface for the tket compiler.
PyTorch: An open source machine learning framework primarily developed by Meta AI.
Qiskit: An open-source SDK developed by IBM Research for working with quantum computers at the level of circuits, pulses, and algorithms.
quantum circuit: A sequence of quantum gates, measurements, and initializations of qubits that expresses a computation in a quantum computer. The purpose of lambeq is to convert sentences into quantum circuits that can be evaluated on quantum hardware.
quantum gate: An atomic unit of computation operating on a small number of qubits. Quantum gates are the building blocks of quantum circuits.
quantum NLP (QNLP): The design and implementation of NLP models that exploit certain quantum phenomena such as superposition, entanglement, and interference to perform language-related tasks on quantum hardware.
qubit: The quantum analogue of a bit and the most basic unit of information carrier in a quantum computer. It is associated with a property of a physical system such as the spin of an electron (“up” or “down” along some axis), and has a state that lives in a 2-dimensional complex vector space.
reader: In lambeq, an object that translates a sentence into a string diagram based on a certain compositional scheme. Versions of a bag-of-words model and a word-sequence model are implemented in lambeq using readers.
rewrite rule: A functorial transformation that changes the wiring of a specific box (representing a word) in a string diagram to simplify the diagram or to make it more amenable to implementation on the hardware of choice.
rewriter: An object that acts on a string diagram, applying some form of functorial or procedural transformation.
rigid category: A monoidal category where every object \(A\) has a left dual \(A^l\) and a right dual \(A^r\), both equipped with cup and cap morphisms obeying the so-called snake equations. A pregroup grammar is an example of a rigid category.
shots: A collection of measurement outcomes from a particular quantum circuit.
snake equations: Identities that hold between the dual objects of a monoidal category and allow the “yanking” of wires and the rewriting and simplification of diagrams. In lambeq, the .grammar.Diagram.normal_form() method uses the snake equations in order to “stretch” the wires of a diagram and provide a normal form for it.
spider: Another name for a Frobenius algebra.
string diagram: A diagrammatic representation that reflects computations in a monoidal category, an abstraction well-suited to model the way a quantum computer works and processes data. String diagrams are the native form of representing sentences in lambeq and DisCoCat, since they remain close to quantum circuits, yet are independent of any low-level design decisions depending on hardware. They can be seen as enriched tensor networks.
syntax tree: A hierarchical representation of a sentence that reflects the syntactic relationships between the words, given a specific grammar. The first step in lambeq’s pipeline given a sentence is to produce a CCG syntax tree for it, which is then converted into a string diagram.
symbol: In lambeq, a symbol corresponds to a trainable part of a tensor network or a quantum circuit. In the classical case, symbols are associated with tensors in a tensor network, while in the quantum case symbols represent numbers expressing rotation angles on qubits in a quantum circuit.
symmetric monoidal category: A monoidal category equipped with swaps, such that, for any two objects \(A\) and \(B\), we have \(A\otimes B \cong B\otimes A\). lambeq’s string diagrams are expressed in a symmetric monoidal category.
swap: A crossing of wires in a symmetric monoidal category. lambeq uses swaps in order to translate crossed composition rules in CCG derivations into a string diagram form [YK2021].
tensor network: A directed acyclic graph expressing a (multi-)linear computation between tensors. The vertices of the graph are multi-linear tensor maps, and the edges correspond to vector spaces. Tensor networks have found many applications in quantum mechanics. lambeq’s string diagrams can be seen as tensor networks with additional properties.
tensor train: A basic tensor network in which all tensors have the same shape and each tensor is connected to the next one following a predefined order. In lambeq, tensor trains are used to implement word-sequence models.
tket: Stylised \(\textrm{t}|\textrm{ket}\rangle\). A quantum software development platform produced by Cambridge Quantum. The heart of tket is a language-agnostic optimising compiler designed to generate code for a variety of NISQ devices, which has several features designed to minimise the influence of device error.
trainer: In lambeq, a trainer is a class related to a given backend (for example PyTorch, NumPy, tket and so on) that is used for supervised learning. A trainer is always paired with a matching model, a structure that contains the trainable weights and other parameters of the model.
tree reader: In lambeq, a tree reader converts a sentence into a monoidal diagram by following directly its CCG syntax tree, as provided by a parser. In other words, no explicit pregroup diagram is generated. Composition takes place by boxes that combine word states based on the grammatical rules found in the tree.
word-sequence model: A compositional model that respects the order of words in a sentence, but does not take into account any other syntactic information.