Command-line interface¶
While lambeq
is primarily aimed for programmatic use, since Release 0.2.0 it is also equipped with a command-line interface that provides immediate and easy access to most of the toolkit’s functionality. For example, this addition allows lambeq
to be used as a dual parser, capable of providing syntactic derivations in both pregroup and CCG form.
A summary of the available options is given below.
lambeq [-h] [-v] [-m {string-diagram,pregroups,ccg}] [-i INPUT_FILE]
[-f {json,pickle,text-unicode,text-ascii,image}]
[-g {png,pdf,jpeg,jpg,eps,pgf,ps,raw,rgba,svg,svgz,tif,tiff}]
[-u [KEY=VAR ...]] [-o OUTPUT_FILE | -d OUTPUT_DIR]
[-p {bobcat,depccg}] [-t] [-s] [-r {spiders,stairs,cups,tree}]
[-c [ROOT_CAT ...]] [-w [REWRITE_RULE ...]]
[-a {iqp,tensor,spider,mps}] [-n [KEY=VAR ...]] [-y STORE_ARGS]
[-l LOAD_ARGS]
[input_sentence]
To get detailed help about the available options, type:
$ lambeq --help
The following sections provide an introduction to the command-line interface usage via specific examples, while all available options are described in depth in Section Detailed Options.
Basic usage¶
The most straightforward use of the command-line interface of lambeq
is to use it as a pregroup or CCG parser. The output formalism is controlled by the --mode
option, which can be set to string-diagram
, pregroups
, or ccg
.
The
string-diagram
mode is the default, producing a string diagram that faithfully follows the CCG derivation returned by the parser; this may include swaps introduced by certain CCG features such as cross-composition and “unary” type-changing rules.The
pregroups
mode removes any swaps from the string diagram by changing the ordering of the atomic types, converting it into a valid pregroup form as given in [Lam99]. (Thepregroups
mode is further described later in Section Strict Pregroups Mode.)The
ccg
mode returns the original CCG tree, instead of a string or pregroup diagram.
For example, to get the default string diagram output for a sentence, use the following command:
$ lambeq "John gave Mary a flower"
John gave Mary a flower
──── ───────────── ──── ───── ──────
n n.r·s·n.l·n.l n n·n.l n
╰─────╯ │ │ ╰────╯ │ ╰─────╯
│ ╰─────────────╯
lambeq
will use the default BobcatParser
to parse the sentence and output the string diagram in the console with text drawing characters.
In order to get the corresponding CCG derivation, type:
$ lambeq -m ccg "John gave Mary a flower"
John gave Mary a flower
════ ═══════════ ════ ═══ ══════
n ((s\n)/n)/n n n/n n
────────────────> ──────────>
(s\n)/n n
─────────────────────────────>
s\n
───────────────────────────────────<
s
Use the following command to read an entire file of sentences, tokenise them, parse them with the default parser, and store the pregroup or CCG diagrams in a new file:
$ lambeq -i sentences.txt -t -o diagrams.txt
Note
For the rest of this document, all examples use the default string-diagram
mode.
In the above example, file sentences.txt
is expected to contain one sentence per line. The output will be written to file diagrams.txt
.
In case your input file does not contain one sentence per line, you can add the --split_sentences
or -s
flag.
If the text output is not good enough for your purposes, you can ask lambeq
to prepare images for the diagrams in a variety of formats and store them in a specific folder for you:
$ lambeq -i sentences.txt -t -d image_folder -f image -g png
lambeq
will prepare a png
file for each one of the sentences, and store it in folder image_folder
using the line number of the sentence in the input file to name the image file, e.g. diagram_1.png
, diagram_2.png
and so on.
Note
Image generation is currently available only in string-diagram
and pregroups
modes.
It is also possible to parse a single sentence and store it as an image – for example, in PDF format in order to use it in a paper. In this case, you can name the file yourself and apply specific format options, such as the exact size of the figure or the font size used in the diagram. Note that it is not necessary to specify the image format if it is already contained in the file name (e.g. pdf).
$ lambeq -f image -u fig_width=16 fig_height=3 fontsize=12
> -o diagram.pdf
> "Mary does not like John"
Strict pregroups mode¶
We already discussed that lambeq
can provide its outputs as string diagrams or CCG trees. There is also a third mode available (pregroups
), which removes any swaps from the string diagram and converts it into a strict pregroup form, conforming to the definition of a formal pregroup grammar. Swaps can be introduced by cross-composition and unary rules in the original CCG derivation. For example, consider the following CCG tree:
$ lambeq -t -m ccg "The best movie I've ever seen"
The best movie I 've ever seen
═══ ════ ═════ ═ ═══════════ ═══════════ ═══════
n/n n/n n n (s\n)/(s\n) (s\n)\(s\n) (s\n)/n
──────────> ─────>T ─────────────────────<Bx
n s/(s\n) (s\n)/(s\n)
───────────────> ───────────────────────────────>B
n (s\n)/n
────────────────────────────────────────>B
s/n
───────────────────────────────────────<U>
n\n
───────────────────────────────────────────────────────────<
n
Note that “‘ve” and “ever” are combined using cross-composition (Bx
rule), while there is also a unary (<U>
) type-changing rule, from s/n
to n\n
. CCG parsers use these features to avoid associate a single word with many different types, keeping in that way the size of the vocabulary relatively small. When this derivation is converted into a string diagram, it takes the following form:
$ lambeq -t "The best movie I've ever seen"
The best movie I 've ever seen
───── ───── ───── ─ ─────────── ─────────────── ─────────
n·n.l n·n.l n n n.r·s·s.l·n s.r·n.r.r·n.r·n n.r·s·n.r
│ ╰───╯ ╰─────╯ │ │ │ │ ╰─╮─╯ │ │ │ │ │ │
│ │ │ │ │ ╭─╰─╮ │ │ │ │ │ │
│ │ │ │ ╰╮─╯ ╰─╮──╯ │ │ │ │ │
│ │ │ │ ╭╰─╮ ╭─╰──╮ │ │ │ │ │
│ │ │ ╰──╯ ╰─╮─╯ ╰─╮──╯ │ │ │ │
│ │ │ ╭─╰─╮ ╭─╰──╮ │ │ │ │
│ │ ╰────────╯ ╰─╮──╯ ╰╮─╯ │ │ │
│ │ ╭─╰──╮ ╭╰─╮ │ │ │
│ ╰────────────────╯ ╰─╮──╯ ╰───╯ │ │
│ ╭─╰──╮ │ │
│ │ ╰─────────╯ │
│ ╰────────╮────────╯
│ ╭────────╰────────╮
╰──────────────────────────────────────────╯ │
Even for relativery short sentences like the above, the swaps may result in diagrams that are difficult to read and follow. In cases where diagrammatic clarity and conformance to a strict pregroup form is important, one can use pregroups
mode:
$ lambeq -t -m pregroups "The best movie I've ever seen"
The best movie I 've ever seen
───── ───── ───── ─ ───────────── ───────
n·n.l n·n.l n n n.r·n.r·s.l·n n.r·s·n
│ ╰───╯ ╰─────╯ ╰───╯ │ │ ╰───╯ │ │
╰────────────────────────────╯ ╰─────────╯ │
Note that the order of the types in the new diagram has been changed in a way that does not require swaps, while the two words “‘ve” and “ever”, which in the original derivation were interwoven using swaps (result of cross-composition), now have been merged into a single token.
Warning
The pregroups
mode trades off diagrammatic simplicity and conformance to a formal pregroup grammar for a larger vocabulary, since each word is associated with more types than before and new words (combined tokens) are added to the vocabulary. Depending on the size of your dataset, this might lead to data sparsity problems during training.
Note
To convert a string diagram into a strict pregroup diagram programmatically, one can use the RemoveSwapsRewriter
class.
Using a reader¶
Note
Option only applicable to string and pregroup diagrams.
Instead of the parser, users may prefer to apply one of the available readers, each corresponding to a different compositional scheme. For example, to encode a sentence as a tensor train:
$ lambeq -r cups "John gave Mary a flower"
START John gave Mary a flower
───── ───── ───── ───── ───── ──────
s s.r·s s.r·s s.r·s s.r·s s.r·s
╰─────╯ ╰───╯ ╰───╯ ╰───╯ ╰───╯ │
Readers can be used for batch processing of entire files with the -i
option, exactly as in the parser case.
$ lambeq -r cups -i sentences.txt -o diagrams.txt
Note
Some readers, such as the spiders_reader
, stairs_reader
instances of the LinearReader
class, or an instance of a TreeReader
, may convert the pregroup diagram into a monoidal form that is too complicated to be rendered properly in a text console. In these cases, diagrams cannot be displayed as text.
Rewrite rules and ansätze¶
Note
Option only applicable to string and pregroup diagrams.
The command-line interface supports all stages of the lambeq
pipeline, such as application of rewrite rules and use of ansätze for converting the sentences into quantum circuits or tensor networks. For example, to read a file of sentences, parse them, apply the prepositional_phrase
and determiner
rewrite rules, and use an IQPAnsatz
with 1 qubit assigned to sentence type, 1 qubit to noun type, and 2 IQP layers, use the command:
$ lambeq -i sentences.txt -t -f image -g png
> -w prepositional_phrase determiner
> -a iqp -n dim_n=1 dim_s=1 n_layers=2
> -d image_folder
Note
Since rewrite rules and ansätze can produce output that is too complicated to be properly rendered in purely text form, text output in the console is not available for these cases.
For the classical case, applying a SpiderAnsatz
with 2 dimensions assigned to sentence type and 4 dimensions to noun type, and the same rewrite rules as above, can be done with the following command:
$ lambeq -i sentences.txt -t -f image -g png
> -w prepositional_phrase determiner
> -a spider -n dim_n=4 dim_s=2
> -d image_folder
Other options¶
To store the lambeq.backend.grammar.Diagram
(for string diagrams) or the CCGTree
objects (for the CCG trees) in json
or pickle
format, type:
$ lambeq -f pickle -i sentences.txt -o diagrams.pickle
or
$ lambeq -f json -i sentences.txt -o diagrams.json
Text output is also available with ascii-only characters:
$ lambeq -f text-ascii "John gave Mary a flower."
John gave Mary a flower.
____ _____________ ____ _____ _______
n n.r s n.l n.l n n n.l n
\_____/ | | \____/ | \______/
| \_____________/
To avoid repeated long commands, arguments can be stored into a YAML file conf.yaml
by adding an argument -y conf.yaml
.
To load the configuration from this file next time, -l conf.yaml
can be added. Any arguments that were not provided in the command line will be taken from that file. If an argument is specified both in the command line and in the configuration file, the command-line argument takes priority.
Detailed options¶
Command-line interface for lambeq.
usage: lambeq [-h] [-v] [-m {string-diagram,pregroups,ccg}] [-i INPUT_FILE]
[-f {json,pickle,text-unicode,text-ascii,image}]
[-g {png,pdf,jpeg,jpg,eps,pgf,ps,raw,rgba,svg,svgz,tif,tiff}]
[-u [KEY=VAR ...]] [-o OUTPUT_FILE | -d OUTPUT_DIR]
[-p {bobcat,depccg}] [-t] [-s] [-r {spiders,stairs,cups,tree}]
[-c [ROOT_CAT ...]] [-w [REWRITE_RULE ...]]
[-a {iqp,tensor,spider,mps}] [-n [KEY=VAR ...]] [-y STORE_ARGS]
[-l LOAD_ARGS]
[input_sentence]
Positional Arguments¶
- input_sentence
Sentence to parse.
Default: “”
Named Arguments¶
- -v, --version
show program’s version number and exit
- -m, --mode
Possible choices: string-diagram, pregroups, ccg
Mode used for the output. Default value: string-diagram
Default: “string-diagram”
- -i, --input_file
File to parse.
Output¶
Options related to output format.
- -f, --output_format
Possible choices: json, pickle, text-unicode, text-ascii, image
Format of the output. Use json and pickle to store the lambeq objects in the respective formats, or text-unicode, text-ascii and image to store directly the derivations in diagrammatic form. Default value: text-unicode
- -g, --image_format
Possible choices: png, pdf, jpeg, jpg, eps, pgf, ps, raw, rgba, svg, svgz, tif, tiff
When image is selected as output_format, this option specifies the required image type. It does not have any effect when any other option is selected as output_format. Default value: png
- -u, --output_options
Possible choices: fig_width=<int> (default: None), fig_height=<int> (default: None), fontsize=<int> (default: None)
A list of keyword=value items that define options for the output format. Available options are fig_width=<int> (default: None), fig_height=<int> (default: None), fontsize=<int> (default: None).
Default: {}
- -o, --output_file
File to write the output. When output_format is json, text-ascii, or text-unicode and this argument is not provided, lambeq will output to stdout. This argument is ignored when output_format is image, in which case output_dir needs to be provided.
- -d, --output_dir
When image is selected as output_format, this option specifies the directory where the image files would be stored. It does not have effect when any other option is selected as output_format.
Parser¶
Options related to parser.
- -p, --parser
Possible choices: bobcat, depccg
Choice of a parser. Mutually exclussive with using a reader. If None, BobcatParser is used.
- -t, --tokenise
Tokenises the input before sending to parser. If not used, the parser assumes that text is already tokenised.
Default: False
- -s, --split_sentences
Use SpaCy sentence splitting to split the text into sentences. Not required if only one sentence is provided or if sentences are already given one per line.
Default: False
- -r, --reader
Possible choices: spiders, stairs, cups, tree
Choice of a reader. Mutually exclusive with using a parser.
- -c, --root_categories
A list of acceptable categories at the root of the diagram.
Rewriter¶
Rewrite options.
- -w, --rewrite_rules
Possible choices: auxiliary, connector, determiner, postadverb, preadverb, prepositional_phrase, coordination, curry, object_rel_pronoun, subject_rel_pronoun
A list of rewrite rules. Available options: [‘auxiliary’, ‘connector’, ‘determiner’, ‘postadverb’, ‘preadverb’, ‘prepositional_phrase’, ‘coordination’, ‘curry’, ‘object_rel_pronoun’, ‘subject_rel_pronoun’]
Ansätze¶
Options related to ansatz choices.
- -a, --ansatz
Possible choices: iqp, tensor, spider, mps
Ansatz to be used. This determines if the result will be a quantum circuit or a tensor network.
- -n, --ansatz_options
Possible choices: dim_n=<int> (default: 2), dim_s=<int> (default: 2), n_layers=<int> (default: 2), n_single_qubit_params=<int> (default: 3), bond_dim=<int> (default: 3), max_order=<int> (default: 3)
A list of keyword=value items that define options for the selected ansatz. Available options are dim_n=<int> (default: 2), dim_s=<int> (default: 2), n_layers=<int> (default: 2), n_single_qubit_params=<int> (default: 3), bond_dim=<int> (default: 3), max_order=<int> (default: 3).
Default: {}
Configuration¶
Options for storing and loading the command-line arguments to/from files.
- -y, --store_args
File to store the parameters in YAML format for future use.
- -l, --load_args
Load and use a set of stored parameters from the specified file.