lambeq.tokeniser

class lambeq.tokeniser.SpacyTokeniser[source]

Bases: Tokeniser

Tokeniser class based on SpaCy.

__init__() None[source]
split_sentences(text: str) list[str][source]

Split input text into a list of sentences.

Parameters:
textstr

A single string that contains one or multiple sentences.

Returns:
list of str

List of sentences, one sentence in each string.

tokenise_sentence(sentence: str) list[str]

Tokenise a sentence.

Parameters:
sentencestr

An untokenised sentence.

Returns:
list of str

A tokenised sentence given as a list of tokens - strings.

tokenise_sentences(sentences: Iterable[str]) list[list[str]][source]

Tokenise a list of sentences.

Parameters:
sentenceslist of str

A list of untokenised sentences.

Returns:
list of list of str

A list of tokenised sentences, where each sentence is a list of tokens.

class lambeq.tokeniser.Tokeniser[source]

Bases: ABC

Base Class for all tokenisers

abstract split_sentences(text: str) list[str][source]

Split input text into a list of sentences.

Parameters:
textstr

A single string that contains one or multiple sentences.

Returns:
list of str

List of sentences, one sentence in each string.

tokenise_sentence(sentence: str) list[str][source]

Tokenise a sentence.

Parameters:
sentencestr

An untokenised sentence.

Returns:
list of str

A tokenised sentence given as a list of tokens - strings.

abstract tokenise_sentences(sentences: Iterable[str]) list[list[str]][source]

Tokenise a list of sentences.

Parameters:
sentenceslist of str

A list of untokenised sentences.

Returns:
list of list of str

A list of tokenised sentences, where each sentence is a list of tokens - strings