Expressions
Rational expressions, or expressions for short, denote (rational) languages in a compact way. Since Vcsn supports weighted expressions, they actually can denoted rational series.
This page documents the syntax and transformations (called identities) that are applied to every expression. The list of available algorithms using expression is in the Algorithms page.
Syntax
The syntax for rational expressions is as follows (with increasing precedence):
\z
, the empty expression.\e
, the empty word.a
, the lettera
.'l'
, the labell
(useful, for instance, when labels are words, or to denote a letter which is an operator:'+'
denotes the+
letter).[abcd]
, letter class, equivalent to(a+b+c+d)
.[a-d]
, one letter of the current alphabet betweena
andd
. If the alphabet is ,[a-d]
denotes[ad]
, not[abcd]
.[^a-dz]
, one letter of the current alphabet that is not part of[a-dz]
.[^]
, any letter of the current alphabet ("any letter other that none").(e)
,e
.e+f
, the addition (disjunction, union) ofe
andf
(note the use of+
,|
denotes tupling).e@f
, the composition ofe
andf
.e|f
, the tupling ofe
andf
.e&f
, the conjunction (intersection) ofe
andf
.e:f
, the shuffle product (interleaving) ofe
andf
.e&:f
, the infiltration product ofe
andf
.e/f
, the right division ofe
byf
.e\f
, the left division off
bye
.ef
ande.f
, the multiplication (concatenation) ofe
andf
.<k>e
, the left exterior product (left-scalar product) ofe
byk
.e<k>
, the right exterior product (right-scalar product) ofe
byk
.e*
ande{*}
, any number of repetitions ofe
(the Kleene closure ofe
).e{n}
, the power (repeated multiplication) ofe
n
times:ee ... e
.e{n,m}
, any repetition ofe
betweenn
andm
, i.e., the sum of the powers ofe
betweenn
andm
:e{n}+e{n+1}+ ... +e{m}
.e{n,}
, the sum of powers ofe
at leastn
times:e{n}e*
.e{,m}
, at mostm
repetitions ofe
:e{0,m}
.e{+}
, at least onee
:e{1,}
.e?
,e{?}
,e
optional:e{0,1}
.e{c}
, the complement ofe
.e{T}
, the transposition ofe
.e{t}
, the transpose ofe
.
where e
and f
denote expressions, a
a label, k
a weight, and n
and m
natural numbers.
Please note that contrary to "regexps" (as in grep, perl, etc.):
spaces are ignored
+
denotes the choice, not|
.
denotes the concatenation, use[^]
to mean "any letter"
You can also create or edit interactively an expression while seeing the resulting automaton by using %expression
var
in a notebook. You will also be able to chose the identity and the automaton generating algorithm you want to use to build the automaton.
Identities
Some rewriting rules are applied on the expressions to "simplify" them. The strength of this simplification is graduated.
none
: no identities at all. Some algorithms, such asderived_term
, might not terminate.trivial
: the trivial identities only are applied.associative
: the associative identities are added.linear
: the linear identities are added. This is the default.distributive
: the distributive identities are added to linear.agressive
: in addition to the linear identities, some optimizations are performed.
Trivial Identities
$$ \newcommand{\eword}{\varepsilon} \newcommand{\lweight}[2]{\bra{#1}{#2}} \newcommand{\rweight}[2]{#1\bra{#2}} \newcommand{\lmulq}[2]{\bra{#1}^?{#2}} \newcommand{\rmulq}[2]{#1\bra{#2}^?} \newcommand{\bra}[1]{\langle#1\rangle} \newcommand{\K}{\mathbb{K}} \newcommand{\zed}{\mathsf{0}} \newcommand{\und}{\mathsf{1}} \newcommand{\zeK}{0_{\K}} \newcommand{\unK}{1_{\K}} \newcommand{\Ed}{\mathsf{E}} \newcommand{\Fd}{\mathsf{F}} \newcommand{\Gd}{\mathsf{G}} \newcommand{\AND}{\mathbin{\&}} \newcommand{\ldiv}{\mathbin{\backslash}} \newcommand{\comp}{\mathbin{@}} \newcommand{\tuple}{\mathbin{|}} \newcommand{\coloneqq}{\mathrel{\vcenter{:}}=} \begin{gather*} % \tag{add} \Ed+\zed \Rightarrow \Ed \quad \zed+\Ed \Rightarrow \Ed \$.7ex] %\tag{kmul} \begin{aligned}[t] \lweight{\zeK}{\Ed} & \Rightarrow \zed & \lweight{\unK}{\Ed} & \Rightarrow \Ed & \lweight{k}{\zed} & \Rightarrow \zed & \lweight{k}{\lweight{h}{\Ed}} &\Rightarrow \lweight{kh}{\Ed} \\ \rweight{\Ed}{\zeK} & \Rightarrow \zed & \rweight{\Ed}{\unK} & \Rightarrow \Ed & \rweight{\zed}{k} & \Rightarrow \zed & \rweight{\rweight{\Ed}{k}}{h} &\Rightarrow \rweight{\Ed}{kh} \end{aligned}\\ \rweight{(\lweight{k}{\Ed})}{h} \Rightarrow \lweight{k}{(\rweight{\Ed}{h})} \quad \rweight{\ell}{k} \Rightarrow \lweight{k}{\ell} \\ %\tag{mul} \Ed \cdot \zed \Rightarrow \zed \quad \zed \cdot \Ed \Rightarrow \zed \\ (\lmulq{k}{\und}) \cdot \Ed \Rightarrow \lweight{k}{\Ed} \quad \Ed \cdot (\lmulq{k}{\und}) \Rightarrow \rweight{\Ed}{k} \\ %\tag{star} \zed^\star \Rightarrow \und \\ % \tag{and} \zed \AND \Ed \Rightarrow \zed \quad \Ed \AND \zed \Rightarrow \zed \\ \zed^c \AND \Ed \Rightarrow \Ed \quad \Ed \AND \zed^c \Rightarrow \Ed \\ (\lmulq{k}{\ell}) \AND (\lmulq{h}{\ell}) \Rightarrow \lweight{kh}{\ell} \quad (\lmulq{k}{\ell}) \AND (\lmulq{h}{\ell'}) \Rightarrow \zed \\ (\lweight{k}{\Ed})^{c} \Rightarrow \Ed^{c} \quad (\rweight{\Ed}{k})^{c} \Rightarrow \Ed^{c} \\ {\Ed^c}^c \Rightarrow \Ed \text{ if the weights are Boolean ($\mathbb{B}\mathbb{F}_2$)} \ 0 \ldiv \Ed \Rightarrow 0 \quad \Ed \ldiv 0 \Rightarrow 0 \quad 1 \ldiv \Ed \Rightarrow \Ed \ 0^t \Rightarrow 0 \quad \ell^t \Rightarrow \ell \ \Ed \tuple \zed \Rightarrow \zed \quad \zed \tuple \Ed \Rightarrow \zed \ \Ed \comp \zed \Rightarrow \zed \quad \zed \comp \Ed \Rightarrow \zed \ \und \comp \und \Rightarrow \und \end{gather*} $$
where ParseError: KaTeX parse error: Undefined control sequence: \Ed at position 1: \̲E̲d̲ stands for any rational expression, is any letter, ParseError: KaTeX parse error: Undefined control sequence: \eword at position 25: …l'\in A \cup \{\̲e̲w̲o̲r̲d̲\} denote two different labels, ParseError: KaTeX parse error: Undefined control sequence: \K at position 9: k, h\in \̲K̲ are weights, and ParseError: KaTeX parse error: Undefined control sequence: \lmulq at position 1: \̲l̲m̲u̲l̲q̲{k}{\ell} denotes either ParseError: KaTeX parse error: Undefined control sequence: \lweight at position 1: \̲l̲w̲e̲i̲g̲h̲t̲{k}{\ell}, or in which case ParseError: KaTeX parse error: Undefined control sequence: \unK at position 5: k = \̲u̲n̲K̲ in the right-hand side. Any subexpression of a form listed to the left of a '' is rewritten as indicated on the right.
Associative Identities
In addition to the trivial identities, the binary operators (addition, conjunction, multiplication) are made associative. Actually, they become variadic instead of being strictly binary. ParseError: KaTeX parse error: Undefined control sequence: \Ed at position 18: …begin{align} \̲E̲d̲ ̲+ (\Fd + \Gd) …
Linear Identities
In addition to the associative identities, the addition is made commutative. Actually, members of sums are now sorted, and weights of equal terms are added. Some identities requires the weightset to be a commutative semiring (i.e., the product of scalars is commutative). $$ \begin{align} \Fd+\Ed & \Rightarrow \Ed+\Fd && \text{if $\Ed < \FdParseError: KaTeX parse error: Expected 'EOF', got '}' at position 1: }̲ \\ \lweight{…$
Distributive Identities
In addition to the linear identities, the multiplication and multiplication by a scalar are distributed on the sum. ParseError: KaTeX parse error: Undefined control sequence: \lweight at position 20: …gin{gather*} \̲l̲w̲e̲i̲g̲h̲t̲{k}{(\Ed+\Fd)} …
Agressive Identities
In addition to the linear identities, the following optimizations are made. $$ \begin{align*} {\Ed^*}^* & \Rightarrow \Ed^* && \text{if the weightset is $\mathbb{B}ParseError: KaTeX parse error: Expected 'EOF', got '}' at position 1: }̲ \\ {\Ed^T}^T…$
Examples
The following helping routine takes a list of expressions as text (*es
), converts them into genuine expression objects (ctx.expression(e, id)
) for each id
, formats them into LaTeX, and puts them in a DataFrame for display.
Input | Trivial | Associative | Linear | Distributive | Agressive | |
---|---|---|---|---|---|---|
0 | a | |||||
1 | a** | |||||
2 | a+b+c | |||||
3 | a+(b+c) | |||||
4 | a+b+c+d | |||||
5 | b+a | |||||
6 | [ab][ab] |
A few more examples, with weights in :
Input | Trivial | Associative | Linear | Distributive | Agressive | |
---|---|---|---|---|---|---|
0 | a | |||||
1 | a** | 1.1-3: RatE[{a...} -> Q](distributive): value is not starrable: a*\na**\n^^^\n while reading expression: "a**" | ||||
2 | a+a+a | |||||
3 | a+a+b | |||||
4 | a+b+a | |||||
5 | <2>(a+b) | |||||
6 | ([ab]+[ab]){2} | |||||
7 | <2>ab<3>cd<5> |
Try it!
The following piece of code defines an interactive function for you to try your own expression. Enter an expression in the text area, then click on the "Run" button.
Failed to display Jupyter Widget of type interactive
.
If you're reading this message in Jupyter Notebook or JupyterLab, it may mean that the widgets JavaScript is still loading. If this message persists, it likely means that the widgets JavaScript library is either not installed or not enabled. See the Jupyter Widgets Documentation for setup instructions.
If you're reading this message in another notebook frontend (for example, a static rendering on GitHub or NBViewer), it may mean that your frontend doesn't currently support widgets.