Derived-Term Automata of Multitape Expressions with Composition
This page is a complement to the paper Derived-Term Automata of Multitape Expressions with Composition. This page exists in several forms:
A Dynamic Notebook, which can be edited, played with
A static HTML page, whose graphical rendering is always correct.
More information is available here:
You may change the cells, and run then. To run a cell in a notebook, hit "Control-Enter" (in which case the focus stays on the same cell) or "Shift-Enter" (focus goes to the next cell). Beware that depending on the requested operations, Vcsn may generate and compile code, which may be a really slow process on small machines (about a minute): be patient! However, the code is compiled only once: successive uses will be way faster.
To run all the cells anew, select "Restart & Run All" in the "Kernel" menu above. ParseError: KaTeX parse error: \newcommand{\coloneqq} attempting to redefine \coloneqq; use \renewcommand
Introduction: Motivating Examples
First we introduce the context we are interested in: labels are letter-or-empty-word (lan
), two tapes (lat
), weights are in the tropical semiring .
The following examples show how the multiplication and addition of behave:
The first automaton, , is obtained from the following rational expression (the empty word is noted \e
in Vcsn, and rendered even as an expression, whereas in the paper it is written ).
Or, using syntactic sugar (, ):
Admittedly, the display of multitape labels on transition could be improved. The rather cryptic [ε|a-a|εb|ε]
denotes ε|a+ε|b+a|ε+b|ε
. This will be addressed in future versions of Vcsn.
Ng's automaton is:
Or, cleaned from state decorations:
Mohri factors his automaton in the composition of and :
Name | Expression | Automaton |
The resulting automaton is exactly the automaton .
Section 2.2: Weighted Rational Expressions
The set of operators supported by Vcsn is quite large, see the documentation for details. We will show a few simple examples.
We start with a simple example of a single tape expression whose labels are letters-or-empty-word (lan
), and weights are rational numbers. We first introduce what in Vcsn is called a context which is simply an object to type automata, expressions, expansions, etc.
From this context, we can build (typed) expressions:
Let's introduce qexp
as a shorthand to build an expression from a string.
The usual quantifier () is written {+}
, to disambiguate from the binary operator ():
The following examples show the trivial identities in action.
. Input | . None | . Trivial | |
---|---|---|---|
0 | \z+a | ||
1 | a+a | ||
2 | \za | ||
3 | \ea | ||
4 | (\e\za+\z)* |
Vcsn also provides Python operators to assemble expressions.
The Python operator |
builds a multitape expression from simple tape ones, and the Python operator @
denotes composition. As a matter of fact, the reason why composition is denoted in Vcsn is because that is the only remaining operator available in Python. Besides, more beautiful symbols, such as for instance, are technical to type.
Section 3: Rational Expansions
We start with a simple example of a single tape expression whose labels are letters-or-empty-word (lan
), and weights are rational numbers.
Its expansion is:
And its derived-term automaton (contrary to the paper, in Vcsn the empty expression is denoted instead of ):
which is smaller than its standard automaton:
The following example demonstrates how successors of a state are computed. It uses the shuffle operator, denoted :
, for illustration.
Examples 1 to 4: A Simple Multitape Expression (also Figure 1)
Labels are pairs of letter-or-empty-word (lat<lan, lan>
), and weights are rational numbers (q
).
The expression is:
Its expansion is:
The derived-term automaton of , , is:
The ten shortest accepted multitape words are:
Example 5: An Exponential Number of States
We introduce a three-tape context. Unfortunately the automatic graphical rendering is poor.
From multitape automata currently Vcsn extracts only simple-tape expressions over multitape generators (e.g. ) instead of general mutitape expressions (e.g. ):
In the case of the corresponding five-tape expression, the generated graphical rendering is so messy that it is totally useless (try it if you want!). So instead of displaying the automaton, we list its states.
Example 6: A Sed-like Substitution
Again, the extracted expression is less readable than the one we started from.
A More Complex Expression
The previous examples often looked like sed-like substitutions, in the sense that the first tape was often a composite expression, but the second tape a simple label. There is no such limitation.
Section 5.3: Derived-Term Automaton with Composition
Here we show, as will be discussed in Section 6.2, the role played by identities. Here, they are fully disabled (via the 'none'
argument).
Then we show an example with the endmarker (as discussed in Section 6.4).
Example 8
In this example we used symbolic weights, . To simulate this we introduce expressions whose weights are expressions (whose weights are in ):
Then we build and display the expression, its expansion, and its derived-term automaton.
ParseError: KaTeX parse error: Undefined control sequence: \varepsilon@ at position 277: …h\right\rangle \̲v̲a̲r̲e̲p̲s̲i̲l̲o̲n̲@̲\left(a|\vareps…
Section 6.1: Discussion on the Validity of Automata
The following function takes a rational expression, and generates derived-term automata from denormalized expansions (simpler equation, but generates many spontaneous transitions), and from "normal" expansions. The former are often simpler, but may generate invalid automata.
Expression | Denormalized | Normal |
Expression | Denormalized | Normal |
Expression | Denormalized | Normal |
Expression | Denormalized | Normal |
Expression | Denormalized | Normal |
Section 6.2: Discussion on Identities
No identities: Expression | No identities: Automaton |
Trivial identities: Expression | Trivial identities: Automaton |
No identities: Expression | No identities: Automaton |
Error: Q: value is not starrable: 1 | |
Trivial identities: Expression | Trivial identities: Automaton |
No identities: Expression | No identities: Automaton |
Trivial identities: Expression | Trivial identities: Automaton |
Runtime Performances
We compare the quality of the generated automata (smaller is better) and the duration of the generation procedure (again, smaller is better). Vcsn feature two generation procedures for multitape expressions: derived-term, and "inductive".
"Inductive" is a simple recursive translation of the rational expression that applies the corresponding operation on the automata. For single-tape expressions, it uses the "standard" operations on automata, which are the ones used to implement the expression.standard algorithm. The comparison will be performed on twenty randomly generated expressions.
Rational Expression | DT: transitions | Ind: transitions | DT: states | Ind: states | DT: duration | Ind: duration |
3 | 3 | 3 | 4 | 33ms | 30ms | |
0 | 5 | 0 | 6 | 48ms | 90ms | |
0 | 3 | 0 | 4 | 31ms | 75ms | |
11 | 11 | 6 | 6 | 148ms | 46ms | |
4 | 7 | 3 | 5 | 48ms | 65ms | |
ParseError: KaTeX parse error: Undefined control sequence: \varepsilon@ at position 45: … \,\left(\left(\̲v̲a̲r̲e̲p̲s̲i̲l̲o̲n̲@̲\left(a|\vareps… | 0 | 0 | 0 | 3 | 13ms | 76ms |
1 | 1 | 2 | 2 | 36ms | 50ms | |
16 | 27 | 5 | 9 | 153ms | 61ms | |
1 | 2 | 2 | 3 | 33ms | 52ms | |
3 | 3 | 2 | 4 | 40ms | 34ms | |
3 | 3 | 4 | 4 | 45ms | 29ms | |
3 | 3 | 2 | 4 | 38ms | 30ms | |
4 | 4 | 4 | 5 | 29ms | 42ms | |
3 | 4 | 3 | 4 | 70ms | 50ms | |
4 | 4 | 3 | 5 | 25ms | 37ms | |
0 | 0 | 0 | 1 | 11ms | 2ms | |
2 | 2 | 2 | 2 | 47ms | 15ms | |
0 | 0 | 1 | 1 | 13ms | 3ms | |
3 | 4 | 3 | 4 | 26ms | 55ms | |
2 | 3 | 2 | 4 | 30ms | 35ms |
Unfortunately, random expressions are rarely interesting (as can be seen by the number of transitions).
Now we compare the behavior of both algorithms on a familly of rational expressions: .
Rational Expression | DT: transitions | Ind: transitions | DT: states | Ind: states | DT: duration | Ind: duration |
6 | 16 | 2 | 5 | 36ms | 45ms | |
5 | 15 | 2 | 6 | 31ms | 56ms | |
6 | 16 | 3 | 7 | 35ms | 68ms | |
7 | 17 | 4 | 8 | 42ms | 82ms | |
8 | 18 | 5 | 9 | 45ms | 95ms | |
9 | 19 | 6 | 10 | 49ms | 109ms | |
10 | 20 | 7 | 11 | 54ms | 123ms | |
11 | 21 | 8 | 12 | 60ms | 141ms | |
12 | 22 | 9 | 13 | 63ms | 155ms | |
13 | 23 | 10 | 14 | 69ms | 174ms | |
14 | 24 | 11 | 15 | 73ms | 189ms | |
15 | 25 | 12 | 16 | 76ms | 202ms | |
16 | 26 | 13 | 17 | 83ms | 220ms | |
17 | 27 | 14 | 18 | 84ms | 238ms | |
18 | 28 | 15 | 19 | 103ms | 304ms | |
19 | 29 | 16 | 20 | 101ms | 299ms | |
20 | 30 | 17 | 21 | 103ms | 314ms | |
21 | 31 | 18 | 22 | 107ms | 344ms | |
22 | 32 | 19 | 23 | 109ms | 356ms | |
23 | 33 | 20 | 24 | 130ms | 370ms |
Section 6.6: Discussion on the Performances
In this section, we compare the size of automata generated either using the "standard automaton" construction (aka, Glushkov automata), or the expansion-based derived-term automaton as exposed in the paper.
Expression | DTerm states | Dterm transitions | Denorm DTerm states | Denorm Dterm transitions | Inductive states | Inductive transitions |
1 | 6 | 1 | 6 | 7 | 42 | |
2 | 14 | 2 | 14 | 9 | 60 | |
1 | 5 | 1 | 5 | 6 | 30 | |
1 | 5 | 1 | 5 | 6 | 30 | |
1 | 6 | 11 | 26 | 7 | 42 | |
4 | 7 | 4 | 7 | 13 | 16 | |
3 | 3 | 3 | 3 | 4 | 4 | |
7 | 19 | 7 | 19 | 8 | 26 | |
31 | 211 | 31 | 211 | 32 | 242 | |
3 | 8 | 3 | 8 | 5 | 14 | |
2 | 2 | 5 | 8 | 3 | 3 | |
3 | 6 | 3 | 6 | 4 | 9 | |
1 | 0 | 1 | 0 | 1 | 0 | |
3 | 6 | 3 | 6 | 3 | 6 | |
1 | 0 | 1 | 0 | 1 | 0 |