## Loops and Overloops for Tree Walking Automata: Companion Page## Vincent HUGOT |

This web page complements the following paper:

Loops and Overloops for Tree Walking Automata

co-authors:*P.C. Héam, O. Kouchnarenko*

Springer LNCS – International Conference CIAA’11 (Long Paper) — March 2011

Additional webpage and slides.

co-authors:

Springer LNCS – International Conference CIAA’11 (Long Paper) — March 2011

Additional webpage and slides.

[Go back to my main page].

This software merely serves to illustrate the algorithms of the paper, which I think is useful given the explosive character of these transformations, which prohibits doing things by hand even for very trivial TWA.

Note about the source code: (in case you read it, which you really don’t need to do)

Although I have some confidence that
the code is correct, it is *very* messy, as it was written before
the transformation was fully formalised – and it was not intended
to be read by anyone else...
Also note that there is a lot of source code which is completely
unrelated to TWA – this is in fact my work tool for all kinds of
tree automata research.
So, in a nutshell: read and use at your own risk.

Related work:

The only other program I known
of which implements the TWA → BUTA transformation (loop-based only, of course) is
dtwa-tools^{1}, available here,
which was developed for
this article.
However, I am fairly certain the algorithm which it implements
is not entirely correct.
We have discussed in our paper how the BUTA states given in the proof sketches
– 𝕋 × 2^{Q2} –
were surprising, as loops cannot be considered independently
of the symbol which supports them. Which
is why our loop-based algorithm uses states in
Σ × 𝕋 × 2^{Q2}.
In dtwa-tools,
the implementor followed the proof sketches, and
went around the lack of symbol by allowing loops
to be closed if *there exists a symbol*
for which there is an ↑-transition
which closes the loop
(and this is exactly what we did at first, too).
However, for the reason discussed in the paper,
this may compute more loops than actually exist –
and we have given a trivial counter-example.

Binaries are available for the following operating systems:

The OCaml source code can be downloaded from my subversion
repository at http://vincent-hugot-projects.googlecode.com/svn/trunk/ocaml/TAged-emptiness2/.
Note that this repository is in constant evolution, which means that
the latest version of the code may have little to do with
the binary file you downloaded.
For this reason, you may want to download the exact revision
corresponding to the binaries you want to compile – or whose
code you simply desire to read. This is why the revision
number is provided – as annotations of the form *(rXXX)* –
in the above list. Note that the revision number pertains to the whole
repository, and not to any specific project in it.

If you want to build the proof of concept from source, you will need to
pull at least ocaml/common (shared files) and ocaml/Taged-emptiness2 (the POC),
and run setup.sh from common, and then make.sh from the POC’s directory.
You will need wget, ocaml, camlp4 (with camlp4-extras), and menhir.
Of course, this assumes some “Unix-y” environment, capable of running shell scripts.
I work with GNU/Linux (Kubuntu), tested with Cygwin under Windows XP SP3, and *assume*
it would work with a Mac. If you are using the Windows/MinGW or Windows/MVSC version of OCaml,
then you will probably have to do a little bit of work to convert the shell scripts into
MSDOS BAT files.

In this section, POC designates the name of the proof-of-concept executable, for instance poc.linux if you downloaded the linux version. The $ indicates a command prompt, regardless of which OS you are actually using.

The program is used in the following way:

$ POC [dbg] [_clean] twa <TWA> <TREE>

- <TWA> is the path to a file describing an input TWA.
For instance, the example TWA X of the article
is coded by the file
// left-most leaf is 'a' TWA Alphabet a 0 b 0 c 0 f 2 g 2 h 2 States ql // left qu // up Inits ql Finals qu Rules a ql {@,<} * qu ALL qu < ! qu ALL 2 ql {@,<} < ql

The syntax is rather straight-forward:- Extra whitespace is ignored.
- Alphabet is followed by a whitespace-separated list of couples
“σ k”, with the effect of adding the symbol σ to Σ
_{k}. - States, Inits and Finals correspond to the sets Q, I and F, respectively, and are all whitespace-separated lists of state symbols.
- Both symbols and states are identifiers matching the following
regular expression
['_' '-' 'A'-'Z' 'a'-'z' '0'-'9' '*' '$' '#' ''' '~']+

- Rules is followed by a whitespace-separated list of “generalised rules”, each of which encodes one or several rules. Each of these is a whitespace-separated list of five mail arguments, (though some whitespace may also appear in some of these arguments). The order of the arguments corresponds to that used in the paper, that is to say ⟨ Σ, Q, I, F, Δ ⟩, and the rule about replacing a single element by a set of elements applies.
- The types are denoted by @, < and >,
corresponding to ⋆
*(root)*,*0**(left son)*and*1**(right son)*, respectively. - The moves are denoted by !, *, < and >,
corresponding to ↑
*(move to parent)*, ↺*(stay here)*, ↙*(go to left son)*and ↘*(go to right son)*, respectively. - ALL and ALL 2 stand respectively for Σ and Σ
_{2}. More generally, ALL k designates Σ_{k}, for k ∈ ℕ. They are shortcuts, equivalent to writing the set*in-extenso*, for instance in the example, ALL 2 expands to {f, g, h}. - Comments in C++ style are supported, as well as OCaml style. That is to say
- The rest of a line is ignored after a //
- Comment blocks are delimited by /* ... */, or (* ... *), and can be nested.

- The suggested file extension for TWA files is .twa, but you can use whatever you like.

- <TREE> is the path to the file coding an input tree. The example tree in the article was simply
g(f(a,b), c)

- Extra whitespace is ignored.
- For big trees, you can use either ( .. ) or { .. } as delimiters – and they must be correctly matched.

- [_clean] is an optional argument which activates or deactivates
the post-processing of generated BUTA by a
*cleanup*operation. Cleanup is enabled by default. The output makes it clear whether this option was on or off. See the following paper for more information about cleanup:

Random Generation of Positive TAGEDs wrt. the Emptiness Problem

co-authors:*P.C. Héam, O. Kouchnarenko*

INRIA Research Report — November 2010

Corresponding slides. - [dbg] is an optional argument which toggles the use of debug statements on the standard error channel. This is disabled by default, and not terribly useful.

EXAMPLE: The output presented in the next section has been generated from a working copy of my subversion repository, using the command :

$ POC _clean twa t/lncs.twa t/ciaa-tree

The program generates a PDF file such as this one: . In this section we give a few explanations concerning the contents of such a file.

- Preamble:
A representation of the input TWA, which
is very close to the input format and
should therefore be fairly self-explanatory.
Note that the sets all appear with their cardinality. For instance

alphab = #6{a/0, b/0, c/0, f/2, g/2, h/2}

denotes the set Σ = {a,b,c, f,g,h}, where the notation “σ/k” means that the symbol σ is of arity k. So in that case we have Σ

_{0}= {a,b,c} and Σ_{2}= {f,g,h}. The #6 at the beginning simply indicates that |Σ| = 6. - Deterministic run:
The TWA attempts a run on the tree;
if it is deterministic, then
this is sufficient to
determine whether the tree is accepted.
If not, whenever several rules apply,
the smallest rule is always chosen – wrt. an arbitrary
ordering of the rules. Thus a non-deterministic TWA
may or may not accept a tree in its language during
this run – in that case the power-run in the next section
is used to decide whether the tree is accepted.
This run is represented in two ways:

- As a sequence of successive configurations written (α, q)
- As a sequence of successive configurations written by representing the whole tree, with the current state occupying the current node. This gives a much more visual representation of the run.

The convention used throughout the document is that states are all written in bold, with

__final__states (resp. initial states) being additionally__underlined__(resp. in italics). - Nondeterministic power-run:
This is a representation of the tree,
where each node α is labelled by the set
of all states q such that there exists an initial state q
_{i}∈ F and a run (є, q_{i}) ↠ ... ↠ (α, q).In other word, each node contains exactly all states in which the TWA can visit this node.

Once a power run has been computed, it is easy to determine whether or not the tree is accepted: it is if and only if a final state appears at the root of the tree.

- Transformations into BUTA:
This section presents representations
of the BUTA resulting from the loop-based and
overloop-based.
There are two parts for each BUTA:

- The correspondance table between the actual
states of the BUTA – which are just numbers –
and the “theoretical” states in
Σ × 𝕋 × 2
^{Q2}or 𝕋 × 2^{Q2}. - The BUTA itself. The code should be fairly self-explanatory,
and is very similar to that of TWA.
Note that the first line of a BUTA looks like, say,
TAGED 'fromTWA-Sigma' [1986] = {

where ’fromTWA-Sigma’ is an arbitrary “name” given to the automaton, and the number between square brackets, here [1986], is its size. The size of a BUTA ⟨Σ,Q,F,Δ⟩ is defined – in the usual way – as:

The final lines which appear at the end of the representation of any BUTA∥ ⟨Σ,Q,F,Δ⟩ ∥ = |Q| + ∑ f(p _{1},…,p_{n})→q ∈ Δ[arity(f) +2]. ==rel = #0{} <>rel = #0{}

are meaningless and should be ignored. They are simply remnants of the fact that the implemented BUTA are actually the more general model of TAGEDs, which admit global equality and disequality constraints. Since we use neither here, those sets are always empty.

For each of these BUTA, a power run is provided. The idea is the same as that of power runs for TWA. This time, each node α is labelled by the set of all states q such that the subtree t|

_{α}∈ Lang(B, q), where B is the BUTA.Or, it could also be seen as the run of a determinised version of B on the tree.

A tree is accepted by the BUTA iff there is a final state at the root of the power run.

- The correspondance table between the actual
states of the BUTA – which are just numbers –
and the “theoretical” states in
Σ × 𝕋 × 2
- Membership by overloops: Shows the set of overloops on the tree. Note that this should be exactly the same set O as in the state (⋆, O) at the top of the power run for the overloop-based BUTA.
- Term in language: Attempts to build a tree accepted by the BUTA – if any exists.
- Over-Approximation:
Runs the over-approximation,
and dumps the contents of the three sets L
_{τ}when the algorithm ends. Note that in practice it stops the instant it finds an accepting overloop.

If you have any questions which are not covered by this document, you are welcome to contact me at vincent.hugot@gmail.com. Time permitting, I will update this page with the answer.

- 1
- version 0.3
- 2

This document was translated from L^{A}T_{E}X byH^{E}V^{E}A.