Loops and Overloops for Tree Walking Automata: Companion Page

Vincent HUGOT

This web page complements the following paper:

Loops and Overloops for Tree Walking Automata
co-authors: P.C. Héam, O. Kouchnarenko
Springer LNCS – International Conference CIAA’11 (Long Paper) — March 2011
Additional webpage and slides.

[Go back to my main page].

1 Introduction
2 Getting the Proof of Concept
- 2.1 Precompiled Binaries
- 2.2 Building From Source
3 Using the Program
- 3.1 The Input
- 3.2 The Output
4 Questions

1 Introduction

This software merely serves to illustrate the algorithms of the paper, which I think is useful given the explosive character of these transformations, which prohibits doing things by hand even for very trivial TWA.

Note about the source code: (in case you read it, which you really don’t need to do)

Although I have some confidence that the code is correct, it is very messy, as it was written before the transformation was fully formalised – and it was not intended to be read by anyone else... Also note that there is a lot of source code which is completely unrelated to TWA – this is in fact my work tool for all kinds of tree automata research. So, in a nutshell: read and use at your own risk.

Related work:

The only other program I known of which implements the TWA → BUTA transformation (loop-based only, of course) is dtwa-tools¹, available here, which was developed for this article. However, I am fairly certain the algorithm which it implements is not entirely correct. We have discussed in our paper how the BUTA states given in the proof sketches – 𝕋 × 2^Q² – were surprising, as loops cannot be considered independently of the symbol which supports them. Which is why our loop-based algorithm uses states in Σ × 𝕋 × 2^Q². In dtwa-tools, the implementor followed the proof sketches, and went around the lack of symbol by allowing loops to be closed if there exists a symbol for which there is an ↑-transition which closes the loop (and this is exactly what we did at first, too). However, for the reason discussed in the paper, this may compute more loops than actually exist – and we have given a trivial counter-example.

2 Getting the Proof of Concept

2.1 Precompiled Binaries

Binaries are available for the following operating systems:

Linux : (r720)
Windows : (r720)²
Mac OS : N/A

2.2 Building From Source

The OCaml source code can be downloaded from my subversion repository at http://vincent-hugot-projects.googlecode.com/svn/trunk/ocaml/TAged-emptiness2/. Note that this repository is in constant evolution, which means that the latest version of the code may have little to do with the binary file you downloaded. For this reason, you may want to download the exact revision corresponding to the binaries you want to compile – or whose code you simply desire to read. This is why the revision number is provided – as annotations of the form (rXXX) – in the above list. Note that the revision number pertains to the whole repository, and not to any specific project in it.

If you want to build the proof of concept from source, you will need to pull at least ocaml/common (shared files) and ocaml/Taged-emptiness2 (the POC), and run setup.sh from common, and then make.sh from the POC’s directory. You will need wget, ocaml, camlp4 (with camlp4-extras), and menhir. Of course, this assumes some “Unix-y” environment, capable of running shell scripts. I work with GNU/Linux (Kubuntu), tested with Cygwin under Windows XP SP3, and assume it would work with a Mac. If you are using the Windows/MinGW or Windows/MVSC version of OCaml, then you will probably have to do a little bit of work to convert the shell scripts into MSDOS BAT files.

3 Using the Program

3.1 The Input

In this section, POC designates the name of the proof-of-concept executable, for instance poc.linux if you downloaded the linux version. The $ indicates a command prompt, regardless of which OS you are actually using.

The program is used in the following way:

$ POC [dbg] [_clean] twa <TWA> <TREE>

<TWA> is the path to a file describing an input TWA. For instance, the example TWA X of the article is coded by the file
```
// left-most leaf is 'a'
TWA 

  Alphabet a 0 b 0 c 0 f 2 g 2 h 2
  States 
    ql // left 
    qu // up
  Inits ql
  Finals qu
  Rules 

    a       ql  {@,<}    * qu 
    ALL     qu  <        ! qu 
    ALL 2   ql  {@,<}    < ql  
```
The syntax is rather straight-forward:
- Extra whitespace is ignored.
- Alphabet is followed by a whitespace-separated list of couples “σ k”, with the effect of adding the symbol σ to Σ_k.
- States, Inits and Finals correspond to the sets Q, I and F, respectively, and are all whitespace-separated lists of state symbols.
- Both symbols and states are identifiers matching the following regular expression
```
['_' '-' 'A'-'Z' 'a'-'z' '0'-'9' '*' '$' '#' ''' '~']+ 
```
- Rules is followed by a whitespace-separated list of “generalised rules”, each of which encodes one or several rules. Each of these is a whitespace-separated list of five mail arguments, (though some whitespace may also appear in some of these arguments). The order of the arguments corresponds to that used in the paper, that is to say ⟨ Σ, Q, I, F, Δ ⟩, and the rule about replacing a single element by a set of elements applies.
- The types are denoted by @, < and >, corresponding to ⋆ (root), 0 (left son) and 1 (right son), respectively.
- The moves are denoted by !, *, < and >, corresponding to ↑ (move to parent), ↺ (stay here), ↙ (go to left son) and ↘ (go to right son), respectively.
- ALL and ALL 2 stand respectively for Σ and Σ₂. More generally, ALL k designates Σ_k, for k ∈ ℕ. They are shortcuts, equivalent to writing the set in-extenso, for instance in the example, ALL 2 expands to {f, g, h}.
- Comments in C++ style are supported, as well as OCaml style. That is to say
  - The rest of a line is ignored after a //
  - Comment blocks are delimited by /* ... */, or (* ... *), and can be nested.
- The suggested file extension for TWA files is .twa, but you can use whatever you like.
<TREE> is the path to the file coding an input tree. The example tree in the article was simply
```
 g(f(a,b), c)
```
- Extra whitespace is ignored.
- For big trees, you can use either ( .. ) or { .. } as delimiters – and they must be correctly matched.
[_clean] is an optional argument which activates or deactivates the post-processing of generated BUTA by a cleanup operation. Cleanup is enabled by default. The output makes it clear whether this option was on or off. See the following paper for more information about cleanup:

Random Generation of Positive TAGEDs wrt. the Emptiness Problem
co-authors: P.C. Héam, O. Kouchnarenko
INRIA Research Report — November 2010
Corresponding slides.
[dbg] is an optional argument which toggles the use of debug statements on the standard error channel. This is disabled by default, and not terribly useful.

EXAMPLE: The output presented in the next section has been generated from a working copy of my subversion repository, using the command :

$ POC _clean twa t/lncs.twa t/ciaa-tree

3.2 The Output

The program generates a PDF file such as this one: . In this section we give a few explanations concerning the contents of such a file.

Preamble: A representation of the input TWA, which is very close to the input format and should therefore be fairly self-explanatory.
Note that the sets all appear with their cardinality. For instance
```
alphab = #6{a/0, b/0, c/0, f/2, g/2, h/2}
```
denotes the set Σ = {a,b,c, f,g,h}, where the notation “σ/k” means that the symbol σ is of arity k. So in that case we have Σ₀ = {a,b,c} and Σ₂ = {f,g,h}. The #6 at the beginning simply indicates that |Σ| = 6.
Deterministic run: The TWA attempts a run on the tree; if it is deterministic, then this is sufficient to determine whether the tree is accepted. If not, whenever several rules apply, the smallest rule is always chosen – wrt. an arbitrary ordering of the rules. Thus a non-deterministic TWA may or may not accept a tree in its language during this run – in that case the power-run in the next section is used to decide whether the tree is accepted.
This run is represented in two ways:
1. As a sequence of successive configurations written (α, q)
2. As a sequence of successive configurations written by representing the whole tree, with the current state occupying the current node. This gives a much more visual representation of the run.
The convention used throughout the document is that states are all written in bold, with final states (resp. initial states) being additionally underlined (resp. in italics).
Nondeterministic power-run: This is a representation of the tree, where each node α is labelled by the set of all states q such that there exists an initial state q_i ∈ F and a run (є, q_i) ↠ ... ↠ (α, q).
In other word, each node contains exactly all states in which the TWA can visit this node.
Once a power run has been computed, it is easy to determine whether or not the tree is accepted: it is if and only if a final state appears at the root of the tree.
Transformations into BUTA: This section presents representations of the BUTA resulting from the loop-based and overloop-based.
There are two parts for each BUTA:
1. The correspondance table between the actual states of the BUTA – which are just numbers – and the “theoretical” states in Σ × 𝕋 × 2^Q² or 𝕋 × 2^Q².
2. The BUTA itself. The code should be fairly self-explanatory, and is very similar to that of TWA. Note that the first line of a BUTA looks like, say,
```
TAGED 'fromTWA-Sigma' [1986] = {
```
  where ’fromTWA-Sigma’ is an arbitrary “name” given to the automaton, and the number between square brackets, here [1986], is its size. The size of a BUTA ⟨Σ,Q,F,Δ⟩ is defined – in the usual way – as:
  ∥ ⟨Σ,Q,F,Δ⟩ ∥ = |Q| +
  
  ∑
  
  f(p₁,…,p_n)→q ∈ Δ
  
  [arity(f) +2].
  
  The final lines which appear at the end of the representation of any BUTA
```
==rel = #0{}     <>rel = #0{}
```
  are meaningless and should be ignored. They are simply remnants of the fact that the implemented BUTA are actually the more general model of TAGEDs, which admit global equality and disequality constraints. Since we use neither here, those sets are always empty.
For each of these BUTA, a power run is provided. The idea is the same as that of power runs for TWA. This time, each node α is labelled by the set of all states q such that the subtree t|_α∈ Lang(B, q), where B is the BUTA.
Or, it could also be seen as the run of a determinised version of B on the tree.
A tree is accepted by the BUTA iff there is a final state at the root of the power run.
Membership by overloops: Shows the set of overloops on the tree. Note that this should be exactly the same set O as in the state (⋆, O) at the top of the power run for the overloop-based BUTA.
Term in language: Attempts to build a tree accepted by the BUTA – if any exists.
Over-Approximation: Runs the over-approximation, and dumps the contents of the three sets L_τ when the algorithm ends. Note that in practice it stops the instant it finds an accepting overloop.

4 Questions

If you have any questions which are not covered by this document, you are welcome to contact me at vincent.hugot@gmail.com. Time permitting, I will update this page with the answer.

1: version 0.3
2: Compiled with Cygwin. Thus you may need the Cygwin DLL to run it.

This document was translated from L^AT_EX by H^EV^EA.