CS 600.226 Data Structures: Class Challenge
FSA Minimization Algorithms
Back to the Challenge page, or to syllabus.
The bottom-up algorithm (acyclic FSA only).
The top-down general algorithm.
The Bottom-Up Algorithm for Acyclic FSAs
- Initialization
- Find all states with no outgoing arcs. (Since we are dealing
with an acyclic FSA, it is always possible.)
- Split the resulting set into one or two sets according to their
FINAL status (if there are only final states in the set, or if there
are only non-final states, you get only single set). Collapse both (or
the single) set into two (one) new state(s), re-linking all affected
arcs accordingly. Call the resulting (one or two element set)
PROCESSED.
- Main loop (continue until termination condition inside the loop is met):
- Get all such predecessors of the states from the PROCESSED
set, all the arcs of which lead only to some state which
has been in PROCESSED in the past (regardless how far in the
history). Call the resulting set CANDIDATES.
- If the CANDIDATES set is empty, write out the resulting FSA
and stop here. It's done.
- If it is not empty, merge together such subsets of states of the
CANDIDATES, which have the same FINAL status (all of them are
either final or not final), and they share all outgoing
arcs (i.e., their arcs point to the same states using the same arc
symbols). [Some subsets might be of size 1, i.e. some states cannot be
merged with any other state from the CANDIDATES set but 'with
itself'.]
- Merge states in all the above subsets, reattaching arcs as
appropriate. [For subsets of size 1, obviously no arc reattachment is
necessary but consider such states as merged, too.]
- Call the resulting set of merged states PROCESSED
(including those resulting from size 1 subsets, or trivial do-nothing
mergers), and repeat the Main loop.
Example (bottom-up algorithm)
Input FSA
Final states have green color (2,3,5,8,9). Input symbols are only a and b.
All strings which are recognized by this FSA are thus:
a,
b,
bb,
bbba,
bbbb,
baaa,
baab,
ab,
abba,
abbb,
aaaa,
aaab.
Initial Step
Nodes with no outgoing arcs, grouped according to their FINAL status:
- Final: {8,9}
- Not Final: {10}
Call the new two states "9" (for {8,9}) and "10" (for {10}). The
PROCESSED set contains {9,10}.
The FSA after merge:
Main Loop
Iteration 1
- PROCESSED: {9,10}
- CANDIDATES: {6,7} (but not 5, since
there is an arc from 5 to 7, and state 7 has not been PROCESSED yet)
Now the CANDIDATES set is merged into a single state (let's call it
"6"), since the pattern of outgoing arcs from 6 and 7 is the same.
The FSA after merge (states which have been PROCESSED in the past have
a thick circle around them):
Iteration 2
- PROCESSED: {6} (yes, just the state 6 -- the state merged in the
previous iteration --; 9 and 10 remain marked as "PROCESSED in the past"
but it is unnecessary to include them in the PROCESSED set again)
- CANDIDATES: {4,5} (now finally 5 has made it, because both 10 as well as 6 have been PROCESSED in the past)
The states 4 and 5 (CANDIDATES) cannot be merged (not only 4 has no
outgoing arc marked b, but 5 final whereas 4
is not). They have to be kept separate (say let's keep their names,
"4" and "5").
The FSA after merge:
Iteration 3
- PROCESSED: {4,5}
- CANDIDATES: {2,3}
States 2 and 3 can be merged (say, into "2"): both arcs labeled
a go to state 4, and both arcs labeled
b go to state 5. The FSA after merge:
Iteration 4
- PROCESSED: {2}
- CANDIDATES: {1}
No merge possible, but treat "1" as if merged.
The FSA after merge:
Iteration 4
- PROCESSED: {1}
- CANDIDATES: {} ... so that's it!
Back to top, the Challenge page, or to syllabus.
The Top-down Algorithm for (Genreal) FSAs
This algorithm works for general FSAs (i.e., with loops). If the
input is guaranteed to be acyclic, then the following trade-off has to
be taken into consideration: the top-down version needs [linearly]
less memory than the bottom-up version but it might take substantially
longer to compute, unless the 'outgoing arc sharing' tests for the
splits (see below) are implemented in a clever way.
- Initialization
- Split all the states of the input FSA into two sets according
to their FINAL status: one set contains the final states, and the
second the rest (regardless whether there are any outgoing arcs from
either of them). If either set is empty, discard it. [If the discarded
set is an empty set of final states, then the minimized FSA consists
of a non-final start state only. Not very interesting.]
- Call the resulting collection of (one or two) sets an
EQUISET (it defines an equivalence relation on the set of all
states of the FSA). Call an element of the EQUISET (i.e. a (sub)set of
states) a CLASS (of states).
- Main loop (continue until termination condition inside the loop is met):
- For each CLASS (from the EQUISET), split the
CLASS (i.e., a set of states) further into one or more subsets
of states, such that states in the same subset share all of their
outgoing arcs (i.e., their arcs point to the same CLASS (not
states!) using the same arc symbols). [This might be very slow part of
the algorithm if a straightforward implementation is used without any
search speedup techniques.]
- If no CLASS can be split further, write out the resulting
FSA (keeping only one representant of each CLASS, and removing arcs
as appropriate), and stop here. It's done.
- Otherwise (at least one CLASS have been split into two or
more subsets), create a new EQUISET, containing all the new
CLASSes, as the results of any split(s). [Thus, the number of
CLASSes in the EQUISET is always increasing.]
- Repeat the Main loop.
Example (top-down algorithm)
Input FSA
Final states have green color (2,3,5,8,9). Input symbols are only a and b.
All strings which are recognized by this FSA are thus:
a,
b,
bb,
bbba,
bbbb,
baaa,
baab,
ab,
abba,
abbb,
aaaa,
aaab.
Initial Step
Initial split: {2,3,5,8,9} (final states, C1), {1,4,6,7,10} (non-final
states, C2).
Iteration 1
Transition table: (orig. state) x (symbol) -> CLASS (for C1):
Orig. state: Symbol |
2 |
3 |
5 |
8 |
9 |
| a |
C2 |
C2 |
C2 |
- |
- |
| b |
C1 |
C1 |
C2 |
- |
- |
Splitting C1 into: {2,3} (new name for next iteration: C1),
{5} (C2), and {8,9} (C3).
Transition table: (orig. state) x (symbol) -> CLASS (for C2):
Orig. state: Symbol |
1 |
4 |
6 |
7 |
10 |
| a |
C1 |
C2 |
C1 |
C1 |
- |
| b |
C1 |
- |
C1 |
C1 |
- |
Splitting C2 into: {1,6,7} (new name for next iteration: C4),
{4} (C5), and {10} (C6).
Iteration 2
Transition table: (orig. state) x (symbol) -> CLASS (for C1):
Orig. state: Symbol |
2 |
3 |
| a |
C5 |
C5 |
| b |
C2 |
C2 |
No further split possible (good :-)). Gets the name C1 (again).
NB: even though no split is possible now, the CLASS might split
in the future, if any of the classes on the other end of outgoing
arcs do split.
Transition table: (orig. state) x (symbol) -> CLASS (for
C2):
Orig. state: Symbol | 5 |
| a | C6 |
| b | C4 |
Obviously, any CLASS containing one state only may be
safely left intact for the remaining time, except it always gets a new
class name: C2 ({5}; not quite 'new' in this case, but...)
Transition table: (orig. state) x (symbol) -> CLASS (for C3):
Orig. state: Symbol |
8 |
9 |
| a |
- |
- |
| b |
- |
- |
No further split possible. Name for next iteration: C3 ({8,9}).
Transition table: (orig. state) x (symbol) -> CLASS (for C4):
Orig. state: Symbol |
1 |
6 |
7 |
| a |
C1 |
C3 |
C3 |
| b |
C1 |
C3 |
C3 |
Splitting C4 into: {1} (new name for next iteration: C4),
{6,7} (C5).
Since C5 and C6 already contain only one state ({4} and {10},
respectively), assign them only a new name for the next iteration:
C5->C6 ({4}), C6->C7 ({10}).
Iteration 3
Only tables with 2 or more columns shown:
Transition table: (orig. state) x (symbol) -> CLASS (for C1):
Orig. state: Symbol |
2 |
3 |
| a |
C6 |
C6 |
| b |
C2 |
C2 |
No further split possible. Name for next iteration: C1.
Transition table for C2 not shown (1 state only: {5}). New name: C2.
Transition table: (orig. state) x (symbol) -> CLASS (for C3):
Orig. state: Symbol |
8 |
9 |
| a |
- |
- |
| b |
- |
- |
No further split possible. Name for next iteration: C3.
Transition table for C4 not shown (1 state only: {1}). New name: C4.
Transition table: (orig. state) x (symbol) -> CLASS (for C5):
Orig. state: Symbol |
6 |
7 |
| a |
C3 |
C3 |
| b |
C3 |
C3 |
No further split possible.
Transition table for C6 not shown (1 state only: {4}). New name: C6.
Transition table for C7 not shown (1 state only: {10}). New name: C7.
No class split during this iteration - termination
condition met.
Write out the resulting FSA, picking up one of the states from the
sets {2,3}, {6,7}, and {8,9} as a representant of the states of that
CLASS. Ignore arcs going nowhere (after the nodes are deleted).
Or alternatively, simply consider the CLASSes to be states,
number them (easy!) and write out the resulting FSA:
Resulting minimal FSA (using CLASSes as states):
Back to top, the Challenge page, or to syllabus.