# FSA Minimization Algorithms

Back to the Challenge page, or to syllabus.

The bottom-up algorithm (acyclic FSA only).
The top-down general algorithm.

## The Bottom-Up Algorithm for Acyclic FSAs

• Initialization
• Find all states with no outgoing arcs. (Since we are dealing with an acyclic FSA, it is always possible.)
• Split the resulting set into one or two sets according to their FINAL status (if there are only final states in the set, or if there are only non-final states, you get only single set). Collapse both (or the single) set into two (one) new state(s), re-linking all affected arcs accordingly. Call the resulting (one or two element set) PROCESSED.
• Main loop (continue until termination condition inside the loop is met):
• Get all such predecessors of the states from the PROCESSED set, all the arcs of which lead only to some state which has been in PROCESSED in the past (regardless how far in the history). Call the resulting set CANDIDATES.
• If the CANDIDATES set is empty, write out the resulting FSA and stop here. It's done.
• If it is not empty, merge together such subsets of states of the CANDIDATES, which have the same FINAL status (all of them are either final or not final), and they share all outgoing arcs (i.e., their arcs point to the same states using the same arc symbols). [Some subsets might be of size 1, i.e. some states cannot be merged with any other state from the CANDIDATES set but 'with itself'.]
• Merge states in all the above subsets, reattaching arcs as appropriate. [For subsets of size 1, obviously no arc reattachment is necessary but consider such states as merged, too.]
• Call the resulting set of merged states PROCESSED (including those resulting from size 1 subsets, or trivial do-nothing mergers), and repeat the Main loop.

## Example (bottom-up algorithm)

#### Input FSA

Final states have green color (2,3,5,8,9). Input symbols are only a and b.

All strings which are recognized by this FSA are thus:
a, b, bb, bbba, bbbb, baaa, baab, ab, abba, abbb, aaaa, aaab.

#### Initial Step

Nodes with no outgoing arcs, grouped according to their FINAL status:
• Final: {8,9}
• Not Final: {10}

Call the new two states "9" (for {8,9}) and "10" (for {10}). The PROCESSED set contains {9,10}. The FSA after merge:

#### Iteration 1

• PROCESSED: {9,10}
• CANDIDATES: {6,7} (but not 5, since there is an arc from 5 to 7, and state 7 has not been PROCESSED yet)
Now the CANDIDATES set is merged into a single state (let's call it "6"), since the pattern of outgoing arcs from 6 and 7 is the same. The FSA after merge (states which have been PROCESSED in the past have a thick circle around them):

#### Iteration 2

• PROCESSED: {6} (yes, just the state 6 -- the state merged in the previous iteration --; 9 and 10 remain marked as "PROCESSED in the past" but it is unnecessary to include them in the PROCESSED set again)
• CANDIDATES: {4,5} (now finally 5 has made it, because both 10 as well as 6 have been PROCESSED in the past)
The states 4 and 5 (CANDIDATES) cannot be merged (not only 4 has no outgoing arc marked b, but 5 final whereas 4 is not). They have to be kept separate (say let's keep their names, "4" and "5"). The FSA after merge:

#### Iteration 3

• PROCESSED: {4,5}
• CANDIDATES: {2,3}
States 2 and 3 can be merged (say, into "2"): both arcs labeled a go to state 4, and both arcs labeled b go to state 5. The FSA after merge:

#### Iteration 4

• PROCESSED: {2}
• CANDIDATES: {1}
No merge possible, but treat "1" as if merged. The FSA after merge:

#### Iteration 4

• PROCESSED: {1}
• CANDIDATES: {} ... so that's it!

## The Top-down Algorithm for (Genreal) FSAs

This algorithm works for general FSAs (i.e., with loops). If the input is guaranteed to be acyclic, then the following trade-off has to be taken into consideration: the top-down version needs [linearly] less memory than the bottom-up version but it might take substantially longer to compute, unless the 'outgoing arc sharing' tests for the splits (see below) are implemented in a clever way.
• Initialization
• Split all the states of the input FSA into two sets according to their FINAL status: one set contains the final states, and the second the rest (regardless whether there are any outgoing arcs from either of them). If either set is empty, discard it. [If the discarded set is an empty set of final states, then the minimized FSA consists of a non-final start state only. Not very interesting.]
• Call the resulting collection of (one or two) sets an EQUISET (it defines an equivalence relation on the set of all states of the FSA). Call an element of the EQUISET (i.e. a (sub)set of states) a CLASS (of states).
• Main loop (continue until termination condition inside the loop is met):
• For each CLASS (from the EQUISET), split the CLASS (i.e., a set of states) further into one or more subsets of states, such that states in the same subset share all of their outgoing arcs (i.e., their arcs point to the same CLASS (not states!) using the same arc symbols). [This might be very slow part of the algorithm if a straightforward implementation is used without any search speedup techniques.]
• If no CLASS can be split further, write out the resulting FSA (keeping only one representant of each CLASS, and removing arcs as appropriate), and stop here. It's done.
• Otherwise (at least one CLASS have been split into two or more subsets), create a new EQUISET, containing all the new CLASSes, as the results of any split(s). [Thus, the number of CLASSes in the EQUISET is always increasing.]
• Repeat the Main loop.

## Example (top-down algorithm)

#### Input FSA

Final states have green color (2,3,5,8,9). Input symbols are only a and b.

All strings which are recognized by this FSA are thus:
a, b, bb, bbba, bbbb, baaa, baab, ab, abba, abbb, aaaa, aaab.

#### Initial Step

Initial split: {2,3,5,8,9} (final states, C1), {1,4,6,7,10} (non-final states, C2).

#### Iteration 1

Transition table: (orig. state) x (symbol) -> CLASS (for C1):

 Orig. state:Symbol 2 3 5 8 9 a C2 C2 C2 - - b C1 C1 C2 - -

Splitting C1 into: {2,3} (new name for next iteration: C1), {5} (C2), and {8,9} (C3).

Transition table: (orig. state) x (symbol) -> CLASS (for C2):

 Orig. state:Symbol 1 4 6 7 10 a C1 C2 C1 C1 - b C1 - C1 C1 -

Splitting C2 into: {1,6,7} (new name for next iteration: C4), {4} (C5), and {10} (C6).

#### Iteration 2

Transition table: (orig. state) x (symbol) -> CLASS (for C1):

 Orig. state:Symbol 2 3 a C5 C5 b C2 C2

No further split possible (good :-)). Gets the name C1 (again). NB: even though no split is possible now, the CLASS might split in the future, if any of the classes on the other end of outgoing arcs do split.

Transition table: (orig. state) x (symbol) -> CLASS (for C2):

 Orig. state:Symbol 5 a C6 b C4

Obviously, any CLASS containing one state only may be safely left intact for the remaining time, except it always gets a new class name: C2 ({5}; not quite 'new' in this case, but...)

Transition table: (orig. state) x (symbol) -> CLASS (for C3):

 Orig. state:Symbol 8 9 a - - b - -

No further split possible. Name for next iteration: C3 ({8,9}).

Transition table: (orig. state) x (symbol) -> CLASS (for C4):

 Orig. state:Symbol 1 6 7 a C1 C3 C3 b C1 C3 C3

Splitting C4 into: {1} (new name for next iteration: C4), {6,7} (C5).

Since C5 and C6 already contain only one state ({4} and {10}, respectively), assign them only a new name for the next iteration: C5->C6 ({4}), C6->C7 ({10}).

#### Iteration 3

Only tables with 2 or more columns shown:

Transition table: (orig. state) x (symbol) -> CLASS (for C1):

 Orig. state:Symbol 2 3 a C6 C6 b C2 C2

No further split possible. Name for next iteration: C1.

Transition table for C2 not shown (1 state only: {5}). New name: C2.

Transition table: (orig. state) x (symbol) -> CLASS (for C3):

 Orig. state:Symbol 8 9 a - - b - -

No further split possible. Name for next iteration: C3.

Transition table for C4 not shown (1 state only: {1}). New name: C4.

Transition table: (orig. state) x (symbol) -> CLASS (for C5):

 Orig. state:Symbol 6 7 a C3 C3 b C3 C3

No further split possible.

Transition table for C6 not shown (1 state only: {4}). New name: C6.

Transition table for C7 not shown (1 state only: {10}). New name: C7.

No class split during this iteration - termination condition met.

Write out the resulting FSA, picking up one of the states from the sets {2,3}, {6,7}, and {8,9} as a representant of the states of that CLASS. Ignore arcs going nowhere (after the nodes are deleted).

Or alternatively, simply consider the CLASSes to be states, number them (easy!) and write out the resulting FSA:

Resulting minimal FSA (using CLASSes as states):