Regular Expression to NFA
Topics
Purpose
-
In this unit, our goal is to show that any regular expression
r can be transformed into an "equivalent" NFA N
where Lr = L(N).
- High Level Comments/Question
The Constructive Proof
- Preliminary Question
- What will we perform induction on in this proof?
- Any regular expression has finite length.
- In particular, any regular expression applies the three operators
*, +, and concatenation a finite number of times.
- We will prove this result by induction on the number of operators
in r
- We will prove that any regular expression r can be
transformed into an NFA N such that
Lr = L(N) by induction on the number of
operators in r.
- The Basis Step
- The number of operators is 0
- Given a finite alphabet
,
there are only a finite number of regular expressions with 0
operators (how many?).
- Below, we list each such regular expression which you can
click on to see the appropriate NFA.
- The Inductive Step
- The Inductive Hypothesis (strong induction)
- We assume that any regular expression rwith between 0 and
n operators can be transformed into an equivalent
NFA N where Lr = L(N) for
n
0.
- The Statement to Be Proven in the Inductive Step
- Any regular expression rwith
n+1 operators can be transformed into an equivalent
NFA N where Lr = L(N) for
n
0.
- The proof of the Inductive Step
- Let r be any regular expression with n+1 operators where
n
0.
- Examining the precedence of the n+1 operators, one of the
operators must be the "last" to be applied.
- This leads to three cases depending on which operator this last
operator is:
- concatenation:
- In this case, r = st where both s and t
are regular expressions with at most n operators.
(Why?)
- Since both s and t
are regular expressions with at most n operators,
we can apply the induction hypothesis to conclude that
there exists two NFAs Ns and
Nt such that
Ls = L(Ns) and
Lt = L(Nt).
- We now show how NFAs
Ns and Nt
can be combined to form an NFA
Nr such that
Lr = L(Nr).
- The NFA Nr will be the NFAs
Ns and Nt connected as follows:
- The initial state of NFA Nr will be the
initial state of NFA Ns.
- The set of final states of NFA Nr will be the
set of final states of NFA Nt.
- The set of transitions of NFA Nr will be the
set of transitions of NFAs
Ns and Nt plus
transitions between all the final states of
NFA Ns to the initial state of
Nt.
- Click here for a picture
(not yet active).
- Argument that Lr = L(Nr)
(not yet filled in).
- + (Choice):
- In this case, r = s+t where both s and t
are regular expressions with at most n operators.
- Since both s and t
are regular expressions with at most n operators,
we can apply the induction hypothesis to conclude that
there exists two NFAs Ns and
Nt such that
Ls = L(Ns) and
Lt = L(Nt).
- We now show how NFAs
Ns and Nt
can be combined to form an NFA
Nr such that
Lr = L(Nr).
- The NFA Nr will be the NFAs
Ns and Nt connected as follows:
- The initial state of NFA Nr will be a new
state q.
- The set of final states of NFA Nr will be the
set of final states of NFA Ns and the
set of final states of NFA Nt.
- The set of transitions of NFA Nr will be the
set of transitions of NFAs
Ns and Nt plus
transitions between q and the initial state of both
NFA Ns and NFA
Nt.
- Click here for a picture
(not yet active).
- Argument that Lr = L(Nr)
(not yet filled in).
- * (Closure):
- In this case, r = s* where s is
a regular expression with n operators.
- Since s
is a regular expression with n operators,
we can apply the induction hypothesis to conclude that
there exists an NFA Ns such that
Ls = L(Ns).
- We now show how NFA
Ns
can be altered to form an NFA
Nr such that
Lr = L(Nr).
- The NFA Nr will be NFA
Ns modified as follows:
- The initial state of NFA Nr will be a new
state q.
- The set of final states of NFA Nr will be the
set of final states of NFA Ns.
- The set of transitions of NFA Nr will be the
set of transitions of NFA
Ns plus
transitions between all the final states of NFA
NFA Ns and the new state q as well
as an
transition between the new state q
and the initial state of the
NFA Ns.
- Click here for a picture
(not yet active).
- Argument that Lr = L(Nr)
(not yet filled in).
Example with Questions:
r = (ab+b)* + (a + ba)a
- How many operators are in the regular
expression r?
- Which of the operators of r
is the "last" operator of r?
- Based on the answer to the above question,
what would s and t be?
- Answer the same three questions for
the regular expression s that is at least part
of the answer to question 3 above.
- Applying the above construction
without simplification, what does the
transition diagram for the NFA corresponding to ab
look like?
- Applying the above construction
without simplification, what does the
transition diagram for the NFA corresponding to ab+b
look like?
- Applying the above construction
without simplification, what does the
transition diagram for the NFA corresponding to
(ab+b)* look like?
- Applying the above construction
without simplification, what does the
complete transition diagram for the NFA N look like?
- How does this proof fit into the
generic element proof technique
framework, and what basic property of regular expressions
did we use?
- Give a recursive algorithm description
(in pseudocode or some programming language)
of the transformation of a regular expression to an equivalent
NFA.
- Complete list of answers to
the above questions.
Regular Languages Chapter (up one level)
Table of Contents (up two levels)