Step-by-step Conversion of Regular Expressions to C Code On the regular expression: ((a⋅ b)|c)* THOMPSON’S CONSTRUCTION Convert the regular expression to an NFA. Step 1: construct NFA for r1. ( (a ⋅ b) | c r1 )* r 1: 1 a 2 Step 2: construct NFA for r2. ( (a ⋅ b) | c r1 r2 )* r 1: r2: 1 3 a b 2 4 Step 3: construct NFA for r3. ( (a ⋅ b) | c )* r3 r 3: 1 a 2 b 4 Step 4: construct NFA for r4. ( (a ⋅ b) | c )* r3 r4 r 3: r4: 1 5 a b 2 c 6 4 Step 5: construct NFA for r5. ( (a ⋅ b) | c )* r5 𝜀 r 5: 1 a 2 b 4 𝜀 7 8 𝜀 5 c 6 𝜀 Step 6: construct NFA for r5*. 𝜀 1 9 𝜀 a 2 b 4 𝜀 𝜀 8 7 𝜀 5 c 𝜀 6 𝜀 𝜀 10 SUBSET CONSTRUCTION Convert the NFA to a DFA. Draw transition table for DFA Dstates 𝜀 NFA States 1 𝜀 𝜀 9 7 𝜀 a 2 b 4 𝜀 5 c 𝜀 6 𝜀 8 𝜀 10 DFA State Next State a b c Add 𝜀-closure(9) as DFA start state Dstates 𝜀 NFA States 1 𝜀 𝜀 9 7 𝜀 a 2 b {9,7,1,5,10} 4 𝜀 5 c 𝜀 6 𝜀 8 𝜀 10 DFA State A Next State a b c Subset construction: algorithm while (there is an unmarked state T in Dstates) { mark T; for (each input symbol a) { U = 𝜀-closure(move(T, a)); Dtran[T, a] = U if (U is not in Dstates) add U as unmarked state to Dstates; } } Mark state A Dstates 𝜀 NFA States 1 𝜀 𝜀 9 7 𝜀 a 2 b {9,7,1,5,10} 4 𝜀 5 c 𝜀 6 𝜀 8 𝜀 10 DFA State A Next State a b c Compute 𝜀-closure(move(A, a)) Dstates 𝜀 NFA States 1 a 2 𝜀 9 𝜀 7 𝜀 b {9,7,1,5,10} 4 𝜀 5 c 𝜀 6 𝜀 8 𝜀 10 {2} Next State DFA State a A B B b c Compute 𝜀-closure(move(A, b)) Dstates 𝜀 NFA States 1 a 2 𝜀 9 𝜀 7 𝜀 b {9,7,1,5,10} 4 𝜀 5 c 𝜀 6 𝜀 8 𝜀 10 {2} Next State DFA State a b A B - B c Compute 𝜀-closure(move(A, c)) Dstates 𝜀 NFA States 1 a 2 𝜀 9 𝜀 7 𝜀 b {9,7,1,5,10} 4 𝜀 5 c 𝜀 6 𝜀 8 𝜀 10 Next State DFA State a b c A B - C {2} B {6,8,10,7,1,5} C Mark B Dstates 𝜀 a b c {9,7,1,5,10} A B - C {2} B NFA States 1 a 2 𝜀 9 𝜀 7 𝜀 b 4 𝜀 5 c 𝜀 6 𝜀 8 𝜀 10 Next State DFA State {6,8,10,7,1,5} C Compute 𝜀-closure(move(B, a)) Dstates 𝜀 a b c {9,7,1,5,10} A B - C {2} B - NFA States 1 a 2 𝜀 9 𝜀 7 𝜀 b 4 𝜀 5 c 𝜀 6 𝜀 8 𝜀 10 Next State DFA State {6,8,10,7,1,5} C Compute 𝜀-closure(move(B, b)) Dstates 𝜀 a b c {9,7,1,5,10} A B - C {2} B - D NFA States 1 a 2 𝜀 9 𝜀 7 𝜀 b 4 𝜀 5 c 𝜀 6 𝜀 8 𝜀 10 Next State DFA State {6,8,10,7,1,5} C {4,8,7,1,5,10} D Compute 𝜀-closure(move(B, c)) Dstates 𝜀 a b c {9,7,1,5,10} A B - C {2} B - D - NFA States 1 a 2 𝜀 9 𝜀 7 𝜀 b 4 𝜀 5 c 𝜀 6 𝜀 8 𝜀 10 Next State DFA State {6,8,10,7,1,5} C {4,8,7,1,5,10} D Mark C Dstates 𝜀 a b c {9,7,1,5,10} A B - C {2} B - D - {6,8,10,7,1,5} C {4,8,7,1,5,10} D NFA States 1 a 2 𝜀 9 𝜀 7 𝜀 b 4 𝜀 5 c 𝜀 6 𝜀 8 𝜀 10 Next State DFA State Compute 𝜀-closure(move(C, a)) Dstates 𝜀 a b c {9,7,1,5,10} A B - C {2} B - D - {6,8,10,7,1,5} C B {4,8,7,1,5,10} D NFA States 1 a 2 𝜀 9 𝜀 7 𝜀 b 4 𝜀 5 c 𝜀 6 𝜀 8 𝜀 10 Next State DFA State Compute 𝜀-closure(move(C, b)) Dstates 𝜀 a b c {9,7,1,5,10} A B - C {2} B - D - {6,8,10,7,1,5} C B - {4,8,7,1,5,10} D NFA States 1 a 2 𝜀 9 𝜀 7 𝜀 b 4 𝜀 5 c 𝜀 6 𝜀 8 𝜀 10 Next State DFA State Compute 𝜀-closure(move(C, c)) Dstates 𝜀 a b c {9,7,1,5,10} A B - C {2} B - D - {6,8,10,7,1,5} C B - C {4,8,7,1,5,10} D NFA States 1 a 2 𝜀 9 𝜀 7 𝜀 b 4 𝜀 5 c 𝜀 6 𝜀 8 𝜀 10 Next State DFA State Mark D Dstates 𝜀 a b c {9,7,1,5,10} A B - C {2} B - D - {6,8,10,7,1,5} C B - C {4,8,7,1,5,10} D NFA States 1 a 2 𝜀 9 𝜀 7 𝜀 b 4 𝜀 5 c 𝜀 6 𝜀 8 𝜀 10 Next State DFA State Compute 𝜀-closure(move(D, a)) Dstates 𝜀 a b c {9,7,1,5,10} A B - C {2} B - D - {6,8,10,7,1,5} C B - C {4,8,7,1,5,10} D B NFA States 1 a 2 𝜀 9 𝜀 7 𝜀 b 4 𝜀 5 c 𝜀 6 𝜀 8 𝜀 10 Next State DFA State Compute 𝜀-closure(move(D, b)) Dstates 𝜀 a b c {9,7,1,5,10} A B - C {2} B - D - {6,8,10,7,1,5} C B - C {4,8,7,1,5,10} D B - C NFA States 1 a 2 𝜀 9 𝜀 7 𝜀 b 4 𝜀 5 c 𝜀 6 𝜀 8 𝜀 10 Next State DFA State Draw DFA Next State DFA State a b c {9,7,1,5,10} A B - C {2} B - D - {6,8,10,7,1,5} C B - C {4,8,7,1,5,10} D B - C NFA States a a A B b D a c c C c TRANSLATION TO C Convert the DFA into C code. int match(char* next) { goto A; } A: if (*next == '\0') return 1; if (*next == 'a') { next++; goto B; } if (*next == 'c') { next++; goto C; } return 0; B: if (*next == '\0') return 0; if (*next == 'b') { next++; goto D; } return 0; C: if (*next == '\0') return 1; if (*next == 'a') { next++; goto B; } if (*next == 'c') { next++; goto C; } return 0; D: if (*next == '\0') return 1; if (*next == 'a') { next++; goto B; } if (*next == 'c') { next++; goto C; } return 0;