Definition:Regular Expression

From ProofWiki
Jump to navigation Jump to search



Definition

A regular expression is an algebraic structure on an alphabet $\Sigma$ defined as follows:

  • The empty-set regular expression, $\O$, is a regular expression.
  • The empty-word regular expression, $\epsilon$, is a regular expression.
  • Every $\sigma \in \Sigma$ is a regular expression. (These are called literals.)
  • If $R_1$ and $R_2$ are regular expressions, $R_1 R_2$ is a regular expression (concatenation).
  • If $R_1$ and $R_2$ are regular expressions, $R_1 \mid R_2$ is a regular expression (alternation).
  • If $R$ is a regular expression, $R^*$ is a regular expression (Kleene star).


Every regular expression $R$ on an alphabet $\Sigma$ defines a language $\map L R \subseteq \Sigma^*$, where $\Sigma^*$ is the set of all (finite-length) words (sequences) of symbols in $\Sigma$:

  • $\map L \O = \O$ (the empty set).
  • $\map L \epsilon = \set {\sqbrk \,}$ (the set containing only the empty word).
  • If $R$ is a literal $\sigma$, $\map L R = \set {\sqbrk \sigma}$ (i.e., the set containing only the single-symbol word “$\sigma$”).
  • If $R$ is a concatenation $R_1 R_2$, $\map L R = \set {w_1 w_2: w_1 \in \map L {R_1}, w_2 \in \map L {R_2} }$, where $w_1 w_2$ is the concatenation of the words $w_1$ and $w_2$.
  • If $R$ is an alternation $R_1 \mid R_2$, $\map L R = L {R_1} \cup \map L {R_2}$.
  • If $R$ is a Kleene star $R_1^*$, $\map L R$ is the smallest set satisfying the following:
    • $\sqbrk \, \in \map L R$ (the empty word is in the set);
    • if $w_1 \in \map L R$ and $w_2 \in \map L {R_1}$, then $w_1 w_2 \in \map L R$.