# The Arithmetic Expression Module

[ Reference ]

[AEXP Module Source File: aexp.c. ]
[AEXP Module Source File File: aexp_token.c ]
[AEXP Module Public Header File: aexp.h. ]
[AEXP Module Private Header File: aexpp.h. ]

The Arithmetic Expression (AEXP) Module allows for encapsulation and evaluation of arithmetic expressions in infix notation, such as "2 * (3 + 5)." It incorporates constants, such as PI, and common mathematical functions, such as cosine and natural logarithm. The module implementation is based loosely on the "Interactive Expression Evaluator" described in your textbook, Data structures and Program Design in C, Second Edition, by Robert Kruse et al. (Prentice Hall, New Jersey, 1997).

The overall flow of the logic is as follows:

1. The user instantiates an AEXP object, specifying an arithmetic expression (as a string) and the length of the arithmetic expression. Requiring the length of the expression string gives the user the option of specifying a non-null-terminated string. This is particularly useful in the case where the expression to be evaluated is embedded in a longer string, such as "The value of the expression '3 + (2 * 24)' is."
2. We make a copy of the expression string and store it internally; the internal copy will be null-terminated.
3. We parse the expression string into an array of tokens.
4. The array of tokens is translated into a new array, representing the expression in postfix notation (i.e. Reverse Polish Notation, or RPN).
5. The tokenized expression in postfix form is evaluated.

Following is a more detailed description of the process and its implementation.

### Tokenization

#### Parsing the String

A word is parsed from the string. A word is a sequence of one or more ASCII characters that fall into one of the following patterns:
1. Number
A number is a string of digits optionally including a single period (decimal point). Each of the following is a number:

3       4.1       .77

2. Word
Any sequence of alphanumeric characters beginning with an alphabetic character. (I know, this is confusing; this means that we can have a word word as opposed to, say, a number word.) A word of this type must correspond to a recognizable constant (such as pi) or function (such as sin).
3. Symbol
A sequence of one or two non-alphanumeric, non-whitespace characters.
```typedef struct aexp__word_s
{
const char      *str;
size_t          len;
AEXP__TOKEN_p_t token;
double          value;
} AEXP__WORD_t, *AEXP__WORD_p_t;```

Where:

str
is a pointer to the first character of the word.
len
is the length of the word (note that a word is not null-terminated).
token
is a pointer to a struct that describes the token corresponding to a word (assuming the word corresponds to a valid token; see below).
value
is the value of the word. The value of a number word or of a word corresponding to a constant (such as pi) is the value of the number or constant. The value of a symbol word or a word corresponding to a function, is the value obtained by evaluating that portion of the expression. A special case is made for left- and right- parenthesis symbol words; these words are discarded in the process that converts the prefix token array into a postfix token array, and are never assigned a value. For example, the expression:
`3 + cos( 0 )`
consists of 6 words:
• 3 and 0 have the values 3 and 0;
• cos has the value of the cosine of 0, which is 1;
• + has the value of 3 + 1, or 4;
• ( (left-parenthesis) and ) (right-parenthesis) have no value.

#### Token Mapping

```typedef struct aexp__token_s
{
const char*          name;
AEXP__TOKEN_TYPE_e_t type;
int                  priority;
double               value;
AEXP__EVALUATE_p_t   evaluate;
} AEXP__TOKEN_t, *AEXP__TOKEN_p_t;```

Where:

name
is the name of the token. Note that name may consist of a symbol, such as "*" (for the multiplication operator token) as well as more recognizable names such as "pi" and "radians." The special name "x" is used for all numeric constants (e.g. 10 and 3.14). The unary plus and unary minus operators have special internal names denoted by AEXP__UP and AEXP__UM, respectively.
type
specifies the type of the token. Possible values (as declared in aexpp.h) are:
AEXP__NONE
This value is only used to label invalid tokens.
AEXP__END_EXPR
This value is presently not used by the module.
AEXP__LEFT_PAREN
Denotes a left parenthesis.
AEXP__RIGHT_PAREN
Denotes a right parenthesis.
AEXP__UNARY_OP
Denotes a unary operator. Unary plus, unary minus and, presently, all functions have this type.
AEXP__BINARY_OP
Denotes a binary operator.
AEXP__OPERAND
This value applies to the internal constants e and pi, and to all numeric constants.
priority
the priority of the token. Priority is only relevant to tokens of type AEXP__UNARY_OP and AEXP__BINARY_OP. All other tokens are arbitrarily assigned a priority of 0. The priority of the operators follows the usual conventions:
• The highest priority is assigned to unary operators (note that, in this context, a function such as sin is considered a unary operator). All unary operators have the same priority.
• Exponentiation has the second highest priority.
• The third highest priority is assigned to the multiplication and division operators.
• The lowest priority is given to the addition and subtraction operators.
value
is the value of the token, if the token has an intrinsic value. For example, the token pi has a value beginning with 3.14; the symbols for addition and multiplication, + and *, on the other hand, have no intrinsic value.
evaluate
is the address of a function that is used to evaluate the subexpression consisting of an operator and its operand(s). Tokens that are not operators have an evaluate value of NULL.

Following is a portion of the table of token values used by the module; the full table can be found in aexp_token.c:

```static AEXP__TOKEN_t tokens_[] =
{
{ "(",       AEXP__LEFT_PAREN,  0, 0,        null_eval   },
{ ")",       AEXP__RIGHT_PAREN, 0, 0,        null_eval   },
{ AEXP__UM,  AEXP__UNARY_OP,    7, 0,        negate_eval },
{ AEXP__UP,  AEXP__UNARY_OP,    7, 0,        pos_eval    },
{ "abs",     AEXP__UNARY_OP,    7, 0,        abs_eval    },
{ "sqrt",    AEXP__UNARY_OP,    7, 0,        sqrt_eval   },
. . .
{ "**",      AEXP__BINARY_OP,   6, 0,        raise_eval  },
{ "x",       AEXP__OPERAND,     0, 0.0,      null_eval   },
{ "pi",      AEXP__OPERAND,     0, AEXP__PI, null_eval   },
{ "e",       AEXP__OPERAND,     0, AEXP__E,  null_eval   }
};```

### Conversion to Postfix Form

Infix to Postfix Conversion

```Infix expression:   3  +  4  *  2
Postfix expression: 3  4  2  *  +
(Priority of * forces 4 * 2 to evaluate first.)

Infix expression:   (3  +  4)  *  2
Postfix expression: 3  4  +  2  *
(Parentheses force 3 + 4 to evaluate first.)

Infix expression:   (3  +  4)  *  (2  -  1)
Postfix expression: 3  4  +  2  1  -  *

Infix expression:   3  *  log(  10  )
Postfix expression: 3  10  log  *
(Priority of unary operator log forces log( 10 ) to evaluate first.)

Infix expression:   log(  10  )  *  3
Postfix expression: 10  log  3  *```

To perform the conversion we will use a stack and a simplified form of Edger Dijkstra's "Shunting Yard" algorithm which is expressed (absent error checking) as follows:

```while infix array is not traversed
next token
if token is an operand
else if token is a left parenthesis
push token onto stack
else if token is a right parenthesis
pop token from stack
while token is not a left paren
pop token from stack
(note: left and right parens are discarded)
else
(token must be an operator)
set termination flag to false
while !flag AND stack is not empty
peek at top of stack
if top-token is a left parenthesis
set flag to TRUE
else if top-token priority < token priority
set flag to TRUE
else
pop top-token from stack
(end while !flag AND stack is not empty)
push token onto stack
(end while infix array is not traversed)
while stack is not empty
pop token from stack

The complete implementation of this algorithm is found in aexp.c

### Evaluation of the Postfix Expression

Once the expression has been successfully converted to postfix form it may be evaluated. The algorithm for doing so is relatively straightforward; once again, a stack will be employed; we will also need a temporary array, (denoted ops) suitable for holding two tokens to store operands pending subexpression evaluation. The algorithm (with error checking omitted) is as follows:
```while the postfix array is not traversed
next token
if token is an operand
push the token's value onto stack
else
(token must be an operator)
if token is a binary operator
pop a token from the stack to ops[1]
pop a token from the stack to ops[0]
invoke the token's evaluator passing ops
push the resulting value onto the stack
(end while postfix array is not traversed)
(Note: there should be exactly one value left on the stack.)
pop the last value off the stack
flag this value as the value of the expression```

### Expression Representation

```typedef struct aexp__expr_s
{
char        *expr_str;
size_t      len;
CDA_BOOL_t  valid;
VEC_ID_t    infix;
VEC_ID_t    postfix;
double      value;
int         errnum;
} AEXP__EXPR_t, *AEXP__EXPR_p_t;```

Where:

expr_str
is a pointer to a buffer that holds the expression string. This buffer is allocated at the time the expression struct is created; the string provided by the caller is copied into the buffer.
len
is the length of the expression string.
valid
is initially true and is set to false any time the expression is determined to be invalid.
infix
is the array (expressed as a vector) to hold the tokens parsed from the expression string.
postifx
is the array (expressed as a vector) to hold the tokens in the postfix form of the expression string (see Conversion to Postfix Form ).
value
holds the value of the expression.
if parsing fails, points to the first character that can't be parsed in expr_str.
errnum
if an evaluation error occurs, holds the value of errno at the time the evaluation fails.

### Notes on Error Checking

Extensive error checking takes place while parsing and evaluating an expression string. Below are a few notes regarding this process.

1. Floating Point Numbers
When checking floating point numbers (type double) it is necessary to make sure that the number is valid before using it in an operation, such as comparing it to zero. This is done with the isnan facility (for is not a number) declared in math.h. In Microsoft, this facility is named _isnan, so
aexpp.h declares the macro ISNAN which is equivalent to _isnan in Microsoft environments, but equivalent to isnan elsewhere:
```// isnan in Microsoftland is _isnan
#if defined( _WIN32 )
#define ISNAN _isnan
#else
#define ISNAN isnan
#endif```
2. Once parsing is successfully completed, and before postfix conversion takes place, the infix expression must undergo the following two validations:

A leading position is a position within an expression at which a subexpression may begin. Here are some examples:

3 + 4 * 6 3, 4, 6
(3 + 4) * 6 left-paren, 3, 4, 6
-3 + 7 unary minus, 3, 7

This leads to two questions: how do we identify a leading position, and what tokens may legitimately occupy a leading position?

• The beginning of an expression;
• Immediately after a right parenthesis;
• Immediately after a unary operator; and
• Immediately after a binary operator.

Tokens that may appear in a leading position are:

• An operand (type AEXP__OPERAND);
• A unary operator (type AEXP__UNARY_OP); and
• A left parenthesis (type AEXP__LEFT_PAREN).

So, after successfully parsing an expression, we can identify all the leading positions and verify that they are occupied by valid tokens:

```if leading-position
if token-type != OPERAND
and token-type != UNARY-OP
and token-type != left-parenthesis
expression is invalid```
2. Balanced Parentheses
We make sure that an expression contains the same number of right parentheses as left parentheses; we also make sure that the number of right parentheses never exceeds the number of left parentheses at any point in the expression: