The Arithmetic Expression (AEXP) Module allows for encapsulation and evaluation of arithmetic expressions in

infixnotation, such as "2 * (3 + 5)." It incorporates constants, such as PI, and common mathematical functions, such ascosineandnatural logarithm.The module implementation is based loosely on the "Interactive Expression Evaluator" described in your textbook,Data structures and Program Design in C, Second Edition,by Robert Kruse et al. (Prentice Hall, New Jersey, 1997).The overall flow of the logic is as follows:

- The user instantiates an AEXP object, specifying an arithmetic expression (as a string) and the length of the arithmetic expression. Requiring the length of the expression string gives the user the option of specifying a non-null-terminated string. This is particularly useful in the case where the expression to be evaluated is embedded in a longer string, such as "The value of the expression '3 + (2 * 24)' is."
- We make a copy of the expression string and store it internally; the internal copy will be null-terminated.
- We parse the expression string into an array of tokens.
- The array of tokens is translated into a new array, representing the expression in
postfixnotation (i.e.Reverse Polish Notation,orRPN).- The tokenized expression in postfix form is evaluated.
Following is a more detailed description of the process and its implementation.

## Tokenization

Identifying a token in a string consists of two steps: parsing the string into

wordsand mapping thewordstotokens.## Parsing the String

Awordis parsed from the string. Awordis a sequence of one or more ASCII characters that fall into one of the following patterns:

Number

A number is a string of digits optionally including a single period (decimal point). Each of the following is a number:3 4.1 .77

Word

Any sequence of alphanumeric characters beginning with an alphabetic character. (I know, this is confusing; this means that we can have awordword as opposed to, say, anumberword.) A word of this type must correspond to a recognizable constant (such aspi) or function (such assin).Symbol

A sequence of one or two non-alphanumeric, non-whitespace characters.In this module a

wordis described by the AEXP__WORD_t struct declared in aexpp.h:typedef struct aexp__word_s { const char *str; size_t len; AEXP__TOKEN_p_t token; double value; } AEXP__WORD_t, *AEXP__WORD_p_t;Where:

- str
- is a pointer to the first character of the word.
- len
- is the length of the word (note that a word is
notnull-terminated).- token
- is a pointer to a struct that describes the token corresponding to a word (assuming the word corresponds to a valid token; see below).
- value
- is the value of the word. The value of a
numberword or of a word corresponding to a constant (such aspi) is the value of the number or constant. The value of asymbolword or a word corresponding to a function, is the value obtained by evaluating that portion of the expression. A special case is made for left- and right- parenthesissymbolwords; these words are discarded in the process that converts the prefix token array into a postfix token array, and are never assigned a value. For example, the expression:3 + cos( 0 )consists of 6 words:

3and0have the values 3 and 0;coshas the value of thecosine of 0,which is 1;+has the value of 3 + 1, or 4;((left-parenthesis) and)(right-parenthesis) have no value.## Token Mapping

Words are mapped to tokens via a hash table. The word is the key and the value is a pointer to a struct of type AEXP__TOKEN_t, declared in aexpp.h:

typedef struct aexp__token_s { const char* name; AEXP__TOKEN_TYPE_e_t type; int priority; double value; AEXP__EVALUATE_p_t evaluate; } AEXP__TOKEN_t, *AEXP__TOKEN_p_t;Where:

- name
- is the name of the token. Note that
namemay consist of a symbol, such as "*" (for the multiplication operator token) as well as more recognizable names such as "pi" and "radians." The special name "x" is used for all numeric constants (e.g. 10 and 3.14). The unary plus and unary minus operators have special internal names denoted by AEXP__UP and AEXP__UM, respectively.- type
- specifies the type of the token. Possible values (as declared in aexpp.h) are:

- AEXP__NONE
- This value is only used to label invalid tokens.
- AEXP__END_EXPR
- This value is presently not used by the module.
- AEXP__LEFT_PAREN
- Denotes a left parenthesis.
- AEXP__RIGHT_PAREN
- Denotes a right parenthesis.
- AEXP__UNARY_OP
- Denotes a unary operator. Unary plus, unary minus and, presently, all functions have this type.
- AEXP__BINARY_OP
- Denotes a binary operator.
- AEXP__OPERAND
- This value applies to the internal constants
eandpi,and to all numeric constants.- priority
- the priority of the token. Priority is only relevant to tokens of type AEXP__UNARY_OP and AEXP__BINARY_OP. All other tokens are arbitrarily assigned a priority of 0. The priority of the operators follows the usual conventions:

- The highest priority is assigned to unary operators (note that, in this context, a function such as
sinis considered a unary operator). All unary operators have the same priority.- Exponentiation has the second highest priority.
- The third highest priority is assigned to the multiplication and division operators.
- The lowest priority is given to the addition and subtraction operators.
- value
- is the value of the token, if the token has an intrinsic value. For example, the token
pihas a value beginning with 3.14; the symbols for addition and multiplication, + and *, on the other hand, have no intrinsic value.- evaluate
- is the address of a function that is used to evaluate the subexpression consisting of an operator and its operand(s). Tokens that are not operators have an
evaluatevalue of NULL.Following is a portion of the table of token values used by the module; the full table can be found in aexp_token.c:

static AEXP__TOKEN_t tokens_[] = { { "(", AEXP__LEFT_PAREN, 0, 0, null_eval }, { ")", AEXP__RIGHT_PAREN, 0, 0, null_eval }, { AEXP__UM, AEXP__UNARY_OP, 7, 0, negate_eval }, { AEXP__UP, AEXP__UNARY_OP, 7, 0, pos_eval }, { "abs", AEXP__UNARY_OP, 7, 0, abs_eval }, { "sqrt", AEXP__UNARY_OP, 7, 0, sqrt_eval }, . . . { "**", AEXP__BINARY_OP, 6, 0, raise_eval }, { "x", AEXP__OPERAND, 0, 0.0, null_eval }, { "pi", AEXP__OPERAND, 0, AEXP__PI, null_eval }, { "e", AEXP__OPERAND, 0, AEXP__E, null_eval } };## Conversion to Postfix Form

After the string is fully parsed the expression consists of an array of tokens arranged in infix order. The next step is to translate this array into a new array in postfix order. A complete discussion of the postfix expression form (also known as

reverse polish notation) can be found elsewhere, such as Wikipedia. . For this discussion let's just look at a couple of examples:Infix expression: 3 + 4 * 2 Postfix expression: 3 4 2 * + (Priority of * forces 4 * 2 to evaluate first.) Infix expression: (3 + 4) * 2 Postfix expression: 3 4 + 2 * (Parentheses force 3 + 4 to evaluate first.) Infix expression: (3 + 4) * (2 - 1) Postfix expression: 3 4 + 2 1 - * Infix expression: 3 * log( 10 ) Postfix expression: 3 10 log * (Priority of unary operatorlogforces log( 10 ) to evaluate first.) Infix expression: log( 10 ) * 3 Postfix expression: 10 log 3 *To perform the conversion we will use a stack and a simplified form of Edger Dijkstra's "Shunting Yard" algorithm which is expressed (absent error checking) as follows:

while infix array is not traversed next token if token is an operand add token to postfix array else if token is a left parenthesis push token onto stack else if token is a right parenthesis pop token from stack while token is not a left paren add token to postfix array pop token from stack(note: left and right parens are discarded)else(token must be an operator)set termination flag to false while !flag AND stack is not empty peek at top of stack if top-token is a left parenthesis set flag to TRUE else if top-token priority < token priority set flag to TRUE else pop top-token from stack add top-token to postfix array(end while !flag AND stack is not empty)push token onto stack(end while infix array is not traversed)while stack is not empty pop token from stack add token to postfix arrayThe complete implementation of this algorithm is found in aexp.c

## Evaluation of the Postfix Expression

Once the expression has been successfully converted to postfix form it may be evaluated. The algorithm for doing so is relatively straightforward; once again, a stack will be employed; we will also need a temporary array, (denotedops) suitable for holding two tokens to store operands pending subexpression evaluation. The algorithm (with error checking omitted) is as follows:while the postfix array is not traversed next token if token is an operand push the token's value onto stack else(token must be an operator)if token is a binary operator pop a token from the stack to ops[1] pop a token from the stack to ops[0] invoke the token'sevaluatorpassing ops push the resulting value onto the stack(end while postfix array is not traversed)(Note: there should be exactly one value left on the stack.)pop the last value off the stack flag this value as the value of the expression## Expression Representation

The overall expression is described by the AEXP__EXPR_t struct which is declared in aexpp.h:

typedef struct aexp__expr_s { char *expr_str; size_t len; CDA_BOOL_t valid; VEC_ID_t infix; VEC_ID_t postfix; double value; const char *bad_str; size_t bad_str_len; int errnum; } AEXP__EXPR_t, *AEXP__EXPR_p_t;Where:

- expr_str
- is a pointer to a buffer that holds the expression string. This buffer is allocated at the time the expression struct is created; the string provided by the caller is copied into the buffer.
- len
- is the length of the expression string.
- valid
- is initially true and is set to false any time the expression is determined to be invalid.
- infix
- is the array (expressed as a vector) to hold the tokens parsed from the expression string.
- postifx
- is the array (expressed as a vector) to hold the tokens in the postfix form of the expression string (see
Conversion to Postfix Form).- value
- holds the value of the expression.
- bad_str
- if parsing fails, points to the first character that can't be parsed in
expr_str.- bad_str_len
- is the length of
bad_str.- errnum
- if an evaluation error occurs, holds the value of
errnoat the time the evaluation fails.## Notes on Error Checking

Extensive error checking takes place while parsing and evaluating an expression string. Below are a few notes regarding this process.

- Floating Point Numbers

When checking floating point numbers (typedouble) it is necessary to make sure that the number is valid before using it in an operation, such as comparing it to zero. This is done with theisnanfacility (foris not a number) declared in math.h. In Microsoft, this facility is named_isnan,so aexpp.h declares the macroISNANwhich is equivalent to_isnanin Microsoft environments, but equivalent toisnanelsewhere:// isnan in Microsoftland is _isnan #if defined( _WIN32 ) #define ISNAN _isnan #else #define ISNAN isnan #endif- Once parsing is successfully completed, and before postfix conversion takes place, the infix expression must undergo the following two validations:

- Leading Position

Aleading positionis a position within an expression at which a subexpression may begin. Here are some examples:

Expression Leading Positions 3 + 4 * 6 3, 4, 6 (3 + 4) * 6 left-paren, 3, 4, 6 -3 + 7 unary minus, 3, 7 This leads to two questions: how do we identify a

leading position,and what tokens may legitimately occupy a leading position?Leading positions occur at:

- The beginning of an expression;
- Immediately after a right parenthesis;
- Immediately after a unary operator; and
- Immediately after a binary operator.
Tokens that may appear in a leading position are:

- An operand (type AEXP__OPERAND);
- A unary operator (type AEXP__UNARY_OP); and
- A left parenthesis (type AEXP__LEFT_PAREN).
So, after successfully parsing an expression, we can identify all the leading positions and verify that they are occupied by valid tokens:

if leading-position if token-type != OPERAND and token-type != UNARY-OP and token-type != left-parenthesis expression is invalid- Balanced Parentheses

We make sure that an expression contains the same number of right parentheses as left parentheses; we also make sure that the number of right parentheses never exceeds the number of left parentheses at any point in the expression:

**
Copyright © 2007 by Jack Straub
jstraub@centurytel.net
**