Next: Control Statements Up: Professional Programmer's Guide to Previous: Arithmetic

Character Handling and Logic

This section describes the facilities for handling non-numerical data in Fortran. Character data are actually present in almost all programs, if only in the form of file names and error messages, but the facilities for character manipulation are now quite powerful. The logical data type is even more indispensible since a logical expression is used in every IF statement.

Character Facilities

The character data type differs from all the others in one important respect: every character item has a fixed length. This specifies the number of characters it holds.

The length of a literal character constant is just the number of characters between the enclosing apostrophes (except that two consecutive apostrophe within the string count as one). Thus:
'it''s'
is a character constant of length four. Because the length of every character variable, array, and function has to be specified in advance it is nearly always necessary to use CHARACTER statements to declare them, for example:
CHARACTER NAME*20, ADDRSS(3)*40, ZIP*7
The same applies to named character constants but for these a special notation sets the length to that of the attached constant, which saves the trouble of counting characters:

 
      CHARACTER TITLE*(*) 
      PARAMETER (TITLE = 'Latest mailing list')

The fixed length of character objects makes it easy to output data in a fixed format as when printing a table with neatly aligned columns, but sometimes it would be more convenient to have a variable length string type as some other languages do. The rules for character assignment go some way towards this: if an expression is too short then blanks are appended to it; if it is too long then characters are removed from the right-hand end. For many purposes, therefore, it is only necessary to ensure that character variables are at least as long as the longest string you need to store in them.

When transferring character information to procedures the length of the dummy argument can be set automatically to that of the corresponding actual argument. With this passed length notation it is easy to write general-purpose character handling procedures. This is described further in section 9.5.

The most common operations carried out on character strings are splitting them up and joining them together. Any section of a character variable or array element can be extracted by using the substring notation. Strings (and substrings) can be joined end to end by using the concatenation operator in a character expression. These are described in the next two sections.

Another fairly common requirement is to search for a particular sequence of characters within a longer string: this can be done with the intrinsic function INDEX.

Other intrinsic functions ICHAR and CHAR are provided to convert a single character to an integer or vice-versa according to its position within the native character set. More complicated conversions from a numerical data type to character form and vice-versa are best carried out using the internal file READ and WRITE statements which allow the power of the format specification to applied to the task. This mechanism is described in section 10.3.

Character strings can be compared to each other using relational operators or intrinsic functions. The latter use the ASCII collating sequence irrespective of the native character code. Further details are given in section 7.6.

Character Substrings

The substring notation can be used to select any contiguous section of any character variable or array element. The characters in any string are numbered starting from one on the left: the lower bound cannot be altered as it can in arrays. A substring is selected simply by giving the first and last character positions of the extract. For example, with:

 
      CHARACTER METAL*10 
      METAL = 'CADMIUM'

then METAL(1:3) has the value 'CAD' while METAL(8:8) has the value blank because the value is padded out with blanks to its declared length.

Substrings must be at least one character long. They can be used in general in the same ways as character variables. Continuing with the last example, the assignment statement:
METAL(3:4) = 'ES'
will change the value of METAL to 'CAESIUM ' (with three blanks at the end, since the total length stays at 10).

Substring Rules

The parentheses denoting a substring must contain a colon: there may be an integer expression on either side of the colon. The first expression denotes the initial character position, the second one the last character position. Both values must be within the range 1 to LEN, where LEN is the length of the parent string, and the length of the resulting substring must not be less than one.

Although the colon must always be present, the two integer expressions are optional. The default value for the first one is one, the default for the second is the position of the last character of the parent string. Thus, staying with the last example: METAL(:2) has the value 'CA' while METAL(7:) has the value 'M' with three blanks.

With array elements the substring expression follows the sub-script expression, for example:

 
      CHARACTER PLAY(30)*80 
      PLAY(10) = 'AS YOU LIKE IT'

Then the substring PLAY(10)(4:11) has the value 'YOU LIKE'. Substrings can be used in expressions anywhere except in the definition of a statement function; they can also be used on the left-hand side of an assignment statement, and can be also be defined by input/output statements.

Character Expressions

The character operator // is used to concatenate, or join, two character strings. It is, in fact, the only character operator that Fortran provides. Thus:

The length of the result is just the sum of the lengths of the operands. Parentheses may be used in character expressions but make no difference to the result. Note that any embedded or trailing blanks (spaces) will be reproduced exactly in the resulting string.

The general form of a character-expression is thus:
character-operand
or character-expression // character-operand
where character-operand can be any of the following:

character constant (literal or named),
character variable,
character array element,
character substring,
character function reference.

There is one special restriction on character concatenation in procedures: a passed-length dummy argument can only be an operand of the concatenation operator in an assignment statement. This seemingly arbitrary rule allows the compiler to determine how much work-space is required.

Character Assignment Statements

The character assignment statement has the general form:
char-var = character-expression
where char-var can be a character variable, array element, or substring.

There is one important restriction on character assignment statements: none of the characters being referenced in the expression on the right may be defined in char-var on the left, that is to say there can be no overlap. Thus the assignment statement:
STRING(1:N) = STRING(10:)
is valid only as long as N is no higher than 9. It is, of course, easy to get around this restriction by using a temporary character variable with a suitable length.

Note when a value is assigned to a substring (as in the last example) the other characters in the parent string are not affected at all. If the string was previously undefined then the other character positions will still be undefined; otherwise they will retain their previous contents.

The expression and the character object to which its value is assigned may have different lengths: if the expression is longer then the excess characters on the right are lost; if it is shorter then blanks are appended. Care is needed to declare adequate lengths or else the results can be unexpected:

 
      CHARACTER AUTHOR*30, SHORT*5, EXPAND*10 
      AUTHOR = 'SHAKESPEARE, WILLIAM' 
      SHORT = AUTHOR 
      EXPAND = SHORT

The resulting value of EXPAND will be 'SHAKE ' where the last five characters are blanks.

Character Intrinsic Functions

The four main character intrinsic functions are described in this section. There another four functions provided to compare character strings with each other using the ASCII collating sequence: these are described in section 7.6.

`CHAR` and `ICHAR`

These two functions perform integer to character conversion and vice-versa using the internal code of the machine. Although most computers now use the ASCII character code, it is by no means universal, so these functions can only be used in a very limited way in portable software.

CHAR(I) returns the character at position I in the code table. For example, on a machine using ASCII code, CHAR(74) = 'J', since "J" is the character number 74 in the ASCII code table.

ICHAR(STRING) returns the integer position in the code table of the first character of the argument STRING. For example, on a machine using ASCII code,

`INDEX`

INDEX is a search function; it takes two character arguments and returns an integer result. INDEX(S1, S2) searches for the character-string S2 in another string S1, which is usually longer. If S2 is present in S1 the function returns the character position at which it finds starts. If there is no match (or S1 is shorter than S2) then it returns the value zero. For example:

 
      CHARACTER*20 SPELL 
      SPELL  = 'ABRACADABRA' 
      K      = INDEX(SPELL, 'RA')

Here K will be set to 3 because this is the position of the first occurrence of the string 'RA'. To find the second occurrence it is necessary to restart the search at the next character in the main string, for example:
L = INDEX(SPELL(K+1:), 'RA')
This will return the value 7 because the first occurrence of 'RA' in the substring 'ACADABRA' is at position 7. To find its position in the parent string the offset, K, must be added, making 10.

The INDEX function is often useful when manipulating character information. Suppose, for example, we have an string NAME containing the a person's surname and initials, e.g.
Mozart,W.A
The name can be reformatted to put the initials before the surname and omit the comma like this:

 
      CHARACTER NAME*25, PERSON*25 
*... 
      KCOMMA = INDEX(NAME, ',') 
      KSPACE = INDEX(NAME, ' ') 
      PERSON = NAME(KCOMMA+1:KSPACE-1) // NAME(1:KCOMMA-1)

Then PERSON will contain the string 'W.A.Mozart' (with blanks appended to the length of 25). Note that a separate variable, PERSON, was necessary because of the rule about overlapping strings in assignments.

`LEN`

The LEN function takes a character argument and returns its length as an integer. The argument may be a local character variable or array element but this will just return a constant. LEN is more useful in procedures where character dummy arguments (and character function names) may have their length passed over from the calling unit, so that the length may be different on each procedure call. The length returned by LEN is that declared for the item. Sometimes it is more useful to find the length excluding trailing blanks. The next function does just that, using LEN in the process.

 
      INTEGER FUNCTION LENGTH(STRING) 
*Returns length of string ignoring trailing blanks 
      CHARACTER*(*) STRING 
      DO 15, I = LEN(STRING), 1, -1 
         IF(STRING(I:I) .NE. ' ') GO TO 20 
15    CONTINUE 
20    LENGTH = I 
      END

Relational Expressions

A relational expression compares the values of two arithmetic expressions or two character expressions: the result is a logical value, either true or false. Relational expressions are commonly used in IF statements, as in this example:

 
      IF(SENSOR .GT. UPPER) THEN 
          CALL COOL 
      ELSE IF(SENSOR .LT. LOWER) THEN 
          CALL HEAT 
      END IF

The relational operators have forms such as .GT. and .LT. because the Fortran character set does not include the usual characters . and <. Relational expressions are most commonly used in IF statements, but any logical variable or array element may be used to store a logical value for use later on.

 
      CHARACTER*10 OPTION 
      LOGICAL EXIT 
      EXIT = OPTION .EQ. 'FINISH'  
*... 
      IF(EXIT) STOP 'Finish requested'

Logical expressions are covered in more detail in the next section.

General Forms of Relational Expression

arithmetic-exprn rel-op arithmetic-exprn
or character-exprn rel-op character-exprn
In either case the resulting expression has the logical type. The relational operator rel-op can be any of the following:

tabular753

Note that these operators need a decimal point at either end to distinguish them from symbolic names.

Arithmetic Comparisons

When the two arithmetic values of differing data type are compared, a conversion is automatically applied to one of them (as in arithmetic expressions) to bring it to the type of the other. The direction of conversion is always:

When comparing integer expressions, there is a considerable difference between the .LE. and .LT. operators, and similarly between .GE. and .GT., so that you should consider carefully what action is required in the limiting case before selecting the appropriate operator.

In comparisons involving the other arithmetic types you should remember that the value of a number may not be stored exactly. This means that it is unwise to rely on tests involving the .EQ. and .NE. operators except in special cases, for example if one of the values has previously been set to zero or some other small integer.

There are two restrictions on complex values: firstly they cannot be compared at all to ones of double precision type. Secondly they cannot use relational operators other than .EQ. and .NE. because there is no simple linear ordering of complex numbers.

Character comparisons

A character value can only be compared to another character value; if they do not have the same length then the shorter one is padded out with blanks to the length of the other before the comparison takes place. Tests for equality (or inequality) do not depend on the character code, the two strings are just compared character by character until a difference is found. Comparisons using the other operators (.GE., .GT., .LE., and .LT.) do, however, depend on the local character code. The two expressions are compared one character position at a time until a difference is found: the result then depends on the relative positions of the two characters in the local collating sequence, i.e. the order in which the characters appear in the character code table.

The Fortran Standard specifies that the collating sequence used by all systems must have the following basic properties:

all the upper-case letters are in order, A < B < C etc.
all digits are in order, 0 < 1 < 2 etc.
all digits precede all letters or vice-versa,
the blank (space) character precedes letters and digits.

It does not, however, specify whether letters precede digits or follow them. As a result, if strings of mixed text are sorted using relational operators the results may be machine dependent. For example, the expression
'APPLE' .LT. 'APRICOT'
is always true because at the two strings first differ at the third character position, and the letter 'P' precedes 'R' in all Fortran collating sequences. However:
'A1' .GT. 'AONE'
will have a value true if your system uses EBCDIC but false if it uses ASCII, because the digits follow letters in the former and precede them in the latter.

In order to allow character comparisons to be made in a truly portable way, Fortran has provided four additional intrinsic functions. These perform character comparisons using the ASCII collating sequence no matter what the native character code of the machine. These functions are:

tabular784

They take two character arguments (of any length) and return a logical value. Thus the expression:
LGT('A1', 'AONE')
will always have the value false.

Character comparisons are case-sensitive on machines which have lower-case letters in their character set. It is advisable to convert both arguments to the same case beforehand.

Guidelines

Systems which supports both upper and lower-case characters are usually case-sensitive: before testing for the presence of particular keywords or commands it is usually best to convert the an input string to a standard case, usually upper-case. Unfortunately there are no standard intrinsic functions to do this, though many systems provide them as an extension.

In character sorting operations where the strings contain mixtures of letters, digits, or other symbols, you should use the intrinsic functions to make the program portable. In other character comparisons, however, the relational operator notation is probably preferable because it has a more familiar form and may be slightly more efficient.

Logical Expressions

Logical expressions can be used in logical assignment statements, but are most commonly encountered in IF statements where there is a compound condition, for example:

 
       IF(AGE .GE. 60 .OR. (STATUS .EQ. 'WIDOW' .AND. 
     $   NCHILD .GT. 0) THEN

This combines the values of three relational expressions, two of them comparing arithmetic values, the other character values. The logical operators such as .AND. and .OR. also need decimal points at either end to distinguish them from symbolic names. The .OR. operator performs an inclusive or, the exclusive or operator is called .NEQV..

Rules

A logical expression can have any of the following forms:

logical-term
.NOT. logical-term
logical-expression logical-operator logical-term

Where: logical-term can be any of the following:

logical constant (literal or named),
logical variable,
logical array element,
logical function reference,
logical expression enclosed in parentheses,
relational expression.

and the logical operator can be any of the following:

tabular812

Note that the rules of logical expressions only allow two successive operators to occur if the second of them is the unary operator .NOT. which negates the value of its operand. The effects of the four binary logical operators are shown in the table below for the four possible combinations of operands, x and y.

tabular821

Note that a logical expression can have operands which are complete relational expressions, and these can in turn contain arithmetic expressions. The complete order of precedence of the operators in a general expression is as follows:

arithmetical operators (in the order defined in section 6.1 above).
relational operators
.NOT.
.AND.
.OR.
.EQV. and .NEQV.

If the operators .EQV. and .NEQV. are used at the same level in an expression they are evaluated from left to right.

These rules reduce the need for parentheses in logical expressions, thus:
(X .GT. A) .OR. (Y .GT. B)
would have exactly the same meaning if all the parentheses had been omitted.

A Fortran system is not required to evaluate every term in a logical expression completely if its value can be determined more simply. In the above example, if X had been greater than A then it would not be necessary to compare Y and B for the expression would have been true in either case. This improves efficiency but means that functions with side-effects should not be used.

Guidelines

Complicated logical and relational expressions can be hard to read especially if they extend on to several successive lines. It helps to line up similar conditions on successive lines, and to use parentheses.

Logical Assignment Statements

A logical assignment statement has the form:

Where the logical-var can be a logical variable or array element. Logical variables and array elements are mainly used to store the values of relational expressions until some later point where they are used in IF statements.

Next: Control Statements Up: Professional Programmer's Guide to Previous: Arithmetic

Clive Page
Tue Feb 27 11:14:41 GMT 2001