About these ads

Bacon, Alcohol, Tobacco, Firearms, Explosives

Things I love

Archive for September 20th, 2011

Trigraphs, digraphs: part two

Nobody guessed in the comments, so I’ll go ahead and answer. Trigraphs were added to C89 because ISO-646 (“Invariant ASCII”) was not rich enough to express C.

The solution is an internationally agreed-upon repertoire, in terms of which an international representation of C can be defined. The ISO has defined such a standard:ISO 646 describes aninvariant subset of ASCII. The characters in the ASCII repertoire used by C and absent from the ISO 646 repertoire are: #[]{}\|~^ Given this repertoire, the Committee faced the problem of defining representations for the absent characters. The obvious idea of defining two-character escape sequences fails because C uses all the characters which are in the ISO 646 repertoire: no single escape character is available. The best that can be done is to use a trigraph – an escape digraph followed by a distinguishing character. ?? was selected as the escape digraph because it is not used anywhere else in C (except as noted below); it suggests that something unusual is going on. The third character was chosen with an eye to graphical similarity to the character being represented.

Since digraphs were out for C89, how did they end up in C99? They were imported from AMD1 (Amendment 1), AKA C95. Why were they added? Trigraphs are ugly.

C90 addresses the problem in a different way. It provides replacements at the level of individual characters using three-character sequences called trigraphs (see §5.2.1.1). For example, ??< is entirely equivalent to {, even within a character constant or string literal. While this approach provides a solution for the known limitations of EBCDIC (except for the exclamation mark) and ISO/IEC 646, the result is arguably not highly readable.

Thus, AMD1 provides a set of more readable digraphs (see §6.4.6). These are two-character alternate spellings for several of the operators and punctuators that can be hard to read with ISO/IEC 646 national variants. Trigraphs are still required within character constants and string literals, but at least the more common operators and punctuators can have more suggestive spellings using digraphs.

Not what I thought, either.

About these ads

Written by Ry Jones

20 September 2011 at 12:33

Posted in Tech

Follow

Get every new post delivered to your Inbox.

Join 30 other followers