Codes
The first widely used character code was the Morse Code, developed in 1838 by Samuel F. B. Morse (1791–1872). This two-symbol, dot-and-dash code is capable of representing the characters of the alphabet by varying the number of symbols between one and four. If one considers the symbols to be similiar to bits, then the number character set, 0 to 9, uses 1 to 5 bits.
In 1874 Jean-Maurice Émile Baudot (1845–1903) received a patent for a printing telegraph. He also introduced a code using 5 bits per character. Five bits can be combined in 32 different ways, enough for uppercase letters and a few control characters. To include the number set, Baudot devised a shift to another level, much as the Cap Lock on a keyboard. The shift provides the number set, punctuation symbols, and control character representations for the 32 separate, 5-bit combinations. The control characters include the carriage return and the line feed. All the control characters are present in either letter or figure shift mode. The letters were in the lower shift mode and the figures in the upper shift mode. Early teletype machines
punched the messages into paper tapes and read tapes to send messages. Later, machines were designed to print out the messages in character form. Some teletypes for the hearing impaired in use today are based on the Baudot code.
Early computers used a version of Baudot's code called ITA2 a 6-bit code that had more control and format characters in addition to the uppercase letter and the ten numeric characters. The increase to 6 bits, or 64 combinations, eliminated the need for the shift control to switch from letter to numeric characters. There was no urgency to improve the character code by adding lowercase letters and more punctuation symbols, as computers were considered calculation machines. By the late 1950s, computers were used more widely for commercial purposes. The variation in the control character set from system to system was a drawback. The American Standards Association (ASA) developed a standardized code. The ASA is composed of various corporations, including IBM, AT&T, and an AT&T subsidiary, Teletype Corporation—manufacturer of the most widely used communications equipment.
In 1963 the first version of American Standard Code for Information Interchange (ASCII) was introduced. IBM waited until the 1980s to use it, while AT&T's immediate acceptance of it made ASCII the standard for communications. This new code was based on 7 bits, allowing for 128 characters in the character code table. This initial version did not have a lowercase letter set. It did include all the COBOL (COmmon Business-Oriented Language) graphics characters based on the earlier FIELDATA code used by the military, added more control characters such as a linefeed, and simplified some of the transmission control codes. Collating problems were solved by separating the number set from the letter set in the table, and ordering the letter set to allow collating by letter using simple arithmetic comparisons.
The next version of ASCII in 1967 included the lowercase letter set, FORTRAN graphic characters, square bracket, curly braces, and others. Control character changes were made and a small set of international characters was added. The control characters were relocated in the first half of the table. This version of ASCII remained the standard for thirty years.
Meanwhile, back at IBM, a different character code came into use. Why? Perhaps because the origin of IBM goes back to Herman Hollerith (1860–1929), the punched card, and the 6-bit character code. The early IBM mainframes used a 6-bit character code, or Binary Coded Decimal Interchange Code (BCDIC). In 1964 a proprietary code, Extended Binary Coded Decimal Interchange Code (EBCDIC), was created for use on IBM/360 mainframe computers. This 8-bit code was an extension of the earlier code. It included most of the characters in the ASCII code but with differing bit-string representations. For example, the ASCII representation of M is 01001101; the EBCDIC representation of M is 11010100.
Multiple versions of EBCDIC character code sets had to be created as the mainframe market spread throughout the world. Another difficulty arose when translating from or into ASCII. Because there was a difference in the character sets, the translation was slow and error-prone. The general trend is to convert EBCDIC data files into ASCII or other non-proprietary code formats.
In the 1980s, the growth of international business generated interest in a multilingual character code. The International Organization for Standardization (ISO) and a group of American computer firms started on methods to produce a universal character code set. Unicode, which gives a unique number to each character, resulted from merging the two efforts. It is still evolving and currently uses a single 256-by-256 grid that supports 65,236 character points and unifies similar characters, especially within the Asian languages. Unicode is supported by multiple industries and companies ranging from Apple Computer, Inc., IBM Corporation, and Hewlett Packard to Oracle, Microsoft Corporation, and Sun Microsystems. It is supported by operating systems and browsers. Unicode is capable of transporting data through many platforms without data corruption.