Saturday, February 2, 2008

MYSQL PART I

Database :
A database is a collection of data that is organized so that its contents can be easily accessed, managed and updated. The software used to manage and query a database is known as a Database Management System (DBMS). Then came the concept of Relational Database Management System(RDBMS). Relational database is a database where data are stored in more than one table, each one containing different types of data. The different tables can be linked so that information from the separate files can be used together. This is explained below using an example.

Example :
Consider the Student's personal information and the test marks in a school. Suppose the student's infomation and test results are stored seperately, we can get information regarding the student's personal information like Address from the first file. And also a student's mark at a test can be obtained from the other file.
But consider a situation where we want to get the Address of a student as well as his marks. These things become hard when we have a large volume of data. If we have a studentID stored in two files then we can easily relate the details and recollect them.
In relational databases, a table is a set of data elements(cells) that is organized, defined and stored using a model of horizontal rows and vertical columns. A table has a specified number of columns but can have any number of rows(i.e should have specified structure of date but can have any no. of data). Here every column is known as a field, every row is called as record.
MySQL
MySQL is one of the popular Relational Database Management System. Now let us see an example for a simple database which consists of a table. Consider the same example we took earlier, a student database. The table may have different fields such as StudID, Name, Marks, Address, Phone. These five fields constitutes a table named as student.StudID, Name are fileds and the particular row is a record.
StudID Name Marks Address Phone
1 steve 100 5th cross street 2456987
Now we slightly move to MySQL and see how to create database, use database and remove database
The following will not be needed if you have installed MySQL as a service.
Starting MySQL using command line:
Lets see how to start MySQL from the windows command line manually.
To start the mysqld from the command line, first you should open a console window i.e., Start -> Run.., type cmd or command to open the console window. After opening the console window, enter the path where your MySQL is installed. For example:
C:\> "C:Program Files\MySQL\MySQL Server 4.1\bin"
After giving the path, start the MySQL as given below:
C:\Program Files\MySQL\MySQL Server 4.1\bin> mysqld
The version depends upon the mysql server you have installed. The path may also vary depending on the MySQL installation on your system.
You can stop the MySQL server using the below command:
C:\> "C:\Program Files\MySQL\MySQL Server 4.1\bin\mysqladmin" -u root shutdown
The above commands will help you to start and stop the MySQL server.
Connecting MySQL server :
There are three ways to connect to a MySQL server. They are:
Command Prompt
MySQL Command Line Client
External MySQL Tools
Command Prompt :
You can connect MySQL from your Console window i.e., Start -> Run.., type cmd or command to open the Command prompt window.
After opening the console window, enter the path where your MySQL is installed. For example:
C:\> "C:Program Files\MySQL\MySQL Server 4.1\bin"
After giving the path, enter the below command to connect to MySQL server:
C:\Program Files\MySQL\MySQL Server 4.1\bin> mysql.exe -u root
The path may vary depending on the MySQL installation on your system. Instead of root you can also connect by giving your username.
MySQL Command Line Client :
To connect a MySQL server using the command line client, go to
Start -> Programs -> MySQL -> MySQL Server 4.1 -> MySQL Command Line Client. The command line client window will be opened and enter the password to start your queries.
External MySQL Tools :
You can also get connected to MySQL, using external tool like MySQL Query Browser.
Before going to create a database check whether there is any database with the name you are going to create. Check this by the following SHOW statement:
mysql> show databases;
This query will list the available databases. Please note that MySQL is case insensitive. So you can give the query with different cases also. So show dataBASES; , SHOW dataBASES; will also work.
Once you have confirmed that you don't have a database with the name you intended to create, then you can create your own database by,
mysql> create database sample;
Please note that only in Unix the database name is case sensitive. The above query will create an empty database and it wouldn't contain any tables.
If you want to create tables for a database first you have to select the database. For selecting a database you have to enter the following query :
mysql> USE sample;
Database changed
Here sample is the database you want to select. The USE command dosen't need a semicolon at the end of the query.
You can use the following command to view the current database that you're connected to:
mysql> select database();
+------------+
database()
+------------+
sample
+------------+
Understand the difference between Use database and select database() as the former is selecting a database and the later one is displaying the currently selected one. After selecting the database you can create tables and other such operations.
Note : You have to select the database using the USE statement everytime you are entering into Mysql server or when you want to change the database.
If you type the following query you can see an information like Empty set (ie.,) there are no tables in the selected database.
mysql> show tables;
Empty set (0.00 sec)
Database can be removed or deleted using the DROP statement. The following example deletes the database sample.
mysql> drop database sample;
Query OK, 1 row affected (0.05 sec)
This query will delete the database sample. The query will permanently remove the database.
DROP DATABASE drops all tables in the database and deletes the database. Once the DROP command is used, then we cannot use that database. So, we should be careful with this statement.
Data types :
Definition : Data type is the characteristic of columns and variables that defines what types of data values they can store. The characteristic indicating whether a data item represents a number, date, character string, etc.
Data types are used to indicate the type of the field we are creating into the table. MySQL supports a number of datatypes in three important categories:
Numeric types
Date and Time types
String(Character) types
Before creating a table, identify whether a column should be a text, number, or date type. Each column in a table is made of a data type. The size of the value should be the smallest value depending upon the largest input value.
For example, if the number of students in a school are in hundreds set the column as an unsigned three-digit SMALLINT(allowing for up to 999 values).
We should be concise in inserting a string of five characters long into a char(3) field, the final two characters will be truncated. It is better to set the maximum length for text and number columns as well as other attributes such as UNSIGNED.
Square brackets ('[' and ']') indicate optional parts of type definitions.
Now we slightly move to the overview of MySQL datatypes.
Numeric Datatypes :
The numeric data types are as follows:
BIT TINYINT BOOLEAN SMALLINT MEDIUMINT INT INTEGER BIGINT
FLOAT DOUBLE DECIMAL
Lets see the numeric datatypes briefly.
BIT :
BIT is a synonym for TINYINT(1).
TINYINT[(M)] :
A very small integer. The signed range is -128 to 127. The unsigned range is 0 to 255.
BOOL, BOOLEAN :
These types are synonyms for TINYINT(1). A value of zero is considered false. Non-zero values are considered true.
SMALLINT :
A small integer. The signed range is -32768 to 32767. The unsigned range is 0 to 65535.
MEDIUMINT :
A medium-sized integer. The signed range is -8388608 to 8388607. The unsigned range is 0 to 16777215.
INT :
A normal-size integer. The signed range is -2147483648 to 2147483647. The unsigned range is 0 to 4294967295.
INTEGER :
This type is a synonym for INT.
BIGINT :
A large integer. The signed range is -9223372036854775808 to 9223372036854775807. The unsigned range is 0 to 18446744073709551615.
FLOAT :
A small(single-precision) floating-point number. The values are from 3.402823466E+38 to -1.175494351E-38, 0, and 1.175494351E-38 to 3.402823466E+38.
DOUBLE :
A normal-size(double-precision) floating-point number. The values are from 1.7976931348623157E+308 to -2.2250738585072014E-308, 0, and 2.2250738585072014E-308 to 1.7976931348623157E+308.
DECIMAL :
The maximum number of digits(M) for DECIMAL is 64.
Date and Time Data Types :
DATE TIME DATETIME TIMESTAMP YEAR
DATE :
A Date. The range is 1000-01-01 to 9999-12-31. The date values are displayed in YYYY-MM-DD format.
TIME :
A Time. The range is -838:59:59 to 838:59:59. The time values are displayed in HH:MM:SS format.
DATETIME :
A Date and Time combination. The range is 1000-01-01 00:00:00 to 9999-12-31 23:59:59. The datetime values are displayed in YYYY-MM-DD HH:MM:SS format.
TIMESTAMP :
A Timestamp. The range is 1970-01-01 00:00:01 UTC to partway through the year 2037. A TIMESTAMP column is useful for recording the date and time of an INSERT or UPDATE operation.
YEAR :
A Year. The year values are displayed either in two-digit or four-digit format. The range of values for a four-digit is 1901 to 2155. For two-digit, the range is 70 to 69, representing years from 1970 to 2069.
For all the date and time columns, we can also assign the values using either string or numbers.
String data types :
CHAR VARCHAR TINYTEXT TEXT BLOB MEDIUMTEXT LONGTEXT
BINARY VARBINARY ENUM SET
CHAR() :
It is a fixed length string and is mainly used when the data is not going to vary much in it's length. It ranges from 0 to 255 characters long. While storing CHAR values they are right padded with spaces to the specified length. When retrieving the CHAR values, trailing spaces are removed.
VARCHAR() :
It is a variable length string and is mainly used when the data may vary in length. It ranges from 0 to 255 characters long. VARCHAR values are not padded when they are stored.
TINYTEXT, TINYBLOB :
A string with a maximum length of 255 characters.
TEXT :
TEXT columns are treated as character strings(non-binary strings). It contains a maximum length of 65535 characters.
BLOB :
BLOB stands for Binary Large OBject. It can hold a variable amount of data. BLOB columns are treated as byte strings(binary strings). It contains a maximum length of 65535 characters.
MEDIUMTEXT, MEDIUMBLOB :
It has a maximum length of 16777215 characters.
LONGTEXT, LONGBLOB :
It has a maximum length of 4294967295 characters.
BINARY :
The BINARY is similar to the CHAR type. It stores the value as binary byte strings instead of non-binary character strings.
VARBINARY :
The VARBINARY is similar to the VARCHAR type. It stores the value as binary byte strings instead of non-binary character strings.
ENUM() :
An enumeration. Each column may have one of a specified possible values. It can store only one of the values that are declared in the specified list contained in the ( ) brackets. The ENUM list ranges up to 65535 values.
SET() :
A set. Each column may have more than one of the specified possible values. It contains up to 64 list items and can store more than one choice. SET values are represented internally as integers.
If CHAR and VARCHAR options are used in the same table, then MySQL will automatically change the CHAR into VARCHAR for compatability reasons. The ( ) bracket allows to enter a maximum number of characters that will be used in the column.

Thursday, January 31, 2008

Packet Stream XML-RPC

In addition to providing facilities for your program to speak the standard XML-RPC language, XML-RPC For C/C++ provides facilities for a variation on the standard that works in some applications for which XML-RPC is insufficient, and is much faster. We call it packet stream XML-RPC.

Packet stream XML-RPC does not use HTTP, as regular XML-RPC does. Packet stream XML-RPC has a concept of a long-lived client/server connection and you perform multiple RPCs over a single connection. That connection is typically a TCP connection.
One advantage of packet stream XML-RPC over regular XML-RPC is that a communicant can discover when its partner dies, no matter how violently, because the connection goes away. So for example, you could have a client that turns a machine on and off by sending RPCs to it. If for any reason the client isn't running, the machine should be in its quiescent off state. So The client creates a packet stream XML-RPC connection to the machine when the client comes up and maintains it throughout the client's life. When the client shuts down, it normally sends the RPC to turn the machine off before closing the connection, but if the client should crash while the machine is on, the machine finds out about (because the connection dies) and turns itself off.
Another advantage of packet stream XML-RPC is that without all the overhead of HTTP, it generally runs faster and with less CPU and network resources. The lack of TCP connection setup and teardown for an individual RPC also helps. Note that Xmlrpc-c's HTTP-based facilities do HTTP persistent connections so that a stream of RPCs can use just one TCP connection, which means this isn't as great an advantage of packet socket XML-RPC as one might think.
Another reason to prefer packet stream XML-RPC is that the Xmlrpc-c facilities for packet stream let you create the connection between client and server independently, which gives you greater flexibility. You create the connection, then hand it over to Xmlrpc-c for use in performing or serving RPCs. So for example, one user had the problem that he didn't want his client to have to know the server's IP address; it was more sensible for the server to know the client's IP address. With regular XML-RPC, this can't work because the client is an HTTP client and has to initiate a TCP connection. But with packet stream, he was able to have the server initiate a TCP connection and the client merely accept it. With the complete connection in hand, he invoked Xmlrpc-c facilities on the client side to send RPCs to the server.
You can use packet stream XML-RPC to maintain state across RPCs too. For example, you can have a login RPC and then consider the source of every subsequent RPC on that connection to be authenticated. In regular XML-RPC, this sort of session is considerably more complex. When you use it this way, you are straying from the RPC concept, which requires that each RPC stand alone, but it might nonetheless be the best design.
Packet stream XML-RPC is not a public standard. Only XML-RPC For C/C++ implements it. So you use it only in applications where you supply both client and server software, not in putting up a public server for people to access with existing client programs or vice versa.

Monday, January 28, 2008

A tutorial on character code issues

The basics
In computers and in data transmission between them, i.e. in digital data processing and transfer, data is internally presented as octets, as a rule. An octet is a small unit of data with a numerical value between 0 and 255, inclusively.

The numerical values are presented in the normal (decimal) notation here, but notice that other presentations are used too, especially octal (base 8) or hexadecimal (base 16) notation. Octets are often called bytes, but in principle, octet is a more definite concept than byte. Internally, octets consist of eight bits (hence the name, from Latin octo 'eight'), but we need not go into bit level here. However, you might need to know what the phrase "first bit set" or "sign bit set" means, since it is often used. In terms of numerical values of octets, it means that the value is greater than 127. In various contexts, such octets are sometimes interpreted as negative numbers, and this may cause various problems.
Different conventions can be established as regards to how an octet or a sequence of octets presents some data. For instance, four consecutive octets often form a unit that presents a real number according to a specific standard. We are here interested in the presentation of character data (or string data; a string is a sequence of characters) only.
In the simplest case, which is still widely used, one octet corresponds to one character according to some mapping table (encoding). Naturally, this allows at most 256 different characters being represented. There are several different encodings, such as the well-known ASCII encoding and the ISO Latin family of encodings. The correct interpretation and processing of character data of course requires knowledge about the encoding used. For HTML documents, such information should be sent by the Web server along with the document itself, using so-called HTTP headers (cf. to MIME headers).
Previously the ASCII encoding was usually assumed by default (and it is still very common). Nowadays ISO Latin 1, which can be regarded as an extension of ASCII, is often the default. The current trend is to avoid giving such a special position to ISO Latin 1 among the variety of encodings.
Definitions
The following definitions are not universally accepted and used. In fact, one of the greatest causes of confusion around character set issues is that terminology varies and is sometimes misleading.
character repertoire
A set of distinct characters. No specific internal presentation in computers or data transfer is assumed. The repertoire per se does not even define an ordering for the characters; ordering for sorting and other purposes is to be specified separately. A character repertoire is usually defined by specifying names of characters and a sample (or reference) presentation of characters in visible form. Notice that a character repertoire may contain characters which look the same in some presentations but are regarded as logically distinct, such as Latin uppercase A, Cyrillic uppercase A, and Greek uppercase alpha. For more about this, see a discussion of the character concept later in this document.
character code
A mapping, often presented in tabular form, which defines a one-to-one correspondence between characters in a character repertoire and a set of nonnegative integers. That is, it assigns a unique numerical code, a code position, to each character in the repertoire. In addition to being often presented as one or more tables, the code as a whole can be regarded as a single table and the code positions as indexes. As synonyms for "code position", the following terms are also in use: code number, code value, code element, code point, code set value - and just code. Note: The set of nonnegative integers corresponding to characters need not consist of consecutive numbers; in fact, most character codes have "holes", such as code positions reserved for control functions or for eventual future use to be defined later.
character encoding
A method (algorithm) for presenting characters in digital form by mapping sequences of code numbers of characters into sequences of octets. In the simplest case, each character is mapped to an integer in the range 0 - 255 according to a character code and these are used as such as octets. Naturally, this only works for character repertoires with at most 256 characters. For larger sets, more complicated encodings are needed. Encodings have names, which can be registered.
Notice that a character code assumes or implicitly defines a character repertoire. A character encoding could, in principle, be viewed purely as a method of mapping a sequence of integers to a sequence of octets. However, quite often an encoding is specified in terms of a character code (and the implied character repertoire). The logical structure is still the following:
A character repertoire specifies a collection of characters, such as "a", "!", and "ä".
A character code defines numeric codes for characters in a repertoire. For example, in the ISO 10646 character code the numeric codes for "a", "!", "ä", and "‰" (per mille sign) are 97, 33, 228, and 8240. (Note: Especially the per mille sign, presenting 0/00 as a single character, can be shown incorrectly on display or on paper. That would be an illustration of the symptoms of the problems we are discussing.)
A character encoding defines how sequences of numeric codes are presented as (i.e., mapped to) sequences of octets. In one possible encoding for ISO 10646, the string a!ä‰ is presented as the following sequence of octets (using two octets for each character): 0, 97, 0, 33, 0, 228, 32, 48.
For a more rigorous explanation of these basic concepts, see Unicode Technical Report #17: Character Encoding Model.
The phrase character set is used in a variety of meanings. It might denotes just a character repertoire but it may also refer to a character code, and quite often a particular character encoding is implied too.
Unfortunately the word charset is used to refer to an encoding, causing much confusion. It is even the official term to be used in several contexts by Internet protocols, in MIME headers.
Quite often the choice of a character repertoire, code, or encoding is presented as the choice of a language. For example, Web browsers typically confuse things quite a lot in this area. A pulldown menu in a program might be labeled "Languages", yet consist of character encoding choices (only). A language setting is quite distinct from character issues, although naturally each language has its own requirements on character repertoire. Even more seriously, programs and their documentation very often confuse the above-mentioned issues with the selection of a font.
Examples of character codes
Good old ASCII
The basics of ASCII
The name ASCII, originally an abbreviation for "American Standard Code for Information Interchange", denotes an old character repertoire, code, and encoding.
Most character codes currently in use contain ASCII as their subset in some sense. ASCII is the safest character repertoire to be used in data transfer. However, not even all ASCII characters are "safe"!
ASCII has been used and is used so widely that often the word ASCII refers to "text" or "plain text" in general, even if the character code is something else! The words "ASCII file" quite often mean any text file as opposite to a binary file.
The definition of ASCII also specifies a set of control codes ("control characters") such as linefeed (LF) and escape (ESC). But the character repertoire proper, consisting of the printable characters of ASCII, is the following (where the first item is the blank, or space, character):
! " # $ % & ' ( ) * + , - . /
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
@ A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z [ \ ] ^ _
` a b c d e f g h i j k l m n o
p q r s t u v w x y z { } ~
The appearance of characters varies, of course, especially for some special characters. Some of the variation and other details are explained in The ISO Latin 1 character repertoire - a description with usage notes.
A formal view on ASCII
The character code defined by the ASCII standard is the following: code values are assigned to characters consecutively in the order in which the characters are listed above (rowwise), starting from 32 (assigned to the blank) and ending up with 126 (assigned to the tilde character ~). Positions 0 through 31 and 127 are reserved for control codes. They have standardized names and descriptions, but in fact their usage varies a lot.
The character encoding specified by the ASCII standard is very simple, and the most obvious one for any character code where the code numbers do not exceed 255: each code number is presented as an octet with the same value.
Octets 128 - 255 are not used in ASCII. (This allows programs to use the first, most significant bit of an octet as a parity bit, for example.)
National variants of ASCII
There are several national variants of ASCII. In such variants, some special characters have been replaced by national letters (and other symbols). There is great variation here, and even within one country and for one language there might be different variants. The original ASCII is therefore often referred to as US-ASCII; the formal standard (by ANSI) is ANSI X3.4-1986.
The phrase "original ASCII" is perhaps not quite adequate, since the creation of ASCII started in late 1950s, and several additions and modifications were made in the 1960s. The 1963 version had several unassigned code positions. The ANSI standard, where those positions were assigned, mainly to accommodate lower case letters, was approved in 1967/1968, later modified slightly. For the early history, including pre-ASCII character codes, see Steven J. Searle's A Brief History of Character Codes in North America, Europe, and East Asia and Tom Jennings' ASCII: American Standard Code for Information Infiltration. See also Jim Price's ASCII Chart, Mary Brandel's 1963: ASCII Debuts, and the computer history documents, including the background and creation of ASCII, written by Bob Bemer, "father of ASCII".
The international standard ISO 646 defines a character set similar to US-ASCII but with code positions corresponding to US-ASCII characters @[\]{} as "national use positions". It also gives some liberties with characters #$^`~. The standard also defines "international reference version (IRV)", which is (in the 1991 edition of ISO 646) identical to US-ASCII. Ecma International has issued the ECMA-6 standard, which is equivalent to ISO 646 and is freely available on the Web.
Within the framework of ISO 646, and partly otherwise too, several "national variants of ASCII" have been defined, assigning different letters and symbols to the "national use" positions. Thus, the characters that appear in those positions - including those in US-ASCII - are somewhat "unsafe" in international data transfer, although this problem is losing significance. The trend is towards using the corresponding codes strictly for US-ASCII meanings; national characters are handled otherwise, giving them their own, unique and universal code positions in character codes larger than ASCII. But old software and devices may still reflect various "national variants of ASCII".
The following table lists ASCII characters which might be replaced by other characters in national variants of ASCII. (That is, the code positions of these US-ASCII characters might be occupied by other characters needed for national use.) The lists of characters appearing in national variants are not intended to be exhaustive, just typical examples.
dec oct hex glyph official Unicode name National variants
35 43 23 # number sign £ Ù
36 44 24 $ dollar sign ¤
64 100 40 @ commercial at É § Ä à ³
91 133 5B [ left square bracket Ä Æ ° â ¡ ÿ é
92 134 5C \ reverse solidus Ö Ø ç Ñ ½ ¥
93 135 5D ] right square bracket Å Ü § ê é ¿
94 136 5E ^ circumflex accent Ü î
95 137 5F _ low line è
96 140 60 ` grave accent é ä µ ô ù
123 173 7B { left curly bracket ä æ é à ° ¨
124 174 7C vertical line ö ø ù ò ñ f
125 175 7D } right curly bracket å ü è ç ¼
126 176 7E ~ tilde ü ¯ ß ¨ û ì ´ _
Almost all of the characters used in the national variants have been incorporated into ISO Latin 1. Systems that support ISO Latin 1 in principle may still reflect the use of national variants of ASCII in some details; for example, an ASCII character might get printed or displayed according to some national variant. Thus, even "plain ASCII text" is thereby not always portable from one system or application to another.
Control characters (control codes)
The rôle of the so-called control characters in character codes is somewhat obscure. Character codes often contain code positions which are not assigned to any visible character but reserved for control purposes. For example, in communication between a terminal and a computer using the ASCII code, the computer could regard octet 3 as a request for terminating the currently running process. Some older character code standards contain explicit descriptions of such conventions whereas newer standards just reserve some positions for such usage, to be defined in separate standards or agreements such as "C0 controls" (tabulated in my document on ASCII control codes) and "C1 controls", or specifically ISO 6429. And although the definition quoted above suggests that "control characters" might be regarded as characters in the Unicode terminology, perhaps it is more natural to regard them as control codes.
Control codes can be used for device control such as cursor movement, page eject, or changing colors. Quite often they are used in combination with codes for graphic characters, so that a device driver is expected to interpret the combination as a specific command and not display the graphic character(s) contained in it. For example, in the classical VT100 controls, ESC followed by the code corresponding to the letter "A" or something more complicated (depending on mode settings) moves the cursor up. To take a different example, the Emacs editor treats ESC A as a request to move to the beginning of a sentence. Note that the ESC control code is logically distinct from the ESC key in a keyboard, and many other things than pressing ESC might cause the ESC control code to be sent. Also note that phrases like "escape sequences" are often used to refer to things that don't involve ESC at all and operate at a quite different level. Bob Bemer, the inventor of ESC, has written a "vignette" about it: That Powerful ESCAPE Character -- Key and Sequences.
One possible form of device control is changing the way a device interprets the data (octets) that it receives. For example, a control code followed by some data in a specific format might be interpreted so that any subsequent octets to be interpreted according to a table identified in some specific way. This is often called "code page switching", and it means that control codes could be used change the character encoding. And it is then more logical to consider the control codes and associated data at the level of fundamental interpretation of data rather than direct device control. The international standard ISO 2022 defines powerful facilities for using different 8-bit character codes in a document.
Widely used formatting control codes include carriage return (CR), linefeed (LF), and horizontal tab (HT), which in ASCII occupy code positions 13, 10, and 9. The names (or abbreviations) suggest generic meanings, but the actual meanings are defined partly in each character code definition, partly - and more importantly - by various other conventions "above" the character level. The "formatting" codes might be seen as a special case of device control, in a sense, but more naturally, a CR or a LF or a CR LF pair (to mention the most common conventions) when used in a text file simply indicates a new line. As regards to control codes used for line structuring, see Unicode technical report #13 Unicode Newline Guidelines. See also my Unicode line breaking rules: explanations and criticism. The HT (TAB) character is often used for real "tabbing" to some predefined writing position. But it is also used e.g. for indicating data boundaries, without any particular presentational effect, for example in the widely used "tab separated values" (TSV) data format.
A control code, or a "control character" cannot have a graphic presentation (a glyph) in the same way as normal characters have. However, in Unicode there is a separate block Control Pictures which contains characters that can be used to indicate the presence of a control code. They are of course quite distinct from the control codes they symbolize - U+241B symbol for escape is not the same as U+001B escape! On the other hand, a control code might occasionally be displayed, by some programs, in a visible form, perhaps describing the control action rather than the code. For example, upon receiving octet 3 in the example situation above, a program might echo back (onto the terminal) *** or INTERRUPT or ^C. All such notations are program-specific conventions. Some control codes are sometimes named in a manner which seems to bind them to characters. In particular, control codes 1, 2, 3, ... are often called control-A, control-B, control-C, etc. (or CTRL-A or C-A or whatever). This is associated with the fact that on many keyboards, control codes can be produced (for sending to a computer) using a special key labeled "Control" or "Ctrl" or "CTR" or something like that together with letter keys A, B, C, ... This in turn is related to the fact that the code numbers of characters and control codes have been assigned so that the code of "Control-X" is obtained from the code of the upper case letter X by a simple operation (subtracting 64 decimal). But such things imply no real relationships between letters and control codes. The control code 3, or "Control-C", is not a variant of letter C at all, and its meaning is not associated with the meaning of C.