This tutorial covers Unicode and its UTF8 mapping standard. It will start at zero with explaining bits and binary structures, followed by an explanation of the ASCII, extended ASCII and Unicode character sets and ending with some conclusions on how to use UTF8 en/decoding with Flash & PHP.
What are bytes and bits?
Bit means “binary digit” and is the smallest unit of computerized data. A bit is a 2base number, i.e. it has either the value of 0 or 1.
A byte is an amount of memory, a certain collection of bits, originally variable in size but now almost always eight bits. This makes 2^{8} or 256 possible values for a byte.
byte = 
1 
2 
3 
4 
5 
6 
7 
8 
bit 
bit 
bit 
bit 
bit 
bit 
bit 
bit 
10 
10 
10 
10 
10 
10 
10 
10 
Some example bytes could be 00000001 or 11111111 or 01010011.
Now how can we calculate the decimal value of this binary encoded byte. What we need is a conversion from base 2 to base 10.
Every 1 or 0 of these binary values is associated with an exponential of 2. For 8 bits it looks like the following:
byte = 
1 
2 
3 
4 
5 
6 
7 
8 
128 (2^{7}) 
64 (2^{6}) 
32 (2^{5}) 
16 (2^{4}) 
8 (2^{3}) 
4 (2^{2}) 
2 (2^{1}) 
1 (2^{0}) 
10 
10 
10 
10 
10 
10 
10 
10 
The calculation of the decimal equivalent of the binary value 00000001:
byte = 
128 
64 
32 
16 
8 
4 
2 
1 

0 
0 
0 
0 
0 
0 
0 
1 
= 1 
The calculation of the decimal equivalent of the binary value 11111111:
byte = 
128 
64 
32 
16 
8 
4 
2 
1 

1 
1 
1 
1 
1 
1 
1 
1 
= 128+64+32+16+8+4+2+1 = 255 
The calculation of the decimal equivalent of the binary value 01010011:
byte = 
128 
64 
32 
16 
8 
4 
2 
1 

0 
1 
0 
1 
0 
0 
1 
1 
= 64+16+2+1 = 83 
What is ASCII?
ASCII stands for American Standard Code for Information Interchange and is a standard for assigning numerical values to the set of letters in the Roman alphabet and typographic characters. The ASCII character set can be represented by 7 bits. This makes 2^{7} or 128 different values resp. characters.
As ASCII uses only 7 of the 8 bits available of an byte the first bit is always 0: 0xxxxxxx;
Below there is a table of decimal values, their binary expressions and the character assigned to that value due to the ASCII standard. The first 32 characters are control characters. To read more :
Source : http://www.zehnet.de/2005/02/12/unicodeutf8tutorial/