How to write in Turkish on the web

How to write in Turkish on the web and why is this a problem?

When computers were first invented no-one ever thought that more than a handful of characters would be needed. It was decided that 256 characters (1 bytes per character) would be enough. Unfortunately, this original character set does not include six of the special characters used in Turkish, ı (small dotless I); İ (large dotted I); ğ (soft g) and Ğ (soft G); ş (s cedilla) and Ş (S cedilla).

What’s the solution?

To get round this problem new character sets have been created. The original character set is now called Latin 1 (ISO-8859-1). For Turkish you can use Latin 5 (ISO-8859-9). Latin 5 substitutes some rarely needed Icelandic characters in the Latin 1 character set with Turkish ones.

Computers sold in Turkey are set up to use the Latin 5 character set straight from the keyboard. In our case, we will have an English keyboard and computer and will be using Latin 5 only for displaying web documents. To work with this character set, place the following line in the header of your web page:

<meta http-equiv=”Content-Type” content=”text/html; charset=utf-8″> or use this line below;

<meta http-equiv=”Content-Type” content=”text/html; charset=ISO-8859-9″ />

The new characters will now display properly on your web browser but will look different in your text editor, or whatever tool you are using to write your documents.

The table shows how the Turkish characters missing in Latin 1 are assigned in the two character sets:

Character Latin 1 Latin 5
0253 ý ı
0221 Ý İ
0240 ð ğ
0208 Ð Ğ
0254 þ ş
0222 Þ Ş

For reference, here are all the characters that won’t be on your keyboard, and how to type them:

Turkish Character Name Latin 5 Shortcut
ı small dotless I 0253
I large dotless I I
i small dotted i i
İ large dotted I 0221
ö o with diaresis 0246 ctrl:,o
Ö O with diaresis 0214 ctrl:,O
ü u with diaresis 0252 ctrl:,u
Ü U with diaresis 0220 ctrl:,U
ğ yumuşak g 0240
Ğ yumuşak G 0208
ç c cedilla 0231 ctrl,,c
Ç C cedilla 0199 ctrl,,C
ş s cedilla 0254
Ş S cedilla 0222

Not forgetting the circumflex characters used in some words of Arabic origin:

Turkish Character Name Latin 5 Shortcut
â a circumflex 0226 ctrl^a
û u circumflex 0251 ctrl^u

If there isn’t a keyboard shortcut for the character you want, you can type it on the numeric keypad by holding down the Alt key. Don’t be tempted to use Microsoft Word as your text editor in order to assign shortcut keys to the missing characters (although you might want to do this anyway just for typing Turkish documents).

You will see below Turkish characters


Turkish letters

That’s great – but how did you type the first table?

Good question. The first table displays both the Latin 1 and the Latin 5 versions of the re-used characters. This document can’t be in Latin 1, else I could not have displayed the Latin 5 characters; but nor can it be in Latin 5, else I could not have displayed the Latin 1 characters.

To get all the characters you have to use Unicode utf-8 encoding. Unicode is an ongoing project to have all the characters in the world in a single system. It comes in three flavours, UTF-8, UTF-16 and UTF-32. The UTF-8 flavour uses between one and six bytes per character instead of one. The first 256 characters are the same as Latin 1, so the special Latin 5 characters have to use codes beyond this range (four digit codes).

character Latin 5 Unicode
ı 0253 &#x0131;
İ 0221 &#x0130;
ğ 0240 &#x011F;
Ğ 0208 &#x011E;
ş 0254 &#x015F;
Ş 0222 &#x015E;

As you can see from the table, Unicode is a right pain. You are better off using Latin 5 whenever you can.

Acknowledgements

Information on character sets other than Latin 1 is hard to find. If you are using Windows 2000 / NT / XP then The Unicode character set is available on your Character Map.

Resource: http://lavocah.org/turkce/special.html

Please share it
Email this to someoneShare on Google+Pin on PinterestTweet about this on TwitterShare on LinkedInShare on Facebook