“They speak Arabic in Iran, Pakistan, and Xinjiang, right?”
No, that’s wrong.
“OK, what I meant is that they write Farsi, Urdu, and Uyghur in the Arabic script. That’s right, isn’t it?”
That’s closer, but still not right.
We need to discuss two different but intertwingled ideas, one general and one specific:
- The general idea is the difference between speech and writing. As I’ve mentioned more than once before, language is speech; writing is merely a way to represent speech. Two very closely related languages — Croatian and Serbian, for example — can use different alphabets (Roman and Cyrillic in this instance). Two completely unrelated languages — Basque and Indonesian, for example — can use the same alphabet (Roman in this instance). But in all cases the spoken language came first, and then a decision was reached about how to represent it. If a particular writing system is a poor fit for the language, it might be chosen anyway, and then some modifications are usually made. Turkish, for example, was once written in the Arabic alphabet (mostly for religious reasons) and now is written in the Roman alphabet (mostly for political and economic reasons). The Arabic alphabet was truly unsuitable, as we’ll discuss below, but even the Roman alphabet needed some extra letters, such as ç and ı (dotted and undotted i are two different letters in Turkish).
- The specific idea is that the Arabic alphabet was modified in three different ways for Farsi, Urdu, and Uyghur, two of which are Indo-Iranian languages and one of which is a Turkic language — all in contrast to Arabic, which is a Semitic language.
OK, so let’s look at variations of the Arabic writing system, a system that we discussed very briefly back in August. We’ll rely primarily on a blog and a video. The blog is Morph, a wonderful British publication that describes itself as “a blog about languages and linguistic history published by members of the Surrey Morphology Group, at the University of Surrey, UK, as part of our project on the Loss of Inflection.” Check out their fascinating and informative post from three weeks ago, titled “Arabic-based scripts,” where you’ll learn about the basic structure of Arabic writing, especially the role of vowels (see also my post on Quidditch in Yiddish a couple of weeks ago). Go read the Morph post yourself, which includes not only a fully digestible take on writing in Farsi, Urdu, and Uyghur but also throws in a lagniappe about the astonishing Thaana script from the Maldives. There’s not much that I can add to it, so I’ll settle for reproducing some visuals from their post:
- An Arabic text, with the observation that “nothing above and below the red lines actually appear in every-day texts or in handwriting” (they are there to aid children and foreigners, just as the vowels in Hebrew are marked for us who need them but not for the locals).
- An Urdu text.
- A Uyghur text, including the following comment: “The parts circled are the Uyghur innovations that would be incorrect in Arabic, Persian or in Urdu. Notice their proportion.”
- A Dhivehi text, written in Thaana script, still based on Arabic although you would never know it:
I promised to explain why the Arabic alphabet proved unsuitable for Turkish. There are two basic reasons. One is that the sound inventories in the two languages differ; the other is that vowels are relatively unimportant in Arabic (as they are in Hebrew and Middle Egyptian) and can thus be left unmarked except for us foreigners and children, but they are relatively important in Turkish (as they are in English, which you can see by wondering whether “WLL” means wall, well, or will).
Finally, watch this video, which will teach you everything you want to know about Arabic consonants. Many parts of it go by too quickly, so you may need to pause and rewind from time to time: