Albanian (shqip [ʃcip], or gjuha shqipe [ˈɟuha ˈʃcipɛ], meaning "Albanian language") is an independent branch of the Indo-European language family spoken by over 5 million people, primarily in Albania, Kosovo and the Republic of Macedonia but also in other areas of Southern Europe in which there is an Albanian population, including Montenegro and the Preševo Valley of southern Serbia. It is the official language of Albania and Kosovo.
Centuries-old communities speaking Albanian dialects can be found scattered in Croatia (the Arbanasi), Greece (the Arvanites and some communities in Epirus, Western Macedonia and Western Thrace), Italy (the Arbëreshë in Southern Italy, Sicily and Calabria) as well as in Romania and Ukraine. There is also a large Albanian diaspora.
The language is spoken by approximately 7 million people primarily in Albania, Kosovo, Greece, Italy, Macedonia and Montenegro. However, due to the large Albanian diaspora, the total number of speakers is much higher than the native speakers in Southern Europe.
The Albanian language is the official language of Albania and Kosovo and is spoken fluently by the majority of the countries' populations. Albanian is a recognised minority language in Croatia, Italy, Macedonia, Montenegro, Romania and in Serbia. Albanian is also spoken in the Thesprotia and Preveza regional units and in a few villages in Ioannina and Florina regional units in Greece.
Albanian is the third most spoken language in Italy. This is due to the strong historical ties between the countries. Italy has a historical Albanian minority of about 500,000 which are scattered across southern Italy known as Arbëreshë. Approximately 1 million Albanians from Kosovo are dispersed throughout Germany, Switzerland and Austria. These are mainly refugees from Kosovo that migrated during the Kosovo War. In Switzerland, the Albanian language is the seventh most spoken language with 176,293 native speakers.
There are large numbers of Albanian speakers in the United States, Argentina, Chile, Uruguay and Canada. Some of the first ethnic Albanians to arrive in the United States were Arbëreshë. Arbëreshe have a strong sense of identity, and are unique in that they speak an archaic dialect of Tosk Albanian called Arbëreshë.
In North America (United States and Canada) there are approximately 250,000 Albanian speakers. It is spoken in the eastern area of the United States in cities like New York City, New Jersey, Boston, Philadelphia, Ohio and Detroit. Greater New Orleans has a large Arbëresh community. Oftentimes, wherever there are Italians, there are a few Arbëreshe mixed with them. Arbëreshe Americans, therefore are often indistinguishable from Italian Americans due to being assimilated into the Italian American community.
Asia and Oceania
Approximately 1.3 million people of Albanian ancestry live in Turkey, and more than 500,000 recognizing their ancestry, language and culture. There are other estimates however that place the number of people in Turkey with Albanian ancestry and or background upward to 5 million. However, the vast majority of this population is assimilated and no longer possesses fluency in the language, though a vibrant Albanian community maintains its distinct identity in Istanbul to this day.
In Egypt there are around 18,000 Albanians, mostly Tosk speakers. Many are descendants of the Janissary of Muhammad Ali Pasha, an Albanian who became Wāli, and self-declared Khedive of Egypt and Sudan. In addition to the dynasty that he established, a large part of the former Egyptian and Sudanese aristocracy was of Albanian origin. In addition to the recent emigrants, there are older diasporic communities around the world.
Standard Albanian is based on the Tosk dialect of southern Albania. The Albanian language has two distinct dialects, Tosk which is spoken in the south, and Gheg spoken in the north. The Shkumbin river is the rough dividing line between the two dialects.
Gheg is divided into four sub-dialects, in Northwest Gheg, Northeast Gheg, Central Gheg, and Southern Gheg. It is primarily spoken northern Albania and throughout Montenegro, Kosovo and northwestern Macedonia. One fairly divergent dialect is the Upper Reka dialect, which is however classified as Central Gheg. There is also a diaspora dialect in Croatia, the Arbanasi dialect.
Tosk is divided into five sub-dialects, including Northern Tosk (the most numerous in speakers), Labërisht, Çam, Arvanitika, and Arbëresh. Tosk is spoken in southern Albania, southwestern Macedonia and northern and southern Greece. Cham Albanian is spoken in North-western Greece, while Arvanitika is spoken by the Arvanites in southern Greece. In addition Arbëresh is spoken by the Arbëreshë people, descendants of 15th and 16th century migrants who settled in southeastern Italy, in small communities in the regions of Sicily and Calabria.
The Albanian language has been written using many different alphabets since the earliest records from the 14th century. The history of Albanian language orthography is closely related to the cultural orientation and knowledge of certain foreign languages among Albanian writers. The earliest written Albanian records come from the Gheg area in makeshift spellings based on Italian or Greek and sometimes in Arabic characters. Originally, the Tosk dialect was written in the Greek alphabet and the Gheg dialect was written in the Latin script. Both dialects had also been written in the Ottoman Turkish version of the Arabic script, Cyrillic, and some local alphabets (Elbasan, Vithkuqi, Todhri, Veso Bey, Jan Vellara and others, see original Albanian alphabets). More specifically, the writers from northern Albania and under the influence of the Catholic Church used Latin letters, those in southern Albania and under the influence of the Greek Orthodox church used Greek letters, while others throughout Albania and under the influence of Islam used Arabic letters. There were initial attempts to create an original Albanian alphabet during the 1750–1850 period. These attempts intensified after the League of Prizren and culminated with the Congress of Manastir held by Albanian intellectuals from 14 to 22 November 1908, in Manastir (present day Bitola), which decided on which alphabet to use, and what the standardized spelling would be for standard Albanian. This is how the literary language remains. The alphabet is the Latin alphabet with the addition of the letters <ë>, <ç>, and ten digraphs: dh, th, xh, gj, nj, ng, ll, rr, zh and sh.
The Albanian language occupies an independent branch of the Indo-European language tree. In 1854, Albanian was demonstrated to be an Indo-European language by the philologist Franz Bopp. Albanian was formerly compared by few Indo-European linguists with Germanic and Balto-Slavic, all of which share a number of isoglosses with Albanian. Otherwise linguists linked the Albanian language with Latin, Greek and Armenian, while placing Germanic and Balto-Slavic in another branch of Indo-European.
According to the central hypothesis of a project undertaken by the Austrian Science Fund, old Albanian had a significant influence on the development of many languages in the Balkans. Intensive research now aims to confirm this theory. This little-known language is being researched using all available texts before a comparison with other Balkan languages is carried out. The outcome of this work will include the compilation of a lexicon providing an overview of all old Albanian verbs.
As project leader Dr. Schumacher explains, the research is already bearing fruit: "So far, our work has shown that Old Albanian contained numerous modal levels that allowed the speaker to express a particular stance to what was being said. Compared to the existing knowledge and literature, these modal levels are actually more extensive and more nuanced than previously thought. We have also discovered a great many verbal forms that are now obsolete or have been lost through restructuring — until now, these forms have barely even been recognized or, at best, have been classified incorrectly." These verbal forms are crucial to explaining the linguistic history of Albanian and its internal usage. However, they can also shed light on the reciprocal relationship between Albanian and its neighbouring languages. The researchers are following various leads which suggest that Albanian played a key role in the Balkan Sprachbund. For example, it is likely that Albanian is the source of the suffixed definite article in Romanian, Bulgarian and Macedonian, as this has been a feature of Albanian since ancient times.
The first written mention of the Albanian language was on 14 July 1284 in Dubrovnik in modern Croatia when a certain Matthew, witness of a crime, stated: "I heard a voice shouting on the mountainside in the Albanian tongue" (Latin: Audivi unam vocem, clamantem in monte in lingua albanesca). The first audio recording of Albanian was made by Norbert Jokl on April 4, 1914 in Vienna. During the five-century period of the Ottoman presence in Albania, the language was not officially recognized until 1909, when the Congress of Dibra decided that Albanian schools would finally be allowed.
Albanian is considered an isolate within the Indo-European language family; no other language has been conclusively linked to its branch. The only other languages that are the sole surviving member of a branch of Indo-European are Armenian and Greek.
The Albanian language is part of the Indo-European language group and is considered to have evolved from one of the Paleo-Balkan languages of antiquity, although it is still uncertain which particular Paleo-Balkan language represents the ancestor of Albanian, or where in Southern Europe that population lived. In general there is insufficient evidence to connect Albanian with one of those languages, whether one of the Illyrian languages (of which historians mostly confirm), or Thracian and Dacian. Among these possibilities, Illyrian is typically held to be the most probable, though insufficient evidence still clouds the discussion.
Although Albanian shares lexical isoglosses with Greek, Germanic, and to a lesser extent Balto-Slavic, the vocabulary of Albanian is quite distinct. In 1995, Taylor, Ringe and Warnow, using quantitative linguistic techniques, found that Albanian appears to comprise a "subgroup with Germanic". However, they argued that this fact is hardly significant, as Albanian has lost much of its original vocabulary and morphology, and so this "apparently close connection to Germanic rests on only a couple of lexical cognates - hardly any evidence at all".
Early linguistic influences
The earliest loanwords attested in Albanian come from Doric Greek, whereas the strongest influence came from Latin. According to Matthew C. Curtis, the loanwords do not necessarily indicate the geographical location of the ancestor of Albanian language. However, according to other linguists, the borrowed words can help to get an idea about the place of origin and the evolution of the Albanian language. According to another group of linguists, Albanian originates from an area located east of its present geographic spread due to the several common lexical items found between the Albanian and Romanian languages.
The period during which Proto-Albanian and Latin interacted was protracted and drawn out roughly from the 2nd century BC to the 5th century AD. This is borne out into roughly three layers of borrowings, the largest number belonging to the second layer. The first, with the fewest borrowings, was a time of less important interaction. The final period, probably preceding the Slavic or Germanic invasions, also has a notably smaller number of borrowings. Each layer is characterized by a different treatment of most vowels, the first layer having several that follow the evolution of Early Proto-Albanian into Albanian; later layers reflect vowel changes endemic to Late Latin and presumably Proto-Romance. Other formative changes include the syncretism of several noun case endings, especially in the plural, as well as a large-scale palatalization.
A brief period followed, between the 7th and the 9th centuries, that was marked by heavy borrowings from Southern Slavic, some of which predate the "o-a" shift common to the modern forms of this language group. Starting in the latter 9th century, there was a period characterized by protracted contact with the Proto-Romanians, or Vlachs, though lexical borrowing seems to have been mostly one sided—from Albanian into Romanian. Such borrowing indicates that the Romanians migrated from an area where the majority was Slavic (i.e. Middle Bulgarian) to an area with a majority of Albanian speakers (i.e. Dardania, where Vlachs are recorded in the 10th century). Their movement is probably related to the expansion of the Bulgarian Empire into Albania around that time.
Jernej Kopitar (1780–1844) was the first to note Latin's influence on Albanian and claimed "the Latin loanwords in the Albanian language had the pronunciation of the time of Emperor Augustus". Kopitar gave examples such as Albanian qiqer from Latin cicer (meaning chickpeas), qytet from civitas (meaning city), peshk from piscis (meaning fish) and shigjetë from sagitta (meaning arrow). The hard pronunciations of Latin ⟨c⟩ and ⟨g⟩ are retained as palatal and velar stops in the Albanian loanwords. Gustav Meyer (1888) and Wilhelm Meyer-Lübke (1914) later corroborated this. Meyer noted the similarity between the Albanian verbs shqipoj and shqiptoj (both meaning to enunciate) and the Latin word excipio (meaning to welcome). Therefore, he believed that the word Shqiptar (meaning Albanian) was derived from shqipoj, which in turn was derived from the Latin word excipio. Johann Georg von Hahn, an Austrian linguist, had proposed the same theory in 1854.
Eqrem Çabej also noticed, among other things, the archaic Latin elements in Albanian:
- Latin /au/ becomes Albanian /a/ in the earliest borrowings: aurum → ar ; gaudium → gaz ; laurus → lar. Latin /au/ is retained in later borrowings, but is altered in a way similar to Greek: causa → kafshë ; laud → lavd.
- Latin /ō/ becomes Albanian /e/ in the oldest Latin borrowings: pōmum → pemë ; hōra → herë. An analogous mutation occurred from Proto-Indo-European to Albanian; PIE *nōs became Albanian ne, PIE *oḱtō + suffix -ti- became Albanian tetë etc.
- Latin unstressed internal and initial syllables become lost in Albanian: cubitus → kub ; medicus → mjek ; paludem > V. Latin padule → pyll. An analogous mutation occurred from Proto-Indo-European to Albanian. In contrast, in later Latin borrowings, the internal syllable is retained: paganus → pagan ; plaga → plagë etc.
- Latin /tj/, /dj/, /kj/ palatalized to Albanian /s/, /z/, /c/: vitius → ves ; ratio → arsye ; radius → rreze ; facies → faqe ; socius → shoq etc. In turn, Latin /s/ was altered to /ʃ/ in Albanian.
Haralambie Mihăescu demonstrated that:
- Some 85 Latin words have survived in Albanian but not (as inherited) in any Romance language. A few examples include bubulcus → bujk, hibernalia → mërrajë, sarcinarius → shelqëror, trifurcus → tërfurk, accipiter → skifter, chersydrus → kuçedër, spleneticum → shpretkë, solanum → shullë.
- 151 Albanian words of Latin origin were not inherited in Romanian. A few examples include Albanian mik from Latin amicus, or armik from inimicus, arsye from rationem, bekoj from benedicere, qelq from calix (calicis), kështjellë from castellum, qind from centum, gjel from gallus, gjymtyrë from iunctura, mjek from medicus, rrjetë from rete, shpresoj from sperare, vullnet from voluntas (voluntatis).
- Some Albanian church terminology have phonetic features which demonstrate their very early borrowing from Latin. A few examples include Albanian altar from Latin altare, engjëll from angelus, bekoj from benedicere, i krishterë from christianus, kryq from crux (crucis), kishë from ecclesia, ipeshkv from episcopus, ungjill from evangelium, mallkoj from maledicere, meshë from missa, murg from monacus, pagan from paganus.
Other authors have detected Latin loanwords in Albanian with an ancient sound pattern from the 1st century BC, for example, Albanian qingëlë from Latin cingula and Albanian e vjetër from Latin vetus/veteris. The Romance languages inherited these words from Vulgar Latin: Vulgar Latin *cingla became N. Romanian chinga, meaning "belly band, saddle girth", and Vulgar Latin vetrānus > *vatrānus became N. Romanian bătrân, meaning "old".
Albanian, Basque, and the surviving Celtic languages such as Breton and Welsh, are the non-Romance languages today that have this sort of extensive Latin element dating from ancient Roman times, which have undergone the sound changes associated with the languages. Other languages in or near the former Roman area either came on the scene later (Turkish, the Slavic languages, Arabic) or borrowed little from Latin despite coexisting with it (Greek, German), although German does have a few such ancient Latin borrowings (Fenster, Käse, Köln).
Romanian scholars such as Vatasescu and Mihaescu, using lexical analysis of the Albanian language, have concluded that Albanian was heavily influenced by an extinct Romance language that was distinct from both Romanian and Dalmatian. Because the Latin words common to only Romanian and Albanian are significantly less than those that are common to only Albanian and Western Romance, Mihaescu argues that the Albanian language evolved in a region with much greater contact to Western Romance regions than to Romanian-speaking regions, and located this region in present-day Albania, Kosovo and Western Macedonia, spanning east to Bitola and Pristina.
Historical presence and location
The place and the time where the Albanian language was formed is uncertain. American linguist Eric Hamp stated that during an unknown chronological period a pre-Albanian population (termed as "Albanoid" by Hamp) inhabited areas stretching from Poland to southwestern Balkans. Further analysis has suggested that it was in a mountainous region rather than on a plain or seacoast: while the words for plants and animals characteristic of mountainous regions are entirely original, the names for fish and for agricultural activities (such as ploughing) are borrowed from other languages.
A deeper analysis of the vocabulary, however, shows that this could be a consequence of a prolonged Latin domination of the coastal and plain areas of the country, rather than evidence of the original environment where the Albanian language was formed. For example, the word for 'fish' is borrowed from Latin, but not the word for 'gills', which is native. Indigenous are also the words for 'ship', 'raft', 'navigation', 'sea shelves' and a few names of fish kinds, but not the words for 'sail', 'row' and 'harbor'—objects pertaining to navigation itself and a large part of sea fauna. This rather shows that Proto-Albanians were pushed away from coastal areas in early times (probably after the Latin conquest of the region) thus losing large parts (or the majority) of sea environment lexicon. A similar phenomenon could be observed with agricultural terms. While the words for 'arable land', 'corn', 'wheat', 'cereals', 'vineyard', 'yoke', 'harvesting', 'cattle breeding', etc. are native, the words for 'ploughing', 'farm' and 'farmer', agricultural practices, and some harvesting tools are foreign. This, again, points to intense contact with other languages and people, rather than providing evidence of a possible Urheimat.
The centre of Albanian settlement remained the Mat river. In 1079, they were recorded farther south in the valley of the Shkumbin river. The Shkumbin, a seasonal stream that lies near the old Via Egnatia, is approximately the boundary of the primary dialect division for Albanian, Tosk and Gheg. The characteristics of Tosk and Gheg in the treatment of the native and loanwords from other languages are evidence that the dialectal split preceded the Slavic migration to the Balkans, which means that in that period (the 5th to 6th centuries AD), Albanians were occupying nearly the same area around the Shkumbin river, which straddled the Jireček Line.
References to the existence of Albanian as a distinct language survive from the 14th century, but they failed to cite specific words. The oldest surviving documents written in Albanian are the "formula e pagëzimit" (Baptismal formula), Un'te paghesont' pr'emenit t'Atit e t'Birit e t'Spertit Senit. ("I baptize thee in the name of the Father, and the Son, and the Holy Spirit") recorded by Pal Engjelli, Bishop of Durrës in 1462 in the Gheg dialect, and some New Testament verses from that period.
The oldest known Albanian printed book, Meshari, or "missal", was written in 1555 by Gjon Buzuku, a Roman Catholic cleric. In 1635, Frang Bardhi wrote the first Latin–Albanian dictionary. The first Albanian school is believed to have been opened by Franciscans in 1638 in Pdhanë.
One of the earliest dictionaries of Albanian language was written in 1693 which was an Italian language manuscript authored by Montenegrin sea captain Julije Balović Pratichae Schrivaneschae and includes a multilingual dictionary of hundreds of the most often used words in everyday life in the Italian, Slavo-Illirico, Greek, Albanian and Turkish languages.
Although Albanian has been referred to as the "weird sister" for several words that do not correspond to IE cognates, it has retained many proto-IE features: for example, the demonstrative pronoun **ḱo- is ancestral to Albanian ky/kjo and English he but not to English this or to Russian etot.
Albanian is compared to other Indo-European languages below, but note that Albanian has exhibited some notable instances of semantic drift (such as motër meaning "sister" rather than "mother" or the Latin loans gjelbër and verdhë having become switched in meaning).
|Old Church Slavonic||мѣсѧць|
Albanian–PIE phonological correspondences
Phonologically, Albanian is not so conservative. Like many IE stocks, it has merged the two series of voiced stops (e.g. both *d and *dʰ became d). In addition, voiced stops tend to disappear in between vowels. There is almost complete loss of final syllables and very widespread loss of other unstressed syllables (e.g. mik, "friend" from Lat. amicus). PIE *o appears as a (also as e if a high front vowel i follows), while *ē and *ā become o, and PIE *ō appears as e. The palatals, velars, and labiovelars all remain distinct before front vowels, a conservation found otherwise in Luvian and related Anatolian languages. Thus PIE *ḱ, *k, and *kʷ become th, q, and s, respectively (before back vowels *ḱ becomes th, while *k and *kʷ merge as k). Another remarkable retention is the preservation of initial *h4 as Alb. h (all other laryngeals disappear completely).
|*p||p||*pékʷo—"cook"||pjek "to cook, roast, bake"|
|*b||b||*sorbéi̯e/o—"drink, slurp"||gjerb "to drink"|
|*bʰ||b||*bʰaḱeh₂—"bean"||bathë "broad bean"|
|*t||t||*túh₂—"thou"||ti "you (singular)"|
|dh||*pérde/o—"fart"||pjerdh "to fart"|
|g||*dl̥h₁gʰós—"long"||gjatë "long" (Tosk dial. glatë)|
|*dʰ||d||*dʰégʷʰe/o—"burn"||djeg "to burn"|
- Between vowels or after r
|*ḱ||th||*ḱéh₁mi—"I say"||them "I say"|
|ç/c||*ḱentro—"to stick"||çandër "prop"|
|*ǵ||dh||*ǵómbʰos—"tooth, peg"||dhëmb "tooth"|
|d||*ǵēusnō—"to enjoy"||dua "to love, want"|
|*ǵʰ||dh||*ǵʰedi̯e/o—"to defecate"||dhjes "to defecate"|
|d||*ǵʰr̥sdʰi—"grain, barley"||drithë "grain"|
|*k||k||*kágʰmi—"I catch, grasp"||kam "I have"|
|q||*klau-ei̯e/o—"to weep"||qaj "to weep, cry" (Gheg qanj, Salamis kla)|
|gj||*h₁reuge—"to retch"||regj "to tan hides"|
|gj||*gʰédni̯e/o—"get"||gjej "to find" (Gheg gjêj)|
|s||*kʷéle/o—"turn"||sjell "to fetch, bring"|
|z||*gʷērHu—"heaviness"||zor "heaviness; trouble"|
|*gʷʰ||g||*dʰégʷʰe/o—"to burn"||djeg "to burn"|
|z||*h1en-dʰogʷʰéi̯e/o—"to ignite"||ndez "to kindle, turn on"|
|h||*nosōm—"us" (gen.)||nahe "us" (dat.)|
|sht||*h₂osti "bone"||asht "bone"|
|∅||h₁ésmi—"am"||jam "to be"|
- Between vowels
- Between vowels and after u̯/i̯/r/k (ruki law)
- Cluster -sd-
- Cluster -sḱ-
- Cluster -sp-
- Cluster -st-
- Dissimilation with following vowel
|*i̯||gj||*i̯ése/o—"to ferment"||gjesh "to knead"|
|j||*i̯uHs—"you" (nom.)||ju "you (plural)"|
|∅||*bʰéri̯ō—"bear, carry"||bie(r) "to bring"|
|*u̯||v||*u̯oséi̯e/o—"to dress"||vesh "to wear, dress"|
|*n||n||*nōs—"we" (acc.)||ne "we"|
|nj||*eni-h₁ói-no—"that one"||një "one" (Gheg njâ, njo)|
|∅/^||*pénkʷe—"five"||pesë, Gheg pês "five"|
|r||*ǵʰeimen—"winter"||dimër "winter" (Gheg dimën)|
|ll||*kʷéle/o—"turn"||sjell "to fetch, bring"|
|rr||*u̯rh₁ḗn—"sheep"||rrunjë "yearling lamb"|
|*l̥||uj||*u̯ĺ̥kʷos—"wolf"||ujk "wolf" (Chamian ulk)|
|*r̥||ri, ir||*ǵʰr̥sdom—"grain, barley"||drithë "grain"|
- Before i, e, a
- Before back vowels
- After front vowels
- After all other vowels
|*h1||∅||*h₁ésmi—"am"||jam "to be"|
|*i||i||*sínos—"bosom"||gji "bosom, breast"|
|*e||e||*pénkʷe—"five"||pesë "five" (Gheg pês)|
|je||*u̯étos—"year" (loc.)||vjet "last year"|
|*a||a||*bʰaḱeh₂- "bean"||bathë "bean"|
Until the early 20th century, Albanian writing developed in three main literary traditions: Gheg, Tosk, and Arbëreshë. Throughout this time, an intermediate subdialect spoken around Elbasan served as lingua franca among the Albanians, but was less prevalent in writing. The Congress of Manastir of Albanian writers held in 1908 recommended the use of the Elbasan subdialect for literary purposes and as a basis of a unified national language. While technically classified as a southern Gheg variety, the Elbasan speech is closer to Tosk in phonology and practically a hybrid between other Gheg subdialects and literary Tosk.
Between 1916 and 1918, the Albanian Literary Commission met in Shkodër under the leadership of Luigj Gurakuqi with the purpose of establishing a unified orthography for the language. The Commission, made up of representatives from the north and south of Albania, reaffirmed the Elbasan subdialect as the basis of a national tongue. The rules published in 1917 defined spelling for the Elbasan variety for official purposes. The Commission did not, however, discourage publications in one of the dialects, but rather laid a foundation for Gheg and Tosk to gradually converge into one.
When the Congress of Lushnje met in the aftermath of World War I to form a new Albanian government, the 1917 decisions of the Literary Commission were upheld. The Elbasan subdialect remained in use for administrative purposes and many new writers embraced for creative writing. Gheg and Tosk continued to develop freely and interaction between the two dialects increased.
At the end of World War II, however, the new communist regime radically imposed the use of the Tosk dialect in all facets of life: administration, education, and literature. Most Communist leaders were Tosks from the south. Standardization was directed by the Albanian Institute of Linguistics and Literature of the Academy of Sciences of Albania. Two dictionaries were published in 1954: an Albanian language dictionary and a Russian–Albanian dictionary. New orthography rules were eventually published in 1967 and 1973 Drejtshkrimi i gjuhës shqipe (Orthography of the Albanian Language).
More recent dictionaries from the Albanian government are Fjalori Drejtshkrimor i Gjuhës Shqipe (1976) (Orthographic Dictionary of the Albanian Language) and Dictionary of Today's Albanian language (Fjalori Gjuhës së Sotme Shqipe) (1980). Prior to World War II, dictionaries consulted by developers of the standard have included Lexikon tis Alvanikis glossis (Albanian: Fjalori i Gjuhës Shqipe (Kostandin Kristoforidhi, 1904), Fjalori i Bashkimit (1908), and Fjalori i Gazullit (1941).
Calls for reform
Since the fall of the communist regime, Albanian orthography has stirred heated debate among scholars, writers, and public opinion in Albania and Kosovo, with hardliners opposed to any changes in the orthography, moderates supporting varying degrees of reform, and radicals calling for a return to the Elbasan dialect. Criticism of Standard Albanian has centred on the exclusion of the 'me+' infinitive and the Gheg lexicon. Critics say that Standard Albanian disenfranchises and stigmatizes Gheg speakers, affecting the quality of writing and impairing effective public communication. Supporters of the Tosk standard contend view the 1972 Congress as a milestone achievement in Albanian history and dismiss calls for reform as efforts to "divide the nation" or "create two languages." Moderates, who are especially prevalent in Kosovo, generally stress the need for a unified Albanian language, but believe that the 'me+' infinitive and Gheg words should be included. Proponents of the Elbasan dialect have been vocal, but have gathered little support in the public opinion. In general, those involved in the language debate come from diverse backgrounds and there is no significant correlation between one's political views, geographic origin, and position on Standard Albanian.
Many writers have continued to write in the Elbasan dialect but other Gheg variants have found much more limited use in literature. But most publications adhere to a strict policy of not accepting submissions that are not written in Tosk. Some print media even translate direct speech, replacing the 'me+' infinitive with other verb forms and making other changes in grammar and word choice. Even authors who have published in the Elbasan dialect will frequently write in the Tosk standard.
In the recent years, a group of academics for Albania and Kosovo have proposed minor changes in the orthography. Hardline academics boycotted the initiative, while other reformers have viewed it as superficial. Media such as Rrokum and Java have offered content that is almost exclusively in the Elbasan dialect. Meanwhile, author and linguist Agim Morina has promoted a reformed version of the Tosk standard that aims at reflecting the natural development of the language among all Albanians. Morina's variant incorporates the 'me+' infinitive, accommodates for Gheg features, and provides for simpler and dialect-neutral rules.
Albanian is the medium of instruction in most Albanian schools. The literacy rate in Albania for the total population, age 9 or older, is about 99%. Elementary education is compulsory (grades 1–9), but most students continue at least until a secondary education. Students must pass graduation exams at the end of the 9th grade and at the end of the 12th grade in order to continue their education.
Standard Albanian has 7 vowels and 29 consonants. Like English, Albanian has dental fricatives /θ ð/, which are rare cross-linguistically. They are written as th and dh, and similar to the consonants at the beginning of English thin and this.
Gheg uses long and nasal vowels, which are absent in Tosk, and the mid-central vowel ë is lost at the end of the word. The stress is fixed mainly on the last syllable. Gheg n (femën: compare English feminine) changes to r by rhotacism in Tosk (femër).
|IPA||Description||Written as||Pronounced as in|
|p||Voiceless bilabial plosive||p||spin|
|b||Voiced bilabial plosive||b||bat|
|t||Voiceless alveolar plosive||t||stand|
|d||Voiced alveolar plosive||d||debt|
|k||Voiceless velar plosive||k||scar|
|ɡ||Voiced velar plosive||g||go|
|t͡s||Voiceless alveolar affricate||c||hats|
|d͡z||Voiced alveolar affricate||x||goods|
|t͡ʃ||Voiceless postalveolar affricate||ç||chin|
|d͡ʒ||Voiced postalveolar affricate||xh||jet|
|c͡ç||Voiceless palatal affricate||q||~china (RP)|
|ɟ͡ʝ||Voiced palatal affricate||gj||~gem (RP)|
|f||Voiceless labiodental fricative||f||far|
|v||Voiced labiodental fricative||v||van|
|θ||Voiceless dental fricative||th||thin|
|ð||Voiced dental fricative||dh||then|
|s||Voiceless alveolar fricative||s||son|
|z||Voiced alveolar fricative||z||zip|
|ʃ||Voiceless postalveolar fricative||sh||show|
|ʒ||Voiced postalveolar fricative||zh||vision|
|h||Voiceless glottal fricative||h||hat|
|r||Alveolar trill||rr||Spanish perro|
|ɾ||Alveolar tap||r||Spanish pero|
|l||Alveolar lateral approximant||l||lean|
|ɫ||Velarized alveolar lateral approximant||ll||ball|
- The contrast between flapped r and trilled rr is the same as in Spanish or Armenian. In most of the dialects, as also in standard Albanian, the single "r" changes from an alveolar flap /ɾ/ into a retroflex flap [ɽ], or even an alveolar approximant [ɹ] when it is at the end of a word.
- The palatal nasal /ɲ/ corresponds to the Spanish ñ and the French and Italian gn. It is pronounced as one sound, not a nasal plus a glide.
- The ll sound is a velarised lateral, close to English dark L.
- The letter ç is sometimes written ch due to technical limitations because of its use in English sound and its analogy to the other digraphs xh, sh, and zh. Usually it is written simply c or more rarely q with context resolving any ambiguities.
- Many speakers merge the palatal sounds q and gj into the palatoalveolar sounds ç and xh. This is especially common in Northern Gheg, but is increasingly the case in Tosk as well.
|IPA||Description||Written as||Pronounced as in|
|i||Close front unrounded vowel||i||seed|
|ɛ||Open-mid front unrounded vowel||e||bed|
|a||Open central unrounded vowel||a||father, Spanish casa|
|ɔ||Open-mid back rounded vowel||o||law|
|y||Close front rounded vowel||y||French tu, German über|
|u||Close back rounded vowel||u||boot|
Although the Indo-European schwa (ə or -h2-) was preserved in Albanian, in some cases it was lost, possibly when a stressed syllable preceded it. Until the standardization of the modern Albanian alphabet, in which the schwa is spelled as ë, as in the work of Gjon Buzuku in the 16th century, various vowels and gliding vowels were employed, including ae by Lekë Matrënga and é by Pjetër Bogdani in the late 16th and early 17th century. The schwa in Albanian has a great degree of variability from extreme back to extreme front articulation. Within the borders of Albania, the phoneme is pronounced about the same in both the Tosk and the Gheg dialect due to the influence of standard Albanian. However, in the Gheg dialects spoken in the neighbouring Albanian-speaking areas of Kosovo and Macedonia, the phoneme is still pronounced as back and rounded.
Albanian has a canonical word order of SVO (subject–verb–object) like English and many other Indo-European languages. Albanian nouns are inflected by gender (masculine, feminine and neuter) and number (singular and plural). There are five declensions with six cases (nominative, accusative, genitive, dative, ablative, and vocative), although the vocative only occurs with a limited number of words, and the forms of the genitive and dative are identical (a genitive is produced when the prepositions i/e/të/së are used with the dative). Some dialects also retain a locative case, which is not present in standard Albanian. The cases apply to both definite and indefinite nouns, and there are numerous cases of syncretism.
The following shows the declension of mal (mountain), a masculine noun which takes "i" in the definite singular:
|Indefinite singular||Indefinite plural||Definite singular||Definite plural|
|Nominative||një mal (a mountain)||male (mountains)||mali (the mountain)||malet (the mountains)|
|Genitive||i/e/të/së një mali||i/e/të/së maleve||i/e/të/së malit||i/e/të/së maleve|
|Ablative||(prej) një mali||(prej) malesh||(prej) malit||(prej) maleve|
The following shows the declension of the masculine noun zog (bird), a masculine noun which takes "u" in the definite singular:
|Indefinite singular||Indefinite plural||Definite singular||Definite plural|
|Nominative||një zog (a bird)||zogj (birds)||zogu (the bird)||zogjtë (the birds)|
|Genitive||i/e/të/së një zogu||i/e/të/së zogjve||i/e/të/së zogut||i/e/të/së zogjve|
|Ablative||(prej) një zogu||(prej) zogjsh||(prej) zogut||(prej) zogjve|
The following table shows the declension of the feminine noun vajzë (girl):
|Indefinite singular||Indefinite plural||Definite singular||Definite plural|
|Nominative||një vajzë (a girl)||vajza (girls)||vajza (the girl)||vajzat (the girls)|
|Genitive||i/e/të/së një vajze||i/e/të/së vajzave||i/e/të/së vajzës||i/e/të/së vajzave|
|Ablative||(prej) një vajze||(prej) vajzash||(prej) vajzës||(prej) vajzave|
- The definite article can be in the form of noun suffixes, which vary with gender and case.
- For example, in singular nominative, masculine nouns add -i, or those ending in -g/-k/-h take -u (to avoid palatalization):
- mal (mountain) / mali (the mountain);
- libër (book) / libri (the book);
- zog (bird) / zogu (the bird).
- Feminine nouns take the suffix -(i/j)a:
- veturë (car) / vetura (the car);
- shtëpi (house) / shtëpia (the house);
- lule (flower) / lulja (the flower).
- For example, in singular nominative, masculine nouns add -i, or those ending in -g/-k/-h take -u (to avoid palatalization):
- Neuter nouns take -t.
Albanian has developed an analytical verbal structure in place of the earlier synthetic system, inherited from Proto-Indo-European. Its complex system of moods (six types) and tenses (three simple and five complex constructions) is distinctive among Balkan languages. There are two general types of conjugations.
Albanian verbs, like those of other Balkan languages, have an "admirative" mood (mënyra habitore) that is used to indicate surprise on the part of the speaker or to imply that an event is known to the speaker by report and not by direct observation. In some contexts, this mood can be translated using English "apparently".
- Ti flet shqip. "You speak Albanian." (indicative)
- Ti fliske shqip! "You (surprisingly) speak Albanian!" (admirative)
- Rruga është e mbyllur. "The street is closed." (indicative)
- Rruga qenka e mbyllur. "(Apparently,) The street is closed." (admirative)
For more information on verb conjugation and on inflection of other parts of speech, see Albanian morphology.
Albanian word order is relatively free. To say 'Agim ate all the oranges' in Albanian, one may use any of the following orders, with slight pragmatic differences:
- SVO: Agimi i hëngri të gjithë portokallët.
- SOV: Agimi të gjithë portokallët i hëngri.
- OVS: Të gjithë portokallët i hëngri Agimi.
- OSV: Të gjithë portokallët Agimi i hëngri.
- VSO: I hëngri Agimi të gjithë portokallët.
However, the most common order is subject–verb–object, and negation is expressed by the particles nuk or s' in front of the verb, for example:
- Toni nuk flet anglisht "Tony does not speak English";
- Toni s'flet anglisht "Tony doesn't speak English";
- Nuk e di "I do not know";
- S'e di "I don't know".
However, the verb can optionally occur in sentence-initial position, especially with verbs in the non-active form (forma joveprore):
- Parashikohet një ndërprerje "An interruption is anticipated".
In imperative sentences, the particle mos is used for negation:
- Mos harro "do not forget!".
Earliest undisputed texts
The earliest known texts in Albanian:
-  The work is a manuscript decorated with golden miniatures and colored initials, divided in three parts. Pages 1–97 deal with theology, 98–146 with philosophy, and pages 147–208 with a history of the known world from AD 153 to 1209. On the final page of the manuscript we find a note by the author "With the assistance and great love of the blessed Lord, I finished this in the year 1210 on the 9th day of March."
- the "formula e pagëzimit" (Baptismal Formula), which dates back to 1462 and was authored by Pal Engjëlli (or Paulus Angelus) (c. 1417 – 1470), Archbishop of Durrës. Engjëlli was a close friend and counsellor of Skanderbeg. It was written in a pastoral letter for a synod at the Holy Trinity in Mat and read in Latin characters as follows: Unte paghesont premenit Atit et Birit et Spertit Senit (standard Albanian: "Unë të pagëzoj në emër të Atit, të Birit e të Shpirtit të Shenjtë"; English: "I baptize you in the name of the Father and the Son and the Holy Spirit"). It was discovered and published in 1915 by Nicolae Iorga.
- the Fjalori i Arnold von Harfit (Arnold Ritter von Harff's lexicon), a short list of Albanian phrases with German glosses, dated 1496.
- a song, recorded in the Greek alphabet, retrieved from an old codex that was written in Greek. The document is also called "Perikopeja e Ungjillit të Pashkëve" or "Perikopeja e Ungjillit të Shën Mateut" ("The Song of the Easter Gospel, or "The Song of Saint Matthew's Gospel"). Although the codex is dated to during the 14th century, the song, written in Albanian by an anonymous writer, seems to be a 16th-century writing. The document was found by Arbëreshë people who had emigrated to Italy in the 15th century.
- The first book in Albanian is the Meshari ("The Missal"), written by Gjon Buzuku between 20 March 1554 and 5 January 1555. The book was written in the Gheg dialect in the Latin script with some Slavic letters adapted for Albanian vowels. The book was discovered in 1740 by Gjon Nikollë Kazazi, the Albanian archbishop of Skopje. It contains the liturgies of the main holidays. There are also texts of prayers and rituals and catechetical texts. The grammar and the vocabulary are more archaic than those in the Gheg texts from the 17th century. The 188 pages of the book comprise about 154,000 words with a total vocabulary of c. 1,500 different words. The text is archaic yet easily interpreted because it is mainly a translation of known texts, in particular portions of the Bible. The book also contains passages from the Psalms, the Book of Isaiah, the Book of Jeremiah, the Letters to the Corinthians, and many illustrations. The uniformity of spelling seems to indicate an earlier tradition of writing. The only known copy of the Meshari is held by the Apostolic Library. In 1968 the book was published with transliterations and comments by linguists.
Disputed earlier text
In 1967 two scholars claimed to have found a brief text in Albanian inserted into the Bellifortis text, a book written in Latin dating to 1402–1405.
"A star has fallen in a place in the woods, distinguish the star, distinguish it.
Distinguish the star from the others, they are ours, they are.
Do you see where the great voice has resounded? Stand beside it
That thunder. It did not fall. It did not fall for you, the one which would do it.
Like the ears, you should not believe ... that the moon fell when ...
Try to encompass that which spurts far ...
Call the light when the moon falls and no longer exists ..."
Dr. Robert Elsie, a specialist in Albanian studies, considers that "The Todericiu/Polena Romanian translation of the non-Latin lines, although it may offer some clues if the text is indeed Albanian, is fanciful and based, among other things, on a false reading of the manuscript, including the exclusion of a whole line."
In 1635, Frang Bardhi (1606–1643) published in Rome his Dictionarum latinum-epiroticum, the first known Latin-Albanian dictionary. Other scholars who studied the language during the 17th century include Andrea Bogdani (1600–1685), author of the first Latin-Albanian grammar book, Nilo Katalanos (1637–1694) and others.
Cognates with Illyrian
- Andena/Andes/Andio/Antis — personal Illyrian names based on a root-word and- or ant-, found in both the southern and the Dalmatian-Pannonian (including modern Bosnia and Herzegovina) onomastic provinces; cf. Alb. andë (northern Albanian dialect, or Gheg) and ëndë (southern Albanian dialect or Tosk) "appetite, pleasure, desire, wish"; Andi proper name, Andizetes, an Illyrian people inhabiting the Roman province of Panonia.
- aran "field"; cf. Alb. arë; plural ara
- Ardiaioi/Ardiaei, name of an Illyrian people, cf. Alb. ardhja "arrival" or "descent", connected to hardhi "vine-branch, grape-vine", with a sense development similar to Germanic *stamniz, meaning both stem, tree stalk and tribe, lineage. However, the insufficiency of this theory is that so far there is no certainty as to the historical or etymological development of either ardhja/hardhi or Ardiaioi, as with many other words.
- Bilia "daughter"; cf. Alb. bijë, dial. bilë
- Bindo/Bindus, an Illyrian deity from Bihać, Bosnia and Herzegovina; cf. Alb. bind "to convince" or "to make believe", përbindësh "monster".
- bounon, "hutt, cottage"; cf. Alb bun
- brisa, "husk of grapes"; cf. Alb bërsí "lees, dregs; mash" ( < PA *brutiā)
- Barba- "swamp", a toponym from Metubarbis; possibly related to Alb. bërrakë "swampy soil"
- can- "dog"; related to Alb. qen
- Daesitiates, a name of an Illyrian people, cf. Alb. dash "ram", corresponding contextually with south Slavonic dasa "ace", which might represent a borrowing and adaptation from Illyrian (or some other ancient language).
- mal, "mountain"; cf. Alb mal
- bardi, "white"; cf. Alb bardhë
- drakoina "supper"; cf. Alb. darke, dreke
- drenis, "deer"; cf. Alb dre, dreni
- delme "sheep"; cf. Alb dele, Gheg dialect delme
- dard, "pear"; cf. Alb dardhë
- Hyllus (the name of an Illyrian king); cf. Alb. yll (hyll in some northern dialects) "star", also Alb. hyj "god", Ylli proper name.
- sīca, "dagger"; cf. Alb thikë or thika "knife"
- Ulc-, "wolf" (pln. Ulcinium); cf. Alb ujk "wolf", ulk (Northern Dialect)
- loúgeon, "pool"; cf. Alb lag, legen "to wet, soak, bathe, wash" ( < PA *lauga), lëgatë "pool" ( < PA *leugatâ), lakshte "dew" ( < PA laugista)
- mag- "great"; cf. Alb. i madh "big , great"
- mantía "bramblebush"; Old and dial. Alb mandë "berry, mulberry" (mod. Alb mën, man)
- rhinos, "fog, mist"; cf. Old Alb ren "cloud" (mod. Alb re, rê) ( < PA *rina)
- Vendum "place"; cf. Proto-Alb. wen-ta (Mod. Alb. vend)
Early Greek loans
There are some 30 Ancient Greek loanwords in Albanian. Many of these reflect a dialect which voiced its aspirants, as did the Macedonian dialect. Other loanwords are Doric; these words mainly refer to commodity items and trade goods and probably came through trade with a now-extinct intermediary.
- bletë; "hive, bee" < Attic mélitta "bee" (vs. Ionic mélissa).
- drapër; "sickle" < (NW) drápanon
- kumbull; "plum" < kokkúmelon
- lakër; "cabbage, green vegetables" < láchanon "green; vegetable"
- lëpjetë; "orach, dock" < lápathon
- leva (lyej); "to smear, oil" < *liwenj < *elaiwā < Gk elai(w)ṓn "oil"
- mokër; "millstone" < (NW) māchaná "device, instrument"
- mollë; "apple" < mēlon "fruit"
- pjepër; "melon" < pépōn
- presh; "leek" < práson
- shpellë; "cave" < spḗlaion
- trumzë; "thyme" < (NW) thýmbrā, thrýmbrē
- fat; "groom, husband" < Goth brūþfaþs "bridegroom"
- horr; "scoundrel", horrë; "hussy, whore" < Goth hors "adulterer", *hora "whore"
- shkulkë; "boundary marker for pastures made of branches" < Late Latin sculca < Goth skulka "guardian"
- shkumë; "foam" < Late Latin < Goth skūma
- tirq; "trousers" < Late Latin tubrucus < Goth *þiobrok "knee-britches"; cf. OHG dioh-bruoh, Eng thigh, breeches
It is assumed that Greek and Balkan Latin (which was the ancestor of Romanian and other Balkan Romance languages) had exerted a great influence on Albanian. Examples of words borrowed from Latin: qytet < civitas (city), qiell < caelum (sky), mik < amicus (friend).
After the Slavs arrived in the Balkans, the Slavic languages became an additional source of loanwords. The rise of the Ottoman Empire meant an influx of Turkish words; this also entailed the borrowing of Persian and Arabic words through Turkish. There is also use of some Turkish personal names such as Altin. Some loanwords from Modern Greek also exist especially in the south of Albania. A lot of the borrowed words have been re-substituted from Albanian rooted words or modern Latinized (international) words.
- Kosovo is the subject of a territorial dispute between the Republic of Kosovo and the Republic of Serbia. The Republic of Kosovo unilaterally declared independence on 17 February 2008, but Serbia continues to claim it as part of its own sovereign territory. The two governments began to normalise relations in 2013, as part of the Brussels Agreement. Kosovo has received formal recognition as an independent state from 110 out of 193 United Nations member states.
- This designation is without prejudice to positions on status, and is in line with UNSCR 1244 and the ICJ Opinion on the Kosovo Declaration of Independence.