Originating in Africa, homo sapiens spread across the globe, and with it the human language. A project is now underway to trace the genealogy of the world's languages with the help of highly advanced methods borrowed from big data, genetics and geostatistics.
Text: Roger Nickl
Translation: Astrid Freuler
The story of modern humans began around 300,000 years ago. It is assumed that homo sapiens originated in Africa, from where it gradually conquered the world. With it came language, which developed in as many different ways as the humans that used it. To ensure survival, both the physical attributes and language of population groups adapted to the different external conditions. The genealogy of the world's languages therefore mirrors the history of humankind – culturally, geographically, but also genetically. "The development of language is an evolutionary process," says linguist Balthasar Bickel. "Language is passed down from generation to generation, just like our genes. And similar to our genetic heritage, it changes over time through mutation and selection. New words and linguistic structures are continually added, while those that are no longer required disappear."
Balthasar Bickel is professor of comparative linguistics at UZH and oversees the National Center of Competence in Research (NCCR) Evolving Language. One of its aims is to trace the genealogy of human language. This means being able to explain where it originated, how it developed, and in what way the over 300 language families and 7,000 languages that exist in the world today relate to each other. In order to find out, Bickel's team are using highly advanced scientific methods – including some not commonly associated with the field of linguistics. Working alongside Bickel are population geneticist Chiara Barbieri and geoinformatics specialist Robert Weibel. Weibel is professor of geographic information science, while Barbieri oversees the Human Genetic Diversity across Languages and Cultures research group at the Department of Evolutionary Biology and Environmental Studies of UZH. Together, the researchers want to investigate the connection between the spread of humans across the globe and how the different languages developed.
In the field of linguistics, the evolution of human language is roughly divided into two stages – the periods before and after the Neolithic Revolution. Prior to the Neolithic Revolution, our ancestors mainly lived as hunter-gatherers, feeding themselves on berries and plants and on the animals they hunted. Grouped into small communities, the hunter-gatherers led a nomadic life. This meant that they constantly had to adapt to new circumstances. At the same time, there was little contact between the widely scattered clans. According to Bickel, these conditions led to the rapid development and diversification of language. "This gives us reason to believe that there have always been many different languages."
From hunting to farming
The hunter-gatherer phase of homo sapiens lasted several 100,000 years. Then, around 10,000 years ago, the Neolithic Revolution set in, which completely changed the way most people lived – the nomadic hunter-gatherers settled and became farmers. This laid the foundations for the early advanced civilizations and for larger organizational structures.
It also brought a significant shift in how languages evolved, as larger language communities started to develop. One reason for this was the volatile nature of agriculture – while some harvests produced a surplus, other crops failed. The surplus needed to be sold and shortages had to be compensated through acquisition. "In order to trade, the farmers had to enlarge their social network," says Balthasar Bickel. And of course, trade required the ability to communicate with other groups. Over time, this led to the formation of more widespread language families that shared certain characteristics.
Agriculture and livestock breeding also allowed greater numbers of people to be fed. As a result, there was a massive growth in the world's population, which in turn forced our ancestors to seek new means of subsistence. "People began to migrate," says UZH population geneticist Chiara Barbieri. The farmers explored new regions, where they formed new groups. This led to a diversification of the genetic makeup, but also of the languages, Barbieri explains. At the same time, the various groups would have remained in contact with each other through trade relationships and through wars. "People within a wider region were just about able to communicate," says Balthasar Bickel.
When people travel, so does their language
As language migrates with people, genetics and language are closely linked. At the same time, the migration of a community will also lead to changes in their language. Conversely, genetic and linguistic concurrences can be an indication of a shared origin. As part of the NCCR Evolving Language, these connections are being investigated and integrated into models.
For this purpose, geostatistics and informatics specialist Robert Weibel and his team have developed simulations which highlight past events such as large-scale migration in a particular part of the world, triggered by environmental factors such as climatic changes. Such migratory movements would have led to new contacts being formed between previously distant groups of people, but could also cause related groups to drift apart. Both of these changes are reflected in the genetic makeup and the language of a community.
Population geneticist Chiara Barbieri is therefore exploring how different population groups are related to each other and at what point in history they went their separate ways. This enables her to trace historical developments, which also provide an indication of how individual languages branched out and continued to evolve, for example by growing closer to new linguistic influences, or by drifting away from related languages. "It allows us to determine where genetic and linguistic genealogies correspond and where they don't," says Barbieri. Where they don't correspond, something disruptive must have happened to break the connection between gene pool and language, such as a war or a new political power structure.
Hungary is a good example of this. Genetically, Hungarians are just like other western Europeans. "Their genetic profile is more or less identical with the German, Swiss and Czech profile," says Chiara Barbieri. Yet the Hungarian language is entirely different to the languages of neighboring areas. In fact, Hungarian originally came from Siberia. "Such discrepancies to the genetic narrative indicate that a population adopted a new language," explains Balthasar Bickel. "In the case of Hungarian, we also know that this happened for purely military reasons." Hungarian was the language of a Siberian military elite which assumed power in the region of modern-day Hungary. The language of the rulers was then taken up by the natives, just as the Gauls once had to learn Latin.
Unknown primal language
With their research into the genealogy of language, Bickel, Barbieri and Weibel are working towards realizing a long-held ambition – to write the genealogy of the world's languages, with all the detours and deviations that occurred along the way. It is an ambition shared by linguists as far back as the 19th century, including the Indo-Europeanist August Schleicher (1821–1868). He compared different Indo-European languages in an attempt to reconstruct their mutual origin in a primal language of the distant past.
Schleicher understood linguistics as a science and based his work on the idea that languages – similar to biological species – develop in accordance with the laws of evolution. To explain the relationships between the various Indo-European languages – which include German and English, but also Persian – Schleicher used family trees that he created from written sources of the different languages. Schleicher's language trees were rooted in an ancient and unknown primal language. From this base they grew and branched out over the centuries, right through to the youngest products of linguistic history, the languages of his time.
The researchers of Evolving Language are in essence continuing the work carried out by scholars such as August Schleicher. Unlike their predecessors, however, they have highly technical means and methods to help them find answers to the many questions surrounding linguistic evolution and to fill in the gaps in our knowledge. Chiara Barbieri and Balthasar Bickel are comparing words with the same statistical methods as Barbieri uses to analyze the gene sequences of specific population groups. "We have long lists – just as you would for genetic analysis – with hundreds of positions for words such as Vogel, bird, oiseau, ..." explains Bickel. "We use them to demonstrate which words were originally related and which weren't."
These statistical calculations enable the team to establish a genealogical tree, akin to genetics. Thus, the researchers are working their way from the outermost boughs to the main branches and on towards the trunk of the language tree. Their aim is to penetrate right down to the roots in the Neolithic Revolution. That is where Bickel places the boundary of how far even the most advanced scientific methods can take them towards understanding the proliferation of languages.
Prehistoric language fossils
Bickel's boundary of linguistic exploration can be placed around 10,000 years ago. Yet occasionally, linguists also come across "prehistoric language fossils" that are significantly older. One such fossil was discovered by the researchers in the Himalaya region. The inaccessible valleys of the high mountain ranges represent an island amid the rivers of language migration that characterize the history of Asia and Europe. These movements blurred the original linguistic diversity and replaced it with larger-scale language families.
But in the Himalayas, you can still find "very old language structures," states Bickel. These include striking grammatical constructions in Burushaski (northern Pakistan) and in Limbu (Nepal, Bhutan, India) to indicate possession of something. "We found possessive constructs that are dependent on the thing that is possessed. So, for example 'my cup' has a very different grammatical structure to 'my computer' or 'my brother'." Such grammatical differentiations aren't seen anywhere else on the entire Eurasian continent. The linguists did, however, find similar constructs in completely different parts of the world – for example in the languages spoken by the native peoples of America. "That could indicate that this grammatical construction is older than the Neolithic Revolution," says Balthasar Bickel.
It is presumed that America was populated around 18,000 years ago, via the then frozen Bering Strait. If this theory is correct, the possessive constructs indicate what languages sounded like before the boundary line of the Neolithic Revolution. "That would be truly fantastic," enthuses Balthasar Bickel. "That would mean that we are catching hold of the tail of something that may have been around for as long as humankind has existed." Bickel is referring to the tail of the original languages that were carried across the entire globe by the hunter-gatherers. And that really would be fantastic.