Lexical portal of Akkadian: fastText

This page introduces a new way of exploring lexical semantics in Akkadian.

Team 1 of the Centre of Excellence in Ancient Near Eastern Empires (ANEE) has created a lexical portal that functions as a graphic semantic dictionary. Via this portal the user can explore semantic networks for one (or multiple) words that one is interested in. By following the links, one can also trace attestations back to the dataset in Korp and from there to Oracc. This page gives a very short introduction to the portal and the research behind it. For more information on the methods, data and processes behind it, see our published works (listed below, with links to electronic offprints and datasets).

How to cite this portal: Heidi Jauhiainen, Aleksi Sahala, Tero Alstola, Sam Hardwick, Tommi Jauhiainen, Krister Lindén and Saana Svärd “ANEE lexical portal of Akkadian: fastText.” URN http://urn.fi/urn:nbn:fi:lb-2021060104

For metadata on the “ANEE lexical portal of Akkadian: fastText,” please see http://urn.fi/urn:nbn:fi:lb-2021060103.

Valuable help and feedback was offered by the other members of ANEE Team 1: Johannes Bach, Eleanor Bennett, Céline Debourse; Kaisa Autere, Evelien Vanderstraeten, Julia Giessler, Mikko Luukko, Sebastian Fink, Gina Konstantopoulos and Shana Zaia.

What is “ANEE lexical portal of Akkadian: fastText” and how was it created?

The approach we adapted has its roots in the classic work of Jost Trier and the Saussurean distinction that is commonly made between syntagmatic and paradigmatic relations in the meaning of words. Semantically, there is a paradigmatic connection between words that belong to the same general category. For example, in English, the concept “chair” belongs to the semantic domain “furniture,” together with “tables” and “beds.” At the same time, there is a syntagmatic semantic connection between words that co-occur frequently (e.g., “pitch black”). For example, the word “chair” appears in many different contexts, with differing connotations. In the domain HOME the word “chair” could associate syntagmatically with words like “comfort” and “family,” as for example in the sentence: “A comfortable chair is important for the whole family.” At the same time, within the domain COMMERCE, “chair” can associate with “money,” “discount” or “store.” Therefore, in addition to “chair” having paradigmatic connections to “table” and “bed,” it belongs to a multitude of syntagmatic semantic categories.

We have used the methods of language technology to trace paradigmatic and syntagmatic relationships in a large corpus of the Akkadian language. The method called Pointwise Mutual Information (PMI) is able to capture the nuances of syntagmatic relations and fastText can be used to explore paradigmatic relations. PMI detects words that co-occur frequently in the dataset. To continue with the simple example sentence: “A comfortable chair is important for the whole family,” PMI can calculate co-occurrence probabilities for words that occur close to “chair” (eg “comfortable” or “family”). These probabilities attest to syntagmatic relationships between lexemes. On the other hand, fastText can be used to find words which appear in similar semantic contexts. “Chair,” “bed,” and “table” are examples of words which are not necessarily used together or close to each other but are likely to appear in similar contexts. Such relations can be described as paradigmatic relations. The visualization of lexemes and their relationships as networks have proved to be the most fruitful approach to analyzing semantic domains created by our methods.

This short introduction is naturally a very much abbreviated and simplified description of our work, but we hope that the portals will provide tools for scholars to reflect on the semantic domains of the words. The research is ongoing and the lexical portals presented here are by no means the final result. We hope to get feedback from colleagues regarding the portals -- please don’t hesitate to get in touch with team leader Saana Svärd.

Thus, if you are interested in paradigmatic relationships between words, we recommend using the lexical portal on this page, which was created with fastText, The graph is also viewable in English. The English version is identical to the Akkadian version – we have simply replaced the Akkadian terms with their English equivalents.

Links to two networks of paradigmatic relationships, created with fastText

Akkadian version
English version
Help Page

A short note on data

The data used for the graphs has been downloaded as JSON files from Open Richly Annotated Cuneiform Corpus (Oracc) in February 2019. For the analysis we used a dataset consisting of 7,346 texts that have in Oracc been tagged as having been written in “Akkadian.” These texts were written primarily in the Neo-Assyrian period (c. 930–612 BCE) in both Assyria and Babylonia, but earlier and later texts are also included. The texts belong to several genres, with royal inscriptions being the most prominent one in terms of word count.

We standardized the spellings of divine and place names and removed duplicate texts following the procedure explained in Alstola et al. (2019). We only used dictionary forms, as defined in Oracc (following Concise Dictionary of Akkadian), of content words—nouns, verbs, and adjectives—while all the other words have been replaced with an underline character as a placeholder. Since neither the cuneiform script nor the Oracc metadata indicates sentence endings, the text of each document is handled as one continuous line of text.

From all the lexemes in our dataset, we chose all those that appear at least 5 times. We then produced with PMI and fastText lists of semantically most similar words to each of these 4930 lexemes. These lists were then visualized with Gephi.

Please note that the lexical portal is diachronically “flat.” In order to have as much data as possible (which usually provides more accurate results for language technological methods) we have used all texts labeled “Akkadian” in Oracc. If you are interested in a particular set of data, we recommend using the Korp interface (Jauhiainen et al 2019) or downloading the full dataset on which these graphs are based and creating more specific networks from that data (Jauhiainen et al 2021).

Annotated bibliography

We have used these approaches in several articles. Selected articles are listed below, most with links to full-text articles.

Tero Alstola, Heidi Jauhiainen, Saana Svärd, Aleksi Sahala, and Krister Lindén: "Digital Approaches to Analyzing and Translating Emotion" Routledge Handbook of Emotions in the Ancient Near East (eds. Karen Sonik & Ulrike Steinert). In peer review.

Saana Svärd, Tero Alstola, Heidi Jauhiainen, Aleksi Sahala, and Krister Lindén. 2020. “Fear in Akkadian Texts: New Digital Perspectives on Lexical Semantics.” In Shih-Wei Hsu and Jaume Llop-Raduà (eds.) The Expression of Emotions in Ancient Egypt and Mesopotamia. Culture and History of the Ancient Near East. Leiden: Brill, pp. 470-502.
DOI: https://doi.org/10.1163/9789004430761_019
The most recent published article. It contains detailed information on how lexical networks can be created. Freely available behind DOI.

Tero Alstola, Shana Zaia, Aleksi Sahala, Heidi Jauhiainen, Saana Svärd, and Krister Lindén. 2019. “Aššur and his Friends: A Statistical Analysis of Neo-Assyrian Texts.” Journal of Cuneiform Studies 71: 159–80. Downloadable from here.

Saana Svärd, Heidi Jauhiainen, Aleksi Sahala, Krister Lindén 2018 "Semantic Domains in Akkadian Texts". In Vanessa Juloux, Amy Gansell, & Alessandro di Ludovico, (eds.) CyberResearch on the Ancient Near East and Neighboring Regions: Case Studies on Archaeological Data, Objects, Texts, and Digital Archiving. Digital Biblical Studies 2. Brill: Leiden, pp 224-256.
DOI: https://doi.org/10.1163/9789004375086_009
Proof-of-concept article. The first article where our language technological methods wer tested with Akkadian texts. Freely available behind DOI.

Bojanowski, Pjotr, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. “Enriching Word Vectors with Subword Information.” Transactions of the Association for Computational Linguistics 5: 135–46. https://doi.org/10.1162/tacl_a_00051

Datasets and scripts

Heidi Jauhiainen, Aleksi Sahala, Tero Alstola, Saana Svärd, & Krister Lindén. 2021. ANEE Lexical Portal - the dataset [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4646662.
DOI leads to the Zenodo-repository where the full dataset for Lexical portal can be found (both PMI and fastText graphs).

Heidi Jauhiainen, Aleksi Sahala & Tero Alstola 2019: Open Richly Annotated Cuneiform Corpus, Korp Version, May 2019 [text corpus]. Kielipankki. Retrieved from http://urn.fi/urn:nbn:fi:lb-2019060601
Open Richly Annotated Cuneiform Corpus (Oracc) brings together the work of several Assyriological projects to publish online editions of cuneiform texts. The Korp version of Oracc allows extensive searches on the texts and presents the results as a KWIC concordance list. Korp also offers statistical information and comparison of the search results. Downloading the query results is possible as well. On how to use it, follow the URN to metadata and see “ORACC in Korp User Guide” (under Heading “Documentation”). The URN for the actual Oracc in Korp data.

“JavaScript GEXF Viewer for Gephi.” Found at https://github.com/raphv/gexf-js.
This code (needed to display the networks in the portal page) has been modified and improved by Sam Hardwick.

Acknowledgements to institutions and projects

Funders: Academy of Finland, University of Helsinki

Data and research infrastructures: FIN-CLARIN, Language Bank of Finland, all projects included in Oracc in Feb 2019 (ADsD: Astronomical Diaries Digital; ARIo: Achaemenid Royal Inscriptions online; blms: Bilinguals in Late Mesopotamian Scholarship; CASPo: Corpus of Akkadian Shuila-Prayers online; CAMS: Corpus of Ancient Mesopotamian Scholarship; CTIJ: Cuneiform Texts Mentioning Israelites, Judeans, and Other Related Groups; DCCLT: Digital Corpus of Cuneiform Lexical Texts; DCCMT: Digital Corpus of Cuneiform Mathematical Texts; eCUT: Electronic Corpus of Urartian Texts; ETCSRI: Electronic Text Corpus of Sumerian Royal Inscriptions; HBTIN: Hellenistic Babylonia: Texts, Iconography, Names; OBMC: Old Babylonian Model Contracts; RIAo: Royal Inscriptions of Assyria online; RIBo: Royal Inscriptions of Babylonia online; Rīm-Anum: The House of Prisoners; RINAP: Royal Inscriptions of the Neo-Assyrian Period; SAAo: State Archives of Assyria Online; Idrimi: Statue of Idrimi; akklove: Akkadian Love Literature; Contributions Amarna; CKST: Corpus of Kassite Sumerian Texts; Glass: Corpus of Glass Technological Texts; LaOCOST: Law and Order: Cuneiform Online Sustainable Tool; OBTA: Old Babylonian Tabular Accounts; Suhu: The Inscriptions of Suhu online)