The Berlin Shona Novel Corpus is one outcome of the research project "Changing Patterns in the Shona Novel" conducted at Humboldt University Berlin between 2013 and 2016, and funded by the DFG (Deutsche Forschungsgemeinschaft – German Research Council). It consists of annotated extracts from three Shona novels from Zimbabwe: Pfumo reropa by Patrick Chakaipa of 1961, Ndiko kupindana kwamazuva by Charles Mungoshi of 1975 and Mapenzi by Ignatius Mabasa of 1999. Approximately 40 percent of the total amount of these texts was fed into the software “The Field Linguist's Toolbox”, a semi-automatic tool used for morphological analysis (provided free of charge by SIL). The software breaks the text into morphemes, to which glosses and part of speech tags are assigned.
The principal investigator of the project was Flora Veit-Wild, professor emerita of African Literatures and Cultures at Humboldt University. The research team members were Katja Kellerer, Isabelle Nguyen, Tsitsi Nyoni of Great Zimbabwe University (May 2013 - February 2014), Dr. Aquilina Mawadza (March/April 2014) and Dr. Jacob Mapara of Chinhoyi University of Technology (April 2015 – June 2016). Tom Güldemann, professor of African languages at Humboldt University, acted as linguistic advisor to the project. See this link for a brief description of the project in English and German on the homepage of the Department of African Studies. During a three-month stay in Harare, the research team was joined by two linguists from the University of Zimbabwe, Dr. Francis Matambirofa and Dr. Zvinashe Mamvura. With their help, the parsing of the texts was completed.
The establishment of the linguistic corpus went hand in hand with a literary analysis of the three Shona novels. Preliminary papers showing how the linguistic data could be used from a literary angle were presented at a workshop hosted by Chinhoyi University of Technology on February 23 2016. The workshop was attended by a group of around 20 Zimbabwean experts from linguistics and literary studies who provided critical input into the work done by the Humboldt research team.
The workshop attendants also discussed the possibilities of incorporating the Berlin Shona Novel Corpus into corpus work done at the Universities of Oslo and Zimbabwe within the framework of the ALLEX project. A future project would combine the detailed morphological analysis developed in Berlin with the vast amount of data compiled over many years in the ALLEX project - including newspaper articles and recordings of spoken language - thus potentially resulting in the first major morphologically annotated corpus of Shona.
The Berlin Shona Novel Corpus represents the first attempt to analyse and annotate prominent literary works in Shona with the help of the Field Linguist’s Toolbox. In its present form, the corpus still contains open questions and inconsistencies. However, as a substantial outcome of the Berlin research project, it is made available here for interested researchers.
Annotated texts are available for download, individually as well as in bundles (sorted by author; see the dash tiles at the top of this page). The software can be downloaded here. The primary levels of analysis are:
The secondary levels are
The latter pertain to 'modern' phenomena of code-switching, borrowing and slang and were used only in cases where they were relevant. For this reason, they do not appear in the Pfumo reropa files, but do in the extracts from Mapenzi, and, to some extent, from Ndiko kupindana kwamazuva.
The text ('tx') and free translation ('ft') lines are not part of the parsing process. For more details on morphological segmentation and glossing conventions, see our "Conventions and Settings".
Questions regarding the project or the data should be directed to Flora Veit-Wild at firstname.lastname@example.org or Tom Güldemann at email@example.com.
Work on this project was funded by the Deutsche Forschungsgemeinschaft (DFG).