Vienna-Oxford
International Corpus of English
home home

Frequently Asked Questions

What does VOICE stand for?

VOICE stands for Vienna-Oxford International Corpus of English. ‘Oxford’ is a constituent of the name VOICE because the Oxford University Press supported the VOICE project financially in its initial phase. ‘Vienna’ points to the location of the corpus compilation at the University of Vienna.

What is a corpus?

“In the language sciences a corpus is a body of written text or transcribed speech which can serve as a basis for linguistic analysis and description.” (p.1)
Kennedy, Graeme. 1998. An introduction to corpus linguistics. London: Longman.

What is English as a lingua franca (ELF)?

English as a lingua franca (ELF) constitutes an additionally acquired language system which serves as a common means of communication for speakers of different first languages.
ELF is currently the most common use of English world-wide. Millions of speakers from diverse cultural and linguistic backgrounds use ELF on a daily basis, routinely and successfully, in their professional, academic and personal lives.

Why does VOICE focus on spoken data?

Spoken interactions are immediate and at a remove from the stabilizing and standardizing influence of writing. They are overtly reciprocal and reveal the online negotiation of meaning in the production and reception of utterances, thus facilitating observations regarding mutual intelligibility among interlocutors.

How big is VOICE?

The current size of VOICE 1.1 Online is just over 1 million words of spoken ELF, equalling 110 hours and 35 minutes of recorded and transcribed interactions.

Which first languages are represented in VOICE?

Since the focus of VOICE at this stage is primarily, but not exclusively, on Europe, all major first languages spoken across Europe are represented in the corpus. In sum, VOICE currently encompasses approximately 50 different, also non-European, first languages.

Does VOICE/ELF include native speakers of English?

“English […] used as a ‘lingua franca’ [is] a ‘contact language’ between persons who share neither a common native tongue nor a common (national) culture, and for whom English is the chosen foreign language of communication.” (p.240)
Firth, Alan. 1996. "The discursive accomplishment of normality: on 'lingua franca' English and conversation analysis". Journal of Pragmatics 26, 237-259.

While Firth’s definition could be said to capture ELF in its purest form, it has to be remembered that ELF interactions often also include speakers from backgrounds where English is used as a first or second language. The VOICE project therefore works with a broader definition of ELF which includes English native speakers as well. Nevertheless, so-called non-native speakers of English commonly outnumber English native speakers in ELF interactions, a fact also represented in VOICE. Currently, speakers who have English as a first language make up less than 10 per cent of all speakers recorded in VOICE.

Does VOICE Online include audio files?

As of 24 November 2010, 23 recordings of transcribed speech events can also be listened to. The anonymized audio material is freely accessible from within the VOICE Online interface after a free registration for the VOICE Online services. The audio material covers approximately 22 hours of field-recordings, which equals about 20% of the entire corpus. We trust that this new feature will further increase the value of VOICE for research. For detailed information on using the new audio features, please refer to the subsection audio files in Using VOICE Online.

Is VOICE available for download?

As of 5 May 2011, VOICE XML is available for download. VOICE XML is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License and includes all corpus texts in XML format as well as derived HTML and TXT versions of the corpus with reduced mark-up. For more information on VOICE XML see Availability.