Vienna-Oxford
International Corpus of English

Using VOICE-Online

2 Corpus Tree

The corpus tree is displayed in the top left-hand corner of the application area and offers overall information about VOICE, its structure as well as the 151 individual texts in VOICE.

2.1 Corpus Header

The small icon next to the word VOICE represents the corpus header. When you click on this icon, detailed information about the VOICE project, the corpus design, sampling principles, the transcription of data, and speakers is displayed in the content area.

2.2 Corpus structure

The domain structure of VOICE is displayed when you click on the word VOICE.
Screenshot of the domain structure
Domain structure of VOICE
The two-letter acronyms subsequently listed beneath represent the five domains found in VOICE. ED stands for educational, LE for leisure, PB for professional business, PO for professional organization, and PR for professional research and science. Definitions for each of these five domains are provided in the corpus header and are also displayed when the cursor of the mouse hovers over the acronyms in the corpus tree. The corpus tree is hidden when clicking on the word VOICE again.

The individual corpus texts in the five domains are listed when you click on the respective domains in the corpus tree. The list of texts disappears when you click on the domains again.

2.3 Corpus texts

Each corpus text has a specific event ID (e.g. PRcon29).
Screenshot of Corpus Tree
Corpus tree with list of event IDs
The event ID specifies the domain the text is part of (two capital letters, e.g. PR), the speech event type it represents (three lower case letters, e.g. con), and combines this with a number that the text was assigned to in the VOICE database (e.g. 29). The combination of these specifications uniquely identifies the individual text in the corpus (e.g. PRcon29).

Individual corpus texts are part of one of five different domains. Definitions for the five domains (ED, LE, PB, PO, and PR) are provided when the cursor of the mouse hovers over the specific event IDs, and can also be found in the corpus header.

Individual corpus texts are assigned to one of 10 different speech event types. These are abbreviated as follows: con (conversation), int (interview), mtg (meeting), pan (panel), prc (press conference), qas (question-answer session), sed (seminar discussion), sve (service encounter), wgd (working group discussion), and wsd (workshop discussion). Definitions of the 10 speech event types can be found in the corpus header.

The individual texts are listed in alphabetical and then numerical order within each domain in the corpus tree.

2.4 Text Header

The small icon next to the event ID represents the individual text header. When clicking on this icon, detailed contextual information about the speech event is displayed in the content area. This information includes the title of the speech event, recording details, text classification, setting, creation history, speaker details, number of words, as well as a short prose description of the speech event. The speaker details provided include first language(s), age, gender, the number of speakers and interactants, as well as a specification of power relations and acquaintedness. Definitions of most of the text header categories can be found in the corpus header.

The text header can only be viewed when VOICE style is selected as output style at the bottom of the content area.

2.5 Complete texts

The second small icon next to the text header icon represents the individual texts as such. When you click on this icon, the complete text is displayed in the content area. Complete texts can be viewed in VOICE style and in plain style. The drop-down menu at the bottom of the content area can be used to change the output style.