Vienna-Oxford
International Corpus of English

Using VOICE-Online

3 Searching VOICE

3.1 Simple queries

The content area contains an input box, where a search can be submitted by entering a search item, i.e. a word or phrase to be searched in the corpus.

3.2 Display of search results

The VOICE Online interface generally displays the search results as individual utterances, which are listed according to the sequence of events in the corpus tree.

In its default setting, search results are rendered in VOICE style. To view the search results in other styles, see the different output styles.

Every occurrence of the search item is highlighted.

If the corpus structure is fully displayed in the corpus tree, all event IDs of speech events which contain the search item are indicated in bold print.
Screenshot of search results
Search results and corpus tree

3.3 Search statistics

The basic search statistics are displayed in the line below the input box. The search engine counts the individual occurrences of the search item in the entire corpus, the number of utterances containing the search item, and the seconds needed for processing the search. For example, Found 279 in 248u in 0.462s means that there are 279 occurrences of the search item, which appear in 248 utterances, and 0.462 seconds were needed for the search.

3.4 Browsing through the list of search results

When you have submitted a search, the search results are displayed in the content area. For the sake of usability and speed, the search results are divided and are rendered in groups of 25 utterances per page. The letter u and the numbers on the top right corner of the content area show the numbers of the utterances displayed at a time (e.g. u1-25). The list of search results can be browsed through by clicking on the left and right arrows in the blue circles.

3.5 Search options/Types of searches

VOICE Online allows for searches of individual words and multi-word phrases, which can be combined with different wildcard characters.

3.5.1 Wildcard character *

The wildcard character ‘*’ represents zero or more characters. It may be used at any position in the search item, i.e. at the beginning and end, but also in the middle of a search item. For examples see the table below.

3.5.2 Wildcard character ?

The wildcard character ‘?’ represents zero or one character and may be used at any position in the search item, i.e. at the beginning and end, but also in the middle of a search item. For examples see the table below.

3.5.3 Wildcard character +

The wildcard character ‘+’ represents one or more characters and may be used at any position in the search item, i.e. at the beginning and end, but also in the middle of a search item. For examples see the table below.

3.5.4 Examples of searching with wildcard characters

The following table lists some examples of searches with wildcard characters.

Type of searchPossible matches
* zero or more characters
*risee.g. arise, enterprise, rise
hous*e.g. house, houses, housing
house *e.g. house in, house the, house
* house *e.g. the house and, a house in, own house with
? zero or one character
?isee.g. rise, wise
house?e.g. house, houses
house ?e.g. house, house i; NOT: e.g. houses, house the
? house ?e.g. a house, house i
+ one or more characters
+risee.g. enterprise, surprise, sunrise; NOT: e.g. rise
house+e.g. houses, household, housewives
house +e.g. house in, house the, house again; NOT: houses
+ house +e.g. a house in, same house like, lovely house and

Please note that spaces before or after wildcard characters and word characters are meaningful, e.g. see the difference between ‘house+’ and ‘house +’. Different wildcard characters can also be combined in searches (e.g. ‘* hous+ *’).

3.5.5 Additional search information

Parentheses indicating uncertain speech (according to the VOICE Transcription Conventions [2.1]) are generally ignored in searches, i.e. searching for ‘(house)’ is equal to searching for ‘house’.

Searching for mark-up, mark-up features, and contextual information, specified as such in the VOICE Transcription Conventions [2.1] is currently not possible. This includes general tags for unintelligible speech, non-English speech, speaking modes, speaker noises, pronunciation variations and coinages etc, but also anonymized items, pauses, and intonation.

Hyphens at the end of words are not regarded as word characters by the search engine. Searching for ‘twenty-’ will thus produce zero results, although words such as ‘twenty-five’ occur in the corpus. Searches with the wildcard characters ‘+’ or ‘*’, such as ‘twenty+’ or ‘twenty*’ will lead to results including hyphenated words (such as ‘twenty-five’). It is possible to search for hyphenated words if the hyphen occurs within the search item (but not at the end!). Thus seaching for ‘twenty-five’ or ‘twenty-fi*’ does yield hits in the corpus.