Resources: New Website for the Corpus of Spoken Israeli Hebrew (CoSIH)

The site of The Corpus of Spoken Israeli Hebrew (CoSIH) has changed its location. The new address is http://cosih.com (Hebrew main page) or click here for English: http://cosih.com/english/index.html

Plans for The Corpus of Spoken Israeli Hebrew (CoSIH) started to take shape in 1998. CoSIH aimed at compiling a large database of recordings of spoken Israeli Hebrew in order to facilitate research in a range of disciplines. A corpus is a preliminary desideratum for larger projects that cannot otherwise be accomplished. The research potential of such a corpus is extremely large, including, inter alia, applications in the following areas: general and theoretical linguistics, Hebrew language and linguistics, applied linguistics, language engineering, education, and cultural and sociological studies.

CoSIH was designed with the intention to include a representative sample of both demographically and contextually defined varieties. The model according to which CoSIH would be compiled was to consist of a thousand sets of recordings (“cells”) with 5000 words each, i.e., a corpus of five million words. We have taken a culture-dependent approach for the compilation of CoSIH. CoSIH aspires to bridge between the infinite number of varieties used by the Israeli Hebrew speech community and their representation in the corpus, by characterizing their diversity in both demographic and contextual terms. CoSIH seems to be a first and singular attempt to establish a representative corpus using the axes of both demographic and contextual variables, based on statistical and analytic criteria.

The selection of informants for the recordings of CoSIH would be made by a random sample of the Israeli population, in order to reflect the social structure of the Israeli Hebrew speech community. The segmentation of the corpus for analytic purposes would be done using well-defined criteria, notwithstanding the fact that all sociolinguistic data of the recorded informants will be made available for CoSIH’s endusers. The working hypothesis of CoSIH is based on demographic criteria that seem to be most significant for the representation of the linguistic diversity in Israel: (1) place of birth, familial land of origin, ethnic group or religion; (2) age; (3) education; and (4) sex.1

For the analysis of the contextual variables for each discourse, CoSIH’s working hypothesis is based on five variables. There are three primary variables: interpersonal relationships, discourse structure and discourse topic; and two secondary variables: number of participants and medium (i.e. face-to-face conversation and telephone conversation).

A comprehensive study of the demographic and circumstantial variables in Hebrew discourse in Israel remains a desideratum. Therefore, in order to design a proper model for CoSIH, the setting of the corpus would be done in phases, during which a research program would be taken in order to verifty the wortking hypothesis suggested above.

This model was first published online, in both Hebrew and English. The English version eventually found its place in Hary & Izre’el 2003. A more sophisticated model has been published in English in Izre’el, Hary & Rahav 2001.

CoSIH was initiated, designed and operated by a team of Israeli and international scholars:

Core team: Shlomo Izre’el, Tel-Aviv University (director); Benjamin Hary, Emory University (principal investigator); John Du Bois, University of California at Santa arbara (corpus analyst); Mira Ariel, Tel-Aviv University (discourse analysis and pragmatics); Giora Rahav, Tel-Aviv University (statistics and sociology). Esther Borochovsky-Bar Aba, Tel Aviv University (syntax) joined the team at a later stage.

Advisory board: Eliezer Ben-Rafael, Tel Aviv University (sociolinguistics – sociological aspects); Yaakov Bentolila, Ben Gurion University (sociolinguistics – linguistic aspects); Otto Jastrow, Universität Erlangen-Nürnberg (transcription, phonology, dialectology); Shmuel Bolozky, University of Massachusetts at Amherst (phonology, morphology); Geoffrey Khan, Cambridge University (syntax); Elana Shohamy, Tel Aviv University (language education).

The Present State of CoSIH

As of 2012, this ambitious project still awaits its realization. The limited financial support that was at our disposal enabled us to compile two sets of recordings, the first of which was made during the initial preparatory phase, while the second was done as a pilot study. The initial preparatory phase produced 11 recordings spanning at least 6 hours each, with some being much longer. Although we initially designed a pilot of 20 sets of 3-hour recordings, we have eventually ended up with 42 sets, each including between 8 to 16 hours of uninterrupted recording of everyday speech. Taken together, we now possess 6 to 18 hour recordings by 53 volunteers, which we believe to be a reasonable source of data for the study of Spoken Hebrew. The recordings, which were all made between August 2000 and October 2002, are all real life conversations of CoSIH’s informants. As such, they naturally include both the speech of the volunteers who recorded them and their interlocutors.

 

New Article: Gonen et al, The Discourse Marker axshav (‘now’) in Spontaneous Spoken Hebrew

Gonen, Einat, Zohar Livnat, and Noam Amir. “The Discourse Marker axshav (‘now’) in Spontaneous Spoken Hebrew: Discursive and Prosodic Features.” Journal of Pragmatics 89 (2015): 69-84.

 
URL: http://dx.doi.org/10.1016/j.pragma.2015.09.005
 
Abstract

This study describes the discursive characteristics of the discourse marker axshav (‘now’) in spoken Hebrew and explores its prosodic features using instrumental methods. This is the first attempt to use acoustical analysis to examine the prosodic aspects of discourse markers in Hebrew.

The corpus includes more than 5 h of everyday Israeli Hebrew conversations, in which 106 occurrences of the word axshav were found. More than one-third of these occurrences were identified as DMs, while the others are temporal adverbials.

The main discursive functions of the DMs identified were segmentation; accentuation of the importance of certain pieces of information, sometimes by means of comparisons and contrasts; and holding the floor.
The acoustical analysis of the performances of axshav in both functions showed that most DMs have characteristic intonation contour, including a sharp decrease in the frequency inside the second syllable. An examination of the average duration of the performance of axshav as a DM as compared to its performance as a temporal adverbial found a significant statistical difference, showing that the duration of the performance of axshav as a DM was shorter, both for the performance of the first syllable as well as the overall duration of the word. These findings seem to strengthen the hypothesis that prosodic data play a role in deciphering the function of axshav as a DM.