The Multimedia Parallel Corpus combines the features of the multimedia corpus and a parallel corpus, and is designed for comparative studies. The corpus consists of two independent zones, differing both in the nature of the material and in the way it is organized.
The Russian MultiPARC provides the opportunity to compare various film, television, radio and theater productions of the same play . Currently, the Russian MultiPARC includes Revizor (The Government Inspector) by Nikolai Gogol, presented in nine different productions, Vishnevyj sad (The Cherry Orchard) by Anton Chekhov in four productions, Djadja Vanja (Uncle Vanya) by Antov Chekhov in five productions, Tri sestry (Three Sisters) by Anton Chekhov in four productions. The Russian MultiPARC provides the opportunity for a comparative study of the same remark uttered by different speakers in the same circumstances. Such studies enable the establishment of the limits of variation in various aspects of spoken speech and its gestural accompaniment, depending on factors related to the performer's personality, the time and style of the production, the director’s intentions, and more.
The technology used to prepare the corpus is quite complex and resembles the preparation of a multilingual parallel corpus of written translations of the same text. The published text of the play serves as the “anchor” text, against which all versions of its performance are compared. The text of the play is divided into fragments, according to which the audio or video recording of the production is fragmented. Each audio or video fragment is then aligned with its written transcript. Search results are presented in the form of clusters, with each cluster containing context from the printed text of the play that includes the requested element, along with corresponding fragments from all productions, accompanied by the respective videos.
The English-Russian MultiPARK includes fragments of TV series and films in English with Russian voice-over translation or dubbing, as well as various productions of plays in both Russian and in English. This allows for the comparison and study of the speech behavior of people from different cultures, speaking different languages, but finding themselves in similar situations.
Each film, both the original and the translation, is cut into small fragments (clips). The English and Russian transcripts of these fragments are also divided into corresponding fragments. Subsequently, two clips (English and Russian) and two transcripts (English and Russian) are aligned with each other. The numbering of clips and text fragments is consistent between the English and Russian versions.
Each text fragment is annotated in accordance with the standards of MURCO and the parallel corpus of the RNC and contains various levels of annotation, including metatextual, morphological (annotated in both the original and the translation), semantic (in the Russian translation), accentological (in the Russian translation), and sociological annotation (information about the original performer and the dubbing performer). Upon user request, two pairs of clixts are given (in English and Russian), in which the video and text series are aligned with each other. This presentation of the material enables comparative studies in the areas of intonation and phonetics, vocabulary and semantics, phraseology, syntax, gesticulation analysis in English-language discourse and comparative gestural studies by comparing the obtained data with the MURCO data. Additionally, this corpus provides examples of a special type of speech activity in Russian: the translation of audiovisual texts, which is considered as an independent type of translation activity.