Design and implementation: Martin Vavřín, Alexandr Rosen
Tools for word-to-word alignment and wordpairs extraction: GIZA++ (Och, F. J. – Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1), 19–51.). Our thanks are due to Ondřej Bojar and David Mareček for advice on the installation. The results of the automatic excerption were not reviewed
We are grateful to Elżbieta Kaczmarska for inciting the development of Treq.
User interface: Martin Vavřín
Data extraction: Pavel Procházka, Martin Vavřín
Design support: Jan Kocek
How to cite Treq:
Help
Are you wondering how to best translate a word? Do you need to come up with a synonym or other suitable expression? Try Treq! Treq is a collection of bilingual dictionaries, built automatically from the InterCorp parallel corpus. The dictionaries are bidirectional, with all languages represented in the corpus on one side and Czech, English or Spanish on the other side.
To use Treq, start by specifying the desired language pair by selecting source language (the language of the query) and target language (the language of the potential equivalents). The query can be entered either as a specific word form, as a lemma (Lemma), as a multiword unit (Multiword) or using regular expressions (Regex). The query can also be made case insensitive (A = a). Depending on the (Restrict to:) parameter, result retrieval can target different text types: the fiction-oriented core texts, specific collections, or the entire corpus. Then enter your query (Query:) and click Search. The query result is a list of all translation candidates of the given word, sorted by decreasing frequency by default. By clicking on a particular candidate, you can browse its occurrences in InterCorp and check the translation contexts. The reported frequency may differ since the corpus query may also find instances where the potential equivalent corresponds to a different word.
Treq is based on texts from InterCorp, release 15. The first step was to align the original and translated texts word-to-word using statistical methods provided by the GIZA++ program (Och–Ney 2003). The aligned word pairs were then sorted and summarized. The results of the automatic excerption were not reviewed; however, the relative frequency of a given pair may serve as an approximate indicator of reliability. The more often an equivalent of a given word occurs in comparison with other equivalents, the more useful it may be.