Term extraction in MemoQ

MemoQ is a recent love, still full of new surprises and a promising future. For non-translators out there, MemoQ is a computer assisted translation software (CAT tool) developed by a dedicated Hungarian team, holding high ambitions to challenge the status quo of the CAT tool marketplace.

One prominent feature in MemoQ 5 is term extraction. The underlying principle is the same as the Lexicon feature in Deja Vu X, but MemoQ’s solution is better. The software opens a document, or a set of documents, and runs a quick statistical analysis on the text, searching for words and phrases that occur frequently together. These are term candidates.  The user can choose to ignore certain frequently used words such as “of”, “then”, “are”, etc., since are usually not useful when building a term base. MemoQ calls these stop words, and the user can flexibly edit the list of such stop words.

Term extraction is done in a few straighforward steps:

1. You set up a regular translation project in MemoQ, add documents, as well as any resources such as translation memories and existing term bases.  In the Operations menu, you click on Extract Terms (at the bottom of the dropdown menu). A dialog opens where you can set term extraction options such as the maximum number of words to be considered as a term, the minimum number of occurrences, etc.  Default values are quite useful, and the beginner term extractor should be happy with them.

2. You start the process, which should take a few minutes.  The result is a list of term candidates, sorted by relevance, occurrence and number of words.  Translation for any term that is found in an existing term base is automatically filled in. In the lower left corner of the screen you can see a list of translation segments where the term candidate occurs. If you process an already translated bilingual document, both source and target segments will be listed.  As you go through the list of term candidates, you can simply accept the pre-filled target term, or else copy and paste it from the lower left corner, or you may have to add it manually. You can also edit the source term. Once you’re happy with both the source and the target, you accept the term by pressing Ctrl+Enter.

3. Once you’re finished with the entire list, you can export the accepted terms into a new term base. To do this, you have to press a curved arrow icon above the list area on the right edge of the dialog box. In the small dialog box you can specify some meta data that will apply to all terms in the list (client, project, etc.). This will create a term base that you can use for future MemoQ projects.

4. It’s also possible to export your terms into an Excel file. In MemoQ’s Project home area, open Term bases, select the term base you want to export into an Excel file, then click on Export to CSV at the bottom. Another dialog box appears with its list of export options. I played around a bit with the options before I found the right set of options for my purposes.  You select the file name and the path, then you select Export as CSV.  For encoding I selected Windows Latin 2 (1250); other settings created garbled or otherwise incorrect characters for all the “special” Hungarian characters, i.e. characters with diacritical marks. As a delimiter I selected semicolon (the other ones didn’t work out as intended).  In the Fields pane I deselected all but one field: the one I left selected was Term text (with wildcards) – third from the bottom.  At the end you simply click on Export.

Now you have a glossary list in two columns in an Excel file that you may want to sort alphabetically. Together with the term base file in MemoQ, this Excel file serves as a great resource for future translation projects that come up in this specific field. You can also use it as a bargaining chip for winning new clients or keeping existing ones.


About bancsaba

Discovering the world, step by step, word by word
This entry was posted in techniques and tagged , , . Bookmark the permalink.

One Response to Term extraction in MemoQ

  1. Pingback: Term extraction in MemoQ | The translation world | Scoop.it

Comments are closed.