How to convert TBX Glossary into tab-delimited format

It's very handy to refer to MS Glossary while translating computer or IT related documents. You can download MS Glossary in tbx format from http://www.microsoft.com/Language/en-US/Terminology.aspx.

As a sample, I downloaded a Korean glossary (MicrosoftTermCollection.tbx.) Now I want to convert it into tab-delimited format (which will be imported into Excel.)

There are several ways to convert tbx file into tab-delimited format.

First, here is the way how to convert TBX glossary into tab-delimited text format using ApSIC Xbench.

1. In Xbench, press "Project > New."

2. In the "Project Properties" dialogue, click "Add" and select "TBX/MARTIF Glossary."

3. Click "Next" to select your TBX files, and then "Open."

4. Click "Next" twice, then check your languages (for example, "en-US" (U.S. English) for source and "ko-kr" (Korean) for Target,) and click "OK."

5. Click "OK" to close the "Project Properties" dialogue.

Xbench - Converting TBX glossary into tab-delimited format
Xbench - Converting TBX glossary into tab-delimited format

Now, you are ready to export the imported items into tab-delimited format.

6. In ApSIC Xbench, go to "Tools > Export Items."

7. Select "All items in project," choose "Tabbed Text File" under the "Output" tab, and specify the target file name.

8. Click "OK."

Xbench - Export Items
Xbench - Export Items

You can also convert MS glossary (MicrosoftTermCollection.tbx) into tab-delimited format using RegexBuddy. RegexBuddy is an easy-to-use regular expression tool. Using this application, you can easily convert any types of text-based glossaries into tab-delimited format or other formats such as TMX.

RegexBuddy
RegexBuddy

1. In RegexBuddy, go to "Replace" mode by pressing "Replace."

2. In the 1st box (1,) enter the following regular expression snippet:

<termEntry.*?"definition">(.*?)</descrip>.*?<term id="\d*">(.*?)</term>.*?"ko-kr"><ntig><termGrp><term id="\d*">(.*?)</term>

(This code is valid for only MS Glossary in tbx format. For other glossaries, some changes might need to be done.)

3. In the 2nd box (2,) enter the following:

\2\t\3

(Here, \t indicates a tab.)

4. Copy and paste all text from your tbx file into the 3rd box (3.)

5. Select "Replace > List All Placements" under the Test tab. The output will be displayed in the bottom box (4.)

6. Now, you can copy and paste all text from the bottom box into a text file.

 


Leave a Comment