The Translate Toolkit and Pootle
setup on OpenBSD 3.8
Contents
- Introduction
- Pootle overview
- Packages to be installed
- Installation Notes - READ THIS BEFORE INSTALLING
- General Notes
- Package Notes
- python
- py-subversion
- python-Levenshtein
- Kid
- libtidy
- pysqlite2
- lxml
- libxml2
- libxslt
- Installation Testing
- Post Installation
- Using Pootle
- Starting
- Gotchas
- the Web Interface
- Translation
- Definitions
- The translation process
- Translating a website
- Pootle Help
- Translation Toolkit Tools
Introduction
This is based on our experiences using Pootle as a platform for the translation of the Kondo Syokai websites into Japanese.
No matter if some of this might be critical a big thank you is due to the developers of Pootle whose generosity in making Pootle available is appreciated.
Pootle overview
From the package README Pootle is "a web translation and translation management engine". But what does it do?
Pootle is a web based translation tool. It brings the power of the internet to translation tasks by providing a simple platform for translation, perfect for the ad-lib, unstructured translation of free software. There are many Pootle projects - each one handling a translation task. However the Pootle web interface is only the front end to a back end office which prepares and manages the process - probably using the Translate Toolkit.
Packages to be installed
The following list of packages is derived from the README files, files in doc directories and post install notes in the downloaded packages and also possibly other sources
Dependancies are shown by indenting the package name. These packages must be installed in reverse order i.e. starting at the bottom of the list
| PACKAGE | VERSION | SUMMARY, WEBSITE and DOWNLOAD SITE |
| Pootle | 1.1.0 | a web translation and translation management engine |
| Translate Toolkit and Pootle | ||
| download | ||
| --Translate Toolkit | 1.1.1 | a set of software and documentation designed to help make the lives of localizers both more productive and less frustrating |
| Translate Toolkit and Pootle | ||
| download | ||
| ----lxml | 2.0.5 | the most feature-rich and easy-to-use library for working with XML and HTML in the Python language |
| code speak | ||
| download | ||
| ------libxslt | 1.1.15 | XSLT support for libxml2 |
| xmlsoft.org - The XSLT C library for GNOME | ||
| download | ||
| ------libxml2 | 2.6.32 | XML toolkit from the GNOME project |
| xmlsoft.org - The XML C parser and toolkit of Gnome | ||
| download | ||
| ----pysqlite2 | 2.4.1 | an interface to the SQLite 3.x embedded relational database engine |
| the pysqlite wiki | ||
| download | ||
| ------sqlite | 3.5.8 | a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine |
| sqlite | ||
| download | ||
| ----psyco | 1.6 | a Python extension module which can massively speed up the execution of any Python code |
| Psyco | ||
| download | ||
| ----uTidyLib | 0.2 | the Python wrapper for the HTML cleaning library named TidyLib |
| uTidylib | ||
| download | ||
| ----iniparse | 0.2.3 | a INI parser for Python |
| Google code - iniparse | ||
| download | ||
| ------ctypes | 1.0.2 | a ffi (Foreign Function Interface) package for Python |
| the ctypes package | ||
| download | ||
| ------tidy | 050705 | Tidy is a tool that automatically fixes errors in your HTML, validates it and pretty prints it |
| OpenBSD Package Information for tidy-050705.tgz (i386) | ||
| download | ||
| ----python-Levenshtein | 0.10.1 | Levenshtein Python extension and C library |
| home page is down - Pootl site has some information | ||
| download | ||
| --jToolkit | 0.7.8 | a Python web application framework built on modpython and Apache |
| jToolkit | ||
| download | ||
| --ZIP | 2.3p0 | a compression and file packaging utility |
| OpenBSD Package Information for zip-2.3p0.tgz (i386) | ||
| download | ||
| --kid | 0.9.6 | a simple Python based template language for XML vocabularies |
| kid-templating.org | ||
| download | ||
| --elementtree | 1.2.6 | a light-weight toolkit for XML processing in Python |
| ElementTree | ||
| download | ||
| ----python-expat | 2.3.5 | support for the expat XML parser |
| OpenBSD Package Information for python-expat-2.3.5p2.tgz (i386) | ||
| download | ||
| --python | 2.4 | an interpreted, interactive, object-oriented programming language |
| OpenBSD Package Information for python-2.4.1p0.tgz (i386) | ||
| download | ||
| OPTIONAL | ||
| Subversion | 1.2.1 | a free/open-source version control system |
| OpenBSD Package Information for subversion-1.2.1.tgz (i386) | ||
| download | ||
| --py-subversion | 1.2.1 | a set of bindings for the Python scripting language to Subversion |
| OpenBSD Package Information for py-subversion-1.2.1.tgz (i386) | ||
| download | ||
| --p5-SVN | 1.2.1 | a set of bindings for the perl scripting language to Subversion |
| OpenBSD Package Information for p5-SVN-1.2.1.tgz (i386) | ||
| download | ||
| PyLucene | before 2.0 | a Python extension for accessing Java Lucene |
| Apache | Proxying Pootle traffic | |
Installation Notes - READ THIS BEFORE INSTALLING
General Notes
Installation was done on a clean install of OpenBSD 3.8
Disk Usage
| clean install | 364M |
| ports tree (which goes into /usr) | 121M |
| After Pootle Install: | 919M |
Install the OpenBSD packages first using pkg_add
Unpack the other packages into a writable directory then cd into each in the order noted above and install
extract bz2 files as follows:
$ bzip2 -dc package-name.tar.bz2 | tar -xvf -
extract gz files as follows:
$ gzip -dc package-name.tar.gz | tar -xvf -
get the latest ez_setup tools for installing python packages:
wget http://peak.telecommunity.com/dist/ez_setup.py
then run ez_setup.py - you need to be root and in a directory which is writable
python packages are installed using:
# python package-name.py install
for the others it is the usual
# ./configure # make # make install
For binary packages like rpm, deb and others, libraries (which are needed to run the program) are separated from the header files. However the header files are needed to compile a program that uses the library. The header files are usually in an include directory. If compilation fails saying can't find header files check first the version of the library that the program depends on and that the one you have is recent enough
Package Notes
python
after installation do:
# ln -s /usr/local/bin/python2.4 /usr/local/bin/python # ln -s /usr/local/bin/pydoc2.4 /usr/local/bin/pydoc
py-subversion
the OBSD package is built against python 2.3.5 and this will be automatically installed also
python-Levenshtein
The python-Levenshtein project seems to have disappeared from the net, Pootle added it to the SourceForge project for download for the benefit of Pootle users
Kid
Kid 0.9.x depends on ez_setup tools
libtidy
The libtidy homepage is at HTML Tidy Library Project
However I couldn't find a package and so used the OBSD package which appears to be the same
pysqlite2
You must uncomment the paths to the headers and libraries in setup.cfg
lxml
lxml install.txt states the libxml 2.6.20 or later is required. However v. 2.6.20 does not have the schematron header files which are required for compilation. A later version is therefore required.
lxml install.txt states do not use libxslt 1.1.27 if you want to use xpath
libxml2
xsltproc required for some tests during build
configure cannot find Python.h - you must add the path (/usr/local/include/python2.4) to configure
libxslt
depends on libxml2
configure cannot find Python.h - you must add the path (/usr/local/include/python2.4) to configure
Installation Testing
Post Installation
Put the Pootle config files in a standard location, e.g.:
# ln /usr/local/lib/python2.4/site-packages/Pootle/pootle.prefs /etc/Pootle/ # ln /usr/local/lib/python2.4/site-packages/Pootle/users.prefs /etc/Pootle/
Using Pootle
Starting
# PootleServer
to run in background
# PootleServer -B < /dev/null >> /var/log/pootle 2>&1
Pootle is installed in
/usr/local/lib/python2.4/site-packages/Pootle
Within this there is a subdirectory "po" in which all projects are stored
Gotchas
The admin must add himself to projects before he can edit them - you don't get project rights just because you are the admin
The following cannot be done through the web interface:
- removing files
- deleting languages from a project
Each project has one subdirectory per language translation. You can use subdirectories also but with one important caveat - do not use the "Update from template" button since this will recursively go through all directories and put po files in the project base directory - and if files have the same name they will get overwritten one after the other. The "Update from templates" function is useful only if you have a project in a single directory and want to upload pot files from the web interface.
the Web Interface
This deserves it's own section since navigating the project is so confusing it amounts to a gotcha.
You would think that there would be one page for project administration - but there are several and 2 completely different ways of getting there:
1. Admin>Projects This is where you - add projects - delete projects 2. Admin>Projects>Project Code This is where you: - set languages for the project - update the project from templates (pot files) 3. My Account>Project (Administrate) This is where you - add users to the project - remove users from the project - set user permissions for the project
There are 2 ways to get to a project for normal use:
1. All Projects>Project This gets you to the project languages page 2. All Languages>Language>project This gets you to the project files page
You can also get to the above pages from:
My Account>Project My Account>language
We are all used to having cheat sheets for unix admin - but this is the first time one is required to navigate a website
Translation
Definitions
TM file
translation memory file, a po file
The translation process
Phase 1
A new file for translation
Phase 2
the source file is modified and the translation needs updating
PHASE 1 rc2po -P --charset=utf8 -i en.1.rc -o en.rc.pot pot2po -i en.rc.pot -o ja.1.rc.po poEdit ja.1.rc.po po2rc -t en.1.rc -i ja.1.rc.po -o ja.1.rc PHASE 2 rc2po -P --charset=utf8 -i en.2.rc -o en.rc.pot pot2po -i en.rc.pot -t ja.1.rc.po -o ja.2.rc.po Why use pot files? what is their purpose? why not go directly to po files e.g. rc2po -P --charset=utf8 -i en.1.rc -o ja.1.rc.po because you can only do this the first time, after that it is pot2po -i en.rc.pot -t ja.1.rc.po -o ja.2.rc.po
Translating a website
you can convert a bunch of html files to pot files using:
# html2po -P site pot
where
- site is the directory containing the html files
- pot is the destination directory - likely .../Pootle/po/project-code
This will create a copy of the directory structure of the source tree. However do NOT use:
admin > projects > project code > update from templates
to create the po files. This will put all po files in the base directory sequentially overwriting files with the same name.
Once the files have been translated you can get them back to html using
# po2html -t site xh site-xh
where
- site is the directory of the original files which provides the template for conversion
- xh is where the pot (now po) files were put
- site-xh is where you want the translated files to go
The admin problem of course is to manage the conversion of pot files to po files, not overwriting old po files and managing translation memory in the process. This is key to understanding Pootle - the back office function is essential to the translation project and it is not web based.
HTML pages change a lot and, unlike programs, don't come out in nice regular versions - so using versioning doesn't seem to make sense. We simply need to generate a pot file as and when and get this into Pootle. However we do need to have versions for the translator - otherwise they will not be aware that the page has changed and will wonder what happened to their work. After some time the old po files will need deleting.
po2tmx create kondosyokai.tmx from index.po
html2po -P will give us our pot files, replicating the source tree.
pot2po --tm=kondosyokai.tmx
Pootle Help
Pootle help in the installation
Check the webpages of all the installed packages
Translate Toolkit and Pootle wiki - use the search
For the Pootle and Translation Toolkit tools:
- use the --help option
- use the --manpage option e.g.
$ pot2po --manpage | /usr/bin/nroff -Tutf8 -mandoc | less
Man pages for relevant unix tools:
- gettext
# apropos gettext
Translation Toolkit Tools
From the README file:
Converters
oo2po - convert between OpenOffice.org GSI files and PO
oo2xliff - convert between OpenOffice.org GSI files and XLIFF
moz2po - convert from a Mozilla XPI file and PO. Including unpacking
and building a translated XPI.
csv2po - convert PO format to CSV for editing in a spreadsheet program
php2po - PHP localisable string arrays converter.
ts2po - convert Qt Linguist (.ts) files to PO
txt2po - convert simple text files to PO
html2po - convert HTML to PO (beta)
xliff2po - XLIFF (XML Localisation Interchange File Format) converter
prop2po - convert Java .properties files to PO
po2wordfast - Wordfast Translation Memory converter
po2tmx - TMX (Translation Memory Exchange) converter
pot2po - PO file initialiser
csv2tbx - Create TBX (TermBase eXchange) files from Comma Separated Value (CSV) files
Tools Quality Assurance
pofilter - run any of the 40+ checks on your PO files
pomerge - merge corrected translations from pofilter back into your existing
PO files.
poconflicts - identify conflicting use of terms
porestructure - restructures po files according to poconflict directives
pogrep - find words in PO files
Tools Other
pocompile - create a Gettext MO files from PO or XLIFF files
pocount - count translatable file formats (PO, XLIFF)
podebug - Create comment in your PO files' msgstr which can then be used to quickly
track down mistranslations as the comments appear in the application.
posegment - Break a PO or XLIFF files into sentence segments, useful
for creating a segmented translation memory.
poswap - uses a translation of another language that you would rather use
than English as source language

