The Translate Toolkit and Pootle

setup on OpenBSD 3.8

Contents

 

Introduction

This is based on our experiences using Pootle as a platform for the translation of the Kondo Syokai websites into Japanese.

No matter if some of this might be critical a big thank you is due to the developers of Pootle whose generosity in making Pootle available is appreciated.

 

Pootle overview

From the package README Pootle is "a web translation and translation management engine". But what does it do?

Pootle is a web based translation tool. It brings the power of the internet to translation tasks by providing a simple platform for translation, perfect for the ad-lib, unstructured translation of free software. There are many Pootle projects - each one handling a translation task. However the Pootle web interface is only the front end to a back end office which prepares and manages the process - probably using the Translate Toolkit.

 

Packages to be installed

The following list of packages is derived from the README files, files in doc directories and post install notes in the downloaded packages and also possibly other sources

Dependancies are shown by indenting the package name. These packages must be installed in reverse order i.e. starting at the bottom of the list

PACKAGE VERSION SUMMARY, WEBSITE and DOWNLOAD SITE
Pootle 1.1.0 a web translation and translation management engine
Translate Toolkit and Pootle
download
--Translate Toolkit 1.1.1 a set of software and documentation designed to help make the lives of localizers both more productive and less frustrating
Translate Toolkit and Pootle
download
----lxml 2.0.5 the most feature-rich and easy-to-use library for working with XML and HTML in the Python language
code speak
download
------libxslt 1.1.15 XSLT support for libxml2
xmlsoft.org - The XSLT C library for GNOME
download
------libxml2 2.6.32 XML toolkit from the GNOME project
xmlsoft.org - The XML C parser and toolkit of Gnome
download
----pysqlite2 2.4.1 an interface to the SQLite 3.x embedded relational database engine
the pysqlite wiki
download
------sqlite 3.5.8 a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine
sqlite
download
----psyco 1.6 a Python extension module which can massively speed up the execution of any Python code
Psyco
download
----uTidyLib 0.2 the Python wrapper for the HTML cleaning library named TidyLib
uTidylib
download
----iniparse 0.2.3 a INI parser for Python
Google code - iniparse
download
------ctypes 1.0.2 a ffi (Foreign Function Interface) package for Python
the ctypes package
download
------tidy 050705 Tidy is a tool that automatically fixes errors in your HTML, validates it and pretty prints it
OpenBSD Package Information for tidy-050705.tgz (i386)
download
----python-Levenshtein 0.10.1 Levenshtein Python extension and C library
home page is down - Pootl site has some information
download
--jToolkit 0.7.8 a Python web application framework built on modpython and Apache
jToolkit
download
--ZIP 2.3p0 a compression and file packaging utility
OpenBSD Package Information for zip-2.3p0.tgz (i386)
download
--kid 0.9.6 a simple Python based template language for XML vocabularies
kid-templating.org
download
--elementtree 1.2.6 a light-weight toolkit for XML processing in Python
ElementTree
download
----python-expat 2.3.5 support for the expat XML parser
OpenBSD Package Information for python-expat-2.3.5p2.tgz (i386)
download
--python 2.4 an interpreted, interactive, object-oriented programming language
OpenBSD Package Information for python-2.4.1p0.tgz (i386)
download
OPTIONAL
Subversion 1.2.1 a free/open-source version control system
OpenBSD Package Information for subversion-1.2.1.tgz (i386)
download
--py-subversion 1.2.1 a set of bindings for the Python scripting language to Subversion
OpenBSD Package Information for py-subversion-1.2.1.tgz (i386)
download
--p5-SVN 1.2.1 a set of bindings for the perl scripting language to Subversion
OpenBSD Package Information for p5-SVN-1.2.1.tgz (i386)
download
PyLucene before 2.0 a Python extension for accessing Java Lucene
Apache Proxying Pootle traffic

 

Installation Notes - READ THIS BEFORE INSTALLING

 

General Notes

Installation was done on a clean install of OpenBSD 3.8

Disk Usage

clean install 364M
ports tree (which goes into /usr) 121M
After Pootle Install: 919M

Install the OpenBSD packages first using pkg_add

Unpack the other packages into a writable directory then cd into each in the order noted above and install

extract bz2 files as follows:

  $ bzip2 -dc package-name.tar.bz2 | tar -xvf -

extract gz files as follows:

  $ gzip -dc package-name.tar.gz | tar -xvf -

get the latest ez_setup tools for installing python packages:

  wget http://peak.telecommunity.com/dist/ez_setup.py

then run ez_setup.py - you need to be root and in a directory which is writable

python packages are installed using:

  # python package-name.py install

for the others it is the usual

  # ./configure
  # make
  # make install

For binary packages like rpm, deb and others, libraries (which are needed to run the program) are separated from the header files. However the header files are needed to compile a program that uses the library. The header files are usually in an include directory. If compilation fails saying can't find header files check first the version of the library that the program depends on and that the one you have is recent enough

 

Package Notes

 

python

after installation do:

  # ln -s  /usr/local/bin/python2.4  /usr/local/bin/python
  # ln -s  /usr/local/bin/pydoc2.4   /usr/local/bin/pydoc

 

py-subversion

the OBSD package is built against python 2.3.5 and this will be automatically installed also

 

python-Levenshtein

The python-Levenshtein project seems to have disappeared from the net, Pootle added it to the SourceForge project for download for the benefit of Pootle users

 

Kid

Kid 0.9.x depends on ez_setup tools

 

libtidy

The libtidy homepage is at HTML Tidy Library Project

However I couldn't find a package and so used the OBSD package which appears to be the same

 

pysqlite2

You must uncomment the paths to the headers and libraries in setup.cfg

 

lxml

lxml install.txt states the libxml 2.6.20 or later is required. However v. 2.6.20 does not have the schematron header files which are required for compilation. A later version is therefore required.

lxml install.txt states do not use libxslt 1.1.27 if you want to use xpath

 

libxml2

xsltproc required for some tests during build

configure cannot find Python.h - you must add the path (/usr/local/include/python2.4) to configure

 

libxslt

depends on libxml2

configure cannot find Python.h - you must add the path (/usr/local/include/python2.4) to configure

 

Installation Testing

 

Post Installation

Put the Pootle config files in a standard location, e.g.:

  # ln /usr/local/lib/python2.4/site-packages/Pootle/pootle.prefs  /etc/Pootle/
  # ln /usr/local/lib/python2.4/site-packages/Pootle/users.prefs   /etc/Pootle/

 

Using Pootle

 

Starting

  # PootleServer

to run in background

  # PootleServer -B < /dev/null >> /var/log/pootle 2>&1

Pootle is installed in

/usr/local/lib/python2.4/site-packages/Pootle

Within this there is a subdirectory "po" in which all projects are stored

 

Gotchas

The admin must add himself to projects before he can edit them - you don't get project rights just because you are the admin

The following cannot be done through the web interface:

- removing files

- deleting languages from a project

Each project has one subdirectory per language translation. You can use subdirectories also but with one important caveat - do not use the "Update from template" button since this will recursively go through all directories and put po files in the project base directory - and if files have the same name they will get overwritten one after the other. The "Update from templates" function is useful only if you have a project in a single directory and want to upload pot files from the web interface.

 

the Web Interface

This deserves it's own section since navigating the project is so confusing it amounts to a gotcha.

You would think that there would be one page for project administration - but there are several and 2 completely different ways of getting there:

1. Admin>Projects
   This is where you
   - add projects
   - delete projects
2. Admin>Projects>Project Code
   This is where you:
   - set languages for the project
   - update the project from templates (pot files)
3. My Account>Project (Administrate)
   This is where you
   - add users to the project
   - remove users from the project
   - set user permissions for the project

There are 2 ways to get to a project for normal use:

1. All Projects>Project
   This gets you to the project languages page
2. All Languages>Language>project
   This gets you to the project files page

You can also get to the above pages from:

   My Account>Project
   My Account>language

We are all used to having cheat sheets for unix admin - but this is the first time one is required to navigate a website

 

Translation

 

Definitions

TM file

translation memory file, a po file

 

The translation process

Phase 1

A new file for translation

Phase 2

the source file is modified and the translation needs updating

PHASE 1													 
rc2po -P --charset=utf8		-i en.1.rc	-o en.rc.pot						 
pot2po						-i en.rc.pot	-o ja.1.rc.po				 
poEdit								   ja.1.rc.po				  
po2rc				-t en.1.rc			-i ja.1.rc.po				-o ja.1.rc
PHASE 2													 
rc2po -P --charset=utf8		-i en.2.rc	-o en.rc.pot						 
pot2po						-i en.rc.pot	-t ja.1.rc.po		-o ja.2.rc.po	 
 													 
Why use pot files?												
what is their purpose?												
why not go directly to po files e.g.		 							  
 													 
rc2po -P --charset=utf8		-i en.1.rc			-o ja.1.rc.po				 
 														
because you can only do this the first time, after that it is		 		 		 
 													 
pot2po						-i en.rc.pot	-t ja.1.rc.po		-o ja.2.rc.po	 

 

Translating a website

you can convert a bunch of html files to pot files using:

  # html2po -P site pot

where

- site is the directory containing the html files

- pot is the destination directory - likely .../Pootle/po/project-code

This will create a copy of the directory structure of the source tree. However do NOT use:

admin > projects > project code > update from templates

to create the po files. This will put all po files in the base directory sequentially overwriting files with the same name.

Once the files have been translated you can get them back to html using

  # po2html -t site xh site-xh

where

- site is the directory of the original files which provides the template for conversion

- xh is where the pot (now po) files were put

- site-xh is where you want the translated files to go

The admin problem of course is to manage the conversion of pot files to po files, not overwriting old po files and managing translation memory in the process. This is key to understanding Pootle - the back office function is essential to the translation project and it is not web based.

HTML pages change a lot and, unlike programs, don't come out in nice regular versions - so using versioning doesn't seem to make sense. We simply need to generate a pot file as and when and get this into Pootle. However we do need to have versions for the translator - otherwise they will not be aware that the page has changed and will wonder what happened to their work. After some time the old po files will need deleting.

po2tmx create kondosyokai.tmx from index.po

html2po -P will give us our pot files, replicating the source tree.

pot2po --tm=kondosyokai.tmx

 

Pootle Help

Pootle help in the installation

Check the webpages of all the installed packages

Translate Toolkit and Pootle wiki - use the search

For the Pootle and Translation Toolkit tools:

- use the --help option

- use the --manpage option e.g.

  $ pot2po --manpage | /usr/bin/nroff -Tutf8 -mandoc | less

Man pages for relevant unix tools:

- gettext

  # apropos gettext

 

Translation Toolkit Tools

From the README file:

      Converters
oo2po       - convert between OpenOffice.org GSI files and PO
oo2xliff    - convert between OpenOffice.org GSI files and XLIFF
moz2po      - convert from a Mozilla XPI file and PO.  Including unpacking
              and building a translated XPI.
csv2po      - convert PO format to CSV for editing in a spreadsheet program
php2po      - PHP localisable string arrays converter.
ts2po       - convert Qt Linguist (.ts) files to PO
txt2po      - convert simple text files to PO
html2po     - convert HTML to PO (beta)
xliff2po    - XLIFF (XML Localisation Interchange File Format) converter
prop2po     - convert Java .properties files to PO
po2wordfast - Wordfast Translation Memory converter
po2tmx      - TMX (Translation Memory Exchange) converter
pot2po      - PO file initialiser
csv2tbx     - Create TBX (TermBase eXchange) files from Comma Separated Value (CSV) files
      Tools Quality Assurance
pofilter      - run any of the 40+ checks on your PO files
pomerge       - merge corrected translations from pofilter back into your existing
                PO files.
poconflicts   - identify conflicting use of terms
porestructure - restructures po files according to poconflict directives
pogrep        - find words in PO files
      Tools Other
pocompile - create a Gettext MO files from PO or XLIFF files
pocount   - count translatable file formats (PO, XLIFF)
podebug   - Create comment in your PO files' msgstr which can then be used to quickly
            track down mistranslations as the comments appear in the application.
posegment - Break a PO or XLIFF files into sentence segments, useful
            for creating a segmented translation memory.
poswap    - uses a translation of another language that you would rather use
            than English as source language
   


 

back to top