L10n:Web frameworks
So, you're using a new web framework or programming language and want to make it localizable...
Contents
Really?
First up, stop. No. Please use Playdoh (docs) for the frontend portions of your code, it is already wired up to localize HTML and JavaScript sent to the browser. Playdoh and the libraries it ships with are all coded against the quirky and nuanced way our l10n community prefers things to be done.
Our l10n community is a large body of awesome, passionate volunteers asked to turn around lots of work in a short amount of time. If your i18n/l10n framework or your specific strings in your app aren't setup to their liking, they will reject your work and you'll have to do it over, maybe from scratch. (Been there and done that, now I get an l10n webdev involved as early as possible).
What goes into a framework
Okay, for whatever reason you really need to do L10n the Mozilla way, this page will dissect Playdoh to capture the scope of implementing i18n/l10n on a web framework.
Translate Toolkit
The Translate Toolkit is a Python package that assists in localization of software.
It provides many CLI like:
- po2html
- php2po
- moz2po
- po2moz
On php applications, we use straight gettext command line tools. Python's syntax and library support for gettext makes this impossible, so translate is a way to do gettext like work-flows on code.
Babel
A collection of tools for internationalizing Python applications.
Primarily Babel provides l10n to jinja. Jinja is the foundation of the template library jingo.
Tower
Tower fills in Mozilla specific issues from Babel.
- Pulls strings from a variety of sources: Python, JavaScript, and .lhtml files.
- Collapses whitespace in all strings to prevent unwieldy msgids.
- Supports Gettext context (msgctxt) in all gettext, and ngettext calls.
FunFactory
FunFactory (webdev library) provides a LocaleURLMiddleware which does locale detection and HTTP redirects. Again this captures the Mozilla specific way to do these things and accounts for user-agent bugs, etc.
It's configured to use the local codes in the same case as is preferred by our L10n community.
It also provides template helper functions which will format a URL with the locale code at the begging, which is our standard. It updates Django's url de-referencing bit so you have reverse a short name for a View into the locale aware URL.
Django/Jinja/Jingo
The Django web framework provides many i18n features. We use Jinja2 and Jingo in the Playdoh framework.
Django/jinja provides ugettext and ugettext_lazy. These are typically aliased to _ and _lazy and used in Python code (Models, Controllers, and business logic).
There are several template tags: _, ngettext, and trans. These have been tweaked (f and fe filters) based on feedback from security to make their security reviews easier.
The trans tag can be used in templates to localize large chunks of text.
Lifecycle
Okay, that is a lot of different libraries and frameworks. When are they used? What is L10n?
Build Time
- Extract strings from sources (Server side programming language, templates, client side JavaScript).
- Merge changes into existing POT/PO files
It's possible that gettext tools will be sufficient and you won't have much work here... but that was not the case for real world Python projects.
Runtime
On startup you'll want to have a list of supported locales. One startup or each request you'll want to add a text domain for the current locale and 'messages', so that you can use the .mo files under locale/${locale}/messages.mo.
When servicing a web request, you'll take the locale either from the URL or HTTP headers. You'll lookup strings using gettext functions. Looking at gettext support in your language is the place to start.
A locale in a url should trump a locale in HTTP headers, so that a French speaker can view a Spanish localization of a page.
Special care will be needed in how you integrate gettext into your web framework's templating engine. A naive implementation will cause security to have to manually review every localized string as a possible XSS vector. Work with them to find a syntax that is easy to identify static strings from those that are interpolated with user supplied data.
Lazy strings are needed if you execute a bit of code without the request in your context. You'll want to defer the sting lookup until you have a locale from the request.
Advanced Gettext
These frameworks also provide support for advanced usage like Gettext context (msgctxt) in all gettext, and ngettext calls.
Supporting these advanced features are like boiling a frog.
On day one, you can probably get by with _ and the gettext command line tools. In a couple hours you can have a working prototype. On day two you'll need ngettext because of plurals. If your UI gets large you may eventually need message contexts... your programming language doesn't support them properly and gettext command line tools will stop being sufficient... so you'll need to write a parser for your programming language, your templates, etc as well as an emitter for POT files...
It's hard to evaluate that the temperature of the water is getting hotter and hotter gradually, until your being boiled alive.
Okay, that is a bit dramatic, but you've been warned. i18n/l10n will eat up many days of you and other people's time troubleshooting/debugging subtle brokenness.