What we want:
- A web page that detects the language from the browser, and, if a translation exists, displays that translation. If not, it falls back to the English version.
- A menu somewhere, that lets users pick from a list of supported languages, independently of the one set by their browser.
- An easy to use process for translators, that relies on the well known tools of the trade (i.e. gettext and Poedit).
- All of the above in a single web page, so that can we keep all the common parts together, and don't have to duplicate changes.
Where we start:
- A web server that we control fully, and that natively supports UTF-8. I'll only say this once: In 2014, if you still don't use UTF-8 everywhere you can, then you don't deserve to host a web page, let alone administer a web server.
- An single
index.html
page, in English/UTF-8, that contains pure HTML (possibly with a little sprinkling of JavaScript, but not much else).
Prerequisites:
Because we have complete control of the server, we're going to use PHP Gettext.Why? Because it relies on gettext, which is a mature translation framework, with solid support (including a nice GUI translation application for Windows & Mac called Poedit) and also because the performance hit of using PHP Gettext seems to be minimal compared to the alternatives. Finally, using PHP gives us the ability to simply edit our existing HTML and insert PHP code wherever we need a translation, which should make the whole process a breeze.
Thus, the first two items you need to install on your server then, if you don't have them already, will be PHP (preferably v5 or later) as well as
php-gettext
, plus all dependencies those two packages may have.Then, you will need to install is
php5-intl
, so that we can use the locale_accept_from_http()
function call to detect the browser locale from our visitors.Finally, you must ensure that your server serves ALL the locales you are planning to support, in UTF-8. Especially, issuing
locale -a | grep utf8
on your server must return AN AWFUL LOT of entries (on mine, I get more than 150 of them, and that is the way it should be).If issuing
locale -a | grep utf8 | wc -l
returns less than 100 entries, then, unless you are planning to restrict your site to only a small part of the world, you will need to first sort that out, for instance by installing the locales-all
package. This is because gettext will not support a locale that is unknown to the system. For instance, if you don't see fr_CA.utf8
listed in your locale -a
, then no matter what you do, even if you have other French locales listed, gettext will not know how to handle browsers that are set to Canadian French. You have been warned!
Testing PHP gettext support:
At this stage, I will assume that you have php5
, php5-intl
, php-gettext
and possibly other dependencies such as libapache2-mod-php5
, gettext
and co. installed. If you are using Apache2, you may also have to enable the PHP5 module, by symlinking php5.conf
and php5.load
in your /etc/apache2/mods-enabled/
, and possibly edit php5.conf
to allow running PHP scripts in user directories (which is disabled by default).The first thing we'll do, to check that everything is in order before starting with localization, is simply create an
info.php
, at the same location where you have your index.html
, and that contains the following one liner:<? phpinfo(); ?>
Now, you should navigate to
<your_website>/info.php
and confirm that:- You get a whole bunch of PHP information from your server
- In this whole set of data, you see a line stating "GetText Support: enabled"
<?
rather than <?php
), which is what we'll use in the code below, are working, and also, get some assurance that gettext is enabled. So make sure to edit your php.ini
or conf settings, if you need to sort things out.Once you got the above simple test going, you should delete that
info.php
file, as you don't want attackers to know too much about the PHP and server settings you're running under.
Let's get crackin'
With PHP now confirmed working, let's set our translation rolling with PHP-Gettext. For that I'm going to loosely follow this guide. I say loosely, because I found that it was woefully incomplete and left out the most crucial parts.
- Start by duplicate your existing
index.html
asindex2.php
. This will enable us to work on adding translations toindex2.php
without interfering with the existing site, until we're happy enough that we can replaceindex.html
altogether. Of course we pickedindex2.php
rather thanindex.php
, to make sure our server doesn't try to serve the file we're testing over the live index.html that's assumed to already exist in that directory.
- In
index2.php
, and provided you want to test a French translation (you don't really have to speak French if you just want to test that things work), somewhere after the initial<html>
tag, add the following PHP header:
<? $langs = array( 'en_US' => array('en', 'English (International)'), 'fr_FR' => array('fr', 'French (Français)'), ); $locale = "en_US"; if (isset($_SERVER["HTTP_ACCEPT_LANGUAGE"])) $locale = locale_accept_from_http($_SERVER["HTTP_ACCEPT_LANGUAGE"]); if (isSet($_GET["locale"])) { $locale = $_GET["locale"]; } $locale = preg_replace("/[^a-zA-Z_]/", "", substr($locale,0,5)); foreach($langs as $code => $lang) { if(substr($locale,0,strlen($lang[0])) == $lang[0]) { $locale = $code; break; } } // Must append ".utf8" suffix here, else languages such as Azerbaijani won't work setlocale(LC_MESSAGES, $locale . ".utf8"); // Also set the LANGUAGE variable, which may be needed on some systems putenv("LANGUAGE=" . $locale); bindtextdomain("index", "./locale"); bind_textdomain_codeset("index", "UTF-8"); textdomain("index"); ?>
What this code does is:- Create an array of languages that we will support from the language selection menu (here English and French). You'll notice that this is actually an array of arrays, but more about this later.
- After setting the default to English, read the preferred locale from the browser, if
HTTP_ACCEPT_LANGUAGE
is defined (isset(...)
), usinglocale_accept_from_http()
. If that locale is not overridden with a?locale=
parameter passed on the URL, it's the one that will be used throughout the rest of the file. - Find if a
locale
parameter was passed on the URL and set the$locale
variable to it if that's the case. - Sanitize the locale parameter to ensure that it only contains only alphabetical or underscore, and is no more than 5 characters long (anything that can be entered by users must be considered potentially harmful and SHOULD BE SANITIZED!).
- Ensure that if we get a short locale (eg.
fr
rather thanfr_FR
), or if we get a locale for a language we support, but for a region that we don't (eg.fr_CA
), we convert it to the closestlocale_REGION
form we support. This is very important, as the browser may only provide us withfr
orfr_CA
when invokinglocale_accept_from_http
and want to have these locales mapped tofr_FR
for subsequent processing. - Tell gettext that it should use UTF-8 and look for
index.mo
in a./locale/<LOCALE>/LC_MESSAGES/
for translations (eg../locale/fr/LC_MESSAGES/index.mo
).
- Somewhere in a
div
(eg. the one for a right sidebar) add the following code for the language selection menu:
<select onchange="self.location='?locale='+this.options[this.selectedIndex].value"> <? foreach($langs as $code => $lang): ?> <option <? if(substr($locale,0,strlen($lang[0])) == $lang[0]) echo "selected=\"selected\"";?> value="<?= $code;?>"> <?= $lang[1]; ?> </option> <? endforeach; ?> </select>
What this code does is:- Create a dropdown with all the languages from our
$langs
array. - Check out if the first characters of our
$locale
matches the short language code from our array, and set the dropdown entry as the selected one if that is the case. This ensures that "French" will be selected in our dropdown, regardless of whether the locale isfr_CA
,fr_FR
or any of the otherfr_XX
locales. - When a user selects an entry from the dropdown, add a
?locale=en_US
or?locale=fr_FR
to the URL, to force the page to be refreshed using that language.
- Create a dropdown with all the languages from our
- For every place where you want to translate a string, use something like
<?= _("Hello, world");?>
, where<?=
is the short version of<?php echo
and_(
is the actual call to gettext. What gettext does then is, find out if a translation exists for the string being passed as parameter and either use that if it exists, or the original untranslated string otherwise.
- Of course, you can use the whole gamut of PHP function calls, and say, if you want to insert a variable in your translated string, such as a date, do something like:
<? printf(_("Last updated %s:"), $last_date);?>
.
Also, if needed, and this is something that is very useful to know, you can insert translator notes using comments (/* ... */
within your PHP, before the_(...)
calls. These comments will then be displayed for all translators to see in Poedit (as long as you used the-c
option when creating your PO catalog withxgettext
).
- Save your
index2.php
and confirm that you get to see the English strings, the dropdown with 2 entries, as well as?locale=fr_FR
or?locale=en_US
appended to the URL when you select an entry from the dropdown. Of course, since we haven't created any translation for French, the English text still displays when French is selected, as the default of gettext is to use the original if a translation is missing, but we will address that shortly.
- Create a
./locale/fr/LC_MESSAGES/
set of subdirectories, at the location where you have yourindex2.php
page.
- Now we need to generate the gettext catalog, or
POT
, which is the file you will have to provide translators with, in order for them to start creating a translation. Now, while Poedit is supposed to be able to process a PHP file to generate a.pot
, I couldn't for the life of me figure out how to do just that with the Windows version. Moreover, the.pot
creation is really something you want to do on the server anyway, so, to cut a long story short, we're just going to callxgettext
, using a script, to produce our.pot
on the server. Here is the content of that script:
#!/bin/sh xgettext --package-version=1.0 --from-code=UTF-8 --copyright-holder="Pete Batard" --package-name="Rufus Homepage" --msgid-bugs-address=pete@akeo.ie -L PHP -c -d index -o ./locale/index.pot index2.php sed --in-place ./locale/index.pot --expression='s/SOME DESCRIPTIVE TITLE/Rufus Homepage/' sed --in-place ./locale/index.pot --expression='1,6s/YEAR/2014/' sed --in-place ./locale/index.pot --expression='1,6s/PACKAGE/Rufus/' sed --in-place ./locale/index.pot --expression='1,6s/FIRST AUTHOR/Pete Batard/' sed --in-place ./locale/index.pot --expression='1,6s/EMAIL@ADDRESS/pete@akeo.ie/'
Running the above, in the directory where we have our PHP, creates ourindex.pot
under the./locale/
subdirectory, and fills in some important variables thatxgettext
mysteriously doesn't seem to provide any means to set. As you can see, we used the-c
option so that any notes to translators that we added using PHP comments are carried over.
- Now, we're doing into the part that is generally meant to be done by a translator: download the
index.pot
, and open it in Poedit. From there, set your target language (herefr_FR
) and translate the various strings (eg. "Hello, world" → "Bonjour, monde"). Save your translation asindex.po
/index.mo
(Poedit will create both files) and uploadindex.mo
in./locale/fr/LC_MESSAGES/
.
- Voilà! If you did all of the above properly and select French in the dropdown or use a browser that has French as its preferred language, then you should now see the relevant sections translated. "C'est magique, non?"
- From there, you will of course need to add PHP for all of the page content that you want to see translated, by enclosing the English text it into
<? _(...);?>
sections (don't worry about the constant switching between HTML and PHP mode - PHP is designed to be very efficient at doing just that!). Once you're happy, just rename yourindex2.php
toindex.php
(but make sure to remove yourindex.html
first, or you may run into weird issues), and you are fully ready to get your content localized. To do that, just run thePOT
creation script again (make sure you edit the script if needed, so that is applies toindex.php
now), and provideindex.pot
to your translators. Then wait for them to send your their.mo
files, edit the code above to add a new array line for each extra language, and watch in awe as visitors experience your site in that new language. Now, it wasn't that hard after all, was it?
Additional remarks:
Can't we just do away with the double fr_FR
and fr
in our array?
Unfortunately, no. The short explanation is, even after you place your translation under a /fr/
subdirectory, so that it is used by default when your locale is fr_FR
, fr_CA
, fr_BE
, fr_CH
and so on, gettext still can't work with a locale that is just set to fr
. This is because, as explained in the Prerequisites, if your system doesn't have an fr
or fr.utf8
listed with locale -a
, gettext just doesn't know how to handle it language.Now. the long explanation as to why don't we couldn't just use a single
fr_FR
in our $langs
array is: we want to smartly set our dropdown to French, even when fr_CA
is provided, and we can't do something as simple as just picking the first two characters of the array locale, due to the fact that we will also want to support both pt_PT
and pt_BR
as well as zh_CN
and zh_TW
, as separate languages (because that's pretty much what they are). So, if we were to just try to isolate the substring up to the underscore, then if we had zh_CN defined before zh_TW in our array, Traditional Chinese speakers would see the dropdown set to Simplified Chinese and that's not what we want.Thus, for our dropdown selection comparison, we must provide a value that is the lowest common denominator we want the language to apply to, which can be either a simple
fr
or es
, or a longer pt_BR
or zh_CN
. But as we explained previously, we can't use that lowest common denominator for locale selection, as gettext might not know how to handle it. And that is why we need to duplicate part of the locale in two places in our array.<rant>
Of course, it would be oh so much simpler if OSes agreed that short locales without a region are perfectly valid entities by default, especially as gettext doesn't seem to have any issue accepting them when looking for .mo
files, but hey, that's localization for you: no-one EVER manages to get it right...</rant>
How about a real-life example?
Alright... Since I'm all about Open Source, let me show you exactly how I am applying all of the above to the Rufus Homepage. You can click the following to access the currentindex.php
source for the Rufus site, as well as the locale/
subdirectory. There's also this guide, that I provide to any translator who volunteered to create a translation for the homepage. Hopefully, these will help you fill any blanks, and allow you to provide an awesome multilingual web page!What about right-to-left languages?
Look at the PHP source and look for the use of the$dir
variable.