2014-11-18

Applying a series of Debian patches to an original source

Say you have a nice original source package, as well as bunch of extra Debian patches, which you want to apply to that source (for instance, you may want to compile Debian's grub 2.00-22 using the tarballs you picked here.

However, since Debian uses quilt, or whatever it's called, to automate the application of a series of patches, and you either don't have it on your system or don't want to bother with it (since you're only interested in the patches), you end up wanting to apply all the files from the patches directory of the .debian addon, and there's of course no way you'll want to do that manually.

The solution: Copy the patches/ directory from the Debian addon to the root of your orig source, and run the following shell script.

#!/bin/bash
while read p; do
  patch -p1 < ./patches/$p
done < ./patches/series

By Grabthar's hammer, what a timesaver!

2014-11-16

Compiling Grub4DOS on MinGW

Since chenall committed a handful of patches that I submitted to make the compilation of Grub4DOS on MinGW easier, I'm just going to jolt down some quick notes on how you can produce a working Grub4DOS on Windows.
Note that part of this guide is shamelessly copied from the RMPrepUSB Grub4DOS compilation guide.
  • If you don't already have a git client, download and install msys-git (a.k.a. "Git for Windows") from here.
  • Download the latest MinGW32 installer (mingw-get-setup.exe) by clicking the "Download Installer" button on the top right corner of the main MinGW site.
  • Keep the default options on the first screen (but you can change the destination directory if you want)
  • On the package selection screen, select
    • mingw-developer-toolkit
    • mingw-base
    • msys-base
  • Select menu InstallationApply Changes and click Apply
  • Now navigate to your msys directory, e.g.. C:\MinGW\msys\1.0\, and open the file etc\profile in a text editor.
  • Assuming that you installed msys-git in C:\Program Files (x86)\Git, change the following:
    if [ $MSYSTEM == MINGW32 ]; then
      export PATH=".:/usr/local/bin:/mingw/bin:/bin:$PATH"
    else
      export PATH=".:/usr/local/bin:/bin:/mingw/bin:$PATH"
    fi
    to
    if [ $MSYSTEM == MINGW32 ]; then
      export PATH=".:/usr/local/bin:/mingw/bin:/bin:/c/Program Files (x86)/Git/bin:$PATH"
    else
      export PATH=".:/usr/local/bin:/bin:/mingw/bin:/c/Program Files (x86)/Git/bin:$PATH"
    fi
    This is to ensure that your system will be able to invoke git. Of course, if you use a different git client, you can ignore this step.
  • Download nasm (current build is: http://www.nasm.us/pub/nasm/releasebuilds/2.11.06/win32/nasm-2.11.06-win32.zip) extract and copy nasm.exe to C:\MinGW\msys\1.0\bin (the other files in the zip archive can be discarded).
  • Download upx (current build is: ftp://ftp.heanet.ie/mirrors/sourceforge/u/up/upx/upx/3.91/upx391w.zip) extract and copy upx.exe to C:\MinGW\msys\1.0\bin (the other files in the zip archive can be discarded).
  • In C:\MinGW\msys\1.0\ launch msys.bat
  • In the shell that appears, issue the following command (this may be necessary to locate mingw-get):
    /postintall/pi.sh
    You should accept all the default options.
  • Now issue the following commands:
    mingw-get upgrade gcc=4.6.2-1
    mingw-get install mpc=0.8.1-1
    This will effectively downgrade your compiler to gcc 4.6.2, which is necessary as gcc 4.7 or later doesn't seem to produce a working grldr for the time being.
  • Download the latest Grub4DOS source from github by issuing the following command
    git clone https://github.com/chenall/grub4dos.git
    Note: By default this will download the source into C:\MinGW\msys\1.0\home\<your_user_name>\grub4dos\, but you can of course navigate to a different directory before issuing the git clone command if you want it elsewhere.
  • Run the following commands:
    cd grub4dos
    ./autogen.sh
    make
At the end of all this, you should end up with a grldr and grldr.mbr in the C:\MinGW\msys\1.0\home\<your_user_name>\grub4dos\stage2\ directory, which is what you want

IMPORTANT: Do not try to invoke ./configure directly on MinGW, as compilation will fail. Instead should ensure that you call autotools to re-generate configure and Makefiles that MinGW will be happy with. Note that you can run ./bootstrap.sh instead of ./autogen.sh, if you don't want configure to be invoked with the default options.

What's the deal with gcc 4.7 or later on MinGW?

I haven't really investigated the issue, but the end result is that grldr is 303 KB, vs 307 KB for gcc 4.6.2, and freezes at boot after displaying:
A20 Debug: C806 Done! ...

I'm getting an error about objcopy during the configure test...

That's because you're not listening to what I say and try to compile a version of Grub4DOS that doesn't contain the necessary updates for MinGW. You must use a version of the source that's more recent than 2014.11.14 and right now, that source is only available if you clone from git.

Dude, could you, like, also provide the steps to compile from Linux?

Sigh... Alright, since I'm a nice guy, and it's a lot simpler, I'll give you the steps for a bare Debian 7.7.0 x64 Linux setup:
aptitude install gcc glibc-devel.i686 gcc-multilib make autotools autoconf git nasm upx
git clone https://github.com/chenall/grub4dos.git
cd grub4dos
./autogen.sh
make
Happy now? Note that the Linux compiled version is usually a lot smaller than the MinGW32 compiled one.

2014-11-13

Visual Studio 2013 has now become essentially free...

See http://www.visualstudio.com/products/visual-studio-community-vs.

I'm just going to point out to the first 2 paragraph of the license terms:
1.   INSTALLATION AND USE RIGHTS.
a.   Individual license. If you are an individual working on your own applications to sell or for any other purpose, you may use the software to develop and test those applications.
b.   Organization licenses. If you are an organization, your users may use the software as follows:
  • Any number of your users may use the software to develop and test your applications released under Open Source Institute (OSI)-approved open source software licenses.
  • Any number of your users may use the software to develop and test your applications as part of online or in person classroom training and education, or for performing academic research.
  • If none of the above apply, and you are also not an enterprise (defined below), then up to 5 of your individual users can use the software concurrently to develop and test your applications.
  • If you are an enterprise, your employees and contractors may not use the software to develop or test your applications, except for open source and education purposes as permitted above. An “enterprise” is any organization and its affiliates who collectively have either (a) more than 250 PCs or users or (b) more than one million US dollars (or the equivalent in other currencies) in annual revenues, and “affiliates” means those entities that control (via majority ownership), are controlled by, or are under common control with an organization.
Basically, this means that even if you're a corporate user, you can legally install and use Visual Studio Community Edition, on any computer you want, to compile and/or contribute to Open Source projects, and this regardless of your company's internal policies regarding the installation of Software (otherwise any company could enact an internal policy such as "Microsoft software licenses don't apply here" to be entitled to install as many unlicensed copies of Windows as they like).
So I have to stress this out very vehemently: If a company or IT department tries to take your right to download and install Visual Studio 2013 Community Edition to compile or test Open Source projects, THEY ARE IN BREACH OF THE LAW!
The only case where you are not entitled to use Visual Studio Community Edition is if you're developing a closed source application for a company. But who in their right mind would ever want to do something like that anyway?... ;)

So all of a sudden, you no longer have to jump through hoops if you want to recompile, debug and contribute to rufus, libusb or libwdi/Zadig - simply install Visual Studio 2013, as you are fully entitled to (because all these projects use an OSI approved Open Source license), and get going!

Oh, and for the record, if you want to keep a copy of Visual Studio 2013 Community Edition, for offline installation, you should run the installer as:
vs_community.exe /layout
Note however that this will send you back 8 GB in terms of download size and disk space.

2014-10-08

Free SSL certificate for Open Source projects

Just going to point out that GlobalSign are currently offering a 1 year SSL certificate for Open Source projects for free.

Alas, this is only for a specific domain name, such as app.project.org, rather than for a wildcard domain, such as *.project.org, and at this stage, I'm not entirely sure if the certificate is also renewable for free after one year. But at least, this now allows me to offer access to Rufus from https://rufus.akeo.ie.

Oh, and once your site is set for SSL, you probably want to ensure that it is properly configured by running it through Qualys SSL Labs' excellent SSL analysis tool.

And I'm just going to jolt down that, to get a proper grade with Apache, you may have to edit your /etc/apache2/mods-enabled/ssl.conf and set the following:
SSLProtocol all -SSLv2 -SSLv3

SSLHonorCipherOrder on
SSLCipherSuite EECDH+ECDSA+AESGCM:EECDH+aRSA+AESGCM:EECDH+ECDSA+SHA384:EECDH+ECDSA+SHA256:EECDH+aRSA+SHA384:EECDH+aRSA+SHA256:EECDH:EDH+aRSA:!aNULL:!eNULL:!LOW:!3DES:!MD5:!EXP:!PSK:!SRP:!DSS:!RC4

2014-09-29

Getting proper coloured directory listing with Debian and Putty

Since I keep having to do that:

  1. In putty, in the Colours setting tab for your connection, make sure that "Indicate bolded text by changing" is set to "The colour" and not "The font"
  2. In Debian's bashrc, uncomment the line:
    force_color_prompt=yes

2014-06-30

So I built an NTFS EFI driver...

It's Free Software of course, and it only took me about two weeks to do so.

Since I've been doing it in my limited spare time, I might as well brag about it and say that, had I been able to work on this full time (which I sure wouldn't mind), it probably wouldn't have taken more than 7 days... Can't help but wonder how much a proprietary/non-free software development workflow would have had to budget, or outsource, to try to achieve the same thing, within the same amount of time.

At the very least, this demonstrates that, if you start with the the right resource set and, more importantly, if you stop being irrational about how "using the GPLv3 means the death knell of your software revenue stream", a project such as this one can easily and cheaply be completed in a matter of days.


Anyway, the driver itself is read-only (which is all I need for Rufus, as my intent is to use it there) and it could probably use some more polishing/cleanup, but it is stable enough to be used right now.

So, if you are interested in a redistributable and 100% Free Software read-only NTFS EFI driver, you should visit:
https://efi.akeo.ie (the link includes pre-built binaries).

Alternatively, you can also visit the github project page at:
https://github.com/pbatard/efifs

Now, I'd be ungrateful if I didn't mention that the main reason I was able to get something off the ground this quickly is thanks to the awesome developers behind the GRUB 2.0 project, who abstracted their file system framework enough, to make reusing their code in an EFI implementation fairly straightforward.
And I also have to thank the iPXE developers, who did most of the back-breaking work in figuring out a GPL friendly version of an EFI FS driver, that I could build on.
Finally, I was also able to reuse some of the good work from the rEFInd people (the GPLv3 compatible parts), which was big help!

But the lesson is: Don't waste your time with proprietary/non-free software. If you are both interested in being productive and budget-conscious, Free Software is where it's at!

2014-05-31

Restoring EFI NVRAM boot entries with rEFInd and Rufus

So, you reinstalled Windows, and it somehow screwed the nice EFI entry you had that booted your meticulously crafted EFI system partition? You know, the one you use with rEFInd or ELILO or whatever, to multiboot Linux, Windows, etc., and that has other goodies such as the EFI shell...

Well, here's how you can sort yourself out (shamelessly adapted from the always awesome and extremely comprehensive Arch Linux documentation):
  • Download the latest rEFInd CD-R image from here.
  • Extract the ISO and use Rufus to create a bootable USB drive. Make sure that, when you create the USB, you have "GPT partition scheme for UEFI computer" selected, under "Partition scheme and target system type".
  • Boot your computer in UEFI mode, and enter the EFI BIOS to select the USB as your boot device
  • On the rEFInd screen select "Start EFI shell".
  • At the 2.0 Shell > prompt type:
    bcfg boot dump

    This should confirm that some of your old entries have been unceremoniously wiped out by Windows.
  • Find the disk on which your old EFI partition resides by issuing something like:
    dir fs0:\efi

    NB: you can use the map command to get a list of all the disks and partitions detected during boot.
  • Once you have the proper fs# information (and provided you want to add an entry that boots into a rEFInd installed on your EFI system partition under EFI\refind\refind_x64.efi), issue something like:
    bcfg boot add 0 fs0:\EFI\refind\refind_x64.efi rEFInd
    Note: If needed you can also use bcfg boot rm # to remove existing entries.
  • Confirm that your entry has been properly installed as the first option, by re-issuing bcfg boot dump. Then remove the USB, reset your machine, and you should find that everything is back to normal.
NOTE: Make sure you use the latest rEFInd if you want an EFI shell that includes bcfg. Not all EFI shells will contain that command!

2014-05-30

RTF - Where's the FM?

I mean a hands-on manual on how to create an Rich Text Format file from scratch, not the friggin' 200 pages  specs! Plus, only Microsoft would provide a 200 pages Word Document as an executable... Oh well, it's not like I never saw IBM (or was it Intel?) providing some source code as a PDF file with page numbering.

Man, what a struggle to figure out how to get Arabic RTF content to properly display in an app's Rich Edit control.

If you try to be smart and have Wordpad produce your RTF for you, and even if you set your Arabic text to use an Unicode font, you end up with something like:

{\rtf1 ... {\fonttbl{\f0\fnil\fcharset0 Courier New;}{\f1\fnil\fcharset178 @Arial Unicode MS;}}
\pard\ltrpar\f0 Some blurb\f1\rtlch\lang1025\'da\'e3\'d1 \'c7\'e1\'d5\'e3\'cf\b0\f0\ltrch\lang6153\par
}
...which results in UTTER GARBAGE on screen in place of the Arabic!

I can't help but ask: what is the point of using an Unicode font, really, if that insanely dumb word processor that is Wordpad still insists on living in the 1980s, and switches codepages to insert CP-Whatever codepoints instead?

So here's what you actually want to do, manually:
  • remove the \lang switch
  • insert pure Unicode codepoints using \u
But of course, it wouldn't be as backwards as possible if Microsoft didn't also force you to specify Unicode codepoints in decimal, with no means whatsoever of specifying hex instead. So even if you know the Arabic UTF-16 sequence you want to insert, you will have to spend some time doing your decimal conversions, to, at last, get the properly working:

{\rtf1 ... {\fonttbl{\f0\fnil\fcharset0 Courier New;}{\f1\fnil\fcharset178 @Arial Unicode MS;}}
\pard\ltrpar\f0 Some blurb\f1\rtlch\u1575?\u1604?\u1589?\u1605?\u1583? \u1593?\u1605?\u1585?\ltrch\f0\
}

Heed my advice: If you design your format around the idea that no human will ever need to edit some data in a hurry in it, you're designing it all wrong...

As an aside, the above is also the reason why little-endian is an utter abomination that should be banned from the face of this earth: If I'm in a computer-controlled commercial airplane, that's lost all input, and, on account of the ground approaching fast, I'm in a bit of a hurry to figure out from a memory dump where the automatic pilot might store its altitude, to manually alter it, you bet that I'm gonna hope that whoever designed that plane picked a big-endian CPU, to slightly increase the probability of myself and all the other passengers not ending up as a pancake...

First rule of designing anything is to design with the idea that humans will always need to interact with your stuff, in ways that you'll never be able to devise.

So, Microsoft, next time you want to design something like RTF, please RTFM of Design rules and try to make it just a bit easier on people who need to manually interact with your stuff...

2014-05-29

Compiling and installing Grub2 for standalone USB boot

The goal here, is to produce the necessary set of files, to be written to an USB Flash Drive using dd (rather than using the Grub installer), so that it will boot through Grub 2.x and be able to process an existing grub.cfg that sits there.

As usual, we start from nothing. I'll also assume that you know nothing about the intricacies of Grub 2 with regards to the creation of a bootable USB, so let me start with a couple of primers:

  1. For a BIOS/USB boot, Grub 2 basically works on the principle of a standard MBR (boot.img), that calls a custom second stage (core.img), which usually sits right after the MBR (sector 1, or 0x200 on the UFD) and which is a flat compressed image containing the Grub 2 kernel plus a user hand-picked set of modules (.mod).
    These modules, which get added to the base kernel, should usually limit themselves to the ones required to access the set of file systems you want Grub to be able to read a config file from and load more individual modules (some of which need to be loaded to parse the config, such as normal.mod or terminal.mod).
    As you may expect, the modules you embed with the Grub kernel and the modules you load from the target filesystem are exactly the same, so you have some choice on whether to add them to the core image or load them from the filesystem.
     
  2. You most certainly do NOT want to use the automated Grub installer in order to boot an UFD. This is because the Grub installer is designed to try to boot the OS it is running from, rather than try to boot a random target in generic fashion. Thus, if you try to follow the myriad of quick Grub 2 guides you'll find floating around, you'll end up nowhere in terms of booting a FAT or NTFS USB Flash Drive, that should be isolated of everything else.
With the above in mind, it's time to get our hands dirty. Today, I'm going to use Linux, because my attempts to try to build the latest Grub 2 using either MinGW32 or cygwin failed miserably (crypto compilation issue for MinGW, Python issue for cygwin on top of the usual CRLF annoyances for shell scripts due to the lack of a .gitattributes). I sure wish I had the time to produce a set of fixes for Grub guys, but right now, that ain't gonna happen ⇒ Linux is is.

First step is to pick up the latest source, and, since we like living on the edge, we'll be using git rather than a release tarball:

git clone git://git.savannah.gnu.org/grub.git

Then, we bootstrap and attempt to configure for the smallest image size possible, by disabling NLS (which I had hoped would remove anything gettext but turns out not to be the case - see below).

cd grub
./autogen.sh
./configure --disable-nls
make -j2

After a few minutes, your compilation should succeed, and you should find that in the grub-core/ directory, you have a boot.img, kernel.img as well as a bunch of modules (.mod).

As explained above, boot.img is really our MBR, so that's good, but we're still missing the bunch of sectors we need to write right after that, that are meant to come from a core.img file.

The reason we don't have a core.img yet is because it is generated dynamically, and we need to tell Grub exactly what modules we want in there, as well as the disk location we want the kernel to look for additional modules and config files. To do just that, we need to use the Grub utility grub-mkimage.

Now that last part (telling grub that it should look at the USB generically and in isolation, and not give a damn about our current OS or disk setup) is what nobody on the Internet seems to have the foggiest clue about, so here goes: We'll want to tell Grub to use BIOS/MBR mode (not UEFI/GPT) and that we'll have one MBR partition on our UFD containing the boot data that's not included in boot.img/core.img and that it may need to proceed. And with BIOS setting our bootable UFD as the first disk (whatever gets booted is usually the first disk BIOS will list), we should tell Grub that our disk target is hd0. Furthermore, the first MBR partition on this drive will be identified as msdos1 (Grub calls MBR-like partitions msdos#, and GPT partitions gpt#, with the index starting at 1, rather than 0 as is the case for disks).

Thus, if we want to tell Grub that it needs to look for the first MBR partition on our bootable UFD device, we must specify (hd0,msdos1) as the root for our target.
With this being sorted, the only hard part remaining is figure out the basic modules we need, so that Grub has the ability to actually identify and read stuff on a partition that may be FAT, NTFS or exFAT. To cut a long story short, you'll need at least biosdisk and part_msdos, and then a module for each type of filesystem you want to be able to access. Hence the complete command:

cd grub-core/
../grub-mkimage -v -O i386-pc -d. -p\(hd0,msdos1\)/boot/grub biosdisk part_msdos fat ntfs exfat -o core.img

NB: If you want to know what the other options are for, just run ../grub-mkimage --help
Obviously, you could go crazy adding more file systems, but the one thing you want to pay attention is the size of core.img. That's because if you want to keep it safe and stay compatible with the largest choice of disk partitioning tools, you sure want to have core.img below 32KB - 512 bytes. The reason is there still exists a bunch of partitioning utilities out there that default to creating their first partition on the second "track" of the disk. And for most modern disks, including flash drives, a track will be exactly 64 sectors. What this all means is, if you don't want to harbour the possibility of overflowing core.img onto your partition data, you really don't want it to be larger than 32256 or 0x7E00 bytes.
OK, so now that we have core.img, it's probably a good idea to create a single partition on our UFD (May I suggest using Rufus to do just that? ;)) and format it to either FAT/FAT32, NTFS or exFAT.

Once this is done, we can flat-copy both the MBR, a.k.a. boot.img, and core.img onto those first sectors. The one thing you want to pay attention to here is, while copying core.img is no sweat, because we can just use a regular 512 byte sector size, for the MBR, you need to make sure that only the first 440 bytes of boot.img are copied, so as not to overwrite the partition data and the disk signature that also resides in the MBR and that has already been filled. So please pay close attention to the bs values below:

dd if=boot.img of=/dev/sdb bs=440 count=1
dd if=core.img of=/dev/sdb bs=512 seek=1 # seek=1 skips the first block (MBR)

Side note: Of course, instead of using plain old dd, one could have used Grub's custom grub-bios-setup like this:

../grub-bios-setup -d. -b ./boot.img -c ./core.img /dev/sdb

However, the whole point of this little post is to figure out a way to add Grub 2 support to Rufus, in which we'll have to do the copying of the img files without being able to rely on external tools. Thus I'd rather demonstrate that a dd copy works just as good as the Grub tool for this.
After having run the above, you may think that all that's left is copying a grub.cfg to /boot/grub/ onto your USB device, and watch the magic happen... but you'll be wrong.

Before you can even think about loading a grub.cfg, and at the very least, Grub MUST have loaded the following modules (which you'll find in your grub-core/ directory and that need to be copied on the target into a /boot/grub/i386-pc/ folder):
  • boot.mod
  • bufio.mod
  • crypto.mod
  • extcmd.mod
  • gettext.mod
  • normal.mod
  • terminal.mod
As to why the heck we still need gettext.mod, when we made sure we disabled NLS, and also why we must have crypto, when most usages of Grub don't care about it, your guess is as good as mine...

Finally, to confirm that everything works, you can add echo.mod to the list above, and create a /boot/grub/grub.cfg on your target with the following:

insmod echo
set timeout=5

menuentry "test" {
    echo "hello"
}

Try it, and you should find that your Grub 2 config is executing at long last, whether your target filesystem in FAT, NTFS or exFAT, and you can now build custom bootable Grub 2 USBs on top of that. Isn't that nice?

FINAL NOTE: In case you're using this to try boot an existing Grub 2 based ISO from USB (say Aros), be mindful that, since we are using the very latest Grub code, there is a chance that the modules from the ISO and the kernel we use in core may have some incompatibility. Especially, you may run into the obnoxious:

error: symbol 'grub_isprint' not found.

What this basically means is that there is a mismatch between your Grub 2 kernel version and Grub 2 module. To fix that you will need to use kernel and modules from the same source.

2014-01-14

Using PHP-Gettext to localize your web pages

This is what I am now using for the Rufus Homepage. As usual, it took way too long to find all the pieces needed to solve this specific problem, so I'm going to write a guide that has them all in a single place.

What we want:

  1. A web page that detects the language from the browser, and, if a translation exists, displays that translation. If not, it falls back to the English version.
  2. A menu somewhere, that lets users pick from a list of supported languages, independently of the one set by their browser.
  3. An easy to use process for translators, that relies on the well known tools of the trade (i.e. gettext and Poedit).
  4. All of the above in a single web page, so that can we keep all the common parts together, and don't have to duplicate changes.


Where we start:

  • A web server that we control fully, and that natively supports UTF-8. I'll only say this once: In 2014, if you still don't use UTF-8 everywhere you can, then you don't deserve to host a web page, let alone administer a web server.
  • An single index.html page, in English/UTF-8, that contains pure HTML (possibly with a little sprinkling of JavaScript, but not much else).
Aaaand, that's about it really.


Prerequisites:

Because we have complete control of the server, we're going to use PHP Gettext.
Why? Because it relies on gettext, which is a mature translation framework, with solid support (including a nice GUI translation application for Windows & Mac called Poedit) and also because the performance hit of using PHP Gettext seems to be minimal compared to the alternatives. Finally, using PHP gives us the ability to simply edit our existing HTML and insert PHP code wherever we need a translation, which should make the whole process a breeze.

Thus, the first two items you need to install on your server then, if you don't have them already, will be PHP (preferably v5 or later) as well as php-gettext, plus all dependencies those two packages may have.

Then, you will need to install is php5-intl, so that we can use the locale_accept_from_http() function call to detect the browser locale from our visitors.

Finally, you must ensure that your server serves ALL the locales you are planning to support, in UTF-8. Especially, issuing locale -a | grep utf8 on your server must return AN AWFUL LOT of entries (on mine, I get more than 150 of them, and that is the way it should be).
If issuing locale -a | grep utf8 | wc -l returns less than 100 entries, then, unless you are planning to restrict your site to only a small part of the world, you will need to first sort that out, for instance by installing the locales-all package. This is because gettext will not support a locale that is unknown to the system. For instance, if you don't see fr_CA.utf8 listed in your locale -a, then no matter what you do, even if you have other French locales listed, gettext will not know how to handle browsers that are set to Canadian French. You have been warned!


Testing PHP gettext support:

At this stage, I will assume that you have php5, php5-intl, php-gettext and possibly other dependencies such as libapache2-mod-php5, gettext and co. installed. If you are using Apache2, you may also have to enable the PHP5 module, by symlinking php5.conf and php5.load in your /etc/apache2/mods-enabled/, and possibly edit php5.conf to allow running PHP scripts in user directories (which is disabled by default).

The first thing we'll do, to check that everything is in order before starting with localization, is simply create an info.php, at the same location where you have your index.html, and that contains the following one liner:
<? phpinfo(); ?>

Now, you should navigate to <your_website>/info.php and confirm that:
  1. You get a whole bunch of PHP information from your server
  2. In this whole set of data, you see a line stating "GetText Support: enabled"
If you don't see any of the above, then you will need to sort your PHP settings before proceeding, as everything that follows relies on having at least the above working. For one, we want to confirm that both PHP and the short script form (<? rather than <?php), which is what we'll use in the code below, are working, and also, get some assurance that gettext is enabled. So make sure to edit your php.ini or conf settings, if you need to sort things out.

Once you got the above simple test going, you should delete that info.php file, as you don't want attackers to know too much about the PHP and server settings you're running under.


Let's get crackin'


With PHP now confirmed working, let's set our translation rolling with PHP-Gettext. For that I'm going to loosely follow this guide. I say loosely, because I found that it was woefully incomplete and left out the most crucial parts.
  1. Start by duplicate your existing index.html as index2.php. This will enable us to work on adding translations to index2.php without interfering with the existing site, until we're happy enough that we can replace index.html altogether. Of course we picked index2.php rather than index.php, to make sure our server doesn't try to serve the file we're testing over the live index.html that's assumed to already exist in that directory.

  2. In index2.php, and provided you want to test a French translation (you don't really have to speak French if you just want to test that things work), somewhere after the initial <html> tag, add the following PHP header:

    <?
    $langs = array(
      'en_US' => array('en', 'English (International)'),
      'fr_FR' => array('fr', 'French (Français)'),
    );
    $locale = "en_US";
    if (isset($_SERVER["HTTP_ACCEPT_LANGUAGE"]))
      $locale = locale_accept_from_http($_SERVER["HTTP_ACCEPT_LANGUAGE"]);
    if (isSet($_GET["locale"])) {
      $locale = $_GET["locale"];
    }
    $locale = preg_replace("/[^a-zA-Z_]/", "", substr($locale,0,5));
    foreach($langs as $code => $lang) {
      if(substr($locale,0,strlen($lang[0])) == $lang[0]) {
        $locale = $code;
        break;
      }
    }
    // Must append ".utf8" suffix here, else languages such as Azerbaijani won't work
    setlocale(LC_MESSAGES, $locale . ".utf8");
    // Also set the LANGUAGE variable, which may be needed on some systems
    putenv("LANGUAGE=" . $locale);
    bindtextdomain("index", "./locale");
    bind_textdomain_codeset("index", "UTF-8");
    textdomain("index");
    ?>

    What this code does is:
    • Create an array of languages that we will support from the language selection menu (here English and French). You'll notice that this is actually an array of arrays, but more about this later.
    • After setting the default to English, read the preferred locale from the browser, if HTTP_ACCEPT_LANGUAGE is defined (isset(...)), using locale_accept_from_http(). If that locale is not overridden with a ?locale= parameter passed on the URL, it's the one that will be used throughout the rest of the file.
    • Find if a locale parameter was passed on the URL and set the $locale variable to it if that's the case.
    • Sanitize the locale parameter to ensure that it only contains only alphabetical or underscore, and is no more than 5 characters long (anything that can be entered by users must be considered potentially harmful and SHOULD BE SANITIZED!).
    • Ensure that if we get a short locale (eg. fr rather than fr_FR), or if we get a locale for a language we support, but for a region that we don't (eg. fr_CA), we convert it to the closest locale_REGION form we support. This is very important, as the browser may only provide us with fr or fr_CA when invoking locale_accept_from_http and want to have these locales mapped to fr_FR for subsequent processing.
    • Tell gettext that it should use UTF-8 and look for index.mo in a ./locale/<LOCALE>/LC_MESSAGES/ for translations (eg. ./locale/fr/LC_MESSAGES/index.mo).

  3. Somewhere in a div (eg. the one for a right sidebar) add the following code for the language selection menu:

    <select onchange="self.location='?locale='+this.options[this.selectedIndex].value">
    <? foreach($langs as $code => $lang): ?>
      <option &lt? if(substr($locale,0,strlen($lang[0])) == $lang[0]) echo "selected=\"selected\"";?> value="<?= $code;?>">
      <?= $lang[1]; ?>
    </option>
    <? endforeach; ?>
    </select>

    What this code does is:
    • Create a dropdown with all the languages from our $langs array.
    • Check out if the first characters of our $locale matches the short language code from our array, and set the dropdown entry as the selected one if that is the case. This ensures that "French" will be selected in our dropdown, regardless of whether the locale is fr_CA, fr_FR or any of the other fr_XX locales.
    • When a user selects an entry from the dropdown, add a ?locale=en_US or ?locale=fr_FR to the URL, to force the page to be refreshed using that language.

  4. For every place where you want to translate a string, use something like <?= _("Hello, world");?>, where <?= is the short version of <?php echo and _( is the actual call to gettext. What gettext does then is, find out if a translation exists for the string being passed as parameter and either use that if it exists, or the original untranslated string otherwise.

  5. Of course, you can use the whole gamut of PHP function calls, and say, if you want to insert a variable in your translated string, such as a date, do something like:
    <? printf(_("Last updated %s:"), $last_date);?>.
    Also, if needed, and this is something that is very useful to know, you can insert translator notes using comments (/* ... */ within your PHP, before the _(...) calls. These comments will then be displayed for all translators to see in Poedit (as long as you used the -c option when creating your PO catalog with xgettext).

  6. Save your index2.php and confirm that you get to see the English strings, the dropdown with 2 entries, as well as ?locale=fr_FR or ?locale=en_US appended to the URL when you select an entry from the dropdown. Of course, since we haven't created any translation for French, the English text still displays when French is selected, as the default of gettext is to use the original if a translation is missing, but we will address that shortly.

  7. Create a ./locale/fr/LC_MESSAGES/ set of subdirectories, at the location where you have your index2.php page.

  8. Now we need to generate the gettext catalog, or POT, which is the file you will have to provide  translators with, in order for them to start creating a translation. Now, while Poedit is supposed to be able to process a PHP file to generate a .pot, I couldn't for the life of me figure out how to do just that with the Windows version. Moreover, the .pot creation is really something you want to do on the server anyway, so, to cut a long story short, we're just going to call xgettext, using a script, to produce our .pot on the server. Here is the content of that script:

    #!/bin/sh
    xgettext --package-version=1.0 --from-code=UTF-8 --copyright-holder="Pete Batard" --package-name="Rufus Homepage" --msgid-bugs-address=pete@akeo.ie -L PHP -c -d index -o ./locale/index.pot index2.php
    sed --in-place ./locale/index.pot --expression='s/SOME DESCRIPTIVE TITLE/Rufus Homepage/'
    sed --in-place ./locale/index.pot --expression='1,6s/YEAR/2014/'
    sed --in-place ./locale/index.pot --expression='1,6s/PACKAGE/Rufus/'
    sed --in-place ./locale/index.pot --expression='1,6s/FIRST AUTHOR/Pete Batard/'
    sed --in-place ./locale/index.pot --expression='1,6s/EMAIL@ADDRESS/pete@akeo.ie/'

    Running the above, in the directory where we have our PHP, creates our index.pot under the ./locale/ subdirectory, and fills in some important variables that xgettext mysteriously doesn't seem to provide any means to set. As you can see, we used the -c option so that any notes to translators that we added using PHP comments are carried over.

  9. Now, we're doing into the part that is generally meant to be done by a translator: download the index.pot, and open it in Poedit. From there, set your target language (here fr_FR) and translate the various strings (eg. "Hello, world""Bonjour, monde"). Save your translation as index.po/index.mo (Poedit will create both files) and upload index.mo in ./locale/fr/LC_MESSAGES/.

  10. Voilà! If you did all of the above properly and select French in the dropdown or use a browser that has French as its preferred language, then you should now see the relevant sections translated. "C'est magique, non?"

  11. From there, you will of course need to add PHP for all of the page content that you want to see translated, by enclosing the English text it into <? _(...);?> sections (don't worry about the constant switching between HTML and PHP mode - PHP is designed to be very efficient at doing just that!). Once you're happy, just rename your index2.php to index.php (but make sure to remove your index.html first, or you may run into weird issues), and you are fully ready to get your content localized. To do that, just run the POT creation script again (make sure you edit the script if needed, so that is applies to index.php now), and provide index.pot to your translators. Then wait for them to send your their .mo files, edit the code above to add a new array line for each extra language, and watch in awe as visitors experience your site in that new language. Now, it wasn't that hard after all, was it?


Additional remarks:

Can't we just do away with the double fr_FR and fr in our array?

Unfortunately, no. The short explanation is, even after you place your translation under a /fr/ subdirectory, so that it is used by default when your locale is fr_FR, fr_CA, fr_BE, fr_CH and so on, gettext still can't work with a locale that is just set to fr. This is because, as explained in the Prerequisites, if your system doesn't have an fr or fr.utf8 listed with locale -a, gettext just doesn't know how to handle it language.

Now. the long explanation as to why don't we couldn't just use a single fr_FR in our $langs array is: we want to smartly set our dropdown to French, even when fr_CA is provided, and we can't do something as simple as just picking the first two characters of the array locale, due to the fact that we will also want to support both pt_PT and pt_BR as well as zh_CN and zh_TW, as separate languages (because that's pretty much what they are). So, if we were to just try to isolate the substring up to the underscore, then if we had zh_CN defined before zh_TW in our array, Traditional Chinese speakers would see the dropdown set to Simplified Chinese and that's not what we want.

Thus, for our dropdown selection comparison, we must provide a value that is the lowest common denominator we want the language to apply to, which can be either a simple fr or es, or a longer pt_BR or zh_CN. But as we explained previously, we can't use that lowest common denominator for locale selection, as gettext might not know how to handle it. And that is why we need to duplicate part of the locale in two places in our array.

<rant>Of course, it would be oh so much simpler if OSes agreed that short locales without a region are perfectly valid entities by default, especially as gettext doesn't seem to have any issue accepting them when looking for .mo files, but hey, that's localization for you: no-one EVER manages to get it right...</rant>

How about a real-life example?

Alright... Since I'm all about Open Source, let me show you exactly how I am applying all of the above to the Rufus Homepage. You can click the following to access the current index.php source for the Rufus site, as well as the locale/ subdirectory. There's also this guide, that I provide to any translator who volunteered to create a translation for the homepage. Hopefully, these will help you fill any blanks, and allow you to provide an awesome multilingual web page!

What about right-to-left languages?

Look at the PHP source and look for the use of the $dir variable.