I18n:Updating Unicode version
This document describes the process of updating the files in the Mozilla codebase that are generated from Unicode data files.
Unicode properties
To regenerate the tables in nsUnicodePropertyData.cpp:
Download the current Unicode data files from http://www.unicode.org/Public/UNIDATA/
NB: not all the files are actually needed; currently, we require
- UnicodeData.txt
- Scripts.txt
- EastAsianWidth.txt
- BidiMirroring.txt
- HangulSyllableType.txt
- SpecialCasing.txt
- ReadMe.txt (to record version/date of the UCD)
- Unihan_Variants.txt (from Unihan.zip)
though this may change if we find a need for additional properties.
The Unicode data files listed above should be together in one directory.
We also require the file http://www.unicode.org/Public/security/latest/xidmodifications.txt
This file should be in a sub-directory "security" immediately below the directory containing the other Unicode data files.
We also require the latest data file for UTR50, currently revision-13: http://www.unicode.org/Public/vertical/revision-13/VerticalOrientation-13.txt
This file should be in a sub-directory "vertical" immediately below the directory containing the other Unicode data files.
From intl/unicharutil/util, run the command:
perl ../tools/genUnicodePropertyData.pl /path/to/hb-common.h /path/to/UCD-directory
(where hb-common.h is found in the gfx/harfbuzz/src directory).
This will generate (or overwrite!) the files
- nsUnicodePropertyData.cpp
- nsUnicodeScriptCodes.h
in the current directory.
Casing
We require Unicode data files from http://www.unicode.org/Public/UNIDATA/
As well as UnicodeData.txt downloaded in the previous step, we need
- SpecialCasing.txt
From intl/unichar/util, run the command:
perl ../tools/genSpecialCasingData.pl /path/to/UCD-directory/UnicodeData.txt /path/to/UCD-directory/SpecialCasing.txt > nsSpecialCasingData.cpp
This will generate (or overwrite!) the files
- nsSpecialCasingData.cpp
- all-lower-ref.html
- all-lower.html
- all-title-ref.html
- all-title.html
- all-upper-ref.html
- all-upper.html
in the current directory
Then move the six *.html files to layout/reftests/text-transform
Normalization
Currently our normalization data is frozen at Unicode 3.2 to conform to RFC 3454 (Stringprep), see Bug 728180
JavaScript Unicode support
To update SpiderMonkey's Unicode support:
- move into
js/src/vm/
- run
python ./make_unicode.py
- verify that
UnicodeData.txt
,CaseFolding.txt
, and the derived files were correctly updated
Note that running python ./make_unicode.py FILENAME1 FILENAME2
instead uses FILENAME1
as a UnicodeData.txt
and FILENAME2
as a CaseFolding.txt
, if you ever want to generate new data without overwriting the current js/src/vm/UnicodeData.txt
and js/src/vm/CaseFolding.txt
.