Behdad Esfahbod
5f0eaaad12
[Indic] Fix base search in final_reordering
...
Fixes most Malayalam failures. Down from 1.6% to 0.38% now. Fixes a
few more in other scripts too.
2012-07-20 15:47:24 -04:00
Behdad Esfahbod
81202bd860
[Indic] Don't attach SM/VD to other characters
2012-07-20 15:14:51 -04:00
Behdad Esfahbod
efb4ad7356
Fix compiler warnings
...
If x is not constant, we cannot ASSERT_STATIC on it.
2012-07-20 14:27:38 -04:00
Behdad Esfahbod
f31d97e44e
[Indic] Form Telugu Reph out of Ra,Virama,ZWJ
...
Apparently this was approved in Feb 2012. No font yet.
2012-07-20 14:13:35 -04:00
Behdad Esfahbod
2e193b240e
[Indic] Don't split U+0AC9
...
Althought IndicMatraCategory.txt classifies it as Top_And_Right matra,
it does not have Unicode decomposition, and Uniscribe does not do
anything special about it either.
Gujarati failures down from 0.672% to 0.0130966%.
2012-07-20 14:02:35 -04:00
Behdad Esfahbod
30c3d5e9fc
[Indic] Simplify Uniscribe cluster emulation
...
Now that we break syllables on Halant,ZWNJ, this code can be simplified.
2012-07-20 13:56:32 -04:00
Behdad Esfahbod
decf6ffca4
[Indic] Minor!
2012-07-20 13:51:31 -04:00
Behdad Esfahbod
9e4f94a72c
[Indic] Break syllables at Halant,ZWNJ
...
That's really what Uniscribe does, and explains a lot of pecularities of
Halant,ZWNJ before the base.
Sent Telugu from 1% failures to 0.03%. Improved Kannada and Malayalam
slightly. Fixed half of Bengali, and did NOT break anything!
2012-07-20 13:48:03 -04:00
Behdad Esfahbod
2c372b80f6
[Indic] Better check for applying 'init'
...
Specifically, don't apply 'init' if previous char is a joiner.
Fixes some more of Bengali.
2012-07-20 13:37:48 -04:00
Behdad Esfahbod
34a7440b7c
[GPOS] Don't zero mark advances
...
Fixes more of Telugu, Kannada, and Oriya.
May break things (outside Indic...), but we cannot think of any font relying
on this immediately.
2012-07-20 12:40:39 -04:00
Behdad Esfahbod
8ed248de77
[Indic] Minor
2012-07-20 11:42:24 -04:00
Behdad Esfahbod
d0e68dbd0b
[Indic] Implement reph positioning step 5
...
Not tuned, just copied from step 2. Fixes another 0.5% of Kannada
failures. 1% to go.
2012-07-20 11:25:41 -04:00
Behdad Esfahbod
a9e45c32e4
[Indic] Don't let ZWNJ at the end of syllable affect base search
...
Fixes a few Devanagari, half of remaining Kannada failures, quarter for
Telugu, and others slightly improved or unchanged.
2012-07-20 11:04:15 -04:00
Behdad Esfahbod
20b68e699f
[Indic] Apply 'cjct' globally
...
Fixes 5 Devanagari failures, and no regressions.
2012-07-20 10:47:46 -04:00
Behdad Esfahbod
51e764de44
[Indic] Unbreak old scriptures
...
Brings down failures with Lohit-Telugu from 57% to 1.40%.
2012-07-20 10:30:24 -04:00
Behdad Esfahbod
900cf3d449
Minor
2012-07-20 10:18:23 -04:00
Behdad Esfahbod
87cd63266e
[Indic] Recategorize some Kannada right matras
...
Kannada failures down from 3.5% to 2.93%.
2012-07-19 21:25:46 -04:00
Behdad Esfahbod
3604d64ced
[Indic] Recategorize GURMUKHI ADDAK
...
It's not in IndicSyllabicCategory.txt. Fixes most of Gurmukhi failures.
Failures down from 7.7% to 0.222%!
2012-07-19 21:13:04 -04:00
Behdad Esfahbod
8932858123
Minor
2012-07-19 21:02:38 -04:00
Behdad Esfahbod
47ef931f13
[buffer] Make sure out_info = info during GPOS
2012-07-19 20:52:44 -04:00
Behdad Esfahbod
ae63cf2062
Print line number during return when tracing
2012-07-19 20:45:41 -04:00
Behdad Esfahbod
5249f3aee1
[Indic] Unbreak Khmer
...
For Khmer, all consonants are subjoining. No need to look in the font.
We were looking in the wrong order anyway.
2012-07-19 20:30:22 -04:00
Behdad Esfahbod
e0475345d5
[Indic] Apply 'akhn' globally
...
Fixes 1.5% more failures for Telugu, 2% for Kannada.
Breaks one test in Devanagari.
2012-07-19 20:24:14 -04:00
Behdad Esfahbod
c87bcddb10
[Indic] Add failing test for Kannada
2012-07-19 20:03:25 -04:00
Behdad Esfahbod
fa247ebe52
[Indic] Better position U+0CD5
...
Fixes another 5% of Kannada failures.
2012-07-19 19:52:19 -04:00
Behdad Esfahbod
f055442716
[Indic] Lookup consonant position in the font
...
Fixes most failures of Oriya, and improves others a bit.
2012-07-19 16:20:21 -04:00
Behdad Esfahbod
74d1d88781
[GSUB] Fix would_apply() for LigatureSubst
2012-07-19 16:14:23 -04:00
Behdad Esfahbod
787f7d1e9b
[TODO] Minor
2012-07-19 15:29:13 -04:00
Behdad Esfahbod
be73a5f936
Add src/test-would-substitute tool
2012-07-19 15:12:18 -04:00
Behdad Esfahbod
e72b360ac6
Refactor / finish would_apply() operation
...
Untested.
2012-07-19 14:44:46 -04:00
Behdad Esfahbod
8c973ebf0f
[Indic] Implement per-script matra positioning
...
Following what the spec says.
Brings down Telugu failures from 40% to 3.75%, and Kannada failures from
44% to 10%. Does NOT affect other scripts' test results.
2012-07-19 13:25:08 -04:00
Behdad Esfahbod
8bb32458f9
[Indic] More refactoring
2012-07-19 13:04:44 -04:00
Behdad Esfahbod
9ccc6382ba
[Indic] Minor refactoring
2012-07-19 12:45:31 -04:00
Behdad Esfahbod
f83aaa3133
[Indic] Minor
2012-07-19 12:23:23 -04:00
Behdad Esfahbod
be8b9f5f71
[Indic] Start refactoring different matra positions per script
2012-07-19 12:11:12 -04:00
Behdad Esfahbod
deeb540a74
[test] Ignore tests with DOTTED CIRCLE in the output
2012-07-19 11:30:48 -04:00
Behdad Esfahbod
b01d9b3d90
[Indic] Disallow decomposition of a couple characters
...
This is a hack for now. Will be fixed when we do complex-shaper-driven
normalization properly.
The results with or without decomposition are the same, but Uniscribe
does not normalize, so this matches better.
2012-07-19 11:25:49 -04:00
Behdad Esfahbod
422ecd2d3c
[Indic] Accept a forced Rakar sequence at the end of syllable
...
In Sinhala, Rakar is formed by Al-Lakuna,ZWJ,Ra. If you put that at the
end of a Consonant,Matra syllable, you get a dotted-circle from
Uniscribe. Apparently adding a ZWJ before the Al-Lakuna "fixes" that.
And people have been encoding that sequence... So, allow a forced
"ZWJ,Virama,ZWJ,Ra" sequence at the of syllables.
Fixes some 100 or more of Sinhala failures. Now at 622 only (0.23%).
2012-07-18 23:25:58 -04:00
Behdad Esfahbod
6fc1732003
[Indic] Allow joiners on both sides of Halant at the same time
...
The sequence <ZWJ,Al-Lakuna,ZWJ> is used in Sinhala to explicitly ask
for Rakar. Fixes two-thousand Sinhala tests. Not many left.
2012-07-18 17:49:19 -04:00
Behdad Esfahbod
10cdc94eee
[Indic] In final reordering, find base, even if it disappeared
...
POS_BASE can disappear if base ligated backward. Define base as last
with position not after base.
Fixes a few hundred of Sinhala failures with Iskoola Pota.
2012-07-18 17:43:23 -04:00
Behdad Esfahbod
9c4d24a3a6
[Indic] Minor
2012-07-18 17:29:10 -04:00
Behdad Esfahbod
3285e107c9
[Indic] Implement Sinhala "Al Lakuna" Reph behavior
...
In Sinhala, Reph is formed only explicitly, by the presence of a ZWJ.
2012-07-18 17:22:14 -04:00
Behdad Esfahbod
91cade7555
[Indic/Unicode] Decompose Sinhala split matras the way Uniscribe likes
...
Makes no visual difference.
Fixes most of the failures. Down from 15% to 1.3%!
2012-07-18 16:50:41 -04:00
Behdad Esfahbod
d8942dcbb4
Apply Tibetan (global) features.
...
Fixes all Tibetan failures. All 180k of them!
Merges back Hangul into the default shaper.
2012-07-18 16:34:10 -04:00
Behdad Esfahbod
552d19b7a1
[Indic] Treat Register Shifters like Nukta
...
Really this time.
Fixes another 18 Khmer tests.
2012-07-18 16:02:33 -04:00
Behdad Esfahbod
e8cd81f76d
[Indic] Minor
2012-07-18 16:00:20 -04:00
Behdad Esfahbod
69f26bf39c
[Indic] Fix Matra reordering when base is at end of syllable
...
For example: U+915,U+200c,U+93f
Fixes last Tamil failure!
2012-07-18 15:47:51 -04:00
Behdad Esfahbod
d16ccc4ae7
Leave one extra item at the end of buffer allocation
...
Just in case, for the times we do out-of-bounds access.
jk
2012-07-18 15:43:55 -04:00
Behdad Esfahbod
075d671f10
[Indic] Fix out-of-bounds array access
2012-07-18 15:41:53 -04:00
Behdad Esfahbod
dcb527242b
[Indic] Allow joiners before matras
...
Fixes 1 more Devanagari test!
2012-07-18 15:32:26 -04:00