Commit Graph

2485 Commits

Author SHA1 Message Date
Behdad Esfahbod
7b2a7dadd6 [Indic] Merge clusters before sorting
This should fix any instabilities in cluster formation that we were
speculating may happen with surrounding syllables.  Or most of it
perhaps.
2012-07-22 23:58:55 -04:00
Behdad Esfahbod
abb3239ef9 [Indic] Update clusters for left-matra even if matra didn't move
Fixes crashes reported with left matra under
non-uniscribe-bug-compatibilty mode.
2012-07-22 23:55:19 -04:00
Behdad Esfahbod
60554f14d8 [Indic] Merge in Malayalam tests
From:
http://silpa.org.in/pub/tests/hb/ml/ml-harfbuzz-testdata.txt
2012-07-22 23:23:56 -04:00
Behdad Esfahbod
5c7081770c [Indic] Add extensive Sinhala tests
Generated by:
http://git.savannah.gnu.org/cgit/sinhala.git/plain/utils/gen-unicode-sinhala.py
2012-07-22 23:20:27 -04:00
Behdad Esfahbod
2efe4707b1 [Indic] Add Sinhala tests
Merge tests from:
http://git.savannah.gnu.org/cgit/sinhala.git/plain/patches/icu-sinhala-rendering.txt
2012-07-22 23:17:59 -04:00
Behdad Esfahbod
3d4c111b7a Add a test case 2012-07-20 19:34:39 -04:00
Behdad Esfahbod
92a1ad7bef [Indic] Stop searching for base if a post form is found before below form
Improves Bengali and Gurmukhi.  Malayalam regressed a bit.  We will deal
with that later.
2012-07-20 18:55:15 -04:00
Behdad Esfahbod
4c450c703f [Indic] Recompose Bengali Ya,Nukta
This is a bunch of hacks for now.

Improves Bengali a bit.
2012-07-20 18:13:04 -04:00
Behdad Esfahbod
e9c0f152a3 [Uniscribe] Fix script fallback
Gurmukhi failures half now.  Others changed slightly.
2012-07-20 17:37:48 -04:00
Behdad Esfahbod
5791f32915 [Indic] Allow a ZWNJ after SM's
Malayalam failures go way down.  Other scripts benefitted slightly too.
Sinhala had one or two test regressions, but...
2012-07-20 16:26:55 -04:00
Behdad Esfahbod
34ae336f3f [Indic] Improve Reph AfterMain positioning
Fixes 20 out of 48 failing Oriya tests.  Failure rate down to 0.066% now.
2012-07-20 16:17:28 -04:00
Behdad Esfahbod
bdd080431a [Indic] Reposition Oriya Candrabindu
Oriya failures down from 0.65% to 0.20%.
2012-07-20 16:03:09 -04:00
Behdad Esfahbod
5f0eaaad12 [Indic] Fix base search in final_reordering
Fixes most Malayalam failures.  Down from 1.6% to 0.38% now.  Fixes a
few more in other scripts too.
2012-07-20 15:47:24 -04:00
Behdad Esfahbod
81202bd860 [Indic] Don't attach SM/VD to other characters 2012-07-20 15:14:51 -04:00
Behdad Esfahbod
efb4ad7356 Fix compiler warnings
If x is not constant, we cannot ASSERT_STATIC on it.
2012-07-20 14:27:38 -04:00
Behdad Esfahbod
f31d97e44e [Indic] Form Telugu Reph out of Ra,Virama,ZWJ
Apparently this was approved in Feb 2012.  No font yet.
2012-07-20 14:13:35 -04:00
Behdad Esfahbod
2e193b240e [Indic] Don't split U+0AC9
Althought IndicMatraCategory.txt classifies it as Top_And_Right matra,
it does not have Unicode decomposition, and Uniscribe does not do
anything special about it either.

Gujarati failures down from 0.672% to 0.0130966%.
2012-07-20 14:02:35 -04:00
Behdad Esfahbod
30c3d5e9fc [Indic] Simplify Uniscribe cluster emulation
Now that we break syllables on Halant,ZWNJ, this code can be simplified.
2012-07-20 13:56:32 -04:00
Behdad Esfahbod
decf6ffca4 [Indic] Minor! 2012-07-20 13:51:31 -04:00
Behdad Esfahbod
9e4f94a72c [Indic] Break syllables at Halant,ZWNJ
That's really what Uniscribe does, and explains a lot of pecularities of
Halant,ZWNJ before the base.

Sent Telugu from 1% failures to 0.03%.  Improved Kannada and Malayalam
slightly.  Fixed half of Bengali, and did NOT break anything!
2012-07-20 13:48:03 -04:00
Behdad Esfahbod
2c372b80f6 [Indic] Better check for applying 'init'
Specifically, don't apply 'init' if previous char is a joiner.

Fixes some more of Bengali.
2012-07-20 13:37:48 -04:00
Behdad Esfahbod
34a7440b7c [GPOS] Don't zero mark advances
Fixes more of Telugu, Kannada, and Oriya.

May break things (outside Indic...), but we cannot think of any font relying
on this immediately.
2012-07-20 12:40:39 -04:00
Behdad Esfahbod
8ed248de77 [Indic] Minor 2012-07-20 11:42:24 -04:00
Behdad Esfahbod
d0e68dbd0b [Indic] Implement reph positioning step 5
Not tuned, just copied from step 2.  Fixes another 0.5% of Kannada
failures.  1% to go.
2012-07-20 11:25:41 -04:00
Behdad Esfahbod
a9e45c32e4 [Indic] Don't let ZWNJ at the end of syllable affect base search
Fixes a few Devanagari, half of remaining Kannada failures, quarter for
Telugu, and others slightly improved or unchanged.
2012-07-20 11:04:15 -04:00
Behdad Esfahbod
20b68e699f [Indic] Apply 'cjct' globally
Fixes 5 Devanagari failures, and no regressions.
2012-07-20 10:47:46 -04:00
Behdad Esfahbod
51e764de44 [Indic] Unbreak old scriptures
Brings down failures with Lohit-Telugu from 57% to 1.40%.
2012-07-20 10:30:24 -04:00
Behdad Esfahbod
900cf3d449 Minor 2012-07-20 10:18:23 -04:00
Behdad Esfahbod
87cd63266e [Indic] Recategorize some Kannada right matras
Kannada failures down from 3.5% to 2.93%.
2012-07-19 21:25:46 -04:00
Behdad Esfahbod
3604d64ced [Indic] Recategorize GURMUKHI ADDAK
It's not in IndicSyllabicCategory.txt.  Fixes most of Gurmukhi failures.
Failures down from 7.7% to 0.222%!
2012-07-19 21:13:04 -04:00
Behdad Esfahbod
8932858123 Minor 2012-07-19 21:02:38 -04:00
Behdad Esfahbod
47ef931f13 [buffer] Make sure out_info = info during GPOS 2012-07-19 20:52:44 -04:00
Behdad Esfahbod
ae63cf2062 Print line number during return when tracing 2012-07-19 20:45:41 -04:00
Behdad Esfahbod
5249f3aee1 [Indic] Unbreak Khmer
For Khmer, all consonants are subjoining.  No need to look in the font.
We were looking in the wrong order anyway.
2012-07-19 20:30:22 -04:00
Behdad Esfahbod
e0475345d5 [Indic] Apply 'akhn' globally
Fixes 1.5% more failures for Telugu, 2% for Kannada.
Breaks one test in Devanagari.
2012-07-19 20:24:14 -04:00
Behdad Esfahbod
c87bcddb10 [Indic] Add failing test for Kannada 2012-07-19 20:03:25 -04:00
Behdad Esfahbod
fa247ebe52 [Indic] Better position U+0CD5
Fixes another 5% of Kannada failures.
2012-07-19 19:52:19 -04:00
Behdad Esfahbod
f055442716 [Indic] Lookup consonant position in the font
Fixes most failures of Oriya, and improves others a bit.
2012-07-19 16:20:21 -04:00
Behdad Esfahbod
74d1d88781 [GSUB] Fix would_apply() for LigatureSubst 2012-07-19 16:14:23 -04:00
Behdad Esfahbod
787f7d1e9b [TODO] Minor 2012-07-19 15:29:13 -04:00
Behdad Esfahbod
be73a5f936 Add src/test-would-substitute tool 2012-07-19 15:12:18 -04:00
Behdad Esfahbod
e72b360ac6 Refactor / finish would_apply() operation
Untested.
2012-07-19 14:44:46 -04:00
Behdad Esfahbod
8c973ebf0f [Indic] Implement per-script matra positioning
Following what the spec says.

Brings down Telugu failures from 40% to 3.75%, and Kannada failures from
44% to 10%.  Does NOT affect other scripts' test results.
2012-07-19 13:25:08 -04:00
Behdad Esfahbod
8bb32458f9 [Indic] More refactoring 2012-07-19 13:04:44 -04:00
Behdad Esfahbod
9ccc6382ba [Indic] Minor refactoring 2012-07-19 12:45:31 -04:00
Behdad Esfahbod
f83aaa3133 [Indic] Minor 2012-07-19 12:23:23 -04:00
Behdad Esfahbod
be8b9f5f71 [Indic] Start refactoring different matra positions per script 2012-07-19 12:11:12 -04:00
Behdad Esfahbod
deeb540a74 [test] Ignore tests with DOTTED CIRCLE in the output 2012-07-19 11:30:48 -04:00
Behdad Esfahbod
b01d9b3d90 [Indic] Disallow decomposition of a couple characters
This is a hack for now.  Will be fixed when we do complex-shaper-driven
normalization properly.

The results with or without decomposition are the same, but Uniscribe
does not normalize, so this matches better.
2012-07-19 11:25:49 -04:00
Behdad Esfahbod
422ecd2d3c [Indic] Accept a forced Rakar sequence at the end of syllable
In Sinhala, Rakar is formed by Al-Lakuna,ZWJ,Ra.  If you put that at the
end of a Consonant,Matra syllable, you get a dotted-circle from
Uniscribe.  Apparently adding a ZWJ before the Al-Lakuna "fixes" that.
And people have been encoding that sequence...  So, allow a forced
"ZWJ,Virama,ZWJ,Ra" sequence at the of syllables.

Fixes some 100 or more of Sinhala failures.  Now at 622 only (0.23%).
2012-07-18 23:25:58 -04:00