Commit Graph

285 Commits

Author SHA1 Message Date
Behdad Esfahbod
efed40b975 [indic] Minor refactoring of reph handling 2013-10-17 18:50:11 +02:00
Behdad Esfahbod
684fe59ff8 [indic] Minor refactoring of would_substitute() 2013-10-17 18:30:06 +02:00
Behdad Esfahbod
b5a0f69e47 [indic] Pass zero-context=false to would_substitute for newer scripts
For scripts without an old/new spec distinction, use zero-context=false.
This changes behavior in Sinhala / Khmer, but doesn't seem to regress.
This will be useful and used in Javanese.
2013-10-17 18:04:23 +02:00
Behdad Esfahbod
c4e71ff36d [indic] Clean up Khmer and Sinhala base finding algorithm 2013-10-17 17:04:47 +02:00
Behdad Esfahbod
e10453e6fb [indic] Add BASE_POS_LAST_SINHALA
Previously we planted this into the mode used for Khmer.  There's not
really much in common between the two, so separate again.
2013-10-17 16:49:06 +02:00
Behdad Esfahbod
9ac6b01e0c [indic] Adjust Sinhala cluster merging under uniscribe
Similar to 190c8f2b60 but for
Sinhala.
2013-10-17 16:27:38 +02:00
Behdad Esfahbod
6b2abdcd20 [indic] Improve clusters in presence of reph 2013-10-17 13:15:43 +02:00
Behdad Esfahbod
42d0f55cbc [indic] Apply calt,clig in the same stage as presentation features
Whic means these twp are applied per-syllable now.  Apparently
in some Khmer fonts the clig interacts with presentation features.

Test case: U+1781,U+17D2,U+1789,U+17BB,U+17C6 with Mondulkiri-R.ttf
should produce one big ligature.
2013-10-17 13:06:22 +02:00
Behdad Esfahbod
ae9a5834df [indic] Fix pref vs blwf interaction
If a glyph can be both blwf and pref, we were wrongly sorting it
in the post position instead of below position.
2013-10-17 12:24:55 +02:00
Behdad Esfahbod
c7dacac02c [indic] Don't apply blwf before base under old-spec mode
Test case: U+09AC,U+09CD,U+09A6 with Lohit-Bengali 2.5.3.
2013-10-17 12:20:46 +02:00
Behdad Esfahbod
3756efaf4e [indic] Misc harmless fixes!
First, we were abusing OT_VD instead of OT_A.  Fix that
but moving OT_A in the grammar where it belongs (which
is different from what the spec says).

Also, only allow medial consonants after all other
consonants.  This doesn't affect any current character.

Finally, fix Halant attachment in presence of medial
consonants.  Again, this currently doesn't affect any
sequence.

I lied.  There's Gurmukhi U+0A75 which is Consonant_Medial.
Uniscribe allows one of those in each of these positions:
before matras, after matras and before syllable modifiers,
and after syllable modifiers!  We currently just allow
unlimited numbers of it, before matras.
2013-10-16 19:06:29 +02:00
Behdad Esfahbod
28d5daec94 [indic] More granular post-base cluster merging! 2013-10-16 12:32:12 +02:00
Behdad Esfahbod
9cb59d460e [indic] Fix cluster merging of left matras
The merge_clusters there was totally broken.
2013-10-16 11:34:07 +02:00
Behdad Esfahbod
190c8f2b60 [indic] Adjust cluster merging under uniscribe mode for Tamil
Apparently Uniscribe Tamil shaper doesn't ship chubby clusters
for Tamil.  Adjust to that.
2013-10-16 11:33:18 +02:00
Behdad Esfahbod
f5299eff5c [indic] Simplify reph logic
*Shouldn't* break anything.
2013-10-15 18:21:32 +02:00
Behdad Esfahbod
65a929b1c0 [indic] If Malayalam dot-reph formed a ligature, don't move it
Rachana-0.6 implements dot-reph by ligation, so we shouldn't move it.
Uniscribe doesn't either.  Test case:

  U+0D4E,U+0D1A,U+0D4D,U+0D1A,U+0D4D
2013-10-15 18:21:32 +02:00
Behdad Esfahbod
a01cbf6cbe [indic] Harmless reordering of Khmer features! 2013-10-15 18:21:32 +02:00
Behdad Esfahbod
eb10233b26 [indic] Apply 'kern' for all scripts except for Khmer in Uniscribe mode
Seems to better match Uniscribe.

Note: NotoSansTelugu-Regular has kern feature, so this fixes most of the
positioning failures there, except for the kern pairs blocked by a
(non-)joiner, in which case we (correctly) kern, but Uniscribe doesn't.
2013-10-15 18:21:32 +02:00
Behdad Esfahbod
30145272a7 [indic] Don't apply presentation features across syllables
More like Uniscribe...  We still allow user-defined features to
work across syllables, but not pres,blws,abs,psts,etc.

This "regressed" Sinhala numbers by 11.  These are cases were
there's Consonant followed by Ra,Halant,ZWJ at the of text.
The Ra,Halant,ZWJ ends up forming reph, which is wrong...
But before we were also ligating that reph with the previous
consonant.  That's even more wrong.  That's also what Uniscribe
does.

Current numbers:

BENGALI: 353732 out of 354188 tests passed. 456 failed (0.128745%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951030 out of 951913 tests passed. 883 failed (0.0927606%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
MALAYALAM: 1048140 out of 1048334 tests passed. 194 failed (0.0185056%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271655 out of 271847 tests passed. 192 failed (0.070628%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
2013-10-15 18:20:59 +02:00
Behdad Esfahbod
3c7b3641cf [indic] Handle Avagraha
It can come either at the end(ish!) of the syllable, or independently.
When independent, it accepts a few bits and pieces.
2013-10-15 13:14:31 +02:00
Behdad Esfahbod
8acbb6be27 [indic] Some scripts like blwf applied to pre-base characters
...while some don't!

Improved Bengali, Devanagari, Gurmukhi, Malayalam.

Updated numbers:

BENGALI: 353732 out of 354188 tests passed. 456 failed (0.128745%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951030 out of 951913 tests passed. 883 failed (0.0927606%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
MALAYALAM: 1048134 out of 1048334 tests passed. 200 failed (0.0190779%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
2013-10-15 12:29:07 +02:00
Behdad Esfahbod
bdd8873fd8 Revert "[Indic] don't apply 'calt' by default in Indic shaper"
This reverts commit 952121007c.

In light of discussion on the mailing list...
2013-08-07 17:58:25 -04:00
Jonathan Kew
952121007c [Indic] don't apply 'calt' by default in Indic shaper 2013-08-06 10:36:14 -04:00
Behdad Esfahbod
9245e98742 [Indic] Add Javanese config
We should add for other scripts too, send me the virama codepoint
and script name...
2013-06-26 20:57:58 -04:00
Behdad Esfahbod
a8cf7b43fa [Indic] Futher adjust ZWJ handling in Indic-like shapers
After the Ngapi hackfest work, we were assuming that fonts
won't use presentation features to choose specific forms
(eg. conjuncts).  As such, we were using auto-joiner behavior
for such features.  It proved to be troublesome as many fonts
used presentation forms ('pres') for example to form conjuncts,
which need to be disabled when a ZWJ is inserted.

Two examples:

	U+0D2F,U+200D,U+0D4D,U+0D2F with kartika.ttf
	U+0995,U+09CD,U+200D,U+09B7 with vrinda.ttf

What we do now is to never do magic to ZWJ during GSUB's main input
match for Indic-style shapers.  Note that backtrack/lookahead are still
matched liberally, as is GPOS.  This seems to be an acceptable
compromise.

As to the bug that initially started this work, that one needs to
be fixed differently:

  Bug 58714 - Kannada u+0cb0 u+200d u+0ccd u+0c95 u+0cbe does not
  provide same results as Windows8
  https://bugs.freedesktop.org/show_bug.cgi?id=58714

New numbers:

BENGALI: 353689 out of 354188 tests passed. 499 failed (0.140886%)
DEVANAGARI: 707305 out of 707394 tests passed. 89 failed (0.0125814%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60706 out of 60747 tests passed. 41 failed (0.067493%)
KANNADA: 951030 out of 951913 tests passed. 883 failed (0.0927606%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048102 out of 1048334 tests passed. 232 failed (0.0221304%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2013-03-19 06:22:06 -04:00
Behdad Esfahbod
fb7c182bf9 [Indic] Minor 2013-03-06 00:53:24 -05:00
Behdad Esfahbod
8144936d07 [Indic] Work around fonts with broken new-spec tables
See comments, and this thread:

http://lists.freedesktop.org/archives/harfbuzz/2013-March/002990.html

Originally reported here:

https://code.google.com/p/chromium/issues/detail?id=96143

Doesn't change test suite numbers.
2013-03-05 20:08:59 -05:00
Behdad Esfahbod
41732f1fe3 [Indic] Help compiler put indic_features table in .rodata
The overridden "or" operator was preventing the flag expression from
being const, and putting the table in .data instead or .rodata.
2013-02-27 20:40:54 -05:00
Behdad Esfahbod
94789fd601 [Indic] Sort pre-base reordering consonants with post-forms
Before, we were marking them as below-form for initial reordering.
However, there is a rule that says "post consonants should follow
below consonsnts" for base determination purposes.  Malayalam has
port-form YA/VA, and RA is pre-base.  As such, for a sequence like
YA,Virama,YA,Virama,RA, the correct base is at index 0.  But
because the code was seeing RA as a below-base, it was stopping at
the second YA as base, instead of jumping it as a post-base.

By treating prebase-reordering consonants like post-forms, this
is fixed.

MALAYALAM went down from 351 to 265.  Other numbers didn't change:

BENGALI: 353686 out of 354188 tests passed. 502 failed (0.141733%)
DEVANAGARI: 707305 out of 707394 tests passed. 89 failed (0.0125814%)
GUJARATI: 366262 out of 366457 tests passed. 195 failed (0.0532122%)
GURMUKHI: 60706 out of 60747 tests passed. 41 failed (0.067493%)
KANNADA: 950680 out of 951913 tests passed. 1233 failed (0.129529%)
KHMER: 299074 out of 299124 tests passed. 50 failed (0.0167155%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048069 out of 1048334 tests passed. 265 failed (0.0252782%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271539 out of 271847 tests passed. 308 failed (0.113299%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2013-02-26 21:22:37 -05:00
Behdad Esfahbod
cfc507c543 [Indic-like] Disable automatic joiner handling for basic shaping features
Not for Arabic, but for Indic-like scripts.  ZWJ/ZWNJ have special
meanings in those scripts, so let font lookups take full control.

This undoes the regression caused by automatic-joiners handling
introduced two commits ago.

We only disable automatic joiner handling for the "basic shaping
features" of Indic, Myanmar, and SEAsian shapers.  The "presentation
forms" and other features are still applied with automatic-joiner
handling.

This change also changes the test suite failure statistics, such that
a few scripts show more "failures".  The most affected is Kannada.
However, upon inspection, we believe that in most, if not all, of the
new failures, we are producing results superior to Uniscribe.  Hard to
count those!

Here's an example of what is fixed by the recent joiner-handling
changes:

  https://bugs.freedesktop.org/show_bug.cgi?id=58714

New numbers, for future reference:

BENGALI: 353892 out of 354188 tests passed. 296 failed (0.0835714%)
DEVANAGARI: 707336 out of 707394 tests passed. 58 failed (0.00819911%)
GUJARATI: 366262 out of 366457 tests passed. 195 failed (0.0532122%)
GURMUKHI: 60706 out of 60747 tests passed. 41 failed (0.067493%)
KANNADA: 950680 out of 951913 tests passed. 1233 failed (0.129529%)
KHMER: 299074 out of 299124 tests passed. 50 failed (0.0167155%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1047983 out of 1048334 tests passed. 351 failed (0.0334817%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271539 out of 271847 tests passed. 308 failed (0.113299%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2013-02-14 13:10:54 -05:00
Behdad Esfahbod
ec5448667b Add hb_ot_map_feature_flags_t
Code cleanup.  No (intended) functional change.
2013-02-14 12:53:57 -05:00
Behdad Esfahbod
e7ffcfafb1 Clean-up add_bool_feature 2013-02-14 11:58:13 -05:00
Behdad Esfahbod
1f91c39677 Indent 2013-02-13 09:38:40 -05:00
Behdad Esfahbod
a0cb9f33ee [Indic] Improve base finding in final_reordering
Fixes 5 Malayalam failures!

MALAYALAM: 1048016 out of 1048334 tests passed. 318 failed (0.0303338%)
2013-02-13 09:26:55 -05:00
Behdad Esfahbod
f22b7e7778 [Indic] Track base position when reordering things
Ouch, how did things ever work without this?!  The added test that has a
dot-reph as well as a pre-base reordering Ra perfectly demonstrates the
bug (tested with Nirmala font from Win8 for example).  Testing suggests
that Win8 shaper has the *exact* same bug / behavior that we used to
have.  Odd.
2013-02-13 07:32:46 -05:00
Behdad Esfahbod
85c51ec2e1 [Indic] Fix Eyelash Ra with old Devanagari spec 2013-02-12 18:17:39 -05:00
Behdad Esfahbod
63e48bc33b [Indic] Apply 'blwf' before 'half'
This reverts 167b625d98.  It didn't
matter before, but that's going to change with next commit.
2013-02-12 18:02:07 -05:00
Behdad Esfahbod
70d6565711 [Indic] Apply 'vatu' before 'cjct'
This essentially reverts 1d6846db9e,
but that commit is from way back when.  We should be better
following the spec order now again.
2013-02-12 18:02:07 -05:00
Behdad Esfahbod
bab02d339f Rename HB_OT_INDIC_OPTIONS env var to HB_OPTIONS
The Myanmar shaper now respects the uniscribe-bug-compatibility
option too.
2013-02-12 15:26:45 -05:00
Behdad Esfahbod
3a83d33ec0 Add South-East Asian shaper
Handles Tai Tham, Cham, and New Tai Lue for now.
2013-02-12 12:14:10 -05:00
Behdad Esfahbod
568000274c Adjust mark advance-width zeroing logic for Myanmar
Before, we were zeroing advance width of attached marks for
non-Indic scripts, and not doing it for Indic.

We have now three different behaviors, which seem to better
reflect what Uniscribe is doing:

  - For Indic, no explicit zeroing happens whatsoever, which
    is the same as before,

  - For Myanmar, zero advance width of glyphs marked as marks
    *in GDEF*, and do that *before* applying GPOS.  This seems
    to be what the new Win8 Myanmar shaper does,

  - For everything else, zero advance width of glyphs that are
    from General_Category=Mn Unicode characters, and do so
    before applying GPOS.  This seems to be what Uniscribe does
    for Latin at least.

With these changes, positioning of all tests matches for Myanmar,
except for the glitch in Uniscribe not applying 'mark'.  See preivous
commit.
2013-02-12 09:44:57 -05:00
Behdad Esfahbod
98628cac9f Add Win8-style Myanmar shaper
Myanmar failures down from 51% to 0.00204648%!

MYANMAR: 1123860 out of 1123883 tests passed. 23 failed (0.00204648%)
2013-02-11 14:20:08 -05:00
Behdad Esfahbod
1df5644958 Minor 2013-02-11 14:18:09 -05:00
Behdad Esfahbod
9621e0ba29 [Indic] Fix bug introduced in 8b217f5ac5
Was breaking reph formation logic when the Ra is the only consonant.
Devanagari regression fixed.  Down to 57 failures again.  Ouch.
2013-02-11 12:59:36 -05:00
Behdad Esfahbod
ecd454b3cd [Indic] In old-spec shaping, don't move viramas around if seq ends with one
For example: u0c9a u0ccd u0c9a u0ccd with Lohit.  See:

https://bugs.freedesktop.org/show_bug.cgi?id=59118
2013-01-08 18:09:46 -06:00
Behdad Esfahbod
596740db04 [Indic] Insert dottedcircle after a lone Malayalam dot-reph 2012-12-21 19:41:04 -05:00
Behdad Esfahbod
8b217f5ac5 [Indic] Reorder Malayalam dot-reph to after base
Test sequence is simple: U+0D4E,U+0D15.  The doth-reph should be
reordered to after the Ka.

https://bugzilla.redhat.com/show_bug.cgi?id=799565
2012-12-21 15:49:26 -05:00
Behdad Esfahbod
742c4ee97e Minor 2012-12-21 15:35:03 -05:00
Behdad Esfahbod
b71b0bd9ee [Indic] Add link to Sinhala split matra section of the Sinhala spec 2012-12-05 19:20:31 -05:00
Behdad Esfahbod
0beb66e3a6 Fix warnings 2012-12-05 19:14:28 -05:00