Commit Graph

1750 Commits

Author SHA1 Message Date
Behdad Esfahbod
f7e8dcfd4f [Indic] Unbreak Devanagari
And this, concludes the HarfBuzz Massala Hackfest.

I like to specially thank Jonathan Kew for doing all the decription and
letting me get commit points.
2012-05-11 22:01:33 +02:00
Behdad Esfahbod
6a091df9b4 [Indic] Disambiguate sub vs post vs above matras
Bengali is at *just* above 5% now.
2012-05-11 21:42:27 +02:00
Behdad Esfahbod
9d0d319a4a [Indic] Position Bengali Reph before matras 2012-05-11 21:36:32 +02:00
Behdad Esfahbod
f893672511 [Indic] Start categorizing Reph per script 2012-05-11 21:10:03 +02:00
Behdad Esfahbod
a913b024d8 [Indic] Apply 'init' feature for Bengali
Error down from 20% to 7%.
2012-05-11 20:59:26 +02:00
Behdad Esfahbod
eed903b164 [Indic] Refactor for the arrival of 'init' feature
Yep, on Bengali now!
2012-05-11 20:50:53 +02:00
Behdad Esfahbod
18c06e189b [Indic] Add Uniscribe bug feature for dotted circle
For dotted-circle independent clusters, Uniscribe does no Reph shaping
for the exact sequence Ra+Halant+25CC.  Which also is the only possible
sequence with 25CC at the end.
2012-05-11 20:02:14 +02:00
Behdad Esfahbod
0831061efb [Indic] Refactoring 2012-05-11 19:07:58 +02:00
Behdad Esfahbod
7ea58db311 Minor 2012-05-11 18:58:57 +02:00
Behdad Esfahbod
9c09928989 [Indic] Allow multiple Consonants in Vowel/NBSP syllables
Uniscribe allows multiple Halant+Consonant after a Vowel.
Tests:
↦       * U+0905,U+094D,U+092B,U+094D,930,94d,930
2012-05-11 18:46:35 +02:00
Behdad Esfahbod
8c0aa486f3 [Indic] Allow two Nuktas per consonant
Uniscribe allows up to two nuktas per consonant and one per matra. It does so
indepent of whether the consonant already has a nukta in it.  Tests:

        * U+0916,U+093C,U+0941
        * U+0959,U+093C,U+0941
        * U+0916,U+093C,U+093C,U+0941
        * U+0959,U+093C,U+093C,U+0941
        * U+0916,U+093C,U+093C,U+093C,U+0941
        * U+0959,U+093C,U+093C,U+093C,U+0941
        * 915,93c,93c,,94d,U+0916,U+093C,U+093C,U+093e,93c,93c
2012-05-11 18:13:42 +02:00
Behdad Esfahbod
3399a06e70 [Indic] Fix U+0952 and similar classification to match Uniscribe
See comments.
2012-05-11 17:54:26 +02:00
Behdad Esfahbod
11aa3ef18d [Indic] Treat U+0951..U+0954 all similar to U+0952 2012-05-11 17:30:48 +02:00
Behdad Esfahbod
5f131d3226 [GSUB/GPOS/Indic] Apply GSUB/GPOS within syllables only
This does not apply to the context matchings.

This regresses tests right now.  And we are not sure whether this is
the right thing to do for GPOS.  But we'll figure out.
2012-05-11 17:29:40 +02:00
Behdad Esfahbod
8fd83aaf6e [GSUB/GPOS] Fix wrong buffer access in backward skippy mask matching 2012-05-11 17:18:37 +02:00
Behdad Esfahbod
ff24d1081a [Indic] Don't use syllable serial value 0 2012-05-11 17:07:08 +02:00
Behdad Esfahbod
892eb78782 [Indic] Implement Uniscribe Reph+Matra+Halant bug feature 2012-05-11 16:54:40 +02:00
Behdad Esfahbod
67ea29af49 [Indic] Add example of different Uniscribe behavior 2012-05-11 16:51:23 +02:00
Behdad Esfahbod
ebe29733d4 [Indic] Add runtime Uniscribe bug compatibility mode!
Enable by setting envvar:

  HB_OT_INDIC_OPTIONS=uniscribe-bug-compatible

Plus, LeftMatra+Halant "feature".
2012-05-11 16:43:12 +02:00
Behdad Esfahbod
616e692e29 [Indic] Add #define UNISCRIBE_BUG_COMPATIBLE 1 2012-05-11 16:25:02 +02:00
Behdad Esfahbod
6782bdae3b [Indic] Fix Left Matra + Halant reordering
As can be seen in: U+092B,U+093F,U+094D
2012-05-11 16:23:43 +02:00
Behdad Esfahbod
3c2ea9481b Minor 2012-05-11 16:23:38 +02:00
Behdad Esfahbod
203d71069c [GSUB/GPOS] Check all glyph masks when matching input 2012-05-11 16:01:44 +02:00
Behdad Esfahbod
668c6046c1 [Indic] Apply Reph mask to all POS_REPH glyphs
Needed for upcoming changes to GSUB/GPOS mask matching.
2012-05-11 15:34:13 +02:00
Behdad Esfahbod
4be46bade2 [Indic] Fix state machine to backtrack 2012-05-11 14:39:01 +02:00
Behdad Esfahbod
cee7187447 [Indic] Move syllable tracking from Indic to generic layer
This is to incorporate it into GSUB/GPOS processing.
2012-05-11 11:41:39 +02:00
Behdad Esfahbod
3bf27a9f0e [Indic] Disable conjuncts when a ZWJ happens
Not that the code makes any difference since the presence of ZWJ itself
causes the ligature to fail to match anyway.
2012-05-11 11:17:23 +02:00
Behdad Esfahbod
c6d904d67d [Indic] Fix bitops typo!
Another 1000 down!
2012-05-11 11:07:40 +02:00
Behdad Esfahbod
55fe2cf79b Make APPLY debug output print current index and codepoint
Yay!
2012-05-11 03:56:33 +02:00
Behdad Esfahbod
7bd2b04fea Minor 2012-05-11 03:40:58 +02:00
Behdad Esfahbod
cf26510dbb Some more...
Done.  I promise.
2012-05-11 03:35:08 +02:00
Behdad Esfahbod
9659523ca3 More beauty in debug output! 2012-05-11 03:33:36 +02:00
Behdad Esfahbod
cf26e88a5a Finish off debug output beautification 2012-05-11 03:16:57 +02:00
Behdad Esfahbod
d7bba01a35 Only print class name in debug output if there's one available 2012-05-11 02:46:26 +02:00
Behdad Esfahbod
85f73fa8da Only printout class name in tracing, if one is available
Makes debug output much more pleasant.
2012-05-11 02:40:42 +02:00
Behdad Esfahbod
98619ce4fa Minor 2012-05-11 02:34:06 +02:00
Behdad Esfahbod
acea183e98 Add return annotation for APPLY 2012-05-11 02:33:11 +02:00
Behdad Esfahbod
5ccfe8e215 /Minor/ 2012-05-11 02:19:41 +02:00
Behdad Esfahbod
0ab8c86217 Annotate SANITIZE return values
More to come, for APPLY, CLOSURE, etc.
2012-05-11 02:11:52 +02:00
Behdad Esfahbod
829e814ff3 Minor 2012-05-11 00:52:16 +02:00
Behdad Esfahbod
6eec6f406d Code reshuffling 2012-05-11 00:50:38 +02:00
Behdad Esfahbod
1e08830b4f Beautify debug output 2012-05-11 00:43:57 +02:00
Behdad Esfahbod
6f45538017 More massaging trace messaging 2012-05-10 23:24:43 +02:00
Behdad Esfahbod
b5fa37cb69 Minor 2012-05-10 23:09:48 +02:00
Behdad Esfahbod
208109703c Better trace message support infrastructure
We have varargs in the trace interface now.  To be used soon...
2012-05-10 23:06:58 +02:00
Behdad Esfahbod
02b2922fbf [Indic] Towards better Reph positioning
Fixed for Deva cases with two full-form consonants.  Failures **way** down.
Not much left to go :-).
2012-05-10 21:44:50 +02:00
Behdad Esfahbod
74e54cf446 [Indic] Add Ra back for scripts without Reph
We now check that the 'rphp' table exists before forming Reph, so
we don't need to comment out Ra for those scripts.
2012-05-10 21:22:58 +02:00
Behdad Esfahbod
2b70df5cc0 [Indic] Add note re Uniscribe clusters 2012-05-10 18:38:22 +02:00
Behdad Esfahbod
21d2803133 [Indic] Do clustering like Uniscribe does
Hindi Wikipedia failures down to 6639 (0.938381%)!
2012-05-10 18:34:34 +02:00
Behdad Esfahbod
8df5636968 [Indic] Reorder Reph to before the Halant after Matras
Uniscribe doesn't do it, but we want to do as it gives the Reph the
opportunity to interact with the Matras.  Test with mangal for example.
Sequence: <0930,094d,0915,094b,094d>
In test suite already.
2012-05-10 15:41:04 +02:00
Behdad Esfahbod
daf3234bdc [Indic] Don't clear the mask for Reph
This was removing the mandatory global 1 bit in the mask and hence
disabling GPOS for Reph!
2012-05-10 15:28:27 +02:00
Behdad Esfahbod
7708ee23cb [Indic] Improve Left Matra repositioning
Move its dependents too.
2012-05-10 14:48:25 +02:00
Behdad Esfahbod
dbb105883c [Indic] Do Reph repositioning in final reordering like the spec says
This introduced a failure, which we tracked down to a test case like this:

  U+092E,U+094B,U+094D,U+0930

The final character is a Ra that should be put in a syllable of it's
own.  And we do.  But it will interact with the Halant before it.  So
now we finally are convinced that we have to limit features to syllable
boundaries.  That's coming after lunch!
2012-05-10 13:45:52 +02:00
Behdad Esfahbod
4705a70269 Minor 2012-05-10 13:09:08 +02:00
Behdad Esfahbod
4ac9e98d9d [Indic] Reorder left matras to be closer to base 2012-05-10 12:53:53 +02:00
Behdad Esfahbod
1a1fa8c655 [Indic] Treat the standalone cluster case reusing the consonant logic 2012-05-10 12:21:30 +02:00
Behdad Esfahbod
190eb31a16 [Indic] Minor 2012-05-10 12:21:30 +02:00
Behdad Esfahbod
c5306b6861 [Indic] Handle Vowel syllables
Reusing the consonant logic!
2012-05-10 12:21:30 +02:00
Behdad Esfahbod
6d8e0cb74c [Indic] Simplify Reph logic 2012-05-10 11:41:51 +02:00
Behdad Esfahbod
3d25079f8d [Indic] Don't form Reph is Ra is the only consonant in the syllable 2012-05-10 11:37:42 +02:00
Behdad Esfahbod
b99d63ae11 [Indic] Increase max syllable length
20 was way too low, one could hit a syllable with 7ish consonants with it.
2012-05-10 11:32:52 +02:00
Behdad Esfahbod
a391ff50b9 [Indic] Adjust base after sorting 2012-05-10 11:31:20 +02:00
Behdad Esfahbod
d3637edb24 [Indic] Don't return for long syllables. Just not sort. 2012-05-10 10:51:38 +02:00
Behdad Esfahbod
dfa0cade7f Fix Uniscribe clusters with multiple items 2012-05-09 19:10:07 +02:00
Behdad Esfahbod
86e5dd386a [Indic] Don't give up syllable parsing upon junk 2012-05-09 18:57:37 +02:00
Behdad Esfahbod
ef24cc8c8e [Indic] Towards multi-cluster syllables and final reordering 2012-05-09 18:10:20 +02:00
Behdad Esfahbod
a9844d41c6 Combine lig_id and lig_comp into one byte, to free up one for Indic 2012-05-09 17:53:13 +02:00
Behdad Esfahbod
92332e5116 Minor 2012-05-09 17:40:00 +02:00
Behdad Esfahbod
dbccf87eef [Indic] Make room for more reordering positions 2012-05-09 17:24:39 +02:00
Behdad Esfahbod
d4480ace7f [Indic] Improve matra vs consonant ordering
Another 1.5% down.
2012-05-09 15:59:47 +02:00
Behdad Esfahbod
33c92e7695 [Indic] Categorize Anudatta 2012-05-09 15:41:51 +02:00
Behdad Esfahbod
19d984edaa [Indic] Make sure Reph jumps over all matras to the right
Another 12 thousand failures gone! (78 to go)
2012-05-09 15:21:13 +02:00
Behdad Esfahbod
9034641333 [Indic] Keep Vedic signs at the right too 2012-05-09 15:04:58 +02:00
Behdad Esfahbod
d1deaa2f5b Replace zerowidth invisible chars with a zero-advance space glyph
Like Uniscribe does.
2012-05-09 15:04:13 +02:00
Behdad Esfahbod
49e5da1591 [indic] Keep the syllable modifier marks to the right
Shaping failures on Hindi Wikipedia go down from 25% to 14%!
2012-05-09 13:23:27 +02:00
Behdad Esfahbod
5b12609093 Minor 2012-05-09 12:37:27 +02:00
Behdad Esfahbod
9ce939232b Minor 2012-05-09 12:03:09 +02:00
Behdad Esfahbod
76b3409de6 [indic] Better Reph matching 2012-05-09 11:52:32 +02:00
Behdad Esfahbod
df6d45c693 Minor 2012-05-09 11:38:31 +02:00
Behdad Esfahbod
412b91889d [indic] Apply Indic features in order 2012-05-09 11:07:18 +02:00
Behdad Esfahbod
1ac075b227 [indic] Apply rakaar forms
Fixes 10% of the failures against all of Hindi Wikipedia!
2012-05-09 11:06:47 +02:00
Behdad Esfahbod
1a2a4a0078 Fix warning and build issues
As reported by Jonathan Kew on the list.
2012-05-05 22:38:20 +02:00
Behdad Esfahbod
a5e39fed85 Minor 2012-04-25 00:14:46 -04:00
Behdad Esfahbod
1827dc208c Add hb_ot_shape_glyphs_closure()
Experimental API for now.
2012-04-24 16:56:37 -04:00
Behdad Esfahbod
bb09f0ec10 Minor 2012-04-24 16:02:12 -04:00
Behdad Esfahbod
29a7e306e3 Minor 2012-04-24 16:01:30 -04:00
Behdad Esfahbod
6c6ccaf575 Add a few more set operations
TODO: Tests for hb_set_t.
2012-04-24 14:23:01 -04:00
Behdad Esfahbod
5caece67ab Make closure() return void 2012-04-23 23:03:12 -04:00
Behdad Esfahbod
0b08adb353 Add hb_set_t 2012-04-23 22:44:59 -04:00
Behdad Esfahbod
5b93e8d94f Update copyright headers 2012-04-23 22:26:27 -04:00
Behdad Esfahbod
6a9be5bd35 Rename hb_glyph_map_t to hb_set_t 2012-04-23 22:23:17 -04:00
Behdad Esfahbod
a4385f0b0a Improve clustering 2012-04-23 22:20:14 -04:00
Behdad Esfahbod
8e3715f8a1 Minor 2012-04-23 22:18:54 -04:00
Behdad Esfahbod
d2984a241e Add map->substitute_closure() 2012-04-23 17:21:14 -04:00
Behdad Esfahbod
31081f7390 Implement closure() for Context and ChainContext lookups 2012-04-23 16:54:58 -04:00
Behdad Esfahbod
c64ddab3c3 Flesh out closure() for GSUB
The GSUBGPOS part still missing.
2012-04-23 15:28:35 -04:00
Behdad Esfahbod
0da132bde4 Fix Coverage iters 2012-04-23 14:21:33 -04:00
Behdad Esfahbod
3e32cd9570 Minor 2012-04-23 13:22:50 -04:00
Behdad Esfahbod
650ac00da3 Minor refactoring 2012-04-23 13:17:09 -04:00
Behdad Esfahbod
f94b0aa646 Add "closure" operation stubs to GSUB
Filling in.
2012-04-23 13:04:38 -04:00
Behdad Esfahbod
7d50d50263 Add Coverage iterators 2012-04-23 13:04:05 -04:00
Behdad Esfahbod
3ed4634ec3 Add Indic inspection tool 2012-04-19 22:35:01 -04:00
Behdad Esfahbod
a06411ecf9 Minor matra renumbering
Should have no visible effect.
2012-04-19 22:28:25 -04:00
Behdad Esfahbod
36608941f3 Add GSUB "would_apply" API
To be used in the Indic shaper later.  Unused for now.
2012-04-19 22:21:38 -04:00
Behdad Esfahbod
a5e40542ab Make font immutable in hb_shape() 2012-04-17 12:37:19 -04:00
Behdad Esfahbod
3cde23664f Minor note re Graphite 2012-04-17 11:44:49 -04:00
Behdad Esfahbod
4dc2449d92 Fix leak in graphite 2012-04-17 11:39:48 -04:00
Behdad Esfahbod
9ceca3aeb1 Fix ragel regexp in vowel-based syllable
As reported by datao zhang on the mailing list.
2012-04-16 21:05:51 -04:00
Behdad Esfahbod
b870afcd1b Rewrite ragel expression to better match the one on MS spec
https://www.microsoft.com/typography/otfntdev/devanot/shaping.aspx
2012-04-16 21:05:11 -04:00
Behdad Esfahbod
a5f1834f57 Apply 'liga' for vertical writing mode too
Apparently that's what Kazuraki uses to form vertical ligatures,
which suggests that it's what Adobe does.
2012-04-16 15:55:13 -04:00
Behdad Esfahbod
e74616b889 Add comment 2012-04-15 14:12:13 -04:00
Behdad Esfahbod
683b503f30 Minor 2012-04-14 20:47:14 -04:00
Behdad Esfahbod
b9f199c8e3 Move code around 2012-04-14 20:25:37 -04:00
Behdad Esfahbod
38a83019e6 Minor 2012-04-14 19:40:18 -04:00
Behdad Esfahbod
d4adade217 Add assert 2012-04-14 19:23:17 -04:00
Behdad Esfahbod
fe28b997fb Add HB_DIRECTION_IS_VALID 2012-04-14 19:19:26 -04:00
Behdad Esfahbod
5e88aa6682 Remove public enum names again
As was reported to me, glib-mkenum does not understand named enums,
so remove for now.
2012-04-14 18:51:50 -04:00
Behdad Esfahbod
4bf90f6483 Make HB_DIRECTION_INVALID be zero
This changes all the HB_DIRECTION_* enum member values, but is
nicer, in preparation for making hb_segment_properties_t public.
2012-04-12 17:38:23 -04:00
Behdad Esfahbod
6bd9b479b8 Hide backend-specific shape functions
Also remove shaper_options argument to hb_shape_full().  That was
unused and for "future".  Let it go.

More shaper API coming in preparation for plan/planned API.
2012-04-12 14:53:53 -04:00
Behdad Esfahbod
c6035cf802 Add names to enums
gdb was showing <anonymous enum> instead of useful stuff, so name
all our enums.
2012-04-12 13:23:59 -04:00
Behdad Esfahbod
d1c9eb458c Make it an error to include non-top-level headers
Users should #include <hb.h> (or hb-ft.h, hb-glib.h, etc), but
never things like hb-shape.h directly.  This makes it easier to
refactor headers later on without breaking compatibility.
2012-04-12 13:17:44 -04:00
Behdad Esfahbod
323190c27b Minor 2012-04-12 12:29:10 -04:00
Behdad Esfahbod
0e3361464b Fix bug with not setting Unicode props of the first character
Fixes Mongolian shaping issue:
https://bugs.freedesktop.org/show_bug.cgi?id=45695
2012-04-12 10:06:52 -04:00
Behdad Esfahbod
c65662b71e Fix left-matra positioning in Indic
Fixes 200 failures out of previous 4290 cases in the OO.o Indic
dictionary (of ~16000 entries).
2012-04-12 09:31:55 -04:00
Behdad Esfahbod
029a82d81d [hangul] Apply *jmo features to all Hangul chars
This is what old HB does.  Morever, fixes rendering with Win8 malgun
font.  The Win7 version doesn't compose with either Uniscribe nor HB,
but Win8 version works as expected, like Uniscribe, with this change.

Lets call Hangul done for now.
2012-04-11 22:00:46 -04:00
Behdad Esfahbod
41ae674f68 Don't create hb_apply_context_t per glyph!
I couldn't measure significant performance gains out of this; maybe
about 5% (with one million Malayalam strings).  Still, not bad.
But reminds me that optimizing this codebase without profiling first
is simply not going to work.  Oh well...
2012-04-11 17:13:50 -04:00
Behdad Esfahbod
4a1e02ef79 Fix shape to presentation forms font check
As reported by Jonathan Kew on the list.
2012-04-11 14:37:53 -04:00
Behdad Esfahbod
6062f5f014 Fix build with some compilers
As reported by Jonathan Kew on the list.
2012-04-11 14:19:55 -04:00
Behdad Esfahbod
acd88e659f In Arabic fallback shaping, check that the font has glyph for new char 2012-04-10 18:02:20 -04:00
Behdad Esfahbod
7752aa73e7 Minor 2012-04-10 17:22:14 -04:00
Behdad Esfahbod
939c010211 Implement Arabic fallback shaping mandatory ligatures 2012-04-10 17:20:05 -04:00
Behdad Esfahbod
b7d04eb606 Do Arabic fallback shaping 2012-04-10 16:44:38 -04:00
Behdad Esfahbod
ae4a2b9365 Generate fallback Arabic shaping table
Not hooked up yet.
2012-04-10 16:25:08 -04:00
Behdad Esfahbod
3b26f96ebe Add Thai shaper that does SARA AM decomposition / reordering
That's not in the OpenType spec, but it's what MS and Adobe do.
2012-04-10 10:52:07 -04:00
Behdad Esfahbod
d4cc44716c Move code around, in prep for Thai/Lao shaper 2012-04-07 21:52:28 -04:00
Behdad Esfahbod
c9a841f445 Add simple Hangul shaper that recomposes Jamo when feasible
Previously, we were NOT actually recomposing Hangul Jamo.  We do now.
The two lines in:

test/shaping/texts/in-tree/shaper-default/script-hangul/misc/misc.txt

Now render the same with the UnDotum.ttf font.  Previously the second
linle was rendering boxes.

We can also start applying OpenType Jamo features later.  At this time,
I have no idea how the 'ljmo', 'vjmo', 'tjmo' features are supposed to
work.  Maybe someone can explain them to me?
2012-04-07 15:06:55 -04:00
Behdad Esfahbod
9683184553 Implement normalization mode HB_OT_SHAPE_NORMALIZATION_MODE_COMPOSED_FULL
In this mode we try composing CCC=0 with CCC=0 characters.  Useful for
Hangul.
2012-04-07 15:06:47 -04:00
Behdad Esfahbod
bec2ac4fde Bring normalization algorithm closer to the spec
No logical difference so far.
2012-04-07 14:51:17 -04:00
Behdad Esfahbod
e02d925786 Flip logic around 2012-04-07 14:49:13 -04:00
Behdad Esfahbod
11138ccff7 Add normalize mode
In preparation for Hangul shaper.
2012-04-05 17:25:19 -04:00
Behdad Esfahbod
6769f21d57 More moving code around 2012-04-05 16:46:46 -04:00
Behdad Esfahbod
2db2a56682 Move code around 2012-04-05 16:40:37 -04:00
Behdad Esfahbod
cad3821f3d More sorting by Unicode version
This is the most convenient way to browse scripts.
2012-03-07 17:13:25 -05:00
Behdad Esfahbod
317b9504d7 Minor 2012-03-07 16:51:29 -05:00
Behdad Esfahbod
fa2673c1ee More Unicode script age annotation, and a couple more RTL scripts
Cross-checked with Mark Davis's spreadsheet at http://goo.gl/x9ilM
2012-03-07 15:52:02 -05:00
Behdad Esfahbod
6d4016f1ba Make src tests pass again 2012-03-07 15:33:14 -05:00
Behdad Esfahbod
7da435f08c Separate Unicode 3.1 and Unicode 3.2 additions 2012-03-07 15:20:20 -05:00
Behdad Esfahbod
f91136cb52 Route three Unicode 6.1 scripts through Indic shaper 2012-03-07 12:56:22 -05:00
Behdad Esfahbod
f32c0012ad Add Unicode 6.1.0 scripts 2012-03-07 12:53:34 -05:00
Behdad Esfahbod
50e810cd0e Lydian and Kharoshthi are right-to-left 2012-03-07 12:49:08 -05:00