Commit Graph

501 Commits

Author SHA1 Message Date
Behdad Esfahbod
09cbeb2246 Make bots happy
Fixes https://github.com/behdad/harfbuzz/issues/551
2017-10-03 13:22:07 +02:00
Behdad Esfahbod
1633513996 Add test for U+0A51
New Indic numbers are:

BENGALI: 353725 out of 354188 tests passed. 463 failed (0.130722%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366355 out of 366457 tests passed. 102 failed (0.0278341%)
GURMUKHI: 60729 out of 60747 tests passed. 18 failed (0.0296311%)
KANNADA: 951201 out of 951913 tests passed. 712 failed (0.0747968%)
KHMER: 299071 out of 299124 tests passed. 53 failed (0.0177184%)
MALAYALAM: 1048136 out of 1048334 tests passed. 198 failed (0.0188871%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271662 out of 271847 tests passed. 185 failed (0.068053%)
TAMIL: 1091754 out of 1091754 tests passed. 0 failed (0%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)

Before 71c0a1429d GURMUKHI used to be at 15,
because Uniscribe seems to allow this character standalone, but that looks
wrong.
2017-10-02 20:28:56 +02:00
Behdad Esfahbod
8b2c94c43f Tweak ligature component matching for ligature formation
If two marks want to ligate and they belong to different components of the
same ligature glyph, and said ligature glyph is to be ignored according to
mark-filtering rules, then allow.

Example Burmese senquence:

  U+1004,U+103A,U+1039,U+101B,U+103D,U+102D

Test font provided by Norbert Lindenberg.

Fixes https://github.com/behdad/harfbuzz/issues/545
2017-10-02 20:03:35 +02:00
Behdad Esfahbod
71c0a1429d [indic] Fix shaping of U+0A51
Mark it as matra below to allow the sequence U+0A15, U+0A51, U+0A47.
Oh well...

Fixes https://github.com/behdad/harfbuzz/issues/524
2017-10-02 18:57:03 +02:00
Behdad Esfahbod
cc79b666bc [indic] Add test for 1a0a356a0f
https://github.com/behdad/harfbuzz/issues/538
2017-10-02 09:19:15 -04:00
Behdad Esfahbod
61a9d7e6d0 Minor 2017-09-04 19:48:52 -07:00
Behdad Esfahbod
03a5a6f873 [util] Add --unicodes to hb-view / hb-shape
Fixes https://github.com/behdad/harfbuzz/issues/154
2017-09-01 19:12:22 -07:00
Behdad Esfahbod
0e5b475d98 Minor 2017-09-01 18:28:47 -07:00
Behdad Esfahbod
3e1fc6d18b Minor 2017-09-01 10:46:48 -07:00
Behdad Esfahbod
04f009f848 Add test accidentally removed in previous commit 2017-09-01 10:38:25 -07:00
Behdad Esfahbod
06cb162cd7 [indic] Treat Consonant_With_Stacker as consonant
Fixes https://github.com/behdad/harfbuzz/issues/528
"Kannada JIHVAMULIYA and UPADHMANIYA insert dotted circles"
2017-09-01 10:34:56 -07:00
Behdad Esfahbod
099472e08b hb_buffer_diff() tweak
I like to have a mode where CONTAINS_NOTDEF and CONTAINS_DOTTEDCIRCLE are not
returned.  Abused a value of -1 for that.  hb-shape now uses it.  Fixes two
of the six tests failing with --verify in test/shaping/run-tests.sh.
2017-08-30 16:45:06 -07:00
Behdad Esfahbod
4387b059a0 [test] Add --verify to hb-shape
Disabled for now.  Will enable and fix failures after next release.
2017-08-23 14:35:58 -07:00
Behdad Esfahbod
3b54d0337e Add tests for 'avar' fix 5dc30451b8 2017-08-08 18:37:03 -07:00
Behdad Esfahbod
c1432bce3c [arabic] Adjust feature order again
Fixes https://github.com/behdad/harfbuzz/issues/505
2017-07-14 17:35:17 +01:00
Behdad Esfahbod
9dd29c681e [use] Allow up to two medial-below letters
Fixes https://github.com/behdad/harfbuzz/issues/376
2017-07-14 17:01:27 +01:00
Behdad Esfahbod
216b003c91 [use] Fix shaping of U+AA29 CHAM VOWEL SIGN AA
Part of https://github.com/behdad/harfbuzz/issues/376
Also see https://github.com/roozbehp/unicode-data/issues/6

Test added, using NotoSansCham built from Noto Phase III sources.
2017-07-14 16:38:51 +01:00
Behdad Esfahbod
3cc84f45b9 [indic] Fix https://github.com/behdad/harfbuzz/issues/478 2017-07-14 15:50:22 +01:00
Behdad Esfahbod
e359a4b8f5 [indic] Disable automatic ZWNJ handling for Indic features
Fixes https://github.com/behdad/harfbuzz/issues/294

Also fixes a bunch of other Indic issues.  Test results after:

BENGALI: 353725 out of 354188 tests passed. 463 failed (0.130722%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366355 out of 366457 tests passed. 102 failed (0.0278341%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951201 out of 951913 tests passed. 712 failed (0.0747968%)
KHMER: 299071 out of 299124 tests passed. 53 failed (0.0177184%)
MALAYALAM: 1048136 out of 1048334 tests passed. 198 failed (0.0188871%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271662 out of 271847 tests passed. 185 failed (0.068053%)
TAMIL: 1091754 out of 1091754 tests passed. 0 failed (0%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)

Before:

BENGALI: 353725 out of 354188 tests passed. 463 failed (0.130722%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951190 out of 951913 tests passed. 723 failed (0.0759523%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
MALAYALAM: 1048136 out of 1048334 tests passed. 198 failed (0.0188871%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271662 out of 271847 tests passed. 185 failed (0.068053%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
2017-07-14 14:22:52 +01:00
Dominik Schlösser
3a73e0d5e1 Shaping tests for Tibetan vowels (#446)
* Shaping tests for Tibetan vowels

* Test-cases for the Dzongkha contractions with multiple vowel-signs added.

* going to be removed

* Extended contraction-test-cases to all test cases in contractions.txt that actually use multiple-vowels (113 cases)
2017-07-14 12:14:55 +01:00
Khaled Hosny
06cfe3f736 Do not skip TAG characters in glyph substitution (#487)
Hide them like Mongolian Free Variation Selectors instead.

Fixes https://github.com/behdad/harfbuzz/issues/463
2017-05-17 11:32:47 -07:00
Dominik Schlösser
c42869eb71 Small doc fix: make check runs the tests (#469) 2017-04-15 12:17:05 -07:00
Dominik Schloesser
8d256841ca Current fonttools (3.9.1) generate subset-file called font.subset.ttf instead of older font.ttf.subset 2017-03-27 12:05:35 +02:00
Dominik Schloesser
c2a9de15f5 Updated samples: record-it.sh is now record-test.sh 2017-03-27 12:05:35 +02:00
Khaled Hosny
f2e6c7ce51 [tools] Make hb-unicode-code work with Python 3
Related to https://github.com/behdad/harfbuzz/pull/445
2017-03-26 10:50:32 +02:00
Behdad Esfahbod
47e7a1800f Revert "Fix Context lookup application when moving back after a glyph delete"
This reverts commit b9b005f3a4.

This introduced invalid access cases. Revert until I fix correctly.
2017-03-10 13:23:02 -08:00
Behdad Esfahbod
b9b005f3a4 Fix Context lookup application when moving back after a glyph delete
This was broken forever, since days that we did not allow moving
tape backwards. Works now. Reported by Doug Felt.
2017-03-01 14:27:23 -08:00
Behdad Esfahbod
a11501444c Add few tests found by libFuzzer and oss-fuzz 2017-02-25 13:37:54 -08:00
jfkthame
44f7d6ecde Guard against underflow when adjusting length (#421)
* Guard against underflow when adjusting length

With the fuzz-testcase in mozilla bug 1295299, we end up with a recursed lookup that removes 3 items, when `match_positions[idx]` is 0, which results in (unsigned) `end` wrapping to a huge value.

Making `end` a signed int is probably the simplest route to a fix.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1295299.

* Add testcase for #421.
2017-02-16 19:03:24 -08:00
jfkthame
45766b673f [indic] Add support for Grantha marks that may be used in Tamil to th… (#401)
* [indic] Add support for Grantha marks that may be used in Tamil to the Indic table.

See https://bugzilla.mozilla.org/show_bug.cgi?id=1331339.

Testcase: U+0BA4,U+0BC6,U+1133c,U+0BAA,U+1133c,U+0BC6,U+1133c

* [indic] Add test for Grantha nukta that is allowed in Tamil by ScriptExtensions.txt
2017-02-16 09:40:21 -08:00
Behdad Esfahbod
e888f642db Route Adlam through Arabic shaper
Fixes joined Adlam rendering.

Fixes https://github.com/googlei18n/noto-fonts/issues/828
2017-01-26 14:50:14 -08:00
Khaled Hosny
2452543fdd [ot] Fix automatic fraction for RTL scripts (#405)
The numbers for right-to-left scripts are processed also from right to
left, so the order of applying “numr” and “dnom” features should be
reversed in such case.

Fixes https://github.com/behdad/harfbuzz/issues/395
2017-01-18 12:48:13 -08:00
Behdad Esfahbod
31f7b1bb94 Add tests for USE using Marchen font and text
From http://www.babelstone.co.uk/Fonts/Marchen.html
2017-01-05 20:20:06 -08:00
Behdad Esfahbod
4b4a1b9f23 Fix assert fail with contextual matching
As discovered by libFuzzer / Chromium fuzzing.

Fixes https://bugs.chromium.org/p/chromium/issues/detail?id=659496
CC https://github.com/behdad/harfbuzz/issues/139
2016-12-21 23:14:16 -06:00
Behdad Esfahbod
9f6144cdb9 [CBDT] Add test for fetching glyph extents 2016-12-04 19:55:17 -08:00
Behdad Esfahbod
d163cd9562 [tests] Add tests for vertical origin with ft and ot font-funcs 2016-10-26 18:27:48 +02:00
Behdad Esfahbod
91f2585411 Actually add test 2016-08-08 18:08:08 -07:00
Behdad Esfahbod
f1b76275da Add tests for Chinese language tags
Using font from https://github.com/behdad/harfbuzz/issues/300
2016-08-08 18:06:09 -07:00
Sascha Brawer
d34d3ac985 Support CPAL table 2016-06-19 13:17:57 +02:00
Khaled Hosny
988350586f [tests] Workaround Python 2 “narrow” builds
The so-called Python 2 “narrow” builds support UCS2 only, this is a
workaround to allow unichr to work with any Unicode character in such
builds. This fixes Travis-CI failure as it has narrow Python 2 builds.

Copied from:
https://github.com/behdad/fonttools/blob/master/Lib/fontTools/misc/py23.py
2016-06-18 23:01:58 +02:00
Behdad Esfahbod
6c0aa9e92b Fix build on droid.io 2016-05-06 17:50:53 +01:00
Behdad Esfahbod
9b6312f945 [use] Update to draft spec from Andrew Glass from August 2015 2016-05-06 17:41:49 +01:00
Behdad Esfahbod
30e6e29f0f [indic/use] Move Javanese from Indic shaper to USE
Fixes https://github.com/behdad/harfbuzz/issues/243

With javatext.ttf, the reodering medial Ra gets its advance width
zero'ed in Uniscribe implementation, and the font adds the advance
back.  Our Indic shaper does not do that, but USE does.  So, route
Javanese through USE.  That's what Microsoft does anyway.  Test:

  U+A9A5,U+A9BA

This also seems to fix the following sequence, and variations thereof:

  U+A99F,U+A9C0,U+A9A2,U+A9BF
2016-05-06 15:52:27 +01:00
Behdad Esfahbod
c6ee5f5f06 Add Javanese sample text 2016-05-06 15:39:02 +01:00
Behdad Esfahbod
f8061ae797 [tests] Fix test 2016-05-02 10:28:24 +02:00
Behdad Esfahbod
f00ab2a33a [hb-ot-font] Make 'glyf' table loading lazy
Apparently some clients have reference-table callbacks that copy the table.
As such, avoid loading 'glyf' table which is only needed if fallback positioning
happens.
2016-05-02 10:24:00 +02:00
Behdad Esfahbod
f68390f196 [test] Add text for Tibetan shorthand contractions
From http://www.babelstone.co.uk/Tibetan/Contractions.html
2016-04-27 02:44:35 -07:00
Behdad Esfahbod
b20305022a Do NOT ignore Mongolian Free Variation Selectors during matching
Fixes https://github.com/behdad/harfbuzz/issues/234
2016-04-26 16:41:17 -07:00
Behdad Esfahbod
6e55199b5f Add test for 6dd80faf0d 2016-04-04 16:24:27 -07:00
Behdad Esfahbod
af48e3d27c Fix recent test
Not sure why the FT functions were returning advance 1024.  This
caused failure on drone.io.  Switch to hb-ot-font and disable
glyph names.
2016-02-24 16:06:40 +09:00
Behdad Esfahbod
17c8317017 [tests] Fix for multiple options in test runner scripts 2016-02-24 16:06:23 +09:00
Behdad Esfahbod
ebd7431f82 Partially revert 86c68c7a2c
That commit moved the advance adjustment for mark positioning to
be applied immediately, instead of doing late before.  This breaks
if mark advances are zeroed late, like in Arabic.  Also, easier to
hit it in RTL scripts since a single mark with non-zero advance is
enough to hit the bug, whereas in LTR, at least two marks are needed.

This reopens https://github.com/behdad/harfbuzz/issues/211
The cursive+mark interaction is broken again.  To be fixed in a
different way.
2016-02-24 15:53:40 +09:00
Behdad Esfahbod
284481b312 Add test for mark positioning in rtl with non-zero mark advance
Apparently I broke this 86c68c7a2c.
Fix coming.
2016-02-24 15:52:37 +09:00
Behdad Esfahbod
56a84e8dd1 [tests] Allow commenting out tests to be skipped 2016-02-24 15:50:33 +09:00
Behdad Esfahbod
525cc7d28c Add note re only adding tests with Free Software fonts 2016-02-23 15:19:27 +09:00
Behdad Esfahbod
6a09d7e34b [test] Add README about how to add shaping tests 2016-02-23 13:47:16 +09:00
Behdad Esfahbod
f8ee7906d0 Remove MANIFEST files
They are unused currently.  We can add later if we hook them up
to anything useful.
2016-02-23 13:45:38 +09:00
Behdad Esfahbod
815bdd7700 In cluster-level=0, group ZWJ/ZWNJ with previous cluster
This better emulates Unicode grapheme clusters.

Note that Uniscribe does NOT do this, but should be harmless with most clients,
and improve fallback with clients that use HarfBuzz cluster as unit of fallback.

Fixes https://github.com/behdad/harfbuzz/issues/217
2016-02-22 18:22:44 +09:00
Behdad Esfahbod
c373155904 [fuzzing] Add test for recent fix
Test from https://github.com/behdad/harfbuzz/issues/223

I forgot that we do run hb-fuzzer on tests in shaping/tests/fuzzed.tests.
2016-02-19 15:13:07 +07:00
Behdad Esfahbod
da41e48f0a [USE] Zero mark advances by GDEF early
This is what Microsoft's implementation does.  Marks that need advance
need to add it back using 'dist' or other feature in GPOS.  Update tests to
match.
2016-02-16 17:16:33 +07:00
Behdad Esfahbod
86c68c7a2c [GPOS] Fix interaction of mark attachments and cursive chaining
Fixes https://github.com/behdad/harfbuzz/issues/211

What happens in that bug is that a mark is attached to base first,
then a second mark is cursive-chained to the first mark.  This only
"works" because it's in the Indic shaper where mark advances are
not zeroed.

Before, we didn't allow cursive to run on marks at all.  Fix that.
We also where updating mark major offsets at the end of GPOS, such
that changes in advance of base will not change the mark attachment
position.  That was superior to the alternative (which is what Uniscribe
does BTW), but made it hard to apply cursive to the mark after it
was positioned.  We could track major-direction offset changes and
apply that to cursive in the post process, but that's a much trickier
thing to do than the fix here, which is to immediately apply the
major-direction advance-width offsets...  Ie.:

https://github.com/behdad/harfbuzz/issues/211#issuecomment-183194739

If this breaks any fonts, the font should be fixed to do mark attachment
after all the advances are set up first (kerning, etc).

Finally, this, still doesn't make us match Uniscribe, for I explained
in that bug.  Looks like Uniscribe applies minor-direction cursive
adjustment immediate as well.  We don't, and we like it our way, at
least for now.  Eg. the sequence in the test case does this:

- The first subscript attaches with mark-to-base, moving in x only,
- The second subscript attaches with cursive attachment to first subscript
  moving in x only,
- A final context rule moves the first subscript up by 104 units.

The way we do, the final shift-up, also shifts up the second subscript
mark because it's cursively-attached.  Uniscribe doesn't.  We get:

[ttaorya=0+1307|casubscriptorya=0@-242,104+-231|casubscriptnarroworya=0@20,104+507]

while Uniscribe gets:

[ttaorya=0+1307|casubscriptorya=0@-242,104+-211|casubscriptnarroworya=0+487]

note the different y-offset of the last glyph.  In our view, after cursive,
things move together, period.
2016-02-16 16:07:20 +07:00
Behdad Esfahbod
5b5dc2c040 [tests] Add test for advance zeroing of an ASCII letter marked as mark in GDEF 2016-02-11 12:15:38 +07:00
Behdad Esfahbod
3fe0cf1040 Fix previous commit! 2016-02-10 18:43:43 +07:00
Behdad Esfahbod
293a210eee [tests] Fix fonts in cc4a78bf22
They had an invalid LookupFlag (32).
2016-02-10 18:39:59 +07:00
Behdad Esfahbod
cc4a78bf22 [tests] Add tests for Latin mark zeroing 2016-02-10 18:24:35 +07:00
Behdad Esfahbod
55ff34b9c1 [tests] Add tests for Thai mark zeroing 2016-02-10 18:24:32 +07:00
Behdad Esfahbod
43bb2b8fb0 Minor 2016-02-10 14:11:43 +07:00
Behdad Esfahbod
99d3495576 [test] Add test text for Kaithi 2016-01-06 12:21:54 +00:00
Behdad Esfahbod
6173c2a6fc Fix flaky test
This test font had a upem of 769, which results in rounding-related errors with
the FreeType font funcs.  Change the upem to 1024 to fix that.

Fixes https://github.com/behdad/harfbuzz/issues/201
2015-12-25 18:18:23 +01:00
Behdad Esfahbod
3fcae6d82d [tests] Add --reference, for re-recording tests 2015-12-25 18:18:02 +01:00
Behdad Esfahbod
2f02fc79a5 Improve ligature-component handling
We use three bits for lig_id these days, so we finally got a report of
two separate ligatures with the same lig_id happening adjacent to each
other, and then the component-handling code was breaking things.
Protect against that by ignoring same-lig-id but lig-comp=0 glyphs after
a new ligature.

Fixes https://github.com/behdad/harfbuzz/issues/198
2015-12-17 15:21:14 +00:00
Behdad Esfahbod
2ab0de9fbd [use] Fix halant detection
Before, we were just checking the use_category().  This detects as
halant a ligature that had the halant as first glyph (as seen in
NotoSansBalinese.)  Change that to use the is_ligated() glyph prop
bit.  The font is forming this ligature in ccmp, which is before
the rphf / pref tests.  So we need to make sure the "ligated" bit
survives those tests.  Since those only check the "substituted" bit,
we now only clear that bit for them and "ligated" survives.

Fixes https://github.com/behdad/harfbuzz/issues/180
2015-12-17 11:59:15 +00:00
Behdad Esfahbod
1c6a057dd1 Add tests for previous commit 2015-11-26 18:48:30 -05:00
Behdad Esfahbod
9cc1ed4fa6 Do not allow recursiving to same position and same lookup
This is just to make it harder to be extremely slow.  There definitely
are ways still, just harder.  Oh well... how do we tame this problem
without solving halting problem?!

Fixes https://github.com/behdad/harfbuzz/issues/174
2015-11-19 12:39:09 -08:00
Behdad Esfahbod
85062e3b46 Add tests for previous two commits
To fully test what these are supposed to test, they should be run
against libharfbuzz-fuzzing.la instead of libharfbuzz.la, but for
now just record the files.
2015-11-18 23:09:13 -08:00
Behdad Esfahbod
59821ab8b4 [arabic] Don't stretch over cased letters
Addresses
6e6f82b6f3 (commitcomment-14248516)
2015-11-06 16:27:44 -08:00
Behdad Esfahbod
5a7eb5d4d8 [fuzzing] Add test case for OOM
From https://github.com/behdad/harfbuzz/issues/161
2015-11-06 00:01:24 -08:00
Behdad Esfahbod
6e6f82b6f3 Implement SYRIAC ABBREVIATION MARK with 'stch' feature
The feature is enabled for any character in the Arabic shaper.
We should experiment with using it for Arabic subtending marks.
Though, that has a directionality problem as well, since those
are used with digits...

Fixes https://github.com/behdad/harfbuzz/issues/141
2015-11-05 17:46:34 -08:00
Behdad Esfahbod
04fd8517f8 Add tests for hyphen fallback
U+2011 is <noBreak> equivaent of U+2010, so we should do the fallback
for it.  Currently fails.
2015-11-04 17:39:26 -08:00
Behdad Esfahbod
550417117d [test] Drop hintings when subsetting fonts to record 2015-11-04 17:37:30 -08:00
Behdad Esfahbod
49ef630936 Adjust the width of various spaces if font does not cover them
See discussion here:
81ef4f407d

There's no way to disable this fallback, but I don't think it would
be needed.  Let's hope for the best!

Fixes https://github.com/behdad/harfbuzz/issues/153
2015-11-04 17:27:07 -08:00
Behdad Esfahbod
7793aad946 Normalize various spaces to space if font doesn't support
This resurrects the space fallback feature, after I disabled
the compatibility decomposition.  Now I can release HarfBuzz
again without breaking Pango!

It also remembers which space character it was, such that later
on we can approximate the width of this particular space
character.  That part is not implemented yet.

We normalize all GC=Zs chars except for U+1680 OGHA SPACE MARK,
which is better left alone.
2015-11-04 15:51:41 -08:00
Behdad Esfahbod
8b3c7f9ede [test] Support recording multiple lines of text in record-test.sh 2015-11-04 15:48:51 -08:00
Behdad Esfahbod
2f0dfd43cd Fix test expectation 2015-11-03 12:28:34 -08:00
Behdad Esfahbod
df698f3299 [ot-font] Fix hmtx table length checking, *again*
Exactly the same problem that I fixed in
63ef0b41dc

I rewrote the table checking yesterday in
67f8821fb2
and introduced the exact same issue again. :(
Good thing we have ongoing fuzzing going now.  Was discovered
immediately by libFuzzer.  Thanks kcc!

https://github.com/behdad/harfbuzz/issues/139#issuecomment-153449473
Fixes https://github.com/behdad/harfbuzz/issues/156
2015-11-03 12:15:12 -08:00
Behdad Esfahbod
67f8821fb2 [ot] Make bad-hmtx handling match FreeType
Also route fuzzing-related tests through hb-ot-font, to reduce dependency
on FreeType behavior for badly-broken fonts.  Fixes failing test with
FreeType master.
2015-11-02 15:37:29 -08:00
Behdad Esfahbod
338ffec9e4 Add tests for a couple of fixed issues found by libFuzzer
From:
https://github.com/behdad/harfbuzz/issues/139#issuecomment-147616887
https://github.com/behdad/harfbuzz/issues/139#issuecomment-148289957
2015-10-15 12:56:19 -03:00
Behdad Esfahbod
55db94be2b Add test for previous commit 2015-10-13 00:33:59 -04:00
Behdad Esfahbod
98c6fccc00 Add test for ee9b0b6cb5 2015-10-11 21:41:04 -04:00
Behdad Esfahbod
34379b49e6 Add test for previous fix 2015-10-09 12:34:02 -04:00
Behdad Esfahbod
b6d7d161a8 [tests] Add Hebrew test for normalization under cluster-level=1
Currently fails.
https://bugzilla.gnome.org/show_bug.cgi?id=541608
2015-09-01 16:12:44 +01:00
Behdad Esfahbod
fad2674874 Minor 2015-09-01 14:45:46 +01:00
Behdad Esfahbod
7368da6724 [test] Add test for cursive-positioning with mixed directions
Fails now.  Fix coming.  See thread "Issue with cursive attachment"
started by Khaled.  Test fonts were made by modifying test font
from Khaled to add more anchors.
2015-08-25 20:29:36 +01:00
Behdad Esfahbod
f3792342f6 [tests] Add test for fallback positioning with cluster_level > 0
For https://github.com/behdad/harfbuzz/pull/123
Currently fails.  Fix coming.
2015-08-08 18:03:16 +02:00
Behdad Esfahbod
df6cb84449 Merge branch 'use' 2015-07-26 19:40:55 +02:00
Behdad Esfahbod
786ba45847 [test] Encode Kharoshti text
Ouch!
2015-07-23 13:04:34 +01:00
Behdad Esfahbod
b423125503 [test] Add Batak and Buginese test texts 2015-07-23 13:01:55 +01:00
Behdad Esfahbod
b8c159ffcc [test] Remove shaper-sea texts under shaper-use 2015-07-23 13:00:44 +01:00
Behdad Esfahbod
67ba7320cc [test] Remove New Tai Lue texts
New Tai Lue changed encoding to visual, boring, model.
2015-07-23 12:58:21 +01:00
Behdad Esfahbod
c81d957a26 [test] Add tests for improved 'vert' feature 2015-07-23 12:50:48 +01:00
Behdad Esfahbod
8a6a16dbcb [test] Add recently added test
Ouch.
2015-07-23 12:49:09 +01:00
Behdad Esfahbod
895fb31c7f [test] Support additional options to hb-shape in micro-test suite 2015-07-23 12:14:03 +01:00
Behdad Esfahbod
582069172c Add test case for deleting default ignorables with positioning 2015-07-22 18:44:59 +01:00
Behdad Esfahbod
14b12f92a9 [USE] Add Kharoshti test data from Unicode proposal 2015-07-20 11:57:44 +01:00
Behdad Esfahbod
8f0a4d6714 [test] Ignor 'n' and 'i' in hb-unicode-encode
Allows accepting uniXXXX format.
2015-04-23 14:32:33 -07:00
Behdad Esfahbod
9868749abe [test] Use /usr/bin/env python instead of /usr/bin/python
Bug 76494 - #!/usr/bin/python in testsuite

https://bugs.freedesktop.org/show_bug.cgi?id=76494
2015-04-06 14:51:31 -07:00
Ebrahim Byagowi
363ceec3fb Make hb_test_tools.py compatible with python 3
On ArchLinux, /usr/bin/python is linked to python 3 so
HarfBuzz `make check` is broken there.

This makes hb_test_tools.py compatible with python 3 while
no breaking it on python 2.
2015-03-31 03:06:32 +04:30
Behdad Esfahbod
e6f80fa104 [indic] Allow ZWJ/ZWNJ before SM
In Oriya, a ZWJ/ZWNJ might be added before candrabindu to encourage
or stop ligation of the candrabindu.  This is clearly specified in
the Unicode section on Oriya.  Allow it there.  Note that Uniscribe
doesn't allow this.

Micro tests added using Noto Sans Oriya draft.

No changes in numbers.  Currently at:

BENGALI: 353725 out of 354188 tests passed. 463 failed (0.130722%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951190 out of 951913 tests passed. 723 failed (0.0759523%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
MALAYALAM: 1048147 out of 1048334 tests passed. 187 failed (0.0178378%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271662 out of 271847 tests passed. 185 failed (0.068053%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
2014-12-10 12:05:24 -08:00
Behdad Esfahbod
a1f27ac3c4 Update test expectation for previous commit 2014-10-02 16:54:33 -04:00
Behdad Esfahbod
715f27f85f [test] Fixup test 2014-10-01 16:53:00 -04:00
Behdad Esfahbod
c4308f895a Minor 2014-08-13 19:42:01 -04:00
Behdad Esfahbod
d5e61470fa [arabic] Fix fallback shaping regression
Was broken in 615d00ea25.

Fixes https://github.com/behdad/harfbuzz/pull/48

Micro-test added.
2014-08-05 14:19:36 -04:00
Behdad Esfahbod
ac53443f1c [hangul] Don't apply 'calt'
See comments.

Micro-test added.
2014-07-31 18:54:43 -04:00
Behdad Esfahbod
8292f96b2b [test] Fix record-test.sh 2014-07-31 18:54:43 -04:00
Behdad Esfahbod
9e834e29e0 [hebrew] Zero mark advance by GDEF late
Seems to be what Uniscribe does.

At this point I think it's work checking our default...

Fixes Bug 76767 - Zeroing of advance of 2nd component of multiple
substitution with SBL Hebrew
https://bugs.freedesktop.org/show_bug.cgi?id=76767

Micro-test added.
2014-07-26 20:34:01 -04:00
Behdad Esfahbod
6f2d9ba52a Add old-Myanmar shaper
Looks like Unsicribe responds to the 'mymr' tag by zeroing marks
GDEF_LATE instead of generic-shaper UNICODE_LATE.  Implement that.

Fixes
Bug 81775 - Incorrect Rendering with harfbuzz-ng myanmar unicode
https://bugs.freedesktop.org/show_bug.cgi?id=81775

Micro-test added based on Padauk.
2014-07-26 19:18:59 -04:00
Behdad Esfahbod
fc0daafab0 [indic] Handle old-spec Malayalam reordering with final Halant
See comment.

Micro-tests added.
2014-07-23 16:53:03 -04:00
Behdad Esfahbod
d218bdb26b Fix test runner under Windows 2014-07-22 18:02:11 -04:00
Behdad Esfahbod
00a57eb4b5 [test] Remove unused micro-font 2014-07-18 14:42:50 -04:00
Behdad Esfahbod
ed29b15f5d [test] Add more Mongolian variation selector tests
From
https://code.google.com/p/chromium/issues/detail?id=393896
2014-07-18 14:37:49 -04:00
Behdad Esfahbod
615d00ea25 [arabic] Apply init/medi/isol/fini/... in separate stages
Follows the order of the Arabic/Syriac specs.  Also don't stop
between rlig and calt in non-Arabic scripts.

Micro-tests for Arabic and Mongolian added for the latter.
2014-07-17 15:50:13 -04:00
Behdad Esfahbod
d21e997035 [test] Make record_test understand cmdline args to hb-shape 2014-07-17 15:30:17 -04:00
Behdad Esfahbod
164c13d73f Another try to fix Mongolian free variation selectors
This reverts bf029281 and fixes it properly.  That commit
was not enough as it was only inheriting the shaping_action
for prev_action, but not curr_action.

Micro-test added.

https://code.google.com/p/chromium/issues/detail?id=393896
2014-07-17 14:28:04 -04:00
Behdad Esfahbod
844f1a487d [tests] Add record-test.sh 2014-07-16 13:32:51 -04:00
Behdad Esfahbod
3b861421a7 Fix Mongolian Variation Selectors for fonts without GDEF
Originally we fixed those in 79d1007a50.
However, fonts like MongolianWhite don't have GDEF, but have IgnoreMarks
in their LigatureSubstitute init/etc features.  We were synthesizing a
GDEF class of mark for Mongolian Variation Selectors and as such the
ligature lookups where not matching.  Uniscribe doesn't do that.

I tried with more sophisticated fixes, like, if there is no GDEF and
a lookup-flag mismatch happens, instead of rejecting a match, try
skipping that glyph.  That surely produces some interesting behavior,
but since we don't want to support fonts missing GDEF more than we have
to, I went for this simpler fix which is to always mark
default-ignorables as base when synthesizing GDEF.

Micro-test added.

Fixes rest of https://bugs.freedesktop.org/show_bug.cgi?id=65258
2014-07-16 13:30:26 -04:00
Behdad Esfahbod
6bd5646f1b [tests] Remove bash'ish
Apparently on travis-ci, bash is linked to dash, which doesn't
understand "let".  Failing tests were not being noticed.  See eg:

  https://travis-ci.org/behdad/harfbuzz/jobs/29544211

Don't rely on bash.
2014-07-09 17:07:06 -04:00
Behdad Esfahbod
1d634cbb4b Fix base-position when 'pref' is NOT formed
If pre-base reordering Ra is NOT formed (or formed and then
broken up), we should consider that Ra as base.  This is
observable when there's a left matra or dotreph that positions
before base.

Now, it might be that we shouldn't do this if the Ra happend
to form a below form.  We can't quite deduce that right now...

Micro test added.  Also at:

https://code.google.com/a/google.com/p/noto-alpha/issues/detail?id=186#c29
2014-06-12 17:10:35 -04:00
Behdad Esfahbod
0ff74b09d2 Add missing test file. Oops 2014-06-05 21:55:23 -04:00
Behdad Esfahbod
832a6f99b3 [indic] Don't reorder reph/pref if ligature was expanded
Normally if you want to, say, conditionally prevent a 'pref', you
would use blocking contextual matching.  Some designers instead
form the 'pref' form, then undo it in context.  To detect that
we now also remember glyphs that went through MultipleSubst.

In the only place that this is used, Uniscribe seems to only care
about the "last" transformation between Ligature and Multiple
substitions.  Ie. if you ligate, expand, and ligate again, it
moves the pref, but if you ligate and expand it doesn't.  That's
why we clear the MULTIPLIED bit when setting LIGATED.

Micro-test added.  Test: U+0D2F,0D4D,0D30 with font from:

[1]
https://code.google.com/a/google.com/p/noto-alpha/issues/detail?id=186#c29
2014-06-05 20:36:01 -04:00
Behdad Esfahbod
7977ca17aa [indic] Allow decimal and Brahmi digits as placeholders
Tests: U+0967,0951 U+0031,093F
2014-05-29 15:34:26 -04:00
Behdad Esfahbod
e8b5d64039 [indic] Do NOT allow reph formation on placeholders
Only allow it on DOTTED CIRCLE.  No effect on test numbers.

Test: U+0930,094D,00A0
2014-05-29 15:20:15 -04:00
Behdad Esfahbod
0a017ce169 Add tests for Myanmar Asat+MedialYa and MedialYa+Asat sequences
One of them currently produces dotted-circle.  Fix and detailed
message coming.
2014-05-14 16:44:16 -06:00
Behdad Esfahbod
659cd3c5b4 [test] Add test case for Tibetan sign PADMA
Currently fails.
2014-04-28 12:44:14 -07:00
Behdad Esfahbod
ee703bc3ef Reshuffle test data 2014-04-28 12:44:14 -07:00
Behdad Esfahbod
897c7b804d Add Khmer test for U+17DD 2014-04-10 16:27:13 -07:00
Behdad Esfahbod
2a473338da Add Myanmar test case from OpenType Myanmar spec 2014-03-10 15:04:46 -07:00
Behdad Esfahbod
1589859089 Minor 2014-03-10 14:57:55 -07:00
Behdad Esfahbod
2646aec1e6 Drop required automake version back to 1.11.3
Work around broken automake-1.13 changes.
2013-12-05 18:19:35 -05:00
Behdad Esfahbod
d913f98d88 Require automake 1.13
Fix tests build.

https://bugs.freedesktop.org/show_bug.cgi?id=71353
2013-12-04 19:59:48 -05:00
Behdad Esfahbod
9af91ca8ff Add more Myanmar test cases
All three are broken right now according to Roozbeh.

https://bugs.freedesktop.org/show_bug.cgi?id=71947
https://bugs.freedesktop.org/show_bug.cgi?id=71948
https://bugs.freedesktop.org/show_bug.cgi?id=71949
2013-11-25 17:47:19 -05:00
Behdad Esfahbod
b9d0077ac1 Fix win32 testing 2013-10-28 20:46:11 +01:00
Behdad Esfahbod
2e990a3d72 Make "make distcheck" happy 2013-10-28 20:23:07 +01:00
Behdad Esfahbod
5c558877da [indic] Allow up to two syllable modifiers
Bug 70509 - Candrabindu+Visarga doesn't work in Devanagari
https://bugs.freedesktop.org/show_bug.cgi?id=70509

We categorize both bindus and visarga as syllable-modifiers.
OT spec doesn't actually say what characters go in the syllable
modifier category, and allows one.  We just allow up to two now.

Test case: U+0930,U+0941,U+0901,U+0903

Uniscribe currently doesn't support that and produces a
dotted circle.
2013-10-16 11:18:09 +02:00
Behdad Esfahbod
65a929b1c0 [indic] If Malayalam dot-reph formed a ligature, don't move it
Rachana-0.6 implements dot-reph by ligation, so we shouldn't move it.
Uniscribe doesn't either.  Test case:

  U+0D4E,U+0D1A,U+0D4D,U+0D1A,U+0D4D
2013-10-15 18:21:32 +02:00
Behdad Esfahbod
c46f406973 [tests] Remove Myanmar micro-font and test 2013-10-15 18:21:32 +02:00
Behdad Esfahbod
30145272a7 [indic] Don't apply presentation features across syllables
More like Uniscribe...  We still allow user-defined features to
work across syllables, but not pres,blws,abs,psts,etc.

This "regressed" Sinhala numbers by 11.  These are cases were
there's Consonant followed by Ra,Halant,ZWJ at the of text.
The Ra,Halant,ZWJ ends up forming reph, which is wrong...
But before we were also ligating that reph with the previous
consonant.  That's even more wrong.  That's also what Uniscribe
does.

Current numbers:

BENGALI: 353732 out of 354188 tests passed. 456 failed (0.128745%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951030 out of 951913 tests passed. 883 failed (0.0927606%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
MALAYALAM: 1048140 out of 1048334 tests passed. 194 failed (0.0185056%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271655 out of 271847 tests passed. 192 failed (0.070628%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
2013-10-15 18:20:59 +02:00
Behdad Esfahbod
3c7b3641cf [indic] Handle Avagraha
It can come either at the end(ish!) of the syllable, or independently.
When independent, it accepts a few bits and pieces.
2013-10-15 13:14:31 +02:00
Behdad Esfahbod
2c85a3df09 Fix issue with automake 2013-10-14 19:41:52 +02:00
Behdad Esfahbod
841e20d083 Add test suite for shaping results
The new test suite runs tests included under
hb/test/shaping/tests/*.tests, which themselves reference
font files stored by sha1sum under hb/test/shaping/fonts/sha1sum.
The fonts are produced using a subsetter to only include glyphs
needed to run the test.

Four initial tests are added for (Chain)Context matching,
of which three currently fail.
2013-10-14 18:54:51 +02:00
Behdad Esfahbod
e2dab69291 Minor 2013-10-14 16:44:44 +02:00
Behdad Esfahbod
cc50bf5b13 Remove Hangul filler characters from Default_Ignorable chars
See discussion on mailing list.
2013-03-19 07:00:41 -04:00
Behdad Esfahbod
a8cf7b43fa [Indic] Futher adjust ZWJ handling in Indic-like shapers
After the Ngapi hackfest work, we were assuming that fonts
won't use presentation features to choose specific forms
(eg. conjuncts).  As such, we were using auto-joiner behavior
for such features.  It proved to be troublesome as many fonts
used presentation forms ('pres') for example to form conjuncts,
which need to be disabled when a ZWJ is inserted.

Two examples:

	U+0D2F,U+200D,U+0D4D,U+0D2F with kartika.ttf
	U+0995,U+09CD,U+200D,U+09B7 with vrinda.ttf

What we do now is to never do magic to ZWJ during GSUB's main input
match for Indic-style shapers.  Note that backtrack/lookahead are still
matched liberally, as is GPOS.  This seems to be an acceptable
compromise.

As to the bug that initially started this work, that one needs to
be fixed differently:

  Bug 58714 - Kannada u+0cb0 u+200d u+0ccd u+0c95 u+0cbe does not
  provide same results as Windows8
  https://bugs.freedesktop.org/show_bug.cgi?id=58714

New numbers:

BENGALI: 353689 out of 354188 tests passed. 499 failed (0.140886%)
DEVANAGARI: 707305 out of 707394 tests passed. 89 failed (0.0125814%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60706 out of 60747 tests passed. 41 failed (0.067493%)
KANNADA: 951030 out of 951913 tests passed. 883 failed (0.0927606%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048102 out of 1048334 tests passed. 232 failed (0.0221304%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2013-03-19 06:22:06 -04:00
Behdad Esfahbod
6d69a2cec1 [tests] Add Malayalam tests frim cibu 2013-02-26 21:11:05 -05:00
Behdad Esfahbod
e0486fc1af [tests] Add Myanmar torture tests from Martin Hosken 2013-02-19 00:58:10 -05:00
Behdad Esfahbod
a3df9a7bf8 Minor
Moving files around
2013-02-19 00:50:46 -05:00
Behdad Esfahbod
b1f4407591 [SEA] Fix order of pre-base reordering Ra and left matras
The code was confused because it was expecting left matra to have
POS_PRE_M, like we do in the Myanmar shaper, but that is not what
we were doing in this shaper.  Rewrite to rely on category only.

Test case: U+AA06,U+AA34,U+AA2F
2013-02-17 12:12:37 -05:00
Behdad Esfahbod
05ac87813d [tests] Add Syriac Alaph shaping test cases 2013-02-15 09:26:41 -05:00
Behdad Esfahbod
126f39cd16 Add more dot-reph tests 2013-02-13 08:29:21 -05:00
Behdad Esfahbod
f22b7e7778 [Indic] Track base position when reordering things
Ouch, how did things ever work without this?!  The added test that has a
dot-reph as well as a pre-base reordering Ra perfectly demonstrates the
bug (tested with Nirmala font from Win8 for example).  Testing suggests
that Win8 shaper has the *exact* same bug / behavior that we used to
have.  Odd.
2013-02-13 07:32:46 -05:00
Behdad Esfahbod
cc5f24cde0 [tests] Add tests for Devanagary Eyelash Ra
Currently broken with Sanskrit 2003 font.
2013-02-12 18:17:12 -05:00
Behdad Esfahbod
64bb2ae857 Didn't mean to push this out
Ouch!
2013-02-12 16:29:25 -05:00
Behdad Esfahbod
f9b660534c [Myanmar] Use master Indic table for syllable data 2013-02-12 16:13:56 -05:00
Behdad Esfahbod
f60793e854 [tests] Add Cham sample 2013-02-12 15:45:59 -05:00
Behdad Esfahbod
3a83d33ec0 Add South-East Asian shaper
Handles Tai Tham, Cham, and New Tai Lue for now.
2013-02-12 12:14:10 -05:00
Behdad Esfahbod
fb96021206 Minor test reshufflings 2013-02-12 10:33:58 -05:00
Behdad Esfahbod
5676d5d527 [Indic] Make sure New Tai Lue works! 2013-02-12 10:31:14 -05:00
Behdad Esfahbod
bed687f886 Shuffle test data around 2013-02-11 14:24:03 -05:00
Behdad Esfahbod
ecd454b3cd [Indic] In old-spec shaping, don't move viramas around if seq ends with one
For example: u0c9a u0ccd u0c9a u0ccd with Lohit.  See:

https://bugs.freedesktop.org/show_bug.cgi?id=59118
2013-01-08 18:09:46 -06:00
Behdad Esfahbod
8b217f5ac5 [Indic] Reorder Malayalam dot-reph to after base
Test sequence is simple: U+0D4E,U+0D15.  The doth-reph should be
reordered to after the Ka.

https://bugzilla.redhat.com/show_bug.cgi?id=799565
2012-12-21 15:49:26 -05:00
Behdad Esfahbod
624933f676 Add Persian test cases from Mehran Mehr 2012-11-30 11:46:35 +02:00
Behdad Esfahbod
43b6531500 [Indic] Another try to unbreak Sinhala split matras
Just read the comments...
2012-11-16 13:14:26 -08:00
Behdad Esfahbod
f3584d3a3a Add test cases for Thai PUA shaping 2012-11-14 17:53:06 -08:00
Behdad Esfahbod
6b19fa4862 Adjust diff rule for the new hb-shape output format 2012-11-14 11:38:50 -08:00
Behdad Esfahbod
82c4d9880a Add Sinhala test case for split matra U+0DDA 2012-11-14 10:56:02 -08:00
Behdad Esfahbod
27f52dc3f6 Add Myanmar tests from UTN#11 2012-11-12 16:54:03 -08:00
Behdad Esfahbod
e6b86c8519 Add test for non-joining Mongolian letters
For U+1880..U+1886 Uniscribe thinks they are non-joining.
For U+1887 Uniscribe thinks it's joining, but looks wrong to me.
2012-11-05 15:18:49 -08:00
Behdad Esfahbod
f5e55754f9 Add Tifinagh test data 2012-11-02 13:53:18 -07:00
Behdad Esfahbod
c21498afd8 Add Mongolian and 'Phags-pa joining test cases 2012-11-02 10:21:26 -07:00
Behdad Esfahbod
911ed09698 Ignore gid0 in test results 2012-10-29 19:42:19 -07:00
Behdad Esfahbod
10b88d89ef Add Ethiopic test case
This sequence: U+120B,U+135F,U+120B with the Nyala font from Win7
exposes a GPOS bug in Uniscribe, in that the positioned mark is wrongly
moved as a result a following kern.

This is the one "failure" in the Ethiopic test suite :-).

ETHIOPIC: 118900 out of 118901 tests passed. 1 failed (0.000841036%)
2012-10-29 18:26:00 -07:00
Behdad Esfahbod
166b5cf7ec [Indic] Find syllables before any features are applied
With FreeSerif, it seems that the 'ccmp' feature does ligature
substituttions.  That was then causing syllable match failures.  We now
find syllables before any features have been applied.

Test sequence: U+0D9A,U+0DCA,U+200D,U+0DBB,U+0DCF
2012-09-07 14:56:01 -04:00
Behdad Esfahbod
efb8d3eb71 Fixup test failure reporting
After we implemented dotted-circle, we were still ignoring any tests
that had dottedcircle in it for any of the shapers.  That meant that if
we wrongly outputted dottedcircle, the test was being ignored.  Ouch!

Fixing that shows regressions across the board.  Most are Uniscribe
bugs: NOT inserting dotted-circle when it should.  Some are arou
machine bugs.  This is in fact a nice way to catch Indic-machine
deficiencies and when I fix the regressions, our clusters should be
much closer to Uniscribe.  For now, we regressed from:

BENGALI: 353997 out of 354285 tests passed. 288 failed (0.0812905%)
DEVANAGARI: 707339 out of 707394 tests passed. 55 failed (0.00777502%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60769 out of 60809 tests passed. 40 failed (0.0657797%)
KANNADA: 951086 out of 951913 tests passed. 827 failed (0.0868777%)
KHMER: 299106 out of 299124 tests passed. 18 failed (0.00601757%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048104 out of 1048416 tests passed. 312 failed (0.0297592%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271747 out of 271847 tests passed. 100 failed (0.0367854%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970558 out of 970573 tests passed. 15 failed (0.00154548%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)

To:

BENGALI: 353990 out of 354285 tests passed. 295 failed (0.0832663%)
DEVANAGARI: 707315 out of 707394 tests passed. 79 failed (0.0111678%)
GUJARATI: 366447 out of 366506 tests passed. 59 failed (0.016098%)
GURMUKHI: 60707 out of 60809 tests passed. 102 failed (0.167738%)
KANNADA: 951042 out of 951913 tests passed. 871 failed (0.0915%)
KHMER: 298962 out of 299124 tests passed. 162 failed (0.0541581%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048074 out of 1048416 tests passed. 342 failed (0.0326206%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091835 out of 1091837 tests passed. 2 failed (0.000183178%)
TELUGU: 970553 out of 970573 tests passed. 20 failed (0.00206064%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)

Investigating.
2012-09-05 15:57:38 -04:00
Behdad Esfahbod
a4e75e4128 Minor 2012-08-27 15:54:15 -04:00
Behdad Esfahbod
206ab60573 [test] Move around 2012-08-10 09:06:30 -04:00
Behdad Esfahbod
7a484c601e [test] Add Urdu ligature sequences from CRULP 2012-08-10 09:05:29 -04:00
Behdad Esfahbod
70b3dc3272 Add Hebrew test 2012-07-30 12:40:18 -04:00
Behdad Esfahbod
a973b5ce86 [GSUB] Further adjustments to mark-attachment vs ligation interaction
The d1d69ec52e change broke Kannada badly,
since it was ligating consonants, pushing matra out, and then ligating
with the matra.  Adjust for that.  See comments.
2012-07-30 01:47:46 -04:00
Behdad Esfahbod
97a201becf Add Arabic tests for mark ligature component attachments 2012-07-29 20:37:29 -04:00
Behdad Esfahbod
5d874d566f [GPOS] Fix mark-to-mark positioning when one of the marks is a ligature
This commit: a3313e5400 broke MarkMarkPos
when one of the marks itself is a ligature.  That regressed 26 Tibetan
tests (up from zero!).  Fix that.  Tibetan back to zero.
2012-07-28 21:05:25 -04:00
Behdad Esfahbod
6411e74caf [Indic] Reposition Gurmukhi top matras to after post
The font is forming a post-base consonant in some samples, and Uniscribe
positions top matra on the post-base.  Do the same.

Gurmukhi failures down from 59 to 41 (0.0674242%).
2012-07-24 13:48:49 -04:00
Behdad Esfahbod
c3f769ba09 [Indic] Ignore Uniscribe output containing two zero-width space glyphs
Uniscribe is buggy and sometimes /eats/ a mark next to a non-joiner.
Most of Malayalam failures where actually hitting this bug.

Ignore test output with two zero-width space glyphs.  This is a hack
until we build up the test suite infrastructure better.

Bengali went down by 9, Devanagari by 2, Kannada by 130, Malayalm down
from 1197 to 307, Sinhala down by 16, Telugu down by 26.  New stats:

BENGALI: 353996 out of 354285 tests passed. 289 failed (0.0815727%)
DEVANAGARI: 693573 out of 693628 tests passed. 55 failed (0.00792932%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60750 out of 60809 tests passed. 59 failed (0.0970251%)
KANNADA: 951086 out of 951913 tests passed. 827 failed (0.0868777%)
KHMER: 299094 out of 299124 tests passed. 30 failed (0.0100293%)
MALAYALAM: 1048109 out of 1048416 tests passed. 307 failed (0.0292823%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271715 out of 271847 tests passed. 132 failed (0.0485567%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970550 out of 970573 tests passed. 23 failed (0.00236973%)
2012-07-24 13:26:32 -04:00
Behdad Esfahbod
65c43accdc [Indic] Better position left-matra in Malayalam
Just put it before base, which is what's expected.

Malayalam failures down from 1559 to 1197 (0.114172%).

BENGALI: 353988 out of 354285 tests passed. 297 failed (0.0838308%)
DEVANAGARI: 693571 out of 693628 tests passed. 57 failed (0.00821766%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60750 out of 60809 tests passed. 59 failed (0.0970251%)
KANNADA: 950956 out of 951913 tests passed. 957 failed (0.100534%)
KHMER: 299094 out of 299124 tests passed. 30 failed (0.0100293%)
MALAYALAM: 1047219 out of 1048416 tests passed. 1197 failed (0.114172%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271699 out of 271847 tests passed. 148 failed (0.0544424%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970524 out of 970573 tests passed. 49 failed (0.00504856%)
2012-07-24 03:36:47 -04:00
Behdad Esfahbod
88f413b56f [Indic] Implement Reph+Ya-Phalaa interaction
The sequence Ra,H,Ya in Bengali is ambigious and Unicode encoded that to
get Ya-Phalaa, one would place ZWJ before Halant.  Ie. a ZWJ,H sequence
requests subjoining, while a H,ZWJ requests Half form.  Implement that.

Bengali failures go down from 377 to 297 (0.0838308%).
Gujarati is down by 4 to 17 (0.0046384%).
Kannada is down by 226 to 957 (0.100534%).

Current status:

BENGALI: 353988 out of 354285 tests passed. 297 failed (0.0838308%)
DEVANAGARI: 693571 out of 693628 tests passed. 57 failed (0.00821766%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60750 out of 60809 tests passed. 59 failed (0.0970251%)
KANNADA: 950956 out of 951913 tests passed. 957 failed (0.100534%)
KHMER: 299094 out of 299124 tests passed. 30 failed (0.0100293%)
MALAYALAM: 1046857 out of 1048416 tests passed. 1559 failed (0.148701%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271699 out of 271847 tests passed. 148 failed (0.0544424%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970524 out of 970573 tests passed. 49 failed (0.00504856%)
2012-07-24 03:04:36 -04:00
Behdad Esfahbod
330b329c89 [Indic] Unmark U+17D1 KHMER SIGN VIRIAM to NOT be a Virama
Fixes another 1 Khmer failure.  Down to 30 (0.0100293%) now.
2012-07-24 02:25:26 -04:00
Behdad Esfahbod
d90b8e841e [Indic] Reposition Khmer prebase-reordering Ra around split matras
In Khmer coeng model, a V,Ra can go *after* matras.  If it goes after a
split matra, it should be reordered to *before* the left part of such matra.

Khmer failures down from 136 to 39 (0.0130381%).
2012-07-24 02:11:18 -04:00
Behdad Esfahbod
7573799126 [Indic] Position Khmer U+17CE
Fixes another 6 Khmer failures.  Now at 136 (0.0454661%).
2012-07-24 01:32:07 -04:00
Behdad Esfahbod
2278eefcdb [Indic] In Sinhala, form forced Reph even if no other consonant found
Fixes another 10 Sinhala failures.  Down to 148 (0.0544424%).
2012-07-24 00:31:10 -04:00
Behdad Esfahbod
71fd5e80ad [Indic] Further adjust base algorithm for Sinhala
Apparently if there is C,V,ZWJ,C, the first C will be base, but if
it's C,ZWJ,V,C, the second one will be.

Note that Uniscribe implements this differently, by breaking syllable in
the case of C,ZWJ,V,C and putting the first consonant in one syllable
and the rest in the next syllable.

Sinhala failures down from 208 to 158 (0.0581209%).  No changes to
Khmer.
2012-07-24 00:21:16 -04:00
Behdad Esfahbod
73d71cc527 [Indic] End Vowel-based syllable at ZWJ
One Devanagari test regressed, plus 10 Malayalam (at 1545 now).

Fixed 120 Sinhala failures.  Now at 208 (0.0765136%).
2012-07-24 00:09:12 -04:00
Behdad Esfahbod
34c215036f [Indic] Improve Sinhala base algorithm and reph positioning
Sinhala does not have half forms.  And most (all?) consonants can be
base, except when preceded by ZWJ, which would request a subjoined form.
Hence switch the base algorithm to categorize with Khmer, start search
at start, and stop at a ZWJ.

Also, mark all pos=base consonants after base to be subjoined.  Mark
base itself to have pos=base.

Finally, adjust Sinhala's reph position to after-main.

Brings down Sinhala failures from 455 to 328 (0.120656%).
2012-07-23 23:51:29 -04:00
Behdad Esfahbod
771a8f5028 [Indic] exclude ligatures when matching on Indic category
If, say, a H,ZWJ,C ligature was formed, we don't want the code to detec
that as a Halant.  So, ignore ligatures when matching category in
final_reordering.

Sinhala failures down from 514 to 455 (0.167374%).
2012-07-23 20:09:30 -04:00
Behdad Esfahbod
42848453bf [Thai] Reorder U+0E3A THAI VOWEL SIGN PHINTHU
Uniscribe reorders U+0E3A to be after U+0E38 and U+0E39.  We do that by
modifying the ccc for U+0E3A.

Fixes the two remaining Thai failures (see previous commit).
2012-07-23 13:52:07 -04:00
Behdad Esfahbod
4a7f4f3e56 [Thai] Adjust SARA AM reordering to match Uniscribe
Adjust the list of marks before SARA AM that get the reordering
treatment.  Also adjust cluster formation to match Uniscribe.

With Wikipedia test data, now I see:

  - For Thai, with the Angsana New font from Win7, I see 54 failures out
    of over 4M tests  (0.00129107%).  Of the 54, two are legitimate
    reordering issues (fix coming soon), and the other 52 are simply
    Uniscribe using a zero-width space char instead of an unknown
    character for missing glyphs.  No idea why.  The missing-glyph
    sequences include one that is a Thai character followed by an Arabic
    Sokun.  Someone confused it with Nikhahit I assume!

  - For Lao, with the Dokchampa font from Win7, 33 tests fail out of
    54k (0.0615167%).  All seem to be insignificant mark positioning
    with two marks on a base.  Have to investigate.
2012-07-23 13:15:33 -04:00
Behdad Esfahbod
60554f14d8 [Indic] Merge in Malayalam tests
From:
http://silpa.org.in/pub/tests/hb/ml/ml-harfbuzz-testdata.txt
2012-07-22 23:23:56 -04:00
Behdad Esfahbod
5c7081770c [Indic] Add extensive Sinhala tests
Generated by:
http://git.savannah.gnu.org/cgit/sinhala.git/plain/utils/gen-unicode-sinhala.py
2012-07-22 23:20:27 -04:00
Behdad Esfahbod
2efe4707b1 [Indic] Add Sinhala tests
Merge tests from:
http://git.savannah.gnu.org/cgit/sinhala.git/plain/patches/icu-sinhala-rendering.txt
2012-07-22 23:17:59 -04:00
Behdad Esfahbod
3d4c111b7a Add a test case 2012-07-20 19:34:39 -04:00
Behdad Esfahbod
bdd080431a [Indic] Reposition Oriya Candrabindu
Oriya failures down from 0.65% to 0.20%.
2012-07-20 16:03:09 -04:00
Behdad Esfahbod
87cd63266e [Indic] Recategorize some Kannada right matras
Kannada failures down from 3.5% to 2.93%.
2012-07-19 21:25:46 -04:00
Behdad Esfahbod
c87bcddb10 [Indic] Add failing test for Kannada 2012-07-19 20:03:25 -04:00
Behdad Esfahbod
deeb540a74 [test] Ignore tests with DOTTED CIRCLE in the output 2012-07-19 11:30:48 -04:00
Behdad Esfahbod
422ecd2d3c [Indic] Accept a forced Rakar sequence at the end of syllable
In Sinhala, Rakar is formed by Al-Lakuna,ZWJ,Ra.  If you put that at the
end of a Consonant,Matra syllable, you get a dotted-circle from
Uniscribe.  Apparently adding a ZWJ before the Al-Lakuna "fixes" that.
And people have been encoding that sequence...  So, allow a forced
"ZWJ,Virama,ZWJ,Ra" sequence at the of syllables.

Fixes some 100 or more of Sinhala failures.  Now at 622 only (0.23%).
2012-07-18 23:25:58 -04:00
Behdad Esfahbod
10cdc94eee [Indic] In final reordering, find base, even if it disappeared
POS_BASE can disappear if base ligated backward.  Define base as last
with position not after base.

Fixes a few hundred of Sinhala failures with Iskoola Pota.
2012-07-18 17:43:23 -04:00
Behdad Esfahbod
3285e107c9 [Indic] Implement Sinhala "Al Lakuna" Reph behavior
In Sinhala, Reph is formed only explicitly, by the presence of a ZWJ.
2012-07-18 17:22:14 -04:00
Behdad Esfahbod
552d19b7a1 [Indic] Treat Register Shifters like Nukta
Really this time.

Fixes another 18 Khmer tests.
2012-07-18 16:02:33 -04:00
Behdad Esfahbod
69f26bf39c [Indic] Fix Matra reordering when base is at end of syllable
For example: U+915,U+200c,U+93f

Fixes last Tamil failure!
2012-07-18 15:47:51 -04:00
Behdad Esfahbod
391cc03317 [Indic] Allow halant group in Vowel and placeholder syllables
Fixes 2 out of 560 Devanagari failures.  AND:
Fixes 1 out of 2 Tamil failures.
2012-07-18 15:12:49 -04:00
Behdad Esfahbod
418d00dffd [Indic] Minor 2012-07-18 14:57:28 -04:00
Behdad Esfahbod
25bc489498 [Indic] Better categorize Register Shifters and Khmer Various signs
Down another 500 or so Khmer failures!
2012-07-17 17:53:03 -04:00
Behdad Esfahbod
34b5714906 [Indic] Treat Khmer Register Shifters more like Nuktas
Except that there may be a ZWNJ before a Register Shifter.
2012-07-17 14:09:32 -04:00
Behdad Esfahbod
0201e0a464 [Indic] Apply 'cfar' for Khmer
Mark stuff after a pre-base reordering Ro 'cfar'.  Used in Khmer.
This allows distinguishing the following cases with MS Khmer fonts:

  U+1784,U+17D2,U+179A,U+17D2,U+1782
  U+1784,U+17D2,U+1782,U+17D2,U+179A
2012-07-17 13:56:24 -04:00
Behdad Esfahbod
55f70ebfb9 [Indic] Position final subjoined consonants (and vowels) after matras
In Khmer, a final subjoined consonant or independent vowel can occur
after matras.  This final subjoined thing should NOT be reordered to
before the matra even though it's subjoined.

Fixes another 1k of the Khmer failures.  Not much left really.
2012-07-17 12:50:13 -04:00
Behdad Esfahbod
c50ed71e9a [Indic] Recategorize Khmer coeng sign as a separate category OT_Coeng
Amend the syllable structure to allow a final subscripted consonant
(Coeng+C) and a final subscripted independent vowel (Coeng+V).
Fixes another 2k of Khmer failures.
2012-07-17 11:54:28 -04:00
Behdad Esfahbod
74ccc6a132 [Indic] Move Halant with after-base consonants
Normally, we attach the Halant to the previous character and move it
with it.  For after-base consonants however, the Halant "belongs" to the
consonant after, so attach it so.

This fixes Bengali sequences involving post-base consonant Ya, which
should ligate with the Halant to form Ya Phala, but previously a
reordered matras was blocking the ligation.
2012-07-17 11:16:19 -04:00
Behdad Esfahbod
d5c4edcdd6 [Indic] Apply presentation-forms features all at once
Seems like this is what Uniscribe is doing, and does not break any fonts
we tested (with Devanagari, Malayalam, Khmer, and Bengali), while fixing
some Ra Phala sequences for Bengali with Vrinda.  Fixes another 2% of
Bengali failures (a couple more to go).
2012-07-17 10:40:59 -04:00
Behdad Esfahbod
6de103547e [test/arabic] Add Arabic tests for mark skipping
Expose a bug with Khaled's Hussaini Nastaleeq font.
2012-07-16 22:46:52 -04:00
Behdad Esfahbod
1167c7bfc9 Minor 2012-07-11 18:00:28 -04:00
Behdad Esfahbod
aa116582e6 Minor 2012-07-11 18:00:28 -04:00
Behdad Esfahbod
b0a6e58bb3 s/script-punjabi/script-gurmukhi/ 2012-06-04 10:21:22 -04:00
Behdad Esfahbod
4efdffec09 Minor Malayalam test case
From https://bugs.freedesktop.org/show_bug.cgi?id=45166
2012-05-28 10:45:50 -04:00
Behdad Esfahbod
dfff5b3021 Add Myanmar test case 2012-05-28 10:45:50 -04:00
Behdad Esfahbod
ff3524c21a Add Arabic diacritics tests 2012-05-23 21:50:43 -04:00
Behdad Esfahbod
a6de53664d Add CJK Compatibility Ideographs tests
From:
http://people.mozilla.org/~jdaggett/tests/cjkcompat.html
2012-05-18 15:04:35 -04:00
Behdad Esfahbod
f538fcb538 [test] Make tool usage easier by not requiring "--stdin"
Just default to it.  Added "--help" instead to get usage.
2012-05-12 15:34:40 +02:00
Behdad Esfahbod
a3273e30bb [Indic] Add more Malayalam tests 2012-05-12 13:34:18 +02:00
Behdad Esfahbod
5b16de97bc [Indic] Add tests for dottedcircle 2012-05-11 19:55:42 +02:00
Behdad Esfahbod
c071b99f15 [Indic] Add test for Left Matra with Halant
Uniscribe doesn't move the Halant, we do.  And do a broken job of it now.
2012-05-11 16:22:46 +02:00
Behdad Esfahbod
b20c9ebaf5 [Indic] Add test for matra group
The spec says: "[{M}+[N]+[H]]", and that's what Uniscribe implements.
We instead do: "{M+[N]+[H]}", which means we allow Nukta and Halant
after all Matras, not just the last one.  It makes more sense.
2012-05-10 18:31:17 +02:00
Behdad Esfahbod
61a58e26a5 [Indic] Add tricky reordering test cases
In the case of Consonant,LeftMatra,Halant, Uniscribe leaves the Halant
where it is, but we want to move it with the Matra as that makes more
logical sense.
2012-05-10 14:43:53 +02:00
Behdad Esfahbod
3943293a99 [Indic] Add joiner test cases for Devanagari 2012-05-09 15:27:56 +02:00
Behdad Esfahbod
2214a03900 Add hb-diff-ngrams 2012-05-09 09:54:54 +02:00
Behdad Esfahbod
178e6dce01 Add N-gram generator 2012-05-09 08:57:29 +02:00
Behdad Esfahbod
98669ceb77 Use groupby() 2012-05-09 08:16:15 +02:00
Behdad Esfahbod
c438a14b62 Add hb-diff-stat 2012-05-09 07:45:17 +02:00
Behdad Esfahbod
1058d031e2 Make hb-diff-filter-failtures retain all test info for failed tests 2012-05-09 07:35:28 +02:00
Behdad Esfahbod
f1eb008cc7 Add hb-diff-colorize
Accepts --format=html now.
2012-05-09 00:01:50 +02:00
Behdad Esfahbod
9155e4ffe0 Cleanup diff
Doesn't do --color anymore.  That will go into a new hb-diff-colorize
tool.
2012-05-08 22:44:21 +02:00
Behdad Esfahbod
7d22135b4c Make hb-diff faster 2012-05-08 19:38:49 +02:00
Behdad Esfahbod
a93e238e05 More tests 2012-05-08 18:55:29 +02:00
Behdad Esfahbod
585b107cde Add test caes for a minority language using Bengali
U+0985 BENGALI LETTER A followed by U+09D7 BENGALI AU LENGTH MARK.
According to Bobby de Vos on the mailing list, this results in a dotted
circle with most shaping engines, but is a legitimate sequence in this
minority language.

We reached the consensus on the list to NOT implement dotted-circle
in HarfBuzz.
2012-04-24 16:00:50 -04:00