harfbuzz/docs/usermanual-opentype-features.xml

<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
               "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
  <!ENTITY % local.common.attrib "xmlns:xi  CDATA  #FIXED 'http://www.w3.org/2003/XInclude'">
  <!ENTITY version SYSTEM "version.xml">
]>
<chapter id="shaping-and-shape-plans">
  <title>Shaping and shape plans</title>
  <para>
    Once you have your face and font objects configured as desired and
    your input buffer is filled with the characters you need to shape,
    all you need to do is call <function>hb_shape()</function>.
  </para>
  <para>
    HarfBuzz will return the shaped version of the text in the same
    buffer that you provided, but it will be in output mode. At that
    point, you can iterate through the glyphs in the buffer, drawing
    each one at the specified position or handing them off to the
    appropriate graphics library.
  </para>
  <para>
    For the most part, HarfBuzz's shaping step is straightforward from
    the outside. But that doesn't mean there will never be cases where
    you want to look under the hood and see what is happening on the
    inside. HarfBuzz provides facilities for doing that, too.
  </para>

  <section id="shaping-buffer-output">
    <title>Shaping and buffer output</title>
    <para>
      The <function>hb_shape()</function> function call takes four arguments: the font
      object to use, the buffer of characters to shape, an array of
      user-specified features to apply, and the length of that feature
      array. The feature array can be NULL, so for the sake of
      simplicity we will start with that case.
    </para>
    <para>
      Internally, HarfBuzz looks  at the tables of the font file to
      determine where glyph classes, substitutions, and positioning
      are defined, using that information to decide which
      <emphasis>shaper</emphasis> to use (<literal>ot</literal> for
      OpenType fonts, <literal>aat</literal> for Apple Advanced
      Typography fonts, and so on). It also looks at the direction,
      script, and language properties of the segment to figure out
      which script-specific shaping model is needed (at least, in
      shapers that support multiple options).
    </para>
    <para>
      If a font has a GDEF table, then that is used for
      glyph classes; if not, HarfBuzz will fall back to Unicode
      categorization by code point. If a font has an AAT <literal>morx</literal> table,
      then it is used for substitutions; if not, but there is a GSUB
      table, then the GSUB table is used. If the font has an AAT
      <literal>kerx</literal> table, then it is used for positioning; if not, but
      there is a GPOS table, then the GPOS table is used. If neither
      table is found, but there is a <literal>kern</literal> table, then HarfBuzz will
      use the <literal>kern</literal> table. If there is no <literal>kerx</literal>, no GPOS, and no
      <literal>kern</literal>, HarfBuzz will fall back to positioning marks itself.
    </para>
    <para>
      With a well-behaved OpenType font, you expect GDEF, GSUB, and
      GPOS tables to all be applied. HarfBuzz implements the
      script-specific shaping models in internal functions, rather
      than in the public API.
    </para>
    <para>
      The algorithms
      used for complex scripts can be quite involved; HarfBuzz tries
      to be compatible with the OpenType Layout specification
      and, wherever there is any ambiguity, HarfBuzz attempts to replicate the
      output of Microsoft's Uniscribe engine. See the <ulink
      url="https://docs.microsoft.com/en-us/typography/script-development/standard">Microsoft
      Typography pages</ulink> for more detail.
    </para>
    <para>
      In general, though, all that you need to know is that
      <function>hb_shape()</function> returns the results of shaping
      in the same buffer that you provided. The buffer's content type
      will now be set to
      <literal>HB_BUFFER_CONTENT_TYPE_GLYPHS</literal>, indicating
      that it contains shaped output, rather than input text. You can
      now extract the glyph information and positioning arrays:
    </para>
    <programlisting language="C">
      hb_glyph_info_t *glyph_info    = hb_buffer_get_glyph_infos(buf, &amp;glyph_count);
      hb_glyph_position_t *glyph_pos = hb_buffer_get_glyph_positions(buf, &amp;glyph_count);
    </programlisting>
    <para>
      The glyph information array holds a <type>hb_glyph_info_t</type>
      for each output glyph, which has two fields:
      <parameter>codepoint</parameter> and
      <parameter>cluster</parameter>. Whereas, in the input buffer,
      the <parameter>codepoint</parameter> field contained the Unicode
      code point, it now contains the glyph ID of the corresponding
      glyph in the font. The <parameter>cluster</parameter> field is
      an integer that you can use to help identify when shaping has
      reordered, split, or combined code points; we will say more
      about that in the next chapter.
    </para>
    <para>
      The glyph positions array holds a corresponding
      <type>hb_glyph_position_t</type> for each output glyph,
      containing four fields: <parameter>x_advance</parameter>,
      <parameter>y_advance</parameter>,
      <parameter>x_offset</parameter>, and
      <parameter>y_offset</parameter>. The advances tell you how far
      you need to move the drawing point after drawing this glyph,
      depending on whether you are setting horizontal text (in which
      case you will have x advances) or vertical text (for which you
      will have y advances). The x and y offsets tell you where to
      move to start drawing the glyph; usually you will have both and
      x and a y offset, regardless of the text direction.
    </para>
    <para>
      Most of the time, you will rely on a font-rendering library or
      other graphics library to do the actual drawing of glyphs, so
      you will need to iterate through the glyphs in the buffer and
      pass the corresponding values off.
    </para>
  </section>

  <section id="shaping-opentype-features">
    <title>OpenType features</title>
    <para>
      OpenType features enable fonts to include smart behavior,
      implemented as "lookup" rules stored in the GSUB and GPOS
      tables. The OpenType specification defines a long list of
      standard features that fonts can use for these behaviors; each
      feature has a four-character reserved name and a well-defined
      semantic meaning.
    </para>
    <para>
      Some OpenType features are defined for the purpose of supporting
      complex-script shaping, and are automatically activated, but
      only when a buffer's script property is set to a script that the
      feature supports.
    </para>
    <para>
      Other features are more generic and can apply to several (or
      any) script, and shaping engines are expected to implement
      them. By default, HarfBuzz activates several of these features
      on every text run. They include <literal>abvm</literal>,
      <literal>blwm</literal>, <literal>ccmp</literal>,
      <literal>locl</literal>, <literal>mark</literal>,
      <literal>mkmk</literal>, and <literal>rlig</literal>.
    </para>
    <para>
      In addition, if the text direction is horizontal, HarfBuzz
      also applies the <literal>calt</literal>,
      <literal>clig</literal>, <literal>curs</literal>,
      <literal>dist</literal>, <literal>kern</literal>,
      <literal>liga</literal> and <literal>rclt</literal>, features.
    </para>
    <para>
      Additionally, when HarfBuzz encounters a fraction slash
      (<literal>U+2044</literal>), it looks backward and forward for decimal
      digits (Unicode General Category = Nd), and enables features
      <literal>numr</literal> on the sequence before the fraction slash,
      <literal>dnom</literal> on the sequence after the fraction slash,
      and <literal>frac</literal> on the whole sequence including the fraction
      slash.
    </para>
    <para>
      Some script-specific shaping models
      (see <xref linkend="opentype-shaping-models" />) disable some of the
      features listed above:
    </para>
    <itemizedlist>
      <listitem>
        <para>
          Hangul: <literal>calt</literal>
	</para>
      </listitem>
      <listitem>
        <para>
          Indic: <literal>liga</literal>
	</para>
      </listitem>
      <listitem>
        <para>
          Khmer: <literal>liga</literal>
	</para>
      </listitem>
    </itemizedlist>
    <para>
      If the text direction is vertical, HarfBuzz applies
      the <literal>vert</literal> feature by default.
    </para>
    <para>
      Still other features are designed to be purely optional and left
      up to the application or the end user to enable or disable as desired.
    </para>
    <para>
      You can adjust the set of features that HarfBuzz applies to a
      buffer by supplying an array of <type>hb_feature_t</type>
      features as the third argument to
      <function>hb_shape()</function>. For a simple case, let's just
      enable the <literal>dlig</literal> feature, which turns on any
      "discretionary" ligatures in the font:
    </para>
    <programlisting language="C">
      hb_feature_t userfeatures[1];
      userfeatures[0].tag = HB_TAG('d','l','i','g');
      userfeatures[0].value = 1;
      userfeatures[0].start = HB_FEATURE_GLOBAL_START;
      userfeatures[0].end = HB_FEATURE_GLOBAL_END;
    </programlisting>
    <para>
      <literal>HB_FEATURE_GLOBAL_END</literal> and
      <literal>HB_FEATURE_GLOBAL_END</literal> are macros we can use
      to indicate that the features will be applied to the entire
      buffer. We could also have used a literal <literal>0</literal>
      for the start and a <literal>-1</literal> to indicate the end of
      the buffer (or have selected other start and end positions, if needed).
    </para>
    <para>
      When we pass the <varname>userfeatures</varname> array to
      <function>hb_shape()</function>, any discretionary ligature
      substitutions from our font that match the text in our buffer
      will get performed:
    </para>
    <programlisting language="C">
      hb_shape(font, buf, userfeatures, num_features);
    </programlisting>
    <para>
      Just like we enabled the <literal>dlig</literal> feature by
      setting its <parameter>value</parameter> to
      <literal>1</literal>, you would disable a feature by setting its
      <parameter>value</parameter> to <literal>0</literal>. Some
      features can take other <parameter>value</parameter> settings;
      be sure you read the full specification of each feature tag to
      understand what it does and how to control it.
    </para>
  </section>

  <section id="shaping-shaper-selection">
    <title>Shaper selection</title>
    <para>
      The basic version of <function>hb_shape()</function> determines
      its shaping strategy based on examining the capabilities of the
      font file. OpenType font tables cause HarfBuzz to try the
      <literal>ot</literal> shaper, while AAT font tables cause HarfBuzz to try the
      <literal>aat</literal> shaper.
    </para>
    <para>
      In the real world, however, a font might include some unusual
      mix of tables, or one of the tables might simply be broken for
      the script you need to shape. So, sometimes, you might not
      want to rely on HarfBuzz's process for deciding what to do, and
      just tell <function>hb_shape()</function> what you want it to try.
    </para>
    <para>
      <function>hb_shape_full()</function> is an alternate shaping
      function that lets you supply a list of shapers for HarfBuzz to
      try, in order, when shaping your buffer. For example, if you
      have determined that HarfBuzz's attempts to work around broken
      tables gives you better results than the AAT shaper itself does,
      you might move the AAT shaper to the end of your list of
      preferences and call <function>hb_shape_full()</function>
    </para>
    <programlisting language="C">
      char *shaperprefs[3] = {"ot", "default", "aat"};
      ...
      hb_shape_full(font, buf, userfeatures, num_features, shaperprefs);
    </programlisting>
    <para>
      to get results you are happier with.
    </para>
    <para>
      You may also want to call
      <function>hb_shape_list_shapers()</function> to get a list of
      the shapers that were built at compile time in your copy of HarfBuzz.
    </para>
  </section>

  <section id="shaping-plans-and-caching">
    <title>Plans and caching</title>
    <para>
      Internally, HarfBuzz uses a structure called a shape plan to
      track its decisions about how to shape the contents of a
      buffer. The <function>hb_shape()</function> function builds up the shape plan by
      examining segment properties and by inspecting the contents of
      the font.
    </para>
    <para>
      This process can involve some decision-making and
      trade-offs — for example, HarfBuzz inspects the GSUB and GPOS
      lookups for the script and language tags set on the segment
      properties, but it falls back on the lookups under the
      <literal>DFLT</literal> tag (and sometimes other common tags)
      if there are actually no lookups for the tag requested.
    </para>
    <para>
      HarfBuzz also includes some work-arounds for
      handling well-known older font conventions that do not follow
      OpenType or Unicode specifications, for buggy system fonts, and for
      peculiarities of Microsoft Uniscribe. All of that means that a
      shape plan, while not something that you should edit directly in
      client code, still might be an object that you want to
      inspect. Furthermore, if resources are tight, you might want to
      cache the shape plan that HarfBuzz builds for your buffer and
      font, so that you do not have to rebuild it for every shaping call.
    </para>
    <para>
      You can create a cacheable shape plan with
      <function>hb_shape_plan_create_cached(face, props,
      user_features, num_user_features, shaper_list)</function>, where
      <parameter>face</parameter> is a face object (not a font object,
      notably), <parameter>props</parameter> is an
      <type>hb_segment_properties_t</type>,
      <parameter>user_features</parameter> is an array of
      <type>hb_feature_t</type>s (with length
      <parameter>num_user_features</parameter>), and
      <parameter>shaper_list</parameter> is a list of shapers to try.
    </para>
    <para>
      Shape plans are objects in HarfBuzz, so there are
      reference-counting functions and user-data attachment functions
      you can
      use. <function>hb_shape_plan_reference(shape_plan)</function>
      increases the reference count on a shape plan, while
      <function>hb_shape_plan_destroy(shape_plan)</function> decreases
      the reference count, destroying the shape plan when the last
      reference is dropped.
    </para>
    <para>
      You can attach user data to a shaper (with a key) using the
      <function>hb_shape_plan_set_user_data(shape_plan,key,data,destroy,replace)</function>
      function, optionally supplying a <function>destroy</function>
      callback to use. You can then fetch the user data attached to a
      shape plan with
      <function>hb_shape_plan_get_user_data(shape_plan, key)</function>.
    </para>
  </section>

</chapter>