skia2/include/core/SkAnnotation.h
Ben Wagner 225c8861b7 [pdf] Differentiate text from byte strings.
In PDF there are two different physical encodings of strings (as a
sequence of bytes). There is the literal string encoding which is
delimited by '(' and ') which stores the bytes as-is except for '(',
')', and the escape character '\' (which can introduce octal encoded
bytes). There is also the hex string encoding delimited by '<' and '>'
and the bytes are encoded in hex pairs (with an implicit '0' at the end
for odd length encodings).

The interpretation of these bytes depends on the logical string type of
the dictionary key. There is a base abstract (well, almost abstract
except for legacy purposes) string type. The subtypes of the string type
are `text string`, `ASCII string`, and `byte string`. The `text string`
is logically further subtyped into `PDFDocEncoded string` and `UTF-16BE
with BOM`. In theory any of these logical string types may have its
encoded bytes written out in either of the two physical string
encodings.

In practice for Skia this means there are two types of string to keep
track of, since `ASCII string` and `byte string` can be treated the
same (in this change they are both treated as `byte string`). If the
type is `text string` then the bytes Skia has are interpreted as UTF-8
and may be converted to `UTF-16BE with BOM` or used directly as
`PDFDocEncoded string` if that is valid. If the type is `byte string`
then the bytes Skia has may not be converted and must be written as-is.

This means that when Skia sets a dictionary key to a string value it
must, at the very least, remember if the key's type was `text string`.
This change replaces all `String` methods with `ByteString` and
`TextString` methods and updates all the callers to the correct one
based on the key being written.

With the string handling corrected, the `/ActualText` string is now
emitted with this new common code as well for better output and to
reduce code duplication. A few no longer used public APIs involving
these strings are removed. The documentation for the URI annotation is
updated to reflect reality.

This change outputs `UTF-16BE with BOM` with the hex string encoding
only and does not attempt to fix the literal string encoding which
always escapes bytes > 0x7F. These changes may be attempted in a
separate change.

Bug: chromium:1323159
Change-Id: I00bdd5c90ad1ff2edfb74a9de41424c4eeac5ccb
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/543084
Reviewed-by: Derek Sollenberger <djsollen@google.com>
Commit-Queue: Ben Wagner <bungeman@google.com>
Reviewed-by: Herb Derby <herb@google.com>
2022-05-24 18:46:42 +00:00

53 lines
1.4 KiB
C++

/*
* Copyright 2012 Google Inc.
*
* Use of this source code is governed by a BSD-style license that can be
* found in the LICENSE file.
*/
#ifndef SkAnnotation_DEFINED
#define SkAnnotation_DEFINED
#include "include/core/SkTypes.h"
class SkData;
struct SkPoint;
struct SkRect;
class SkCanvas;
/**
* Annotate the canvas by associating the specified URL with the
* specified rectangle (in local coordinates, just like drawRect).
*
* The URL is expected to be escaped and be valid 7-bit ASCII.
*
* If the backend of this canvas does not support annotations, this call is
* safely ignored.
*
* The caller is responsible for managing its ownership of the SkData.
*/
SK_API void SkAnnotateRectWithURL(SkCanvas*, const SkRect&, SkData*);
/**
* Annotate the canvas by associating a name with the specified point.
*
* If the backend of this canvas does not support annotations, this call is
* safely ignored.
*
* The caller is responsible for managing its ownership of the SkData.
*/
SK_API void SkAnnotateNamedDestination(SkCanvas*, const SkPoint&, SkData*);
/**
* Annotate the canvas by making the specified rectangle link to a named
* destination.
*
* If the backend of this canvas does not support annotations, this call is
* safely ignored.
*
* The caller is responsible for managing its ownership of the SkData.
*/
SK_API void SkAnnotateLinkToDestination(SkCanvas*, const SkRect&, SkData*);
#endif