In PDF there are two different physical encodings of strings (as a
sequence of bytes). There is the literal string encoding which is
delimited by '(' and ') which stores the bytes as-is except for '(',
')', and the escape character '\' (which can introduce octal encoded
bytes). There is also the hex string encoding delimited by '<' and '>'
and the bytes are encoded in hex pairs (with an implicit '0' at the end
for odd length encodings).
The interpretation of these bytes depends on the logical string type of
the dictionary key. There is a base abstract (well, almost abstract
except for legacy purposes) string type. The subtypes of the string type
are `text string`, `ASCII string`, and `byte string`. The `text string`
is logically further subtyped into `PDFDocEncoded string` and `UTF-16BE
with BOM`. In theory any of these logical string types may have its
encoded bytes written out in either of the two physical string
encodings.
In practice for Skia this means there are two types of string to keep
track of, since `ASCII string` and `byte string` can be treated the
same (in this change they are both treated as `byte string`). If the
type is `text string` then the bytes Skia has are interpreted as UTF-8
and may be converted to `UTF-16BE with BOM` or used directly as
`PDFDocEncoded string` if that is valid. If the type is `byte string`
then the bytes Skia has may not be converted and must be written as-is.
This means that when Skia sets a dictionary key to a string value it
must, at the very least, remember if the key's type was `text string`.
This change replaces all `String` methods with `ByteString` and
`TextString` methods and updates all the callers to the correct one
based on the key being written.
With the string handling corrected, the `/ActualText` string is now
emitted with this new common code as well for better output and to
reduce code duplication. A few no longer used public APIs involving
these strings are removed. The documentation for the URI annotation is
updated to reflect reality.
This change outputs `UTF-16BE with BOM` with the hex string encoding
only and does not attempt to fix the literal string encoding which
always escapes bytes > 0x7F. These changes may be attempted in a
separate change.
Bug: chromium:1323159
Change-Id: I00bdd5c90ad1ff2edfb74a9de41424c4eeac5ccb
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/543084
Reviewed-by: Derek Sollenberger <djsollen@google.com>
Commit-Queue: Ben Wagner <bungeman@google.com>
Reviewed-by: Herb Derby <herb@google.com>
It was added for Chromium, and Chromium has switched to using
fTypeString instead.
Change-Id: I8cd8ae00b0c3abf3691ce14837afbe3be939538e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/316209
Commit-Queue: Ben Wagner <bungeman@google.com>
Reviewed-by: Derek Sollenberger <djsollen@google.com>
Reviewed-by: Ben Wagner <bungeman@google.com>
Add SkPDF::AttributeList::appendNodeIdArray so that clients
don't need to re-implement/duplicate NodeIdToString in order to
add attributes that express the relationship between nodes.
Follow-up to:
https://chromium-review.googlesource.com/c/chromium/src/+/2251058
This deletes appendNameArray and appendStringArray since there's
no immediate need for them, but we may add them back if needed.
Bug: chromium:607777
Change-Id: If9b1527f97c7b52bb1bdad3c0828067bb76f25f2
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/297277
Commit-Queue: Ben Wagner <bungeman@google.com>
Reviewed-by: Ben Wagner <bungeman@google.com>
fChildren/fChildCount was replaced with fChildVector and Chromium
migrated to the new API a while back, it's now safe to remove the
older interface.
Bug: chromium:607777
Change-Id: I7311d3b51f1b71209dcc024ae51637536d560619
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/297260
Reviewed-by: Ben Wagner <bungeman@google.com>
Commit-Queue: Ben Wagner <bungeman@google.com>
In PDF files, "names" and "strings" are not the same thing, but I was
conflating them. Separate out the interfaces for adding attributes to
PDF struct tree elements so there's a way to add either a name or a string,
and similarly for arrays of names or arrays of strings.
Fix the table test to correctly use a name for the "Scope" attribute
and an array of strings for the "Headers" attribute.
Bug: chromium:607777
Change-Id: Ib30bded2bbcf96e31ba6925fb062615558dea0db
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/296338
Reviewed-by: Ben Wagner <bungeman@google.com>
Reviewed-by: Derek Sollenberger <djsollen@google.com>
Auto-Submit: Dominic Mazzoni <dmazzoni@chromium.org>
Commit-Queue: Derek Sollenberger <djsollen@google.com>
At the time Chromium is painting, we're passing node IDs
along with painting commands to enable tagging. However,
this assumes that all nodes will end up in the structure
tree, which we might not want.
Instead, allow the client to prune the structure tree
later before telling Skia to generate the PDF, but
keep all of the node IDs to be matched up with.
As an example, suppose the doc looks like this:
root id=1
paragraph id=2
div id=3
text1 id=4
link id=5
text2 id=6
The pruned tree passed to Skia would look like this:
root id=1
paragraph id=2 extra_ids=3,4
link id=5 extra_ids=6
We need to pass the extra node IDs into Skia so
that when content is tagged with id=4, we know to
map that to the paragraph node with id=2 instead.
Note that the resulting PDF document will *not*
have any of these extra IDs, they're all remapped
and consolidated.
While it's not strictly necessary that this is done
in Skia, it's easiest to implement it here. Doing the
same upstream would require replaying an SkPicture
and rewriting all of the node IDs.
Bug: chromium:607777
Change-Id: I0ecb62651e60b84cc5b9d053d7f7d3b9efda1470
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/272462
Reviewed-by: Ben Wagner <bungeman@google.com>
Commit-Queue: Dominic Mazzoni <dmazzoni@chromium.org>
Use a std::vector of std::unique_ptr for the children, rather than an
error-prone raw pointer.
Use a string for the element type, rather than an enum.
In both cases, both the old and new format are supported and tested
within Skia until Chromium has migrated.
Also add support for Alt and Lang, which are like attributes
but go right on the structure node. There are only a small
handful of attributes that go on the structure node so this
seems like the simplest solution.
Bug: chromium:607777
Change-Id: I4f315685df35dd9dcd8e1bca6d275cbeb8f2c67a
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/271816
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Dominic Mazzoni <dmazzoni@chromium.org>
Originally landed: https://skia-review.googlesource.com/c/skia/+/268878
Reverted: https://skia-review.googlesource.com/c/skia/+/271858
The issue was with compilation when PDF support is disabled. See
the diff between patchsets 1 and 2.
This is an important part of writing a tagged PDF. Many of the nodes
in the document structure tree need additional attributes, just like
in HTML.
This change aims to add support for a few useful attributes, not to
be comprehensive.
Bug: chromium:1039816
Change-Id: I15f8b6c41d4fdaa4b6e21775ab6d26ec57eb0f5d
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/271916
Commit-Queue: Dominic Mazzoni <dmazzoni@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
This reverts commit 80474156d1.
Reason for revert: breaking chrome roll
https://ci.chromium.org/p/chromium/builders/try/cast_shell_linux/533554
Original change's description:
> Support adding attributes to PDF document structure nodes.
>
> This is an important part of writing a tagged PDF. Many of the nodes
> in the document structure tree need additional attributes, just like
> in HTML.
>
> This change aims to add support for a few useful attributes, not to
> be comprehensive.
>
> Bug: chromium:1039816
>
> Change-Id: I64a6b36b0b4ec42fd27ae4ad702afce95c95af5d
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/268878
> Commit-Queue: Dominic Mazzoni <dmazzoni@chromium.org>
> Commit-Queue: Mike Reed <reed@google.com>
> Auto-Submit: Dominic Mazzoni <dmazzoni@chromium.org>
> Reviewed-by: Mike Reed <reed@google.com>
> Reviewed-by: Derek Sollenberger <djsollen@google.com>
TBR=djsollen@google.com,reed@google.com,dmazzoni@chromium.org,dmazzoni@google.com
Change-Id: Iedd397303e870144e8d282db0cb81c535a783e8b
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: chromium:1039816
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/271858
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Reed <reed@google.com>
This is an important part of writing a tagged PDF. Many of the nodes
in the document structure tree need additional attributes, just like
in HTML.
This change aims to add support for a few useful attributes, not to
be comprehensive.
Bug: chromium:1039816
Change-Id: I64a6b36b0b4ec42fd27ae4ad702afce95c95af5d
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/268878
Commit-Queue: Dominic Mazzoni <dmazzoni@chromium.org>
Commit-Queue: Mike Reed <reed@google.com>
Auto-Submit: Dominic Mazzoni <dmazzoni@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
Reviewed-by: Derek Sollenberger <djsollen@google.com>
Bug: skia:9552
Change-Id: Ifa199048e6d38ccb28f055b77128971411203188
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/249800
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Hal Canary <halcanary@google.com>
Current strategy: everything from the top
Things to look at first are the manual changes:
- added tools/rewrite_includes.py
- removed -Idirectives from BUILD.gn
- various compile.sh simplifications
- tweak tools/embed_resources.py
- update gn/find_headers.py to write paths from the top
- update gn/gn_to_bp.py SkUserConfig.h layout
so that #include "include/config/SkUserConfig.h" always
gets the header we want.
No-Presubmit: true
Change-Id: I73a4b181654e0e38d229bc456c0d0854bae3363e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/209706
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Hal Canary <halcanary@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
All PDF Streams are immediatly serialized, and are never long-lived
in memory.
if EXPERIMENTAL fExecutor is set on the Document, streams are
compressed in parallel.
Results for PDFBigDocBench:
without patch 1807885.01 μs
with patch without executor 1802808.35 μs
with patch with executor 246313.72 μs
SkPDFStreamOut() function replaces SkPDFStream classes.
Page resources are all serialized early.
Several Document-level objects are serialzied early.
SkUUID introduced as top-level object.
Many {insert|append}ObjRef() converted to memory efficient
{insert|append}Ref().
Bug: skia:8630
Change-Id: Ic336917d0c8b9ac1c2423b43bfe9b49a3533fbff
Reviewed-on: https://skia-review.googlesource.com/c/176588
Auto-Submit: Hal Canary <halcanary@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Hal Canary <halcanary@google.com>
Adds an interface for the document creator to pass in a tree
of tags indicating the structure of the document, each with a type
(from a predetermined enum of possible types) and a node ID.
It also adds a setNodeId function to SkCanvas so that page content
can be associated with a particular tag. If both the tag tree and
marked content are present, Skia can now output a properly tagged
PDF.
An example program is included. When used properly, the PDF generated
by this patch is valid and the tags are parsed properly by Adobe
Acrobat. It handles many corner cases like content that spans more
than one page, or tags that don't correspond to any marked content, or
marked content that doesn't correspond to any tags.
However, it doesn't implement all of the features of PDF accessibility
yet, there are some additional attributes that can be associated with
some tags that need to be supported, too, in order to properly tag
things like figures and tables.
Bug: skia:8148
Change-Id: I2e448eca8ded8e1b29ba685663b557ae7ad7e23e
Reviewed-on: https://skia-review.googlesource.com/141138
Reviewed-by: Hal Canary <halcanary@google.com>