Commit Graph

19 Commits

Author SHA1 Message Date
Ben Wagner
225c8861b7 [pdf] Differentiate text from byte strings.
In PDF there are two different physical encodings of strings (as a
sequence of bytes). There is the literal string encoding which is
delimited by '(' and ') which stores the bytes as-is except for '(',
')', and the escape character '\' (which can introduce octal encoded
bytes). There is also the hex string encoding delimited by '<' and '>'
and the bytes are encoded in hex pairs (with an implicit '0' at the end
for odd length encodings).

The interpretation of these bytes depends on the logical string type of
the dictionary key. There is a base abstract (well, almost abstract
except for legacy purposes) string type. The subtypes of the string type
are `text string`, `ASCII string`, and `byte string`. The `text string`
is logically further subtyped into `PDFDocEncoded string` and `UTF-16BE
with BOM`. In theory any of these logical string types may have its
encoded bytes written out in either of the two physical string
encodings.

In practice for Skia this means there are two types of string to keep
track of, since `ASCII string` and `byte string` can be treated the
same (in this change they are both treated as `byte string`). If the
type is `text string` then the bytes Skia has are interpreted as UTF-8
and may be converted to `UTF-16BE with BOM` or used directly as
`PDFDocEncoded string` if that is valid. If the type is `byte string`
then the bytes Skia has may not be converted and must be written as-is.

This means that when Skia sets a dictionary key to a string value it
must, at the very least, remember if the key's type was `text string`.
This change replaces all `String` methods with `ByteString` and
`TextString` methods and updates all the callers to the correct one
based on the key being written.

With the string handling corrected, the `/ActualText` string is now
emitted with this new common code as well for better output and to
reduce code duplication. A few no longer used public APIs involving
these strings are removed. The documentation for the URI annotation is
updated to reflect reality.

This change outputs `UTF-16BE with BOM` with the hex string encoding
only and does not attempt to fix the literal string encoding which
always escapes bytes > 0x7F. These changes may be attempted in a
separate change.

Bug: chromium:1323159
Change-Id: I00bdd5c90ad1ff2edfb74a9de41424c4eeac5ccb
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/543084
Reviewed-by: Derek Sollenberger <djsollen@google.com>
Commit-Queue: Ben Wagner <bungeman@google.com>
Reviewed-by: Herb Derby <herb@google.com>
2022-05-24 18:46:42 +00:00
Lei Zhang
065c5d9408 Remove deprecated fType field in StructureElementNode.
It was added for Chromium, and Chromium has switched to using
fTypeString instead.

Change-Id: I8cd8ae00b0c3abf3691ce14837afbe3be939538e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/316209
Commit-Queue: Ben Wagner <bungeman@google.com>
Reviewed-by: Derek Sollenberger <djsollen@google.com>
Reviewed-by: Ben Wagner <bungeman@google.com>
2021-11-08 20:32:27 +00:00
Dominic Mazzoni
2eb3c17ba3 Add appendNodeIdArray to avoid code duplication.
Add SkPDF::AttributeList::appendNodeIdArray so that clients
don't need to re-implement/duplicate NodeIdToString in order to
add attributes that express the relationship between nodes.

Follow-up to:
https://chromium-review.googlesource.com/c/chromium/src/+/2251058

This deletes appendNameArray and appendStringArray since there's
no immediate need for them, but we may add them back if needed.

Bug: chromium:607777
Change-Id: If9b1527f97c7b52bb1bdad3c0828067bb76f25f2
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/297277
Commit-Queue: Ben Wagner <bungeman@google.com>
Reviewed-by: Ben Wagner <bungeman@google.com>
2020-06-23 14:25:33 +00:00
Dominic Mazzoni
8c662a7d6e Get rid of deprecated API to add children to PDF tag nodes.
fChildren/fChildCount was replaced with fChildVector and Chromium
migrated to the new API a while back, it's now safe to remove the
older interface.

Bug: chromium:607777
Change-Id: I7311d3b51f1b71209dcc024ae51637536d560619
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/297260
Reviewed-by: Ben Wagner <bungeman@google.com>
Commit-Queue: Ben Wagner <bungeman@google.com>
2020-06-18 13:57:23 +00:00
Dominic Mazzoni
127c607818 Add separate PDF tag attribute interfaces for names and strings
In PDF files, "names" and "strings" are not the same thing, but I was
conflating them. Separate out the interfaces for adding attributes to
PDF struct tree elements so there's a way to add either a name or a string,
and similarly for arrays of names or arrays of strings.

Fix the table test to correctly use a name for the "Scope" attribute
and an array of strings for the "Headers" attribute.

Bug: chromium:607777
Change-Id: Ib30bded2bbcf96e31ba6925fb062615558dea0db
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/296338
Reviewed-by: Ben Wagner <bungeman@google.com>
Reviewed-by: Derek Sollenberger <djsollen@google.com>
Auto-Submit: Dominic Mazzoni <dmazzoni@chromium.org>
Commit-Queue: Derek Sollenberger <djsollen@google.com>
2020-06-15 15:46:29 +00:00
Dominic Mazzoni
2a016bad67 Allow passing multiple node IDs per PDF structure node.
At the time Chromium is painting, we're passing node IDs
along with painting commands to enable tagging. However,
this assumes that all nodes will end up in the structure
tree, which we might not want.

Instead, allow the client to prune the structure tree
later before telling Skia to generate the PDF, but
keep all of the node IDs to be matched up with.

As an example, suppose the doc looks like this:

root id=1
  paragraph id=2
    div id=3
      text1 id=4
    link id=5
      text2 id=6

The pruned tree passed to Skia would look like this:

root id=1
  paragraph id=2 extra_ids=3,4
    link id=5 extra_ids=6

We need to pass the extra node IDs into Skia so
that when content is tagged with id=4, we know to
map that to the paragraph node with id=2 instead.

Note that the resulting PDF document will *not*
have any of these extra IDs, they're all remapped
and consolidated.

While it's not strictly necessary that this is done
in Skia, it's easiest to implement it here. Doing the
same upstream would require replaying an SkPicture
and rewriting all of the node IDs.

Bug: chromium:607777
Change-Id: I0ecb62651e60b84cc5b9d053d7f7d3b9efda1470
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/272462
Reviewed-by: Ben Wagner <bungeman@google.com>
Commit-Queue: Dominic Mazzoni <dmazzoni@chromium.org>
2020-02-24 18:21:16 +00:00
Dominic Mazzoni
6ffabbb4c4 Simplify interface to StructureElementNode.
Use a std::vector of std::unique_ptr for the children, rather than an
error-prone raw pointer.

Use a string for the element type, rather than an enum.

In both cases, both the old and new format are supported and tested
within Skia until Chromium has migrated.

Also add support for Alt and Lang, which are like attributes
but go right on the structure node. There are only a small
handful of attributes that go on the structure node so this
seems like the simplest solution.

Bug: chromium:607777
Change-Id: I4f315685df35dd9dcd8e1bca6d275cbeb8f2c67a
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/271816
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Dominic Mazzoni <dmazzoni@chromium.org>
2020-02-20 21:16:48 +00:00
Dominic Mazzoni
7dfb46e7f3 Re-land: Support adding attributes to PDF document structure nodes.
Originally landed: https://skia-review.googlesource.com/c/skia/+/268878
Reverted: https://skia-review.googlesource.com/c/skia/+/271858

The issue was with compilation when PDF support is disabled. See
the diff between patchsets 1 and 2.

This is an important part of writing a tagged PDF. Many of the nodes
in the document structure tree need additional attributes, just like
in HTML.

This change aims to add support for a few useful attributes, not to
be comprehensive.

Bug: chromium:1039816
Change-Id: I15f8b6c41d4fdaa4b6e21775ab6d26ec57eb0f5d
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/271916
Commit-Queue: Dominic Mazzoni <dmazzoni@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
2020-02-19 20:50:14 +00:00
Mike Reed
86b4388fdc Revert "Support adding attributes to PDF document structure nodes."
This reverts commit 80474156d1.

Reason for revert: breaking chrome roll

https://ci.chromium.org/p/chromium/builders/try/cast_shell_linux/533554

Original change's description:
> Support adding attributes to PDF document structure nodes.
> 
> This is an important part of writing a tagged PDF. Many of the nodes
> in the document structure tree need additional attributes, just like
> in HTML.
> 
> This change aims to add support for a few useful attributes, not to
> be comprehensive.
> 
> Bug: chromium:1039816
> 
> Change-Id: I64a6b36b0b4ec42fd27ae4ad702afce95c95af5d
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/268878
> Commit-Queue: Dominic Mazzoni <dmazzoni@chromium.org>
> Commit-Queue: Mike Reed <reed@google.com>
> Auto-Submit: Dominic Mazzoni <dmazzoni@chromium.org>
> Reviewed-by: Mike Reed <reed@google.com>
> Reviewed-by: Derek Sollenberger <djsollen@google.com>

TBR=djsollen@google.com,reed@google.com,dmazzoni@chromium.org,dmazzoni@google.com

Change-Id: Iedd397303e870144e8d282db0cb81c535a783e8b
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: chromium:1039816
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/271858
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Reed <reed@google.com>
2020-02-19 14:23:31 +00:00
Dominic Mazzoni
80474156d1 Support adding attributes to PDF document structure nodes.
This is an important part of writing a tagged PDF. Many of the nodes
in the document structure tree need additional attributes, just like
in HTML.

This change aims to add support for a few useful attributes, not to
be comprehensive.

Bug: chromium:1039816

Change-Id: I64a6b36b0b4ec42fd27ae4ad702afce95c95af5d
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/268878
Commit-Queue: Dominic Mazzoni <dmazzoni@chromium.org>
Commit-Queue: Mike Reed <reed@google.com>
Auto-Submit: Dominic Mazzoni <dmazzoni@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
Reviewed-by: Derek Sollenberger <djsollen@google.com>
2020-02-18 18:14:36 +00:00
Hal Canary
a1050ed2b1 SkPDF/docs: note that Sfntly subsetter is deprecated
Change-Id: I09ecee95653212268926d82739c0df3866d3a8d7
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/253380
Auto-Submit: Hal Canary <halcanary@google.com>
Commit-Queue: Leon Scroggins <scroggo@google.com>
Reviewed-by: Leon Scroggins <scroggo@google.com>
2019-11-07 18:07:55 +00:00
Hal Canary
c056e169ea SkPDF: simplify Producer metadata logic
Bug: skia:9552
Change-Id: Ifa199048e6d38ccb28f055b77128971411203188
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/249800
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Hal Canary <halcanary@google.com>
2019-10-21 20:01:47 +00:00
Mike Klein
c0bd9f9fe5 rewrite includes to not need so much -Ifoo
Current strategy: everything from the top

Things to look at first are the manual changes:

   - added tools/rewrite_includes.py
   - removed -Idirectives from BUILD.gn
   - various compile.sh simplifications
   - tweak tools/embed_resources.py
   - update gn/find_headers.py to write paths from the top
   - update gn/gn_to_bp.py SkUserConfig.h layout
     so that #include "include/config/SkUserConfig.h" always
     gets the header we want.

No-Presubmit: true
Change-Id: I73a4b181654e0e38d229bc456c0d0854bae3363e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/209706
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Hal Canary <halcanary@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
2019-04-24 16:27:11 +00:00
Hal Canary
25dd2c9fe9 SkPDF: runtime switch between subsetters
Bug: chromium:931719
Change-Id: I3199983d24a47b2b3921242a98ce83099ee869dd
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/202956
Reviewed-by: Ben Wagner <bungeman@google.com>
Commit-Queue: Hal Canary <halcanary@skia.org>
2019-03-23 16:31:10 +00:00
Hal Canary
279b65ddb5 SkPDF: clean up public header
Change-Id: I8b3d3856a88021784aaeb18749c16446bae80002
Reviewed-on: https://skia-review.googlesource.com/c/179999
Auto-Submit: Hal Canary <halcanary@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Hal Canary <halcanary@google.com>
2019-01-02 15:55:33 +00:00
Hal Canary
9a3f554154 SkPDF: refactor streams, experimental threading
All PDF Streams are immediatly serialized, and are never long-lived
    in memory.

    if EXPERIMENTAL fExecutor is set on the Document, streams are
    compressed in parallel.

    Results for PDFBigDocBench:

        without patch                    1807885.01 μs
        with    patch without executor   1802808.35 μs
        with    patch with    executor    246313.72 μs

    SkPDFStreamOut() function replaces SkPDFStream classes.

    Page resources are all serialized early.

    Several Document-level objects are serialzied early.

    SkUUID introduced as top-level object.

    Many {insert|append}ObjRef() converted to memory efficient
    {insert|append}Ref().

Bug: skia:8630
Change-Id: Ic336917d0c8b9ac1c2423b43bfe9b49a3533fbff
Reviewed-on: https://skia-review.googlesource.com/c/176588
Auto-Submit: Hal Canary <halcanary@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Hal Canary <halcanary@google.com>
2018-12-18 20:48:25 +00:00
Dominic Mazzoni
656cefe65d Add initial support for generating tagged PDFs.
Adds an interface for the document creator to pass in a tree
of tags indicating the structure of the document, each with a type
(from a predetermined enum of possible types) and a node ID.
It also adds a setNodeId function to SkCanvas so that page content
can be associated with a particular tag. If both the tag tree and
marked content are present, Skia can now output a properly tagged
PDF.

An example program is included. When used properly, the PDF generated
by this patch is valid and the tags are parsed properly by Adobe
Acrobat. It handles many corner cases like content that spans more
than one page, or tags that don't correspond to any marked content, or
marked content that doesn't correspond to any tags.

However, it doesn't implement all of the features of PDF accessibility
yet, there are some additional attributes that can be associated with
some tags that need to be supported, too, in order to properly tag
things like figures and tables.

Bug: skia:8148
Change-Id: I2e448eca8ded8e1b29ba685663b557ae7ad7e23e
Reviewed-on: https://skia-review.googlesource.com/141138
Reviewed-by: Hal Canary <halcanary@google.com>
2018-09-27 19:35:40 +00:00
Hal Canary
bf79d5b837 SkPDF: add SK_API to new public factory.
Change-Id: I2396cb156d486f65d49a5f83a34886007b36026a
Reviewed-on: https://skia-review.googlesource.com/156009
Reviewed-by: Hal Canary <halcanary@google.com>
Commit-Queue: Hal Canary <halcanary@google.com>
2018-09-21 00:45:03 +00:00
Hal Canary
23564b9249 SkDocument: Factories now located in SkPDFDocument.h and SkXPSDocument.h
Change-Id: I48e73b27e52511292c2c0bd51ef0168766f53a80
Reviewed-on: https://skia-review.googlesource.com/152780
Commit-Queue: Hal Canary <halcanary@google.com>
Reviewed-by: Mike Reed <reed@google.com>
2018-09-20 18:21:07 +00:00