protobuf/docs/implementing_proto3_presence.md
2021-03-08 11:13:03 -08:00

401 lines
16 KiB
Markdown

# How To Implement Field Presence for Proto3
Protobuf release 3.12 adds experimental support for `optional` fields in
proto3. Proto3 optional fields track presence like in proto2. For background
information about what presence tracking means, please see
[docs/field_presence](field_presence.md).
## Document Summary
This document is targeted at developers who own or maintain protobuf code
generators. All code generators will need to be updated to support proto3
optional fields. First-party code generators developed by Google are being
updated already. However third-party code generators will need to be updated
independently by their authors. This includes:
- implementations of Protocol Buffers for other languages.
- alternate implementations of Protocol Buffers that target specialized use
cases.
- RPC code generators that create generated APIs for service calls.
- code generators that implement some utility code on top of protobuf generated
classes.
While this document speaks in terms of "code generators", these same principles
apply to implementations that dynamically generate a protocol buffer API "on the
fly", directly from a descriptor, in languages that support this kind of usage.
## Background
Presence tracking was added to proto3 in response to user feedback, both from
inside Google and [from open-source
users](https://github.com/protocolbuffers/protobuf/issues/1606). The [proto3
wrapper
types](https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/wrappers.proto)
were previously the only supported presence mechanism for proto3. Users have
pointed to both efficiency and usability issues with the wrapper types.
Presence in proto3 uses exactly the same syntax and semantics as in proto2.
Proto3 Fields marked `optional` will track presence like proto2, while fields
without any label (known as "singular fields"), will continue to omit presence
information. The `optional` keyword was chosen to minimize differences with
proto2.
Unfortunately, for the current descriptor protos and `Descriptor` API (as of
3.11.4) it is not possible to use the same representation as proto2. Proto3
descriptors already use `LABEL_OPTIONAL` for proto3 singular fields, which do
not track presence. There is a lot of existing code that reflects over proto3
protos and assumes that `LABEL_OPTIONAL` in proto3 means "no presence." Changing
the semantics now would be risky, since old software would likely drop proto3
presence information, which would be a data loss bug.
To minimize this risk we chose a descriptor representation that is semantically
compatible with existing proto3 reflection. Every proto3 optional field is
placed into a one-field `oneof`. We call this a "synthetic" oneof, as it was not
present in the source `.proto` file.
Since oneof fields in proto3 already track presence, existing proto3
reflection-based algorithms should correctly preserve presence for proto3
optional fields with no code changes. For example, the JSON and TextFormat
parsers/serializers in C++ and Java did not require any changes to support
proto3 presence. This is the major benefit of synthetic oneofs.
This design does leave some cruft in descriptors. Synthetic oneofs are a
compatibility measure that we can hopefully clean up in the future. For now
though, it is important to preserve them across different descriptor formats and
APIs. It is never safe to drop synthetic oneofs from a proto schema. Code
generators can (and should) skip synthetic oneofs when generating a user-facing
API or user-facing documentation. But for any schema representation that is
consumed programmatically, it is important to keep the synthetic oneofs around.
In APIs it can be helpful to offer separate accessors that refer to "real"
oneofs (see [API Changes](#api-changes) below). This is a convenient way to omit
synthetic oneofs in code generators.
## Updating a Code Generator
When a user adds an `optional` field to proto3, this is internally rewritten as
a one-field oneof, for backward-compatibility with reflection-based algorithms:
```protobuf
syntax = "proto3";
message Foo {
// Experimental feature, not generally supported yet!
optional int32 foo = 1;
// Internally rewritten to:
// oneof _foo {
// int32 foo = 1 [proto3_optional=true];
// }
//
// We call _foo a "synthetic" oneof, since it was not created by the user.
}
```
As a result, the main two goals when updating a code generator are:
1. Give `optional` fields like `foo` normal field presence, as described in
[docs/field_presence](field_presence.md) If your implementation already
supports proto2, a proto3 `optional` field should use exactly the same API
and internal implementation as proto2 `optional`.
2. Avoid generating any oneof-based accessors for the synthetic oneof. Its only
purpose is to make reflection-based algorithms work properly if they are
not aware of proto3 presence. The synthetic oneof should not appear anywhere
in the generated API.
### Satisfying the Experimental Check
If you try to run `protoc` on a file with proto3 `optional` fields, you will get
an error because the feature is still experimental:
```
$ cat test.proto
syntax = "proto3";
message Foo {
// Experimental feature, not generally supported yet!
optional int32 a = 1;
}
$ protoc --cpp_out=. test.proto
test.proto: This file contains proto3 optional fields, but --experimental_allow_proto3_optional was not set.
```
There are two options for getting around this error:
1. Pass `--experimental_allow_proto3_optional` to protoc.
2. Make your filename (or a directory name) contain the string
`test_proto3_optional`. This indicates that the proto file is specifically
for testing proto3 optional support, so the check is suppressed.
These options are demonstrated below:
```
# One option:
$ ./src/protoc test.proto --cpp_out=. --experimental_allow_proto3_optional
# Another option:
$ cp test.proto test_proto3_optional.proto
$ ./src/protoc test_proto3_optional.proto --cpp_out=.
$
```
The experimental check will be removed in a future release, once we are ready
to make this feature generally available. Ideally this will happen for the 3.13
release of protobuf, sometime in mid-2020, but there is not a specific date set
for this yet. Some of the timing will depend on feedback we get from the
community, so if you have questions or concerns please get in touch via a
GitHub issue.
### Signaling That Your Code Generator Supports Proto3 Optional
If you now try to invoke your own code generator with the test proto, you will
run into a different error:
```
$ ./src/protoc test_proto3_optional.proto --my_codegen_out=.
test_proto3_optional.proto: is a proto3 file that contains optional fields, but
code generator --my_codegen_out hasn't been updated to support optional fields in
proto3. Please ask the owner of this code generator to support proto3 optional.
```
This check exists to make sure that code generators get a chance to update
before they are used with proto3 `optional` fields. Without this check an old
code generator might emit obsolete generated APIs (like accessors for a
synthetic oneof) and users could start depending on these. That would create
a legacy migration burden once a code generator actually implements the feature.
To signal that your code generator supports `optional` fields in proto3, you
need to tell `protoc` what features you support. The method for doing this
depends on whether you are using the C++
`google::protobuf::compiler::CodeGenerator`
framework or not.
If you are using the CodeGenerator framework:
```c++
class MyCodeGenerator : public google::protobuf::compiler::CodeGenerator {
// Add this method.
uint64_t GetSupportedFeatures() const override {
// Indicate that this code generator supports proto3 optional fields.
// (Note: don't release your code generator with this flag set until you
// have actually added and tested your proto3 support!)
return FEATURE_PROTO3_OPTIONAL;
}
}
```
If you are generating code using raw `CodeGeneratorRequest` and
`CodeGeneratorResponse` messages from `plugin.proto`, the change will be very
similar:
```c++
void GenerateResponse() {
CodeGeneratorResponse response;
response.set_supported_features(CodeGeneratorResponse::FEATURE_PROTO3_OPTIONAL);
// Generate code...
}
```
Once you have added this, you should now be able to successfully use your code
generator to generate a file containing proto3 optional fields:
```
$ ./src/protoc test_proto3_optional.proto --my_codegen_out=.
```
### Updating Your Code Generator
Now to actually add support for proto3 optional to your code generator. The goal
is to recognize proto3 optional fields as optional, and suppress any output from
synthetic oneofs.
If your code generator does not currently support proto2, you will need to
design an API and implementation for supporting presence in scalar fields.
Generally this means:
- allocating a bit inside the generated class to represent whether a given field
is present or not.
- exposing a `has_foo()` method for each field to return the value of this bit.
- make the parser set this bit when a value is parsed from the wire.
- make the serializer test this bit to decide whether to serialize.
If your code generator already supports proto2, then most of your work is
already done. All you need to do is make sure that proto3 optional fields have
exactly the same API and behave in exactly the same way as proto2 optional
fields.
From experience updating several of Google's code generators, most of the
updates that are required fall into one of several patterns. Here we will show
the patterns in terms of the C++ CodeGenerator framework. If you are using
`CodeGeneratorRequest` and `CodeGeneratorReply` directly, you can translate the
C++ examples to your own language, referencing the C++ implementation of these
methods where required.
#### To test whether a field should have presence
Old:
```c++
bool MessageHasPresence(const google::protobuf::Descriptor* message) {
return message->file()->syntax() ==
google::protobuf::FileDescriptor::SYNTAX_PROTO2;
}
```
New:
```c++
// Presence is no longer a property of a message, it's a property of individual
// fields.
bool FieldHasPresence(const google::protobuf::FieldDescriptor* field) {
return field->has_presence();
// Note, the above will return true for fields in a oneof.
// If you want to filter out oneof fields, write this instead:
// return field->has_presence && !field->real_containing_oneof()
}
```
#### To test whether a field is a member of a oneof
Old:
```c++
bool FieldIsInOneof(const google::protobuf::FieldDescriptor* field) {
return field->containing_oneof() != nullptr;
}
```
New:
```c++
bool FieldIsInOneof(const google::protobuf::FieldDescriptor* field) {
// real_containing_oneof() returns nullptr for synthetic oneofs.
return field->real_containing_oneof() != nullptr;
}
```
#### To iterate over all oneofs
Old:
```c++
bool IterateOverOneofs(const google::protobuf::Descriptor* message) {
for (int i = 0; i < message->oneof_decl_count(); i++) {
const google::protobuf::OneofDescriptor* oneof = message->oneof(i);
// ...
}
}
```
New:
```c++
bool IterateOverOneofs(const google::protobuf::Descriptor* message) {
// Real oneofs are always first, and real_oneof_decl_count() will return the
// total number of oneofs, excluding synthetic oneofs.
for (int i = 0; i < message->real_oneof_decl_count(); i++) {
const google::protobuf::OneofDescriptor* oneof = message->oneof(i);
// ...
}
}
```
## Updating Reflection
If your implementation offers reflection, there are a few other changes to make:
### API Changes
The API for reflecting over fields and oneofs should make the following changes.
These match the changes implemented in C++ reflection.
1. Add a `FieldDescriptor::has_presence()` method returning `bool`
(adjusted to your language's naming convention). This should return true
for all fields that have explicit presence, as documented in
[docs/field_presence](field_presence.md). In particular, this includes
fields in a oneof, proto2 scalar fields, and proto3 `optional` fields.
This accessor will allow users to query what fields have presence without
thinking about the difference between proto2 and proto3.
2. As a corollary of (1), please do *not* expose an accessor for the
`FieldDescriptorProto.proto3_optional` field. We want to avoid having
users implement any proto2/proto3-specific logic. Users should use the
`has_presence()` function instead.
3. You may also wish to add a `FieldDescriptor::has_optional_keyword()` method
returning `bool`, which indicates whether the `optional` keyword is present.
Message fields will always return `true` for `has_presence()`, so this method
can allow a user to know whether the user wrote `optional` or not. It can
occasionally be useful to have this information, even though it does not
change the presence semantics of the field.
4. If your reflection API may be used for a code generator, you may wish to
implement methods to help users tell the difference between real and
synthetic oneofs. In particular:
- `OneofDescriptor::is_synthetic()`: returns true if this is a synthetic
oneof.
- `FieldDescriptor::real_containing_oneof()`: like `containing_oneof()`,
but returns `nullptr` if the oneof is synthetic.
- `Descriptor::real_oneof_decl_count()`: like `oneof_decl_count()`, but
returns the number of real oneofs only.
### Implementation Changes
Proto3 `optional` fields and synthetic oneofs must work correctly when
reflected on. Specifically:
1. Reflection for synthetic oneofs should work properly. Even though synthetic
oneofs do not really exist in the message, you can still make reflection work
as if they did. In particular, you can make a method like
`Reflection::HasOneof()` or `Reflection::GetOneofFieldDescriptor()` look at
the hasbit to determine if the oneof is present or not.
2. Reflection for proto3 optional fields should work properly. For example, a
method like `Reflection::HasField()` should know to look for the hasbit for a
proto3 `optional` field. It should not be fooled by the synthetic oneof into
thinking that there is a `case` member for the oneof.
Once you have updated reflection to work properly with proto3 `optional` and
synthetic oneofs, any code that *uses* your reflection interface should work
properly with no changes. This is the benefit of using synthetic oneofs.
In particular, if you have a reflection-based implementation of protobuf text
format or JSON, it should properly support proto3 optional fields without any
changes to the code. The fields will look like they all belong to a one-field
oneof, and existing proto3 reflection code should know how to test presence for
fields in a oneof.
So the best way to test your reflection changes is to try round-tripping a
message through text format, JSON, or some other reflection-based parser and
serializer, if you have one.
### Validating Descriptors
If your reflection implementation supports loading descriptors at runtime,
you must verify that all synthetic oneofs are ordered after all "real" oneofs.
Here is the code that implements this validation step in C++, for inspiration:
```c++
// Validation that runs for each message.
// Synthetic oneofs must be last.
int first_synthetic = -1;
for (int i = 0; i < message->oneof_decl_count(); i++) {
const OneofDescriptor* oneof = message->oneof_decl(i);
if (oneof->is_synthetic()) {
if (first_synthetic == -1) {
first_synthetic = i;
}
} else {
if (first_synthetic != -1) {
AddError(message->full_name(), proto.oneof_decl(i),
DescriptorPool::ErrorCollector::OTHER,
"Synthetic oneofs must be after all other oneofs");
}
}
}
if (first_synthetic == -1) {
message->real_oneof_decl_count_ = message->oneof_decl_count_;
} else {
message->real_oneof_decl_count_ = first_synthetic;
}
```