Integer.MAX_SIZE (0x7FFFFFF) #2228
M java/core/src/main/java/com/google/protobuf/CodedInputStream.java
Set DEFAULT_SIZE_LIMIT to Integer.MAX_SIZE (Was 64MB). This is how it was
in pre-2.7.0 pb. Changed size check to an overflow-conscious test (as it
is later in tryRefillBuffer (making sizeLimit a long was to disruptive).
M java/core/src/test/java/com/google/protobuf/CodedInputStreamTest.java
Add two tests that echo tests recently added over in c++ to test parse
of message sizes that are approach and are beyond the size limit.
* Uses build-helper-maven-plugin to add generated sources to the classpath
* Fixes an issue building with newer versions of the maven-compiler-plugin
(See https://issues.apache.org/jira/browse/MCOMPILER-240)
Currently some public API methods are defined in GenreatedMessage.java
and they have a generric return type:
class GeneratedMessage {
class Builder<BuilderType extends Builder<BuilderType>> {
public BuilderType setField(...);
public BuilderType setExtension(...);
}
}
With these definitions, the compiled byte code of a callsite will have
a direct reference to GeneratedMessage. For example:
fooBuilder.setField(...);
becomes:
##: invokevirtual // Method Builder.setField:(...)LGeneratedMessage.Builder
##: checkcast // class Builder
This will prevent us from updating generated classes to subclass a
different versioned GeneratedMessageV3 class in the future (we can't do
it in a binary compatible way).
This change addresses the problem by overriding these methods directly
in the generated class:
class Foo {
class Builder extends GeneratedMessage.Builder<Builder> {
public Builder setField(...) {
return super.setField(...);
}
}
}
After this, fooBuilder.setField(...) will be compiled to:
##: invokevirtual // Method Builder.setField:(...)LFoo.Builder
The callsites will no longer reference GeneratedMessage directly and we
can change Foo to subclass GeneratedMessageV3 without breaking binary
compatiblity.
The downside of this change is:
1. It increases generated code size (though it saves some instructions
on the callsites).
2. We can never stop generating these overrides because doing that
will break binary compatibility.
Change-Id: I879afbbc1325a66324a51565e017143489b06e97
Moving the files to their original location, so that opensource changes
can be picked during the internal merge. Those files will be moved into
the correct location after merging with internal code.
Note: do NOT merge this into master without the other internal
down-integration commit.
protobuf/java will become a parent pom that will contain two modules:
core - contains all of the code for the protobuf-java artifact
util - contains all of the code for the protobuf-java-util artifact
Also cleaned up various Maven warnings.
be properly set. writeTo() may be invoked without a call to
getSerializedSize(), so the generated serialization methods would
write a length of 0 for non-empty packed fields. Just call
getSerializedSize() at the beginning of writeTo(): although this
means that we may compute the byte size needlessly when there
are no packed fields, in practice, getSerializedSize() will
already have been called - all of the writeTo() wrappers in
AbstractMessageLite invoke it.
Tested: new unittest case in WireFormatTest.java now passes
All Languages
* Repeated fields of primitive types (types other that string, group, and
nested messages) may now use the option [packed = true] to get a more
efficient encoding. In the new encoding, the entire list is written
as a single byte blob using the "length-delimited" wire type. Within
this blob, the individual values are encoded the same way they would
be normally except without a tag before each value (thus, they are
tightly "packed").
C++
* UnknownFieldSet now supports STL-like iteration.
* Message interface has method ParseFromBoundedZeroCopyStream() which parses
a limited number of bytes from an input stream rather than parsing until
EOF.
Java
* Fixed bug where Message.mergeFrom(Message) failed to merge extensions.
* Message interface has new method toBuilder() which is equivalent to
newBuilderForType().mergeFrom(this).
* All enums now implement the ProtocolMessageEnum interface.
* Setting a field to null now throws NullPointerException.
* Fixed tendency for TextFormat's parsing to overflow the stack when
parsing large string values. The underlying problem is with Java's
regex implementation (which unfortunately uses recursive backtracking
rather than building an NFA). Worked around by making use of possesive
quantifiers.
Python
* Updated RPC interfaces to allow for blocking operation. A client may
now pass None for a callback when making an RPC, in which case the
call will block until the response is received, and the response
object will be returned directly to the caller. This interface change
cannot be used in practice until RPC implementations are updated to
implement it.
protoc
* Enum values may now have custom options, using syntax similar to field
options.
* Fixed bug where .proto files which use custom options but don't actually
define them (i.e. they import another .proto file defining the options)
had to explicitly import descriptor.proto.
* Adjacent string literals in .proto files will now be concatenated, like in
C.
C++
* Generated message classes now have a Swap() method which efficiently swaps
the contents of two objects.
* All message classes now have a SpaceUsed() method which returns an estimate
of the number of bytes of allocated memory currently owned by the object.
This is particularly useful when you are reusing a single message object
to improve performance but want to make sure it doesn't bloat up too large.
* New method Message::SerializeAsString() returns a string containing the
serialized data. May be more convenient than calling
SerializeToString(string*).
* In debug mode, log error messages when string-type fields are found to
contain bytes that are not valid UTF-8.
* Fixed bug where a message with multiple extension ranges couldn't parse
extensions.
* Fixed bug where MergeFrom(const Message&) didn't do anything if invoked on
a message that contained no fields (but possibly contained extensions).
* Fixed ShortDebugString() to not be O(n^2). Durr.
* Fixed crash in TextFormat parsing if the first token in the input caused a
tokenization error.
Java
* New overload of mergeFrom() which parses a slice of a byte array instead
of the whole thing.
* New method ByteString.asReadOnlyByteBuffer() does what it sounds like.
* Improved performance of isInitialized() when optimizing for code size.
Python
* Corrected ListFields() signature in Message base class to match what
subclasses actually implement.
* Some minor refactoring.
General
* License changed from Apache 2.0 to New BSD.
* It is now possible to define custom "options", which are basically
annotations which may be placed on definitions in a .proto file.
For example, you might define a field option called "foo" like so:
import "google/protobuf/descriptor.proto"
extend google.protobuf.FieldOptions {
optional string foo = 12345;
}
Then you annotate a field using the "foo" option:
message MyMessage {
optional int32 some_field = 1 [(foo) = "bar"]
}
The value of this option is then visible via the message's
Descriptor:
const FieldDescriptor* field =
MyMessage::descriptor()->FindFieldByName("some_field");
assert(field->options().GetExtension(foo) == "bar");
This feature has been implemented and tested in C++ and Java.
Other languages may or may not need to do extra work to support
custom options, depending on how they construct descriptors.
C++
* Fixed some GCC warnings that only occur when using -pedantic.
* Improved static initialization code, making ordering more
predictable among other things.
* TextFormat will no longer accept messages which contain multiple
instances of a singular field. Previously, the latter instance
would overwrite the former.
* Now works on systems that don't have hash_map.
Python
* Strings now use the "unicode" type rather than the "str" type.
String fields may still be assigned ASCII "str" values; they will
automatically be converted.
* Adding a property to an object representing a repeated field now
raises an exception. For example:
# No longer works (and never should have).
message.some_repeated_field.foo = 1
Protoc (parser)
- Improved error message when an enum value's name conflicts with another
symbol defined in the enum type's scope, e.g. if two enum types declared
in the same scope have values with the same name. This is disallowed for
compatibility with C++, but this wasn't clear from the error.
C++
- Restored the set_foo(const char*) accessor for "bytes" type because some
code inside Google depends on it. However, set_foo(const char*, int) is
still there (and actually is changed to take const void*).
- Fixed TokenizerTest when compiling with -DNDEBUG on Linux.
- Other irrelevant tweaks.
Java
- Fixed UnknownFieldSet's parsing of varints larger than 32 bits.
- Fixed TextFormat's parsing of "inf" and "nan".
- Fixed TextFormat's parsing of comments.
Python
- Fixed text_format_test on Windows where floating-point exponents sometimes
contain extra zeros.
- Improved readmes.
- Fixed incorrect definition of kint32min.
- Fixed absolute output paths on Windows.
- Added info to Java POM that will be required when we upload the
package to a Maven repo.