This check adds a few constraints on the way to build a project when we have
a proto file which imports another one. In particular, on projects which
build both C# and Java, it's easy to end up with exceptions like
Expected: included.proto but was src/main/protobuf/included.proto
A user may work around this issue, but it may add unnecessary constraints
on the layout of the project.
According to f3504cf3b1 (diff-ecb0b909ed572381a1c8d1994f09a948R309)
it has already been considered to get rid of this check, for
similar considerations, and because it doesn't exist in the Java code
This should fix the failures in the conformance tests - although
it highlights the problem that we need to do this when changing
the conformance.proto file...
We now just perform the optimization within AddRange itself.
This is a breaking change in terms of "drop in the DLL", but is
source compatible, which should be fine.
This doesn't currently change the ordering in the implementation, but allows us to do so in the future.
We also need to change
https://developers.google.com/protocol-buffers/docs/reference/csharp-generated#singular
which states "Finally, unlike Dictionary<TKey, TValue>, MapField<TKey, TValue> preserves insertion order of entries."
(We can just remove that sentence, I think.)
Also added a standalone formatter test, for confidence.
Have validated that undoing the change in 835fb947 breaks the tests
(i.e. we are still testing that the change is required).
(There are documentation changes and new fields in descriptor.proto that have resulted
in changes to the serialized descriptor, but no breaking changes for C#.)
Overview of changes:
- A new C#-specific command-line option, legacy_enum_values to revert to the old behavior
- When legacy_enum_values isn't specified, we strip the enum name as a prefix, and PascalCase the value name
- A new attribute within the C# code so that we can always tell the original in-proto name
Regenerating the C# code with legacy_enum_values leads to code which still compiles and works - but
there's more still to do.
I've moved both protoc.exe and the proto files out of Google.Protobuf.
The .proto files aren't a slam-dunk, but it feels like they belong with protoc as you'd *use* them with protoc.
It's not clear to me whether we really need both an x86 and x64 version of protoc.exe, as x86 would work on 64-bit Windows anyway. Discuss :)
This makes no externally visible behavioral changes. Internally and non-behaviorally:
- We use a field (compiler-generated) to store the JsonName to avoid recomputing it repeatedly
- The documentation for JsonName is updated to reflect the meaning better
- Readonly autoprops and expression-bodied properties used where possible
This detects:
- An end-group tag with the wrong field number (doesn't match the start-group field)
- An end-group tag with no preceding start-group tag
Fixes issue #688.
This is a start to fixing issue #1212. It won't help for test protos,
conformance etc, but it will definitely be better than nothing, and
would have highlighted a change in descriptor.proto which broken C#
earlier.
Recently, descriptor.proto gained a GeneratedCodeInfo message, which means the generated code conflicts with our type.
Unfortunately this affects codegen as well, although this is a part of the public API which is very unlikely to affect hand-written code.
Generated code changes in next commit.
The usage of ICustomDiagnosticMessage here is non-essential - ToDiagnosticString
doesn't actually get called by ToString() in this case, due to JsonFormatter code. It was
intended to make it clearer that it *did* have a custom format... but then arguably I should
do the same for Value, Struct, Any etc.
Moving some of the code out of JsonFormatter and into Duration/Timestamp/FieldMask likewise
feels somewhat nice, somewhat nasty... basically there are JSON-specific bits of formatting, but
also domain-specific bits of computation. <sigh>
Thoughts welcome.
- Tighten up on Infinity/NaN handling in terms of whitespace handling (and test casing)
- Validate that values are genuinely integers when they've been parsed from a JSON number (ignoring the fact that 1.0000000000000000001 == 1 as a double...)
- Allow exponents and decimal points in string representations
The conformance tests now use types which are part of src/google/protobuf, so we need to include src in the proto path.
The notes around "fix-ups" have been out of date for some time now.
This addresses issue #1008, by creating a JsonFormatter which is private and only different
to JsonFormatter.Default in terms of reference equality.
Other plausible designs:
- The same, but expose the diagnostic-only formatter
- Add something to settings to say "I don't have a type registry at all"
- Change the behaviour of JsonFormatter.Default (bad idea IMO, as we really *don't* want the result of this used as regular JSON to be parsed)
Note that just trying to find a separate fix to issue #933 and using that to override Any.ToString() differently wouldn't work for messages that *contain* an Any.
Generated code changes follow in the next commit.
This required a rework of the tokenizer to allow for a "replaying" tokenizer, basically in case the @type value comes after the data itself. This rework is nice in some ways (all the pushback and object depth logic in one place) but is a little fragile in terms of token push-back when using the replay tokenizer. It'll be fine for the scenario we need it for, but we should be careful...
There are corner cases where MessageDescriptor.{ClrType,Parser} will return null, and these are now documented. However, normally they *should* be implemented, even for descriptors of for dynamic messages. Ditto FieldDescriptor.Accessor.
We'll still need a fair amount of work to implement dynamic messages, but this change means that the public API will be remain intact.
Additionally, this change starts making use of C# 6 features in the files that it touches. This is far from exhaustive, and later PRs will have more.
Generated code changes coming in the next commit.
Generated code coming in next commit - in a subsequent PR I want to do a bit of renaming and redocumenting around this, in anticipation of DynamicMessage.
This is only thrown directly by JsonTokenizer, but surfaces from JsonParser as well. I've added doc comments to hopefully make everything clear.
The exception is actually thrown by the reader within JsonTokenizer, in anticipation of keeping track of the location within the document, but that change is not within this PR.
This includes all the well-known types except Any.
Some aspects are likely to require further work when the details of the JSON parsing expectations are hammered out in more detail. Some of these have "ignored" tests already.
Note that the choice *not* to use Json.NET was made for two reasons:
- Going from 0 dependencies to 1 dependency is a big hit, and there's not much benefit here
- Json.NET parses more leniently than we'd want; accommodating that would be nearly as much work as writing the tokenizer
This only really affects the JsonTokenizer, which could be replaced by Json.NET. The JsonParser code would be about the same length with Json.NET... but I wouldn't be as confident in it.
This changes how we approach JSON formatting in general - instead of looking at the field a value came from, we just look at the type of the value. It's possible this *could* be slightly inefficient, but if we start caring about JSON performance deeply, we'll probably want to rewrite all of this anyway. It's definitely simpler this way.
When we support dynamic messages, we'll need to modify JsonFormatter to handle enum values, as they won't come be "real" .NET enums at that point. It shouldn't be hard to do though.
There are now summaries for:
- The Types nested class (which holds nested types)
- The file descriptor class for each proto
- The enum generated for each oneof
(Also fixed two typos.)
Generated code in next commit.
We still need the JSON representation, which relies on something like a DescriptorPool to fetch message types from based on the type URL. That will come a bit later.
(The DescriptorPool comment in this commit is just a note which will prove useful if we use DescriptorPool itself.)
This introduces a new C# option, base_namespace.
If the option is not specified, the behaviour is as before: no directories are generated.
If the option *is* specified, all C# namespaces must be relative to the base namespace, and the directories are generated relative to that namespace.
Example:
- Any.proto declares csharp_namespace = "Google.Protobuf.WellKnownTypes"
- We build with --csharp_out=Google.Protobuf --csharp_opt=base_namespace=Google.Protobuf
- The Any.cs file is generated in Google.Protobuf/WellKnownTypes (where it currently lives)
We need a change to descriptor.proto before this will all work (it wasn't in the right C# namespace) but that needs the other descriptors to be regenerated too. See next commit...
We now do this in protoc instead of the generation simpler.
Benefits:
- Generation script is simpler
- Detection is simpler as we now only need to care about one filename
- The embedded descriptor knows itself as "google/protobuf/descriptor.proto" avoiding dependency issues
This PR also makes the "invalid dependency" exception clearer in terms of expected and actual dependencies.
With this in place, generating APIs on github.com/google/googleapis works - previously annotations.proto failed.
Currently there's no access to the annotations (stored as extensions) but we could potentially expose those at a later date.
- Removed a TODO without change in DescriptorPool.LookupSymbol - the TODOs were around performance, and this is only used during descriptor initialization
- Make the CodedInputStream limits read-only, adding a static factory method for the rare cases when this is useful
- Extracted IDeepCloneable into its own file.
This is a bit of a grotty hack, as we need to sort of fake proto2 field presence, but with only a proto3 version of the descriptor messages (a bit like oneof detection).
Should be okay, but will need to be careful of this if we ever implement proto2.
Now the generated code doesn't need to check for end group tags, as it will skip whole groups at a time.
Currently it will ignore extraneous end group tags, which may or may not be a good thing.
Renamed ConsumeLastField to SkipLastField as it felt more natural.
Removed WireFormat.IsEndGroupTag as it's no longer useful.
This mostly fixes issue 688.
(Generated code changes coming in next commit.)
This is taking an approach of putting all the logic in JsonFormatter. That's helpful in terms of concealing the details of whether or not to wrap the value in quotes, but it does lack flexibility. I don't *think* we want to allow user-defined formatting of messages, so that much shouldn't be a problem.
While I've provided operators, I haven't yet provided the method equivalents. It's not clear to me that
they're actually a good idea, while we're really targeting C# developers who definitely *can* use the user-defined operators.
Additionally, change it to return the value passed, and make it generic with a class constraint.
A separate method doesn't have the class constraint, for more unusual scenarios.
- Fix nupec paths
- Remove an obsolete part of the JSON build
- Add documentation and tests to reflection extension methods, and improve implementations
This requires .NET 4.5, and there are a few compatibility changes required around reflection.
Creating a PR from this to see how our CI systems handle it. Will want to add more documentation,
validation and probably tests before merging.
This is in aid of issue #590.
I think Jan was actually suggesting keeping both, but that feels redundant to me. The test diff is misleading here IMO, because I wouldn't expect real code using reflection to use several accessors one after another like this, unless it was within a loop. Evidence to the contrary would be welcome :)
This change also incidentally goes part way to fixing the issue of the JSON formatter not writing out the fields in field number order - with this change, it does except for oneofs, which we can fix in a follow-up change.
I haven't actually added a test with a message with fields deliberately out of order - I'm happy to do so though. It feels like it would make sense to be in google/src/protobuf, but it's not entirely clear what the rules of engagement are for adding new messages there. (unittest_proto3.proto?)
This is definitely not ready to ship - I'm "troubled" by the disconnect between a list of fields in declaration order, and a mapping of field accessors by field number/name. Discussion required, but I find that easier when we've got code to look at :)
Changes in brief:
1. Descriptor is now the entry point for all reflection.
2. IReflectedMessage has gone; there's now a Descriptor property in IMessage, which is explicitly implemented (due to the static property).
3. FieldAccessorTable has gone away
4. IFieldAccessor and OneofFieldAccessor still exist; we *could* put the functionality straight into FieldDescriptor and OneofDescriptor... I'm unsure about that.
5. There's a temporary property MessageDescriptor.FieldAccessorsByFieldNumber to make the test changes small - we probably want this to go away
6. Discovery for delegates is now via attributes applied to properties and the Clear method of a oneof
I'm happy with 1-3.
4 I'm unsure about - feedback welcome.
5 will go away
6 I'm unsure about, both in design and implementation. Should we have a ProtobufMessageAttribute too? Should we find all the relevant attributes in MessageDescriptor and pass them down, to avoid an O(N^2) scenario?
Generated code changes coming in the next commit.
- We do still generate the message types, as otherwise reflection breaks, even though it doesn't actually use those types.
- JSON handling hasn't been implemented yet
We don't use it in the runtime or generated code anywhere now, so the extra small performance boost isn't as critical, and it has some undesirable consequences.
The tests have needed to change as iterator block enumerators don't throw when we might expect them to.
This involves:
- Specifying a namespace in each proto (including ones we'd previously missed)
- Updating the generation script
- Changing codegen to implement IReflectedMessage.Fields explicitly (a good thing anyway)
- Changing reflection tests to take account of the explicit interface implementation
Non-generated code in this commit; generated code to follow
Change the C# namespace in descriptor.proto to Google.Protobuf.Reflection.
This then means changing where the generated code lives, which means updating the project file...
It also involves regenerating the C++ - which has updated the well-known types as well,
for no terribly obvious reason...
- The protos are no longer publicly exposed at all
- Oneof detection now works (as we default to -1, not 0)
- OneofDescriptor exposes the fields in the oneof
- Removed unnecessary code for replacing protos - remnant of extensions
- There's now just the non-generic form of IDescriptor
Note that now we need a proto3 version of addressbook.proto. This may affect other platforms, and could do with an overhaul to follow proto3 conventions anyway (e.g. repeated field names). Will need to think about that carefully before merging into master. Raised issue #565 for this.
- FieldAccessorTable is now non-generic
- We don't have a static field per message type in the umbrella class. (Message descriptors are accessed via the file descriptor.)
- Removed the "descriptor assigner" complication from the descriptor fixup; without extensions, we don't need it
- MapField implements IDictionary (more tests would be good...)
- RepeatedField implements IList (more tests would be good)
- Use expression trees to build accessors. (Will need to test this on various platforms... probably need a fallback strategy just using reflection directly.)
- Added FieldDescriptor.IsMap
- Added tests for reflection with generated messages
Changes to generated code coming in next commit.
- Added new line at the end of SampleEnum
- Moved GeneratedMessageTest.GetSampleMessage to a new class, SampleMessages, and renamed it to CreateFullTestAllTypes.
This is mostly just making things internal instead of public, removing and reordering a bunch of code in CodedInputStream/CodedOutputStream, and generally tidying up.
- Remove some old proto2-based C#-only messages
- Remove the "build" directory which only contained out-of-date files
- Remove the csharp_namespace option from proto2 messages
- Change "Google.ProtocolBuffers" to "Google.Protobuf" in other messages
ProtoDump isn't currently useful, but will be when ToString emits JSON: fixed.
ProtoBench: deleted; we should reinstate when there's a common proto3 benchmark.
ProtoMunge: delete; not useful enough to merit fixing up.
Removed the [TestFixture] from ByteStringTest as Travis uses a recent enough version of NUnit.
- Change the default message hash code to 1 to be consistent with other code
- Change the empty list/map hash code to 0 as "empty map" is equivalent to "no map"
- Removed map fields from unittest_proto3.proto
- Created map_unittest_proto3.proto which is like map_unittest.proto but proto3-only
- Fixed factory methods in FieldCodec highlighted by using all field types :)
- Added tests for map serialization:
- Extra fields within entries
- Entries with value then key
- Non-contiguous entries for the same map
- Multiple entries for the same key
Changes to generated code coming in next commit
The solution as a whole doesn't build yet - we probably want to remove
ProtoDump and ProtoMunge entirely, and ProtoBench should use Jan's new
benchmarks for parity with Java.
The version of NUnit on my machine, packaged with Mono 3.12.1, is
only NUnit 2.4.2, which is extremely old - it still requires an explicit
[TestFixture] attribute on test fixtures. I've added one just for ByteStringTest
for the moment so that we can see some tests passing in Travis, but as part of
a separate PR we should work on making sure we're using a recent NUnit version.
(It may already be doing so, but we can check that once it's working and merged.)
- Make some members internal
- Remove a lot of FrameworkPortability that isn't required
- Start adding documentation comments
- Remove some more group-based members
- Not passing in "the last tag read" into Read*Array, g
We'll probably want a lot of the code from the serialization project when we do JSON, but enough of it will change that it's not worth keeping in a broken state for now.
This is effectively reimplementing List<T>, but with a few advantages:
- We know that an empty repeated field is common, so don't allocate an array until we need to
- With direct access to the array, we can easily convert enum values to int without boxing
- We can relax the restrictions over what happens if the repeated field is modified while iterating, avoiding so much checking
This is somewhat risky, in that reimplementing a building block like this is *always* risky, but hey...
(The performance benefits are significant...)
This mirrors commit 7c86bbbc7a in the pull request to
the main protobuf project, but also reduces the size of the buffer created. (There's no point in
creating a 1024-byte buffer if we're only skipping 5 bytes...)
Remove ICodedInputStream and ICodedOutputStream, and rewrite CodedInputStream and CodedOutputStream to be specific to the binary format. If we want to support text-based formats, that can be a whole different serialization mechanism.
This makes repeated fields really awkward at the moment - but when we reimplement RepeatedField<T> to be backed by an array, we can cast the array directly...
Cache a reference to Encoding.UTF8 - the property access is (rather surprisingly) significant.
Additionally, when we detect that the string is all ASCII (due to the computed length in bytes being the length in characters), we can perform the encoding very efficiently ourselves.
We still have some protos which aren't generated how we want them to be:
- Until we have an option to specify the "umbrella" class, DescriptorProtoFile
will be broken. (The change of name here affects the reflection descriptor,
which accounts for most of the change. That's easier than trying to work out
exactly which occurrences of Descriptor need changing though.)
- That change affects UnittestCustomOptions
- Issue #307 breaks Unittest.cs
After this commit, we don't have the record of the fixups in the files themselves
any more, but one centralized record in the shell script.
To my surprise, executing generate_protos.sh used the version of Bash installed with Git for Windows by default.
After a few modifications to detect the most appropriate protoc to use, this worked pretty simply.
This change also:
- adds generation of the address book tutorial proto,
- fixes the addressbook.proto to specify proto2 explicitly (to avoid a warning from protoc; I don't think we want warnings...)
- fixes the addressbook.proto C# namespace (which I thought I'd done before, but apparently hadn't)
- includes the regenerated UnittestCustomOptions.cs apart from the DescriptorProtoFIle => Descriptor change
This is the start of establishing a C# namespace of "Google.ProtocolBuffers.TestProtos.Proto3" for proto3-syntax protos.
We could optionally split the directory structure as well into Proto2 and Proto3 for clarity.
This includes the NUnit test adapter which allows NUnit tests to be run under VS without any extra plugins.
Unfortunate the compatibility tests using the abstract test fixture class show up as "external" tests, and aren't well presented - but they do run.
All referring projects are now .NET 4 client rather than .NET 3.5.
This commit also fixes up the ProtoBench app, which I'd neglected in previous commits. (Disentangling the two sets of changes would be time-consuming.)
Move to a single solution file containing all of the C# projects, but no other solution folders - it's easier to edit those files outside VS than keep adding and removing them from the project.
The AddressBook protos have been regenerated (with a change to the example proto which I haven't included in this change - I'll wait for us to decide exactly what we're doing with namespaces before changing protos outside the csharp directory.
Note that now we've got Addressbook.cs which contains AddressBook and Addressbook classes. It's bad enough that we've got a class called AddressBook within a namespace of AddressBook (hard to get away from) but having things vary just by case is nasty.
This is more evidence that an option for renaming the file and descriptor class would be welcome. (A single option can probably handle both.)
This could potentially be added back in later, but its use is limited and it's a pain in terms of support in PCL environments.
One use that has been highlighted is passing objects between AppDomains; we'd recommend passing a byte array explicitly and reparsing on the other side.
1) Project files for different configurations - we're going to look at all this again, ideally to just have a single PCL-compatible build
2) ProtoGen - the C++ generator is now the only one we care about
3) Proto files - these are mostly duplicates (or older versions) of the ones in the common directories
(Having regenerated descriptor.proto relative to src, the earlier commented-out code checking that dependencies match may now be okay to uncomment again. Will experiment in later CLs.)
This commit includes changes to the C#-specific protos, and rebuilt versions of the "stock" protos.
The stock protos have been locally updated to have a specific C# namespace, but this is expected to change soon, so hasn't been committed.
Four areas are currently not tested:
1) Serialization - we may restore this at some point, possibly optionally.
2) Services - currently nothing is generated for this; will need to see how it interacts with GRPC
3) Fields beginning with _{digit} - see https://github.com/google/protobuf/issues/308
4) Fields with names which conflict with the declaring type in nasty ways - see https://github.com/google/protobuf/issues/309
1) Remove CSharpOptions
2) A new version of DescriptorProtoFile (with manual changes from codegen - it would otherwise be Descriptor.cs)
3) Turn off CLS compliance (which we'll remove from the codebase entirely; I don't think it's actually relevant these days)
4) Add "public imports" to FileDescriptor, with code broadly copied from the Java codebase.
Lots more changes to commit before it will build and tests run, but one step at a time...