345 lines
14 KiB
Markdown
345 lines
14 KiB
Markdown
|
# How Protobuf supports multiple C++ build systems
|
||
|
|
||
|
This document explains how the Protobuf project supports multiple C++ build
|
||
|
systems.
|
||
|
|
||
|
## Background
|
||
|
|
||
|
Protobuf primarily uses [Bazel](https://bazel.build) to build the Protobuf C++
|
||
|
runtime and Protobuf compiler[^historical_sot]. However, there are several
|
||
|
different build systems in common use for C++, each one of which requires
|
||
|
essentially a complete copy of the same build definitions.
|
||
|
|
||
|
[^historical_sot]:
|
||
|
On a historical note, prior to its [release as Open Source
|
||
|
Software](https://opensource.googleblog.com/2008/07/protocol-buffers-googles-data.html),
|
||
|
the Protobuf project was developed using Google's internal build system, which
|
||
|
was the predecessor to Bazel (the vast majority of Google's contributions
|
||
|
continue to be developed this way). The Open Source Protobuf project, however,
|
||
|
historically used Autoconf to build the C++ implementation.
|
||
|
Over time, other build systems (including Bazel) have been added, thanks in
|
||
|
large part to substantial contributions from the Open Source community. Since
|
||
|
the Protobuf project deals with multiple languages (all of which ultimately
|
||
|
rely upon C++, for the Protobuf compiler), Bazel is a natural choice for a
|
||
|
project-wide build system -- in fact, Bazel (and its predecessor, Blaze)
|
||
|
was designed in large part to support exactly this type of rich,
|
||
|
multi-language build.
|
||
|
|
||
|
Currently, C++ Protobuf can be built with Bazel, Autotools, and CMake. Each of
|
||
|
these build systems has different semantics and structure, but share in common
|
||
|
the list of files needed to build the runtime and compiler.
|
||
|
|
||
|
## Design
|
||
|
|
||
|
### Extracting information from Bazel
|
||
|
|
||
|
Bazel's Starlark API provides [aspects](https://bazel.build/rules/aspects) to
|
||
|
traverse the build graph, inspect build rules, define additional actions, and
|
||
|
expose information through
|
||
|
[providers](https://bazel.build/rules/rules#providers). For example, the
|
||
|
`cc_proto_library` rule uses an aspect to traverse the dependency graph of
|
||
|
`proto_library` rules, and dynamically attaches actions to generate C++ code
|
||
|
using the Protobuf compiler and compile using the C++ compiler.
|
||
|
|
||
|
In order to support multiple build systems, the overall build structure is
|
||
|
defined once for each system, and expose frequently-changing metadata
|
||
|
from Bazel in a way that can be included from the build definition. Primarily,
|
||
|
this means exposing the list of source files in a way that can be included
|
||
|
in other build definitions.
|
||
|
|
||
|
Two aspects are used to extract this information from the Bazel build
|
||
|
definitions:
|
||
|
|
||
|
* `cc_file_list_aspect` extracts `srcs`, `hdrs`, and `textual_hdrs` from build
|
||
|
rules like `cc_library`. The sources are exposed through a provider named
|
||
|
`CcFileList`.
|
||
|
* `proto_file_list_aspect` extracts the `srcs` from a `proto_library`, and
|
||
|
also generates the expected filenames that would be generated by the
|
||
|
Protobuf compiler. This information is exposed through a provider named
|
||
|
`ProtoFileList`.
|
||
|
|
||
|
On their own, these aspects have limited utility. However, they can be
|
||
|
instantiated by custom rules, so that an ordinary `BUILD.bazel` target can
|
||
|
produce outputs based on the information gleaned from these aspects.
|
||
|
|
||
|
### (Aside) Distribution libraries
|
||
|
|
||
|
Bazel's native `cc_library` rule is typically used on a "fine-grained" level, so
|
||
|
that, for example, lightweight unit tests can be written with narrow scope.
|
||
|
Although Bazel does build library artifacts (such as `.so` and `.a` files on
|
||
|
Linux), they correspond to `cc_library` rules.
|
||
|
|
||
|
Since the entire "Protobuf library" includes many constituent `cc_library`
|
||
|
rules, a special rule, `cc_dist_library`, combines several fine-grained
|
||
|
libraries into a single, monolithic library.
|
||
|
|
||
|
For the Protobuf project, these "distribution libraries" are intended to match
|
||
|
the granularity of the Autotools- and CMake-based builds. Since the Bazel-built
|
||
|
distribution library covers the rules with the source files needed by other
|
||
|
builds, the `cc_dist_library` rule invokes the `cc_file_list_aspect` on its
|
||
|
input libraries. The result is that a `cc_dist_library` rule not only produces
|
||
|
composite library artifacts, but also collect and provide the list of sources
|
||
|
that were inputs.
|
||
|
|
||
|
For example:
|
||
|
|
||
|
```
|
||
|
$ cat cc_dist_library_example/BUILD.bazel
|
||
|
load("@rules_cc//cc:defs.bzl", "cc_library")
|
||
|
load("//pkg:cc_dist_library.bzl", "cc_dist_library")
|
||
|
|
||
|
cc_library(
|
||
|
name = "a",
|
||
|
srcs = ["a.cc"],
|
||
|
)
|
||
|
|
||
|
cc_library(
|
||
|
name = "b",
|
||
|
srcs = ["b.cc"],
|
||
|
deps = [":c"],
|
||
|
)
|
||
|
|
||
|
# N.B.: not part of the cc_dist_library, even though it is in the deps of 'b':
|
||
|
cc_library(
|
||
|
name = "c",
|
||
|
srcs = ["c.cc"],
|
||
|
)
|
||
|
|
||
|
cc_dist_library(
|
||
|
name = "lib",
|
||
|
deps = [
|
||
|
":a",
|
||
|
":b",
|
||
|
],
|
||
|
visbility = ["//visibility:public"],
|
||
|
)
|
||
|
|
||
|
# Note: the output below has been formatted for clarity:
|
||
|
$ bazel cquery //cc_dist_library_example:lib \
|
||
|
--output=starlark \
|
||
|
--starlark:expr='providers(target)["//pkg:cc_dist_library.bzl%CcFileList"]'
|
||
|
struct(
|
||
|
hdrs = depset([]),
|
||
|
internal_hdrs = depset([]),
|
||
|
srcs = depset([
|
||
|
<source file cc_dist_library_example/a.cc>,
|
||
|
<source file cc_dist_library_example/b.cc>,
|
||
|
]),
|
||
|
textual_hdrs = depset([]),
|
||
|
)
|
||
|
```
|
||
|
|
||
|
The upshot is that the "coarse-grained" library can be defined by the Bazel
|
||
|
build, and then export the list of source files that are needed to reproduce the
|
||
|
library in a different build system.
|
||
|
|
||
|
One major difference from most Bazel rule types is that the file list aspects do
|
||
|
not propagate. In other words, they only expose the immediate dependency's
|
||
|
sources, not transitive sources. This is for two reasons:
|
||
|
|
||
|
1. Immediate dependencies are conceptually simple, while transitivity requires
|
||
|
substantially more thought. For example, if transitive dependencies were
|
||
|
considered, then some way would be needed to exclude dependencies that
|
||
|
should not be part of the final library (for example, a distribution library
|
||
|
for `//:protobuf` could be defined not to include all of
|
||
|
`//:protobuf_lite`). While dependency elision is an interesting design
|
||
|
problem, the protobuf library is small enough that directly listing
|
||
|
dependencies should not be problematic.
|
||
|
2. Dealing only with immediate dependencies gives finer-grained control over
|
||
|
what goes into the composite library. For example, a Starlark `select()`
|
||
|
could conditionally add fine-grained libraries to some builds, but not
|
||
|
others.
|
||
|
|
||
|
Another subtlety for tests is due to Bazel internals. Internally, a slightly
|
||
|
different configuration is used when evaluating `cc_test` rules as compared to
|
||
|
`cc_dist_library`. If `cc_test` targets are included in a `cc_dist_library`
|
||
|
rule, and both are evaluated by Bazel, this can result in a build-time error:
|
||
|
the config used for the test contains additional options that tell Bazel how to
|
||
|
execute the test that the `cc_file_list_aspect` build config does not. Bazel
|
||
|
detects this as two conflicting actions generating the same outputs. (For
|
||
|
`cc_test` rules, the simplest workaround is to provide sources through a
|
||
|
`filegroup` or similar.)
|
||
|
|
||
|
### File list generation
|
||
|
|
||
|
Lists of input files are generated by Bazel in a format that can be imported to
|
||
|
other build systems. Currently, Automake- and CMake-style files can be
|
||
|
generated.
|
||
|
|
||
|
The lists of files are derived from Bazel build targets. The sources can be:
|
||
|
* `cc_dist_library` rules (as described above)
|
||
|
* `proto_library` rules
|
||
|
* individual files
|
||
|
* `filegroup` rules
|
||
|
* `pkg_files` or `pkg_filegroup` rules from
|
||
|
https://github.com/bazelbuild/rules_pkg
|
||
|
|
||
|
For example:
|
||
|
|
||
|
```
|
||
|
$ cat gen_file_lists_example/BUILD.bazel
|
||
|
load("@rules_proto//proto:defs.bzl", "proto_library")
|
||
|
load("//pkg:build_systems.bzl", "gen_cmake_file_lists")
|
||
|
|
||
|
filegroup(
|
||
|
name = "doc_files",
|
||
|
srcs = [
|
||
|
"README.md",
|
||
|
"englilsh_paper.md",
|
||
|
],
|
||
|
)
|
||
|
|
||
|
proto_library(
|
||
|
name = "message",
|
||
|
srcs = ["message.proto"],
|
||
|
)
|
||
|
|
||
|
gen_cmake_file_lists(
|
||
|
name = "source_lists",
|
||
|
out = "source_lists.cmake",
|
||
|
src_libs = {
|
||
|
":doc_files": "docs",
|
||
|
":message": "buff",
|
||
|
"//cc_dist_library_example:c": "distlib",
|
||
|
},
|
||
|
)
|
||
|
|
||
|
$ bazel build gen_file_lists_example:source_lists
|
||
|
$ cat bazel-bin/gen_file_lists_example/source_lists.cmake
|
||
|
# Auto-generated by //gen_file_lists_example:source_lists
|
||
|
#
|
||
|
# This file contains lists of sources based on Bazel rules. It should
|
||
|
# be included from a hand-written CMake file that defines targets.
|
||
|
#
|
||
|
# Changes to this file will be overwritten based on Bazel definitions.
|
||
|
|
||
|
if(${CMAKE_VERSION} VERSION_GREATER 3.10 OR ${CMAKE_VERSION} VERSION_EQUAL 3.10)
|
||
|
include_guard()
|
||
|
endif()
|
||
|
|
||
|
# //gen_file_lists_example:doc_files
|
||
|
set(docs_files
|
||
|
gen_file_lists_example/README.md
|
||
|
gen_file_lists_example/englilsh_paper.md
|
||
|
)
|
||
|
|
||
|
# //gen_file_lists_example:message
|
||
|
set(buff_proto_srcs
|
||
|
gen_file_lists_example/message.proto
|
||
|
)
|
||
|
|
||
|
# //gen_file_lists_example:message
|
||
|
set(buff_srcs
|
||
|
gen_file_lists_example/message.proto.pb.cc
|
||
|
)
|
||
|
|
||
|
# //gen_file_lists_example:message
|
||
|
set(buff_hdrs
|
||
|
gen_file_lists_example/message.proto.pb.h
|
||
|
)
|
||
|
|
||
|
# //gen_file_lists_example:message
|
||
|
set(buff_files
|
||
|
gen_file_lists_example/message-descriptor-set.proto.bin
|
||
|
)
|
||
|
|
||
|
# //cc_dist_library_example:c
|
||
|
set(distlib_srcs
|
||
|
cc_dist_library_example/a.cc
|
||
|
cc_dist_library_example/b.cc
|
||
|
)
|
||
|
|
||
|
# //cc_dist_library_example:c
|
||
|
set(distlib_hdrs
|
||
|
|
||
|
)
|
||
|
```
|
||
|
|
||
|
A hand-written CMake build rule could then use the generated file to define
|
||
|
libraries, such as:
|
||
|
|
||
|
```
|
||
|
include(source_lists.cmake)
|
||
|
add_library(distlib ${distlib_srcs} ${buff_srcs})
|
||
|
```
|
||
|
|
||
|
In addition to `gen_cmake_file_lists`, there is also a `gen_automake_file_lists`
|
||
|
rule. These rules actually share most of the same implementation, but define
|
||
|
different file headers and different Starlark "fragment generator" functions
|
||
|
which format the generated list variables.
|
||
|
|
||
|
### Protobuf usage
|
||
|
|
||
|
The main C++ runtimes (lite and full) and the Protobuf compiler use their
|
||
|
corresponding `cc_dist_library` rules to generate file lists. For
|
||
|
`proto_library` targets, the file list generation can extract the source files
|
||
|
directly. For other targets, notably `cc_test` targets, the file list generators
|
||
|
use `filegroup` rules.
|
||
|
|
||
|
In general, adding new targets to a non-Bazel build system in Protobuf (or
|
||
|
adding a new build system altogether) requires some one-time setup:
|
||
|
|
||
|
1. The overall structure of the new build system has to be defined. It should
|
||
|
import lists of files and refer to them by variable, instead of listing
|
||
|
files directly.
|
||
|
2. (Only if the build system is new) A new rule type has to be added to
|
||
|
`//pkg:build_systems.bzl`. Most of the implementation is shared, but a
|
||
|
"fragment generator" is need to declare a file list variable, and the rule
|
||
|
type itself has to be defined and call the shared implementation.
|
||
|
|
||
|
When files are added or deleted, or when the Protobuf Bazel structure is
|
||
|
changed, these changes may need to be reflected in the file list logic. These
|
||
|
are some example scenarios:
|
||
|
|
||
|
* Files are added to (or removed from) the `srcs` of an existing `cc_library`:
|
||
|
no changes needed. If the `cc_library` is already part of a
|
||
|
`cc_dist_library`, then regenerating the source lists will reflect the
|
||
|
change.
|
||
|
* A `cc_library` is added: the new target may need to be added to the Protobuf
|
||
|
`cc_dist_library` targets, as appropriate.
|
||
|
* A `cc_library` is deleted: if a `cc_dist_library` depends upon the deleted
|
||
|
target, then a build-time error will result. The library needs to be removed
|
||
|
from the `cc_dist_library`.
|
||
|
* A `cc_test` is added or deleted: test sources are handled by `filegroup`
|
||
|
rules defined in the same package as the `cc_test` rule. The `filegroup`s
|
||
|
are usually given a name like `"test_srcs"`, and often use `glob()` to find
|
||
|
sources. This means that adding or removing a test may not require any extra
|
||
|
work, but this can be verified within the same package as the test rule.
|
||
|
* Test-only proto files are added: the `proto_library` might need to be added
|
||
|
to the file list map in `//pkg:BUILD.bazel`, and then the file added to
|
||
|
various build systems. However, most test-only protos are already exposed
|
||
|
through libraries like `//src/google/protobuf:test_protos`.
|
||
|
|
||
|
If there are changes, then the regenerated file lists need to be copied back
|
||
|
into the repo. That way, the corresponding build systems can be used with a git
|
||
|
checkout, without needing to run Bazel first.
|
||
|
|
||
|
### (Aside) Distribution archives
|
||
|
|
||
|
A very similar set of rules is defined in `//pkg` to build source distribution
|
||
|
archives for releases. In addition to the full sources, Protobuf releases also
|
||
|
include source archives sliced by language, so that, for example, a Ruby-based
|
||
|
project can get just the sources needed to build the Ruby runtime. (The
|
||
|
per-language slices also include sources needed to build the protobuf compiler,
|
||
|
so they all effectively include the C++ runtime.)
|
||
|
|
||
|
These archives are defined using rules from the
|
||
|
[rules_pkg](https://github.com/bazelbuild/rules_pkg) project. Although they are
|
||
|
similar to `cc_dist_library` and the file list generation rules, the goals are
|
||
|
different: the build system file lists described above only apply to C++, and
|
||
|
are organized according to what should or should not be included in different
|
||
|
parts of the build (e.g., no tests are included in the main library). On the
|
||
|
other hand, the distribution archives deal with languages other than C++, and
|
||
|
contain all the files that need to be distributed as part of a release (even for
|
||
|
C++, this is more than just the C++ sources).
|
||
|
|
||
|
While it might be possible to use information from the `CcFileList` and
|
||
|
`ProtoFileList` providers to define the distribution files, additional files
|
||
|
(such as the various `BUILD.bazel` files) are also needed in the distribution
|
||
|
archive. The lists of distribution files can usually be generated by `glob()`,
|
||
|
anyhow, so sharing logic with the file list aspects may not be beneficial.
|
||
|
|
||
|
Currently, all of the file lists are checked in. However, it would be possible
|
||
|
to build the file lists on-the-fly and include them in the distribution
|
||
|
archives, rather than checking them in.
|