Text Formatting

2016-08-19

Victor Zverovich, victor.zverovich@gmail.com

Introduction
Design
    Format String Syntax
    Extensibility
    Safety
    Locale Support
    Positional Arguments
    Binary Footprint
Proposed Wording
References

Introduction

This paper proposes a new text formatting functionality that can be used as a safe and extensible alternative to the printf family of functions. It is intended to complement the existing C++ I/O streams library and reuse some of its infrastructure such as overloaded insertion operators for user-defined types.

Example:

std::string message = std::format("The answer is {}.", 42);

Design

Format String Syntax

Variations of the printf format string syntax are arguably the most popular among the programming languages and C++ itself inherits printf from C [1]. The advantage of the printf syntax is that many programmers are familiar with it. However, in its current form it has a number of issues:

Although it is possible to address these issues, this will break compatibility and can potentially be more confusing to users than introducing a different syntax.

Therefore we propose a new syntax based on the ones used in Python [3], the .NET family of languages [4], and Rust [5]. This syntax employs '{' and '}' as replacement field delimiters instead of '%' and it is described in details in TODO:link. Here are some of the advantages:

The syntax is expressive enough to enable translation, possibly automated, of most printf format strings. The correspondence between printf and the new syntax is given in the following table.

printfnew
-<
++
spacespace
##
00
hhunused
hunused
lunused
llunused
junused
zunused
tunused
Lunused
cc (optional)
ss (optional)
dd (optional)
id (optional)
oo
xx
XX
ud (optional)
ff
FF
ee
EE
aa
AA
gg (optional)
GG
nunused
pp (optional)

Width and precision are represented similarly in printf and the proposed syntax with the only difference that runtime value is specified by * in the former and {} in the latter, possibly with the index of the argument inside the braces.

As can be seen from the table above, most of the specifiers remain the same which simplifies migration from printf. Notable difference is in the alignment specification. The proposed syntax allows left, center, and right alignment represented by '<', '^', and '>' respectively which is more expressive than the corresponding printf syntax. The latter only supports left and right (the default) alignment.

The following example uses center alignment and '*' as a fill character:

std::format("{:*^30}", "centered");

resulting in "***********centered***********". The same formatting cannot be easily achieved with printf.

Extensibility

Both the format string syntax and the API are designed with extensibility in mind. The mini-language can be extended for user-defined types and users can provide functions that do parsing and formatting for such types.

The general syntax of a replacement field in a format string is

replacement-field ::=  '{' [arg-id] [':' format-spec] '}'

where format-spec is predefined for built-in types, but can be customized for user-defined types. For example, the syntax can be extended for put_time-like date and time formatting

std::time_t t = std::time(nullptr);
std::string date = std::format("The date is {0:%Y-%m-%d}.", *std::localtime(&t));

by providing an overload of std::format_arg for std::tm:

TODO: example

Safety

Formatting functions rely on variadic templates instead of the mechanism provided by <cstdarg>. The type information is captured automatically and passed to formatters guaranteeing type safety and making many of the printf specifiers redundant (see Format String Syntax). Buffer management is also automatic to prevent buffer overflow errors common to printf.

Locale Support

As pointed out in P0067R1: Elementary string conversions there is a number of use cases that do not require internationalization support, but do require high throughput when produced by a server. These include various text-based interchange formats such as JSON or XML. The need for locale-independent functions for conversions between integers and strings and between floating-point numbers and strings has also been highlighted in N4412: Shortcomings of iostreams. Therefore a user should be able to easily control whether to use locales or not during formatting.

We follow Python's approach [3] and designate a separate format specifier 'n' for locale-aware numeric formatting. It applies to all integral and floating-point types. All other specifiers produce output unaffected by locale settings. This can also have positive peformance effect because locale-independent formatting can be implemented more efficiently.

Positional Arguments

An important feature for localization is the ability to rearrange formatting arguments because the word order may vary in different languages [3]. For example:

printf("String `%s' has %d characters\n", string, length(string)))

A possible German translation of the format string might be:

"%2$d Zeichen lang ist die Zeichenkette `%1$s'\n"

using POSIX positional arguments [2]. Unfortunately these positional specifiers are not portable [6]. The C++ I/O streams don't support positional arguments by design because formatting arguments are interleaved with the portions of the literal string:

std::cout << "String `" << string << "' has " << length(string) << " characters\n"

The current proposal allows both positional and automatically numbered arguments, for example:

std::format("String `{}' has {} characters\n", string, length(string)))

with the German translation of the format string:

"{1} Zeichen lang ist die Zeichenkette `{0}'\n"

Performance

TODO

Binary Footprint

TODO

Proposed Wording

The header <format> defines the function templates format that format arguments and return the results as strings. TODO: rephrase and mention format_args

Header <format> synopsis

namespace std {
  class format_args;

  template <class Char>
  basic_string<Char> format(const Char *fmt, format_args args);

  template <class Char, class ...Args>
  basic_string<Char> format(const Char *fmt, const Args&... args);
}

Format string syntax

replacement-field ::=  '{' [arg-id] [':' format-spec] '}'
arg-id            ::=  integer
integer           ::=  digit+
digit             ::=  '0'...'9'
format-spec ::=  [[fill] align] [sign] ['#'] ['0'] [width] ['.' precision] [type]
fill        ::=  <a character other than '{' or '}'>
align       ::=  '<' | '>' | '=' | '^'
sign        ::=  '+' | '-' | ' '
width       ::=  integer | '{' arg-id '}'
precision   ::=  integer | '{' arg-id '}'
type        ::=  int-type | 'a' | 'A' | 'c' | 'e' | 'E' | 'f' | 'F' | 'g' | 'G' | 'p' | 's'
int-type    ::=  'b' | 'B' | 'd' | 'o' | 'x' | 'X'

Implementation

The ideas proposed in this paper have been implemented in the open-source fmt library. TODO: link and mention other implementations (Boost Format, FastFormat)

References

[1] The fprintf function. ISO/IEC 9899:2011. 7.21.6.1.
[2] fprintf, printf, snprintf, sprintf - print formatted output. The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition.
[3] 6.1.3. Format String Syntax. Python 3.5.2 documentation.
[4] String.Format Method. .NET Framework Class Library.
[5] Module std::fmt. The Rust Standard Library.
[6] Format Specification Syntax: printf and wprintf Functions. C++ Language and Standard Libraries.
[7] 10.4.2 Rearranging printf Arguments. The GNU Awk User's Guide.