This page describes the user-level view of the FLAC format (for a more detailed explanation see the <AHREF="format.html">format page</A>). It also contains the user documentation for <B><TT>flac</TT></B>, which is the command-line file encoder/decoder, <B><TT>metaflac</TT></B>, the FLAC metadata editor, and the <AHREF="#plugins">input plugins</A>.
Keep in mind that the online version of this document will always apply to the latest release. For older releases, check the documentation included with the release package.
<B><TT>flac</TT></B> has been tuned so that the default options yield a good speed vs. compression tradeoff for many kinds of input. However, if you are looking to maximize the compression rate or speed, or want to use the full power of FLAC's metadata system, this section is for you. If not, just skip to the <AHREF="#flac">next section</A>.
The first four bytes are to identify the FLAC stream. The metadata that follows contains all the information about the stream except for the audio data itself. After the metadata comes the encoded audio data.
</P>
<P>
<B>METADATA</B>
</P>
<P>
FLAC defines several types of metadata blocks (see the <AHREF="format.html">format</A> page for the complete list. Metadata blocks can be any length and new ones can be defined. A decoder is allowed to skip any metadata types it does not understand. Only one is mandatory: the STREAMINFO block. This block has information like the sample rate, number of channels, etc., and data that can help the decoder manage its buffers, like the minimum and maximum data rate and minimum and maximum block size. Also included in the STREAMINFO block is the MD5 signature of the <I>unencoded</I> audio data. This is useful for checking an entire stream for transmission errors.
</P>
<P>
Other blocks allow for padding, seek tables, and application-specific data. You can see <B><TT>flac</TT></B> options below for adding PADDING blocks or specifying seek points. FLAC does not require seek points for seeking but they can speed up seeks, or be used for cueing in editing applications.
<P>
</P>
Also, if you have a need of a custom metadata block, you can define your own and request an ID <AHREF="id.html">here</A>. Then you can reserve a PADDING block of the correct size when encoding, and overwrite the padding block with your APPLICATION block after encoding. The resulting stream will be FLAC compatible; decoders that are aware of your metadata can use it and the rest will safely ignore it.
</P>
<P>
<B>AUDIO DATA</B>
</P>
<P>
After the metadata comes the encoded audio data. Audio data and metadata are not interleaved. Like most audio codecs, FLAC splits the unencoded audio data into blocks, and encodes each block separately. The encoded block is packed into a frame and appended to the stream. The reference encoder uses a single block size for the whole stream but the FLAC format does not require it.
</P>
<P>
<B>BLOCKING</B>
</P>
<P>
The block size is an important parameter to encoding. If it is too small, the frame overhead will lower the compression. If it is too large, the modeling stage of the compressor will not be able to generate an efficient model. Understanding FLAC's modeling will help you to improve compression for some kinds of input by varying the block size. In the most general case, using linear prediction on 44.1kHz audio, the optimal block size will be between 2-6 ksamples. <B><TT>flac</TT></B> defaults to a block size of 4608 in this case. Using the fast fixed predictors, a smaller block size is usually preferable because of the smaller frame header.
</P>
<P>
<B>INTER-CHANNEL DECORRELATION</B>
</P>
<P>
In the case of stereo input, once the data is blocked it is optionally passed through an inter-channel decorrelation stage. The left and right channels are converted to center and side channels through the following transformation: mid = (left + right) / 2, side = left - right. This is a lossless process, unlike joint stereo. For normal CD audio this can result in significant extra compression. <B><TT>flac</TT></B> has two options for this: <TT>-m</TT> always compresses both the left-right and mid-side versions of the block and takes the smallest frame, and <TT>-M</TT>, which adaptively switches between left-right and mid-side.
</P>
<P>
<B>MODELING</B>
</P>
<P>
In the next stage, the encoder tries to approximate the signal with a function in such a way that when the approximation is subracted, the result (called the <I>residual</I>, <I>residue</I>, or <I>error</I>) requires fewer bits-per-sample to encode. The function's parameters also have to be transmitted so they should not be so complex as to eat up the savings. FLAC has two methods of forming approximations: 1) fitting a simple polynomial to the signal; and 2) general linear predictive coding (LPC). I will not go into the details here, only some generalities that involve the encoding options.
</P>
<P>
First, fixed polynomial prediction (specified with <TT>-l 0</TT>) is much faster, but less accurate than LPC. The higher the maximum LPC order, the slower, but more accurate, the model will be. However, there are diminishing returns with increasing orders. Also, at some point (around order 9) the part of the encoder that guesses what is the best order to use will start to get it wrong and the compression will actually decrease slightly; at that point you will have to you will have to use the exhaustive search option <TT>-e</TT> to overcome this, which is significantly slower.
</P>
<P>
Second, the parameters for the fixed predictors can be transmitted in 3 bits whereas the parameters for the LPC model depend on the bits-per-sample and LPC order. This means the frame header length varies depending on the method and order you choose and can affect the optimal block size.
</P>
<P>
<B>RESIDUAL CODING</B>
</P>
<P>
Once the model is generated, the encoder subracts the approximation from the original signal to get the residual (error) signal. The error signal is then losslessly coded. To do this, FLAC takes advantage of the fact that the error signal generally has a Laplacian (two-sided geometric) distribution, and that there are a set of special Huffman codes called Rice codes that can be used to efficiently encode these kind of signals quickly and without needing a dictionary.
</P>
<P>
Rice coding involves finding a single parameter that matches a signal's distribution, then using that parameter to generate the codes. As the distribution changes, the optimal parameter changes, so FLAC supports a method that allows the parameter to change as needed. The residual can be broken into several <I>contexts</I> or <I>partitions</I>, each with it's own Rice parameter. <B><TT>flac</TT></B> allows you to specify how the partitioning is done with the <TT>-r</TT> option. The residual can be broken into 2^<I>n</I> partitions, by using the option <TT>-r n,n</TT>. The parameter <I>n</I> is called the <I>partition order</I>. Furthermore, the encoder can be made to search through <I>m</I> to <I>n</I> partition orders, taking the best one, by specifying <TT>-r m,n</TT>. Generally, the choice of n does not affect encoding speed but m,n does. The larger the difference between m and n, the more time it will take the encoder to search for the best order. The block size will also affect the optimal order.
</P>
<P>
<B>FRAMING</B>
</P>
<P>
An audio frame is preceded by a frame header and trailed by a frame footer. The header starts with a sync code, and contains the minimum information necessary for a decoder to play the stream, like sample rate, bits per sample, etc. It also contains the block or sample number and an 8-bit CRC of the frame header. The sync code, frame header CRC, and block/sample number allow resynchronization and seeking even in the absence of seek points. The frame footer contains a 16-bit CRC of the entire encoded frame for error detection. If the reference decoder detects a CRC error it will generate a silent block.
</P>
<P>
<B>MISCELLANEOUS</B>
</P>
<P>
In order to support come common types of metadata, the reference decoder knows how to skip ID3V1 and ID3V2 tags so it is safe to tag FLAC files in this way. ID3V2 tags must come at the beginning of the file (before the "fLaC" marker) and ID3V1 tags must come at the end of the file.
</P>
<P>
<B><TT>flac</TT></B> has a verify option <TT>-V</TT> that verifies the output while encoding. With this option, a decoder is run in parallel to the encoder and its output is compared against the original input. If a difference is found <B><TT>flac</TT></B> will stop with an error.
<B><TT>flac</TT></B> is the command-line file encoder/decoder. The input to the encoder and the output to the decoder must either be RIFF WAVE format, or raw interleaved sample data. <B><TT>flac</TT></B> only supports linear PCM samples (in other words, no A-LAW, uLAW, etc.). Another restriction (hopefully short-term) is that the input must be 8, 16, or 24 bits per sample. This is not a limitation of the FLAC format, just the reference encoder/decoder.
<B><TT>flac</TT></B> assumes that RIFF WAVE files will have the extension ".wav"; this may be overridden with a command-line option. For piped-in data, <B><TT>flac</TT></B> tries to determine the type by looking at the beginning of the file. Other than this, <B><TT>flac</TT></B> makes no assumptions about file extensions, though the convention is that FLAC files have the extension ".flac" (or ".fla" on ancient file systems like FAT-16).
Before going into the full command-line description, a few other things help to sort it out: 1) <B><TT>flac</TT></B> encodes by default, so you must use <B>-d</B> to decode; 2) the options <B><TT>-0</TT></B> .. <B><TT>-9</TT></B> that control the compression level actually are just synonyms for different groups of specific encoding options (described later) and you can get the same effect by using the same options; 3) <B><TT>flac</TT></B> behaves similarly to gzip in the way it handles input and output files.
In any case, if no <TT>inputfile</TT> is specified, stdin is assumed. If only one inputfile is specified, it may be "-" for stdin. When stdin is used as input, <B><TT>flac</TT></B> will write to stdout. Otherwise <B><TT>flac</TT></B> will perform the desired operation on each input file to similarly named output files (meaning for encoding, the extension will be replaced with ".flac", or appended with ".flac" if the input file has no extension, and for decoding, the extension will be ".wav" for WAVE output and ".raw" for raw output). The original file is not deleted unless --delete-input-file is specified.
since the former allows flac to seek backwards to write the STREAMINFO or RIFF WAVE header contents when necessary.
</P>
<P>
Also, you can force output data to go to stdout using <TT>-c</TT>.
</P>
<P>The encoding options affect the compression ratio and encoding speed. The format options are used to tell <B><TT>flac</TT></B> the arrangement of samples if the input file (or output file when decoding) is a raw file. If it is a RIFF WAVE file the format options are not needed since they are read from the WAVE header.
In test mode, <B><TT>flac</TT></B> acts just like in decode mode, except no output file is written. Both decode and test modes detect errors in the stream, but they also detect when the MD5 signature of the decoded audio does not match the stored MD5 signature, even when the bitstream is valid.
Decode (<B><TT>flac</TT></B> encodes by default). <B><TT>flac</TT></B> will exit with an exit code of <TT>1</TT> (and print a message, even in silent mode) if there were any errors during decoding, including when the MD5 checksum does not match the decoded output. Otherwise the exit code will be <TT>0</TT>.
Analyze (same as <B><TT>-d</TT></B> except an analysis file is written). The exit codes are the same as in decode mode. This option is mainly for developers; the output will be a text file that has data about each frame and subframe.
Automatically delete the input file after a successful encode or decode. If there was an error (including a verify error) the input file is left intact.
Allow encoder to generate non-Subset files. The resulting FLAC file may not be streamable, so you should only use this option in combination with custom encoding options meant for archival. File decoders will still be able play (and seek in) such files.
NOTE: if you use -S # and # is >= samples in the input, there will be either no seek point entered (if the input size is determinable before encoding starts) or a placeholder point (if input size is not determinable).<BR>
Tell the encoder to write a <TT>PADDING</TT> metadata block of the given length (in bytes) after the <TT>STREAMINFO</TT> block. <TT>-P 0</TT> implies no <TT>PADDING</TT> block, which is the default. This is useful if you plan to tag the file later with an <TT>APPLICATION</TT> block; instead of having to rewrite the entire file later just to insert your block, you can write directly over the <TT>PADDING</TT> block.
Specify the block size in samples. The default is 1152 for -l 0, otherwise 4608. Subset streams must use one of 192/576/1152/2304/4608/256/512/1024/2048/4096/8192/16384/32768. The reference encoder uses the same block size for the entire stream.
Enable mid-side coding (only for stereo streams). Tends to increase compression by a few percent on average. For each block both the stereo pair and mid-side versions of the block will be encoded, and smallest resulting frame will be stored. Currently mid-side encoding is only available when bits-per-sample <= 16.
Enable loose mid-side coding (only for stereo streams). Like <TT>-m</TT> but the encoder adaptively switches between independent and mid-side coding, which is faster but yields less compression than <TT>-m</TT> (which does an exhaustive search).
Synonymous with -l 32 -b 4608 -m -e -r 16 -p. This is painfully slow but gives you the maximum compression <B><TT>flac</TT></B> can do for the given block size.
Exhaustive model search (expensive!). Normally the encoder estimates the best model to use and encodes once based on the estimate. With an exhaustive model search, the encoder will generate subframes for every order and use the smallest. If the max LPC order is high this can significantly increase the encode time but can shave off another 0.5%.
Specifies the maximum LPC order. This number must be <= 32. If 0, the encoder will not attempt generic linear prediction, and use only fixed predictors. Using fixed predictors is faster but usually results in files being 5-10% larger.
Specifies the precision of the quantized LP coefficients, in bits. The default is <B><TT>-q 0</TT></B>, which means let the encoder decide based on the signal. Unless you really know your input file it's best to leave this up to the encoder.
Do exhaustive LP coefficient quantization optimization. This option overrides any <B><TT>-q</TT></B> option. It is expensive and typically will only improve the compression a tiny fraction of a percent. <B><TT>-q</TT></B> has no effect when <B><TT>-l 0</TT></B> is used.
By default the encoder uses a single Rice parameter for the subframe's entire residual. With this option, the residual is iteratively partitioned into 2^min# .. 2^max# pieces, each with its own Rice parameter. Higher values of max# yield diminishing returns. The most bang for the buck is usually with <B><TT>-r 2,2</TT></B> (more for higher block sizes). This usually shaves off about 1.5%. The technique tends to peak out about when blocksize/(2^n)=128. Use <B><TT>-r 0,16</TT></B> to force the highest degree of optimization.
Set the Rice parameter search distance. Defaults to 0. The residual coder will search for the best Rice parameter +/- this number for each residual partition. This option is expensive (run time for -R n will typically be (2n)*30% over that of -R 0) and doesn't give much of a gain. As a matter of fact, none of the -0..-9 options currently use it since -R > 1 is not consistently better like it should be.
Verify the encoding process. With this option, <B><TT>flac</TT></B> will create a parallel decoder that decodes the output of the encoder and compares the result against the original. It will abort immediately with an error if a mismatch occurs. <B><TT>-V</TT></B> increases the total encoding time but is guaranteed to catch any unforseen bug in the encoding process.
<TT>-S-</TT>, <TT>-m-</TT>, <TT>-e-</TT>, <TT>-p-</TT>, <TT>-V-</TT>, <TT>--delete-input-file-</TT>, <TT>--lax-</TT> can all be used to turn off a particular option.
<B><TT>metaflac</TT></B> is the command-line <TT>.flac</TT> file metadata editor. Right now it just lists the contents of all metadata blocks in a .flac file, but soon it will allow you to insert, delete, and edit blocks.
</P>
<P>
Currently <B><TT>metaflac</TT></B> can be invoked only one way:
<UL>
<LI>
Listing: metaflac [-v] inputfile
</LI>
</UL>
</P>
<P>
<TT>inputfile</TT> may be "-" for stdin. If <TT>-v</TT> is used, you will get verbose output.
All that is necessary is to copy <B><TT>libxmms-flac.so</TT></B> to the directory where XMMS looks for input plugins (usually <B><TT>/usr/lib/xmms/Input</TT></B>). There is nothing else to configure. Make sure to restart XMMS before trying to play any <TT>.flac</TT> files.
All that is necessary is to copy <B><TT>in_flac.dll</TT></B> to the <B><TT>Plugins/</TT></B> directory of your Winamp installation. There is nothing else to configure. Make sure to restart Winamp before trying to play any <TT>.flac</TT> files.
Bug tracking is done on the Sourceforge project page <AHREF="http://sourceforge.net/bugs/?group_id=13478">here</A>. If you submit a bug, please provide an email contact and/or use the Monitor feature.