From e6abb4709f069b534e8bb51956bdfbe03a0689af Mon Sep 17 00:00:00 2001
From: Owen Taylor <otaylor@redhat.com>
Date: Fri, 5 May 2000 11:44:15 +0000
Subject: [PATCH] Add beginnings of file with detailed information about the
 structure and

Fri May  5 12:16:32 2000  Owen Taylor  <otaylor@redhat.com>

	* gdk-pixbuf/pixops/DETAILS: Add beginnings of file with
	detailed information about the structure and algorithms
	of pixops so people can fix it instead of breaking it.

CvS: Added Files:
---
 gdk-pixbuf/ChangeLog      |   6 +
 gdk-pixbuf/pixops/DETAILS | 355 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 361 insertions(+)
 create mode 100644 gdk-pixbuf/pixops/DETAILS

diff --git a/gdk-pixbuf/ChangeLog b/gdk-pixbuf/ChangeLog
index 5f7ef08437..e339b5e8a9 100644
--- a/gdk-pixbuf/ChangeLog
+++ b/gdk-pixbuf/ChangeLog
@@ -1,3 +1,9 @@
+Fri May  5 12:16:32 2000  Owen Taylor  <otaylor@redhat.com>
+
+	* gdk-pixbuf/pixops/DETAILS: Add beginnings of file with 
+	detailed information about the structure and algorithms
+	of pixops so people can fix it instead of breaking it.
+
 2000-05-04  Darin Adler  <darin@eazel.com>
 
 	* gdk-pixbuf/pixops/pixops.c: (pixops_composite_nearest),
diff --git a/gdk-pixbuf/pixops/DETAILS b/gdk-pixbuf/pixops/DETAILS
new file mode 100644
index 0000000000..acf16f57e7
--- /dev/null
+++ b/gdk-pixbuf/pixops/DETAILS
@@ -0,0 +1,355 @@
+General ideas of Pixops
+=======================
+
+ - Gain speed by special-casing the common case, and using
+   generic code to handle the uncommon case.
+
+ - Most of the time in scaling an image is in the center;
+   however code that can handle edges properly is slow
+   because it needs to deal with the possibility of running
+   off the edge. So make the fast case code only handle
+   the centers, and use generic, slow, code for the edges,
+
+Structure of Pixops
+===================
+
+The code of pixops can roughly be grouped into four parts:
+
+ - Filter computation functions
+
+ - Functions for scaling or compositing lines and pixels
+   using precomputed filters
+
+ - pixops process, the central driver that iterates through
+   the image calling pixel or line functions as necessary
+   
+ - Wrapper functions (pixops_scale/composite/composite_color)
+   that compute the filter, chooses the line and pixel functions
+   and then call pixops_processs with the filter, line,
+   and pixel functions.
+
+
+pixops process is a pretty scary looking function:
+
+static void
+pixops_process (guchar         *dest_buf,
+		int             render_x0,
+		int             render_y0,
+		int             render_x1,
+		int             render_y1,
+		int             dest_rowstride,
+		int             dest_channels,
+		gboolean        dest_has_alpha,
+		const guchar   *src_buf,
+		int             src_width,
+		int             src_height,
+		int             src_rowstride,
+		int             src_channels,
+		gboolean        src_has_alpha,
+		double          scale_x,
+		double          scale_y,
+		int             check_x,
+		int             check_y,
+		int             check_size,
+		guint32         color1,
+		guint32         color2,
+		PixopsFilter   *filter,
+		PixopsLineFunc  line_func,
+		PixopsPixelFunc pixel_func)
+
+(Some of the arguments should be moved into structures. It's basically
+"all the arguments to pixops_composite_color plus three more") The
+arguments can be divided up into:
+
+
+Information about the destination buffer
+
+   guchar *dest_buf, int dest_rowstride, int dest_channels, gboolean dest_has_alpha,
+
+Information about the source buffer
+
+   guchar *src_buf,  int src_rowstride,  int src_channels,  gboolean src_has_alpha,
+   int src_width, int src_height,
+
+Information on how to scale the source buf and the region of the scaled source
+to render onto the destination buffer
+
+   int render_x0, int render_y0, int render_x1, int render_y1
+   double scale_x, double scale_y
+
+Information about a constant color or check pattern onto which to to composite
+
+   int check_x,	int check_y, int check_size, guint32 color1, guint32 color2
+
+Information precomputed to use during the scale operation
+
+   PixopsFilter *filter, PixopsLineFunc line_func, OixopsPixelFunc pixel_func
+
+
+Filter computation
+==================
+
+The PixopsFilter structure looks like:
+
+struct _PixopsFilter
+{
+  int *weights;
+  int n_x;
+  int n_y;
+  double x_offset;
+  double y_offset;
+}; 
+
+
+'weights' is an array of size:
+
+ weights[SUBSAMPLE][SUBSAMPLE][n_x][n_y]
+
+SUBSAMPLE is a constant - currently 16 in pixops.c.
+
+
+In order to compute a scaled destination pixel we convolve
+an array of n_x by n_y source pixels with one of
+the SUBSAMPLE * SUBSAMPLE filter matrices stored
+in weights. The choice of filter matrix is determined
+by the fractional part of the source location.
+
+To compute dest[i,j] we do the following:
+
+ x = i * scale_x + x_offset;
+ y = i * scale_x + y_offset;
+ x_int = floor(x)
+ y_int = floor(y)
+
+ C = weights[SUBSAMPLE*(x - x_int)][SUBSAMPLE*(y - y_int)]
+ total  = sum[l=0..n_x-1, j=0..n_y-1] (C[l,m] * src[x_int + l, x_int + m])
+
+The filter weights are integers scaled so that the total of the
+weights in the weights array is equal to 65536.
+
+When the source does not have alpha, we simply compute each channel
+as above, so total is in the range [0,255*65536]
+
+ dest = src / 65536
+
+When the source does have alpha, then we need to compute using
+"pre-multiplied alpha":
+
+ a_total = sum (C[l,m] * src_a[x_int + l, x_int + m])
+ c_total = sum (C[l,m] * src_a[x_int + l, x_int + m] * src_c[x_int + l, x_int + m])
+ 
+This gives us a result for c_total in the range of [0,255*a_total]
+ 
+ c_dest = c_total / a_total
+ 
+
+Mathematical aside:
+
+The process of producing a destination filter consists
+of:
+
+ - Producing a continuous approximation to the source
+   image via interpolation. 
+
+ - Sampling that continuous approximation with filter.
+
+This is representable as:
+
+ S(x,y) = sum[i=-inf,inf; j=-inf,inf] A(frac(x),frac(y))[i,j] * S[floor(x)+i,floor(y)+j]
+
+ D[i,j] = Integral(s=-inf,inf; t=-inf,inf) B(i+x,j+y) S((i+x)/scale_x,(i+y)/scale_y)
+ 
+By reordering the sums and integrals, you get something of the form:
+
+ D[i,j] = sum[l=-inf,inf; m=-inf;inf] C[l,m] S[i+l,j+l]
+
+The arrays in weights are the C[l,m] above, and are thus
+determined by the interpolating algorithm in use and the
+sampling filter:
+
+                                       INTERPOLATE       SAMPLE
+ ART_FILTER_NEAREST                nearest neighbour     point
+ ART_FILTER_TILES                  nearest neighbour      box
+ ART_FILTER_BILINEAR (scale < 1)   nearest neighbour      box   (scale < 1)
+ ART_FILTER_BILINEAR (scale > 1)       bilinear           point  (scale > 1)
+ ART_FILTER_HYPER                      bilinear           box
+ 
+
+Pixel Functions
+===============
+
+typedef void (*PixopsPixelFunc) (guchar *dest, int dest_x, int dest_channels, int dest_has_alpha,
+				 int src_has_alpha, 
+                                 int check_size, guint32 color1, guint32 color2,
+				 int r, int g, int b, int a);
+
+The arguments here are:
+
+ dest: location to store the output pixel
+ dest_x: x coordinate of destination (for handling checks)
+ dest_has_alpha, dest_channels: Information about the destination pixbuf
+ src_has_alpha: Information about the source pixbuf
+
+ check_size, color1, color2: Information for color background for composite_color variant
+ 
+ r,g,b,a - scaled red, green, blue and alpha
+
+r,g,b are premultiplied alpha.
+
+ a is in [0,65536*255]
+ r is in [0,255*a]
+ g is in [0,255*a]
+ b is in [0,255*a]
+
+If src_has_alpha is false, then a will be 65536*255, allowing optimization.
+
+
+Line functions
+==============
+
+typedef guchar *(*PixopsLineFunc) (int *weights, int n_x, int n_y,
+				   guchar *dest, int dest_x, guchar *dest_end, int dest_channels, int dest_has_alpha,
+				   guchar **src, int src_channels, gboolean src_has_alpha,
+				   int x_init, int x_step, int src_width,
+				   int check_size, guint32 color1, guint32 color2);
+
+The argumets are:
+
+ weights, n_x, n_y
+
+   Filter weights for this row - dimensions weights[SUBSAMPLE][n_x][n_y]
+
+ dest, dest_x, dest_end, dest_channels, dest_has_alpha
+
+   The destination buffer, function will start writing into *dest and
+   increment by dest_channels, until dest == dest_end. Reading from
+   src for these pixels is guaranteed not to go outside of the 
+   bufer bounds
+
+ src, src_channels, src_has_alpha
+ 
+   src[n_y] - an array of pointers to the start of the source rows
+   for each filter coordinate.
+
+ x_init, x_step
+
+   Information about x positions in source image.
+
+ src_width - unused
+
+ check_size, color1, color2: Information for color background for composite_color variant
+
+ The total for the destination pixel at dest + i is given by
+
+   SUM (l=0..n_x - 1, m=0..n_y - 1) 
+     src[m][(x_init + i * x_step)>> SCALE_SHIFT + l] * weights[m][l]
+
+
+Algorithms for compositing
+==========================
+
+Compositing alpha on non alpha:
+
+ R = As * Rs + (1 - As) * Rd
+ G = As * Gs + (1 - As) * Gd
+ B = As * Bs + (1 - As) * Bd
+
+This can be regrouped as:
+
+ Cd + Cs * (Cs - Rd)
+
+Compositing alpha on alpha:
+
+ A = As + (1 - As) * Ad
+ R = (As * Rs + (1 - As) * Rd * Ad)  / A
+ G = (As * Gs + (1 - As) * Gd * Ad)  / A
+ B = (As * Bs + (1 - As) * Bd * Ad)  / A
+
+The way to think of this is in terms of the "area":
+
+The final pixel is composed of area As of the source pixel
+and (1 - As) * Ad of the target pixel. So the final pixel
+is a weighted average with those weights.
+
+Note that the weights do not add up to one - hence the
+non-constant division.
+
+
+Integer tricks for compositing
+==============================
+
+
+
+MMX Code
+========
+
+Line functions are provided in MMX functionsfor a few special 
+cases:
+
+ n_x = n_y = 2
+
+   src_channels = 3 dest_channels = 3    op = scale
+   src_channels = 4 with alpha dest_channels = 4 no alpha  op = composite
+   src_channels = 4 with alpha dest_channels = 4 no alpha  op = composite_color
+
+For the case n_x = n_y = 2 - primarily hit when scaling up with bilinear
+scaling, we can take advantage of the fact that multiple destination
+pixels will be composed from the same source pixels.
+
+That is a destination pixel is a linear combination of the source
+pixels around it:
+
+
+  S0                     S1
+
+
+
+
+
+       D  D' D'' ...
+
+
+
+
+  S2                     S3
+
+Each mmx register is 64 bits wide, so we can unpack a source pixel
+into the low 8 bits of 4 16 bit words, and store it into a mmx 
+register.
+
+For each destination pixel, we first make sure that we have pixels S0
+... S3 loaded into registers mm0 ...mm3. (This will often involve not
+doing anything or moving mm1 and mm3 into mm0 and mm1 then reloading
+mm1 and mm3 with new values).
+
+Then we load up the appropriate weights for the 4 corner pixels
+based on the offsets of the destination pixel within the source
+pixels.
+
+We have preexpanded the weights to 64 bits wide and truncated the
+range to 8 bits, so an original filter value of 
+
+ 0x5321 would be expanded to
+
+ 0x0053005300530053
+
+For source buffers without alpha, we simply do a multiply-add
+of the weights, giving us a 16 bit quantity for the result
+that we shift left by 8 and store in the destination buffer.
+
+When the source buffer has alpha, then things become more
+complicated - when we load up mm0 and mm3, we premultiply
+the alpha, so they contain:
+
+ (a*ff >> 8) (r*a >> 8) (g*a >> 8) (b*a >> a)
+
+Then when we multiply by the weights, and add we end up
+with premultiplied r,g,b,a in the range of 0 .. 0xff * 0ff,
+call them A,R,G,B
+
+We then need to composite with the dest pixels - which 
+we do by:
+
+ r_dest = (R + ((0xff * 0xff - A) >> 8) * r_dest) >> 8
+
+(0xff * 0xff)