First, the weights and inverse Hilbert coordinates for each object are determined. If the objects do not have weights, unit weights are assigned. If the objects have multiple weights, only the first weight is currently used. The smallest axis-aligned box is found that contains all of the objects using their two or three dimensional spatial coordinates. This bounding box is slightly expanded to ensure that all objects are strictly interior to the boundary surface. The bounding box is necessary in order to calculate the inverse Hilbert Space Filling curve coordinate. The bounding box is used to scale the problem coordinates into the [0,1]^d unit volume (d represents the number of dimensions in the problem space.) The inverse Hilbert coordinate is calculated and stored as a double precision floating point value for each object. This code works on problems with one, two or three dimensions (the 1-D Inverse Hilbert coordinate is simply the problem coordinate itself, after the bounding box scaling.)
The algorithm seeks to cut the unit interval into P segments containing equal weights of objects associated to the segments by their inverse Hilbert coordinates. The code allows a user vector to specify the desired fraction of the total weight to be assigned to each interval. Note, a zero weight fraction prevents any object being assigned to the corresponding interval. The unit interval is divided into N bins, N=k(P-1)+1, where k is a small positive constant.) Each bin has an left and right endpoint specifying the half-open interval [l,r) associated with the bin. The bins form a non-overlapping cover of [0,1] with the right endpoint of the last bin forced to include 1. The bins are of equal size on the first loop. (Hence each interval or part of the partition is a collection of bins.)
For each loop, an MPI_Allreduce call is made to globally sum the weights in each bin. This call also determines the maximum and minimum (inverse Hilbert) coordinate found in each bin. A greedy algorithm sums the weights of the bins from left to right until the next bin would cause an overflow for the current part. This results in new partition of P intervals. The location of each cut (just before an "overflowing" bin) and the size of its "overflowing" bin are saved. The "overflowing" bin's maximum and minimum are compared to determine if the bin can be practically subdivided. (If the bin's maximum and minimum coordinates are too close relative to double precision resolution, the bin can not be practically subdivided.) If at least one bin can be further refined, then looping will continue. In order to prevent a systematic bias, the greedy algorithm is assumed to exactly satisfy the weight required by each part.
Before starting the next loop, the P intervals are again divided into N bins. The P-1 "overflow" bins are each subdivided into k-1 equal bins. The intervals before and after these new bins determine the remaining bins. This process maintains a fixed number of bins. No bin is "privileged." Specifically, any bin is subject to later refinement, as necessary, on future loops.
The loop terminates when there is no need to further divide any "overflow" bin. A slightly different greedy algorithm is used to determine the final partition of P intervals from the N bins. In this case, when the next bin would cause an overflow, the tolerance is computed for both underfilling (excluding this last bin) and overfilling (including the last bin). The tolerance closest to the target tolerance is used to select the dividing point. The tolerance obtained at each dividing point is compared to the user's specified tolerance. An error is returned if the user's tolerance is not satisfied at any cut. After each cut is made, a correction is calculated as the ratio of the actual weight to the target weight used up to this point. This correction is made to the target weight for the next part. This correction fixes the subsequent parts when a "massive" weight object is on the border of a cut and its assignment creates an excessive imbalance.
Generally, the number of loops is small (proportional to log(number of objects)). A maximum of MAX_LOOPS is used to prevent an infinite looping condition. A user-defined function is used in the MPI_Allreduce call in order to simultaneously determine the sum, maximum, and minimum of each bin. The message length in the MPI_Allreduce is proportional to the P, the number of parts.
Note, when a bin is encountered that satisfies more than two parts, that bin is refined
into a multiple of k-1 intervals which maintains a total of N bins.
 
The inverse transformation is computed by taking the highest order bit from each spatial coordinate and packing them together as 2 or 3 bits (as appropriate to the dimensionality) in the order xyz (or xy) where x is the highest bit in the word. The initial state is 0. The data table lookup finds the value at the column indexed by the xyz word and the row 0 (corresponding to the initial state value.) This data are the 3 (or 2) starting bits of the Hilbert coordinate. The next state value is found by looking up the corresponding element of the state table (xyz column and row 0.)
The table procedure continues to loop (using loop counter i, for example) until the required precision is reached. At loop i, the ith bits from each spatial dimension are packed together as the xyz column index. The data table lookup finds the element at column xyz and the row determined by the last state table value. This is appended to the Hilbert coordinate. The state table is used to find the next state value at the element corresponding to the xyz column and row equal to the last state value.
The inverse transformation is analogous.  Here the 3 (or 2 in the 2-d case) bits of the
Hilbert coordinate are extracted into a word.  This word is the column index into the
data table and the state value is the row.  This word found in the data table is
interpreted as the packed xyz bits for the spatial coordinates.  These bits are
extracted for each dimension and appended to that dimension's coordinate. The corresponding
state table is used to find the next row (state) used in the next loop.
 
The Point Assign function now works for any point in space, even if the point is
outside the original bounding box. If the point is outside the bounding box, it is first
scaled using the same equations that scale the interior points into the unit volume.
The point is projected onto the unit volume. For each spatial dimension, if the scaled
coordinate is less than zero, it is replace by zero. If it is greater than one, it is
replaced by one.  Otherwise the scaled coordinate is directly used.
 
The query functions decompose the unit square (or cube) level by level like the Octree method. Each level divides the remaining region into quadrants (or octets in 3d). At each level, the quadrant with the smallest inverse Hilbert coordinate (that is, occurring first along the Hilbert curve) whose inverse Hilbert coordinate is equal or larger than the starting inverse Hilbert coordinate and which intersects with query region is selected. Thus, each level calculates the next 2 bits (3 bits in 3d) of the inverse Hilbert coordinate of the next point to enter the query region. No more than once per call to the query function, the function may backtrack to a nearest previous level that has another quadrant that intersects the query region and has a higher Hilbert coordinate.
In order to determine the intersection with the query region, the next 2 bits (3 in 3 dimensions) of the Hilbert transformation are also computed (by table lookup) at each level for the quadrant being tested. These bits are compared to the the bits resulting from the intersection of the query region with the region determined by the spatial coordinates computed to the precision of the previous levels.
If the user query box has any side (edge) that is "too small" (effectively degenerate in some dimension), it is replaced by a minimum value and the corresponding vertex coordinates are symmetrically expanded. This is refered to as a "fuzzy" region.
This function requires the KEEP_CUTS parameter to be set by the user. The Box Assign function now works for any box in space, even if it has regions outside the original bounding box. The box vertices are scaled and projected exactly like the points in the Point Assign function described above. However, to allow the search to use a proper volumn, projected points, lines, and planes are converted to a usable volume by the fuzzy region process described above.
This algorithm will work for any space filling curve.  All that is necessary is to
provide the tables (derieved from the curve's state transition diagram) in place of
the Hilbert Space Filling Curve tables.
 
The parameters used by HSFC and their default values are described in the HSFC section of the Zoltan User's Guide. These can be set by use of the Zoltan_HSFC_Set_Param subroutine in the file hsfc/hsfc.c.
When the parameter REDUCE_DIMENSIONS 
is specified, the HSFC algorithm will perform  lower dimensional
partitioning if the geometry is found to be degenerate.  More information
on detecting degenerate
geometries may be found in another 
section.
 
The main routine for HSFC is Zoltan_HSFC in the file hsfc/hsfc.c.