The traditional method of doing structural analysis and design uses Â¯nite element based
numerical analysis programs. While this approach works well for many problems, it does
not parallelize eÂ±ciently on massively parallel processors (MPPs), thus limiting the size and
complexity of the designs that can be analyzed and optimized. A new approach is needed
that works well on MPPs. This method need not be faster than those currently used for
serial machines on problems that do not exhaust the machines' resources; rather it needs
to allow each processor of a MPP enough useful work such that large problems beyond the
resources of serial or moderately parallel machines can be solved in acceptable times.
Cellular automata (CA) were used at least as early as 1946 by Weiner and Rosenblunth
 to describe the operation of heart muscle, even though their use was not computationally
feasible at the time. CA tiles a problem domain into cells of equal size. Each cell has
the same set of simple rules that dictate how it behaves and interacts with its neighboring
cells. The principle is that an overall global behavior can be computed by a group of cells
that only know local conditions . If each cell only needs to know local conditions, then
this minimizes the communication requirements and therefore the problem scales well on a
MPP. A CA is the archetypical algorithm for the SIMD parallel architecture .
A cellular automaton is a discrete dynamical system . It is discrete in the sense
that space and time are discrete. Each cell is a Â¯xed point in a regular lattice. The state of
each cell is updated at discrete time steps, based upon conditions in previous time steps.
All of the cells are updated every time step, thus the state of the entire lattice is updated
every time step.
In general, CA are used to simulate the dynamic behavior of physical systems, and
have been used successfully to represent a variety of phenomena such as diÂ®usion of gaseous
systems, solidiÂ¯cation and crystal growth in solids, and hydrodynamic Â°ow and turbulence
. CA has also recently been used in conjunction with genetic algorithms to derive the
rules required at each cell to perform structural analysis . CA rules have recently been
devised for the simultaneous analysis and design of simple two-dimensional structures [4,
11]. That work is the basis for this thesis. In the case of structural design, the intention
is to describe a static equilibrium of a structure under a system of forces acting on it.
In this sense, time is not being simulated, rather each step of the automaton is used to
propagate (local) stresses and strains through a structure to allow it to reach equilibrium
state while simultaneously determining the shape and/or dimensions of the cells associated
with this equilibrium state. This is continued until the entire process converges (ideally)
to a global state where there is no signiÂ¯cant change in the structure for every subsequent
iteration, corresponding to a static equilibrium state. Note that analysis and design are
done simultaneously and locally by each cell.
This thesis will begin by describing the CA used to do structural design on ground
trusses in Chapter 2 and give an example. Chapter 3 will discuss the merits of two diÂ®erent
iteration methods. Chapter 4 will explain how the CA is parallelized, and Â¯nally Chapter
5 will discuss some preliminary results.
2. Method Description
The basic elements of the structural design CA consist of the division of the problem
into cells and the three types of rules that can operate on those cells. Each of the rules
operates on the cells using the information in a Moore neighborhood, which consists of the
surrounding nine cells. The Â¯rst set of rules are used to do analysis only, determining the
stresses and strains in each cell. The second set of rules does the design work, changing
the areas of the connecting beams to withstand the stresses. The Â¯nal set of rules performs
simultaneous analysis and design.
2.1 Domain DeÂ¯nition
Each cell of this CA is an eight-beam truss where each beam starts at the center of
the cell and connects to its opposite member in an adjacent cell as illustrated in Figure 1.
Figure 1. Example CA domain with a single cell denoted by the dashed line.
This type of structure is known as a ground truss. Those cells which fall on the border
of the rectangular domain are not partial cells requiring special rules, but are complete
cells with the area of the beams that fall outside the computational domain set to zero. In
addition, they are connected to an invisible set of surrounding cells that are turned oÂ® and
that also have all of their beam areas set to zero. Cells that are turned oÂ® are not part
of the computation, being used only to make the rules for the border and non-border cells
consistent, since the stress analysis rules require the displacements of all eight surrounding
The actual border of the computational domain of the CA need not be rectangular.
Any shape can be deÂ¯ned for the truss by turning oÂ® any cells that are not within the
computational domain, as illustrated in Figure 1. A simple method to deÂ¯ne a shape for
the truss is to deÂ¯ne an enclosing polygon, and then turn oÂ® every cell that does not fall
within the polygon. A more sophisticated method could be used to allow for holes, circular
regions, or other, nonstraight boundaries.
The \edge crossings" algorithm  to determine those points within the polygon can
be used; it is simple and parallelizes well. From each point, a ray is cast, usually in the
horizontal direction, and the number of edges that it crosses are counted. If there are an
odd number of edges, then the point is within the polygon. If there are an even number of
edges, then the point is outside the polygon. Special care must be taken for those points
that are on the edge, or where the ray happens to intersect a vertex. A Fortran 90 module
implementing this is included in Appendix A.3.
As seen in Figure 2, only those polygons that are composed of lines with slopes 0,
1, Â¡1, or 1 will be represented exactly. This is the same aliasing problem that bitmaps
face. A better resolution can be obtained by decreasing the cell size in the domain, thereby
increasing the number of cells that form the shape. This is the same as smoothing the
outline of a bitmap by increasing the number of pixels that form the bitmap. The amount
by which the original cells have been subdivided to increase the resolution is known as the
cell density factor (CDF).
Figure 2. Example CA domain with a non-rectangular computational domain.
2.2 CA Rules
There are two sets of rules used to compute an optimal solution to a given structural
problem. Optimal solution in this sense means a set of truss beams with the minimum size
required to withstand the applied forces.
The Â¯rst set of rules is (normally) executed at every iteration to determine the strains
in each cell. The cell attempts to reach equilibrium with the surrounding cells by displacing
itself to minimize the potential energy.
Within a cell, each truss member (indexed relative to the cell by k = 1; : : : ; 8) has
an elastic modulus E, length Lk, a cross-sectional area Ak, and an orientation angle Âµk
from the cell center. Denote the displacement of the kth truss member's near end from
the original cell center by (u; v), and the displacement of the far end from the neighboring
cell's center by (uk; vk). These neighboring displacements (uk; vk) are taken as Â¯xed when
the CA calculates the displacements (u; v) for each cell. The extension Â¢k, strain "k, and
force Fk within each member are calculated from these properties and displacements by
Â¢k = (uk Â¡ u) cos Âµk + (vk Â¡ v) sin Âµk; (1)
"k = Â¢k=Lk; (2)
Fk = EAk"k: (3)
Taking into account the applied external force (Fx; Fy), the total (internal strain plus
external) potential energy V for a cell is given by
Â¡ Fxu Â¡ Fyv: (4)
Setting the partial derivatives of the potential energy with respect to the cell displacements
to zero gives the equilibrium equations
= 0: (5)
In general this is a system of two equations with two unknowns. If there is an (externally)
applied displacement along a single axis, then (5) reduces to a single equation with one
unknown, and if there are (externally) applied displacements along both axes, then there is
nothing to solve. The forces acting upon the cell may be computed for reference, but this
is not needed for the overall computation.
2.2.2 Beam Sizing
Designing the structure requires resizing the beams in the cells. If displacements have
already been calculated, as in Chapter 2.2.1 for example, then some scheme for changing
the cross sectional areas Ak is required. In terms of allowable stress Â¾allow, which is chosen
by the user as the maximum stress that any given beam should endure, one scheme for
computing a new cross sectional area Anew
k , based upon the previous cross sectional area
k , is
k : (6)
If the displacement calculation and sizing are done sequentially, the sizing period
(how often sizing is done) depends on many factors: the number of cells in the domain,
the locations and relative placements of the applied forces and displacements, the iteration
method (Jacobi vs. Gauss-Seidel), and for Gauss-Seidel implemented in parallel, the number
of processors used.
The last two items will be discussed in Chapters 3 and 5.
3. Iteration Methods
Each cell depends on the displacements of the surrounding cells to calculate its own
displacement, thereby propagating the stresses and strains. This is repeated until the
structure no longer changes appreciatively, at which point it is said to have converged.
The convergence criterion for the displacement of a cell is deÂ¯ned by the condition that
the current change in displacement is a small fraction or percentage (usually 10Â¡6) of the
maximum displacement within the structure. The sizing convergence criterion is analogous.
For an entire structure to be considered converged with respect to displacement or sizing,
every cell within that structure must meet the convergence criterion for that update rule.
One method of implementing a CA is to keep two copies of the array of cells, one to
represent time t, and the other to represent time t + 1. The values for the cells at t + 1
are calculated from the cells at t. At the end of this iteration, the labels of the arrays are
swapped, and the process is repeated. This is a Jacobi iteration, where all of the new values
are calculated from the old values.
For any process that converges, using a Jacobi iteration method can be ineÂ±cient .
By using a Gauss-Seidel iteration method, where new values are calculated using updated
values, the process should converge using fewer iterations. This means that only one copy
of the array is kept. When a new displacement is calculated for one cell, then the next
adjacent cell will use that updated value when calculating its own displacement. Note that
this does not apply to the sizing rules since, as deÂ¯ned, their application is independent of
the information in the surrounding cells.
Consider the problem of a simple bridge truss. The Â¯rst image in Figure 3 shows a
CA with six cells. The bottom two corner cells have an applied displacement of (0,0) so
they are Â¯xed in place. The bottom middle cell has an applied force of 100kN downward.
The width of the bridge is 50 meters and the height is 25 meters. The bars are composed
Figure 3. Simple bridge truss, before and after running CA.
Figure 4. Bridge truss with same dimensions, smaller cells, converged.
of medium steel (E = 200GPa and Â¾allow = 250MPa). Each beam has an initial area of
Running the CA on the bridge problem using the Gauss-Seidel iteration method for
displacements and applying the sizing rules every sixth iteration until it converges at iteration
253, the result shown in the second image of Figure 3 is obtained. Since the bridge is 50m
across and the steel beams are no more than a few cm thick, the areas in this view are
exaggerated by a factor of 3000 to show the diÂ®erences in the beam sizes.
The bridge in Figure 3 is only composed of eleven trusses, and the solution could easily
have been computed by hand. But if each beam is required to be less than 25m long, the
complexity of the problem rises. Figure 4 shows a problem of the exact same dimensions,
where each cell is 40 times smaller than previously. Each horizontal and vertical beam is
0:625m long, rather than 25m.
3.2 Convergence Analysis
To analyze the eÂ±cacy of the iteration method used, it is useful to transform the CA
into an equivalent system of linear equations. Recall from Chapter 2.2.1 that each cell is
computing its position (u; v) based upon the position of the surrounding cells. If each cell
were assigned unique variables for its position, such that cell 1 has u1 and v1, cell 2 has
u2 and v2, and so forth, then the equations for each cell can be expressed in terms of the
variables for the surrounding cells. For a CA structure composed of 6 cells, this will form
a linear system of 12 equations and 12 unknowns.
This standard system of linear equations,
Ax = b; (7)
can be solved by the Jacobi and Gauss-Seidel Â¯xed-point iteration methods or block versions
thereof, which are the exact mathematical formulations of the local cell calculations. For
the Jacobi, A is split into its strictly 2£2 block lower triangular (L), 2£2 block diagonal
(D), and strictly 2 £ 2 block upper triangular (U) parts,
A = L + D + U: (8)
The system is then rewritten as a Â¯xed point iteration where the next iterate x(n+1) is
computed from the previous iterate x(n) via
x(n+1) = Bx(n) + C; (9)
B = Â¡DÂ¡1(L + U); C = DÂ¡1b: (10)
CDF n Jacobi Gauss-Seidel
1 8 0.794104 0.611503
2 26 0.949748 0.8982
3 52 0.979368 0.959006
4 86 0.989496 0.978893
5 128 0.993801 0.987605
6 178 0.995959 0.991925
7 236 0.997181 0.994365
8 302 0.997933 0.995865
9 376 0.998426 0.996853
10 458 0.998765 0.997532
Table 1. Spectral radius of the bridge truss for various CDFs.
Note that Ax = b if and only if x = Bx + C, assuming DÂ¡1 exists. For Gauss-Seidel
the iteration x(n+1) = Bx(n) + C has
B = Â¡(D + L)Â¡1U; C = (D + L)Â¡1b; (11)
assuming (D + L)Â¡1 exists.
The Â¯xed point iteration (9) converges for any starting point x(0) if and only if all of
the eigenvalues of B are less than one in absolute value . The maximum absolute value
of the eigenvalues for a matrix is called the spectral radius. The spectral radius for the
bridge structure at various CDFs is shown in Table 1. The n column denotes the number
of equations and number of unknown variables being solved for a given CDF.
This table shows that the Jacobi or Gauss-Seidel CA iteration for analysis does converge,
but extremely slowly. For larger CDFs, the improvement of Gauss-Seidel over Jacobi is
marginal. Even with massive parallelism, any competitive advantage of CA (over solving
the linear system with standard iterative numerical methods) must come by combining
analysis with sizing.
Aitken's Â±2 method  was also explored. This method uses the (scalar) values of three
successive iterations to extrapolate a value (hopefully) closer to the Â¯xed-point value. This
method requires that (for scalar values xn)
Â¼ A; (12)
where jAj < 1. However, this requirement was not met by the components x(n)
i of the vector
iteration (9), and therefore Aitken's Â±2 acceleration method was not applicable.
4. Parallel Implementation
The code for this CA was implemented in Fortran 90 using the Message Passing Interface
(MPI) library as its parallel communication mechanism. It has been tested on both an
Origin 2000 with 64 processors and a Beowulf cluster with 32 processors.
A parallel decomposition was performed by dividing the computational domain into
vertical strips and assigning each strip of contiguous cells to a single processor. Each strip
has an additional column of border cells on either side that are turned oÂ®. These border
cells represent the connected cells located on the adjacent processors. At every iteration, a
processor computes the updated values for its cells, and then exchanges its left and right
columns with its neighbors. These updated values are stored in the border cells and used
for the next iteration. Therefore the natural communication topology is a ring topology,
which easily maps into most other communication topologies.
Stripped partitioning works well for a rectangular shaped domain because it provides a
good balance of computation to communication. It does have some limitations, for example,
given a domain size of N rows £ M cols, there can be at most M processors; if M < N then
the lattice can be trivially rotated. For those problems with an irregular shape, there will
not be the same balance of computation to communication at every processor. For these
cases, a more eÂ±cient partitioning method (e.g., graph-based partitioning) should be used.
When implementing the Gauss-Seidel iteration method in parallel, no attempt is made to
keep the order in which the cells are updated the same as in the non-parallel implementation.
Instead, each processor iterates over its collection of cells as if it were the sole processor
operating on the domain, where the domain consists of its assigned cells, plus a group of
surrounding \dead" cells that are not computed. The \dead" cells are used to contain the
updated values from the adjacent processors that are sent at the end of every iteration.
Since the Gauss-Seidel iteration is contained solely within each processor, the rate of
convergence diÂ®ers depending upon the number of processors used. This has an eÂ®ect upon
the stability of the calculation as will be seen in Chapter 5.1. On the other hand, the
programming task is much easier since, except for the initial setup and the communication
at the end of every iteration, the program is exactly the same as the Gauss-Seidel iteration
for a single processor.
5.1 Jacobi vs. Gauss-Seidel
When comparing the performance of Jacobi and Gauss-Seidel iteration methods it is
useful to look at the number of iterations it takes the displacements (without sizing) to
converge using a single processor. Figure 5 compares the number of iterations to the vertical
displacement of the mid-span of the bridge truss with a CDF (cell density factor) of 16. The
mid-span will have the largest displacement for any correct solution to the problem because
it has the only externally applied force. Figure 5 shows that it takes 12,808 iterations to
converge with the Jacobi method and only 6,723 iterations using the Gauss-Seidel.
0 2500 5000 7500 10000 12500
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Verticle Displacements at Mid-Span
Figure 5. Comparison between Jacobi and Gauss-Seidel iteration
methods on the bridge structure with a CDF of 16.
The speed of convergence for a structural analysis CA is aÂ®ected by more than just the
iteration method if sizing rules are used as well. Figure 5 also has an expanded view of the
Â¯rst 20 iterations that includes sizing rules applied every four iterations. Note that when
using the Jacobi method with sizing, the maximum displacement diverges away quickly
0 5 10 15 20 25 30
CDF = 16
Number of Processors
Figure 6. Comparison between Jacobi and Gauss-Seidel iteration methods for
analysis only on the bridge structure with a cell density factor of 16.
from the maximum displacement using analysis only. In fact, for this sizing period, the CA
is non-convergent. For the Gauss-Seidel method with sizing applied every n iterations, the
CA is usually more stable and converges quicker.
As mentioned previously, when using a Gauss-Seidel iteration, the rate of convergence
diÂ®ers depending upon the number of processors used. As shown in Figure 6 for analysis
with no sizing, the number of iterations needed to converge increases as the number of
processors increases. As the number of processors increases, the smaller the number of cells
each processor contains, and therefore the less area each stress and strain can propagate each
iteration. Intuitively, this continues until each processor has exactly one cell and (parallel)
Gauss-Seidel iteration is exactly the same as Jacobi.
5.2 Sizing Period Using Gauss-Seidel
As seen in the previous section, the choice of how often to apply sizing rules is very
important to the speed and the stability of the CA. Since the design equations allow the
areas of the bars to adjust fully to the surrounding stresses and strains, it is possible that
0 10 20 30 40 50
Figure 7. Convergence data using undamped sizing with Gauss-Seidel iteration.
0 10 20 30 40 50
Figure 8. Convergence data using damped sizing with Gauss-Seidel iteration.
if a sizing is performed before all the stresses and strains have propagated, bars that are
important to the Â¯nal structure could be allowed to disappear too soon. In this case, the
CA will not converge within a reasonable time, or even oscillate and never converge.
Figure 7 shows the bridge structure CA convergence rates in a density plot as a function
of the sizing period and the cell density factor. The gray tones represent the number of
iterations the CA with sizing needed to converge normalized with respect to the number of
iterations needed for convergence with analysis only. The range goes from one (white) to
three (dark grey), with black squares indicating those combinations that did not converge.
Therefore the lighter the gray, the faster the CA converged.
Since the CA tends to converge faster with smaller sizing periods it is better to choose
a period as small as possible, but if the period is too small then convergence might never
be achieved. To allow smaller sizing periods to be chosen, damping (in terms of the rate in
which the cross-sectional areas can change) was applied to the design equations. Damping
the sizing function limits the area to be within a certain percentage of the current value
of the bar area. Figure 8 shows the results for the same problem as in Figure 7 with 10%
In this case, far more combinations of CDFs and sizing periods converged than previously,
especially when using smaller sizing periods. However, more iterations were needed for each
computation to converge than for sizing without damping. Thus there is a tradeoÂ® between
speed of convergence and robustness of the code.
5.3 Parallel Speedup
Execution time for the Fortran 90 code was measured for the Gauss-Seidel version of
the code since it seemed to be the most stable version. Timing runs were done using a Â¯xed
number of iterations that are below the number of iterations required for full convergence.
This was to eliminate the vagaries of convergence using Gauss-Seidel on diÂ®erent numbers
of processors. Parallel overhead was measured by taking a time stamp at the beginning
and end of every communication between processors at a single processor. The cumulative
time diÂ®erence is the parallel overhead needed to pack the information, send it to adjacent
processors, wait for the updates from the other processors, and then Â¯nally unpack the
The communication is using blocking MPI calls to aid in measurement, therefore the
computation time is the overall time minus the parallel overhead. The design easily allows
for a more eÂ±cient implementation overlapping the computation and communication by
using non-blocking MPI calls.
Figure 9 shows the timing results for an Origin 2000 using shared memory for the
communication channels. The data shown in Figure 9 is the average of 5 runs, and the
standard deviations ranged between 0.11 and 1.19. In this case, communication overhead
does not begin to dominate until about 64 processors are used.
Figure 10 shows the timing results for a Linux Beowulf system using 100Mbit Ethernet
between processor nodes. The data shown is the average of 5 runs, and the standard
deviations ranged between 0.29 and 0.46. Here the communication overhead dominates
from the beginning. The theoretical curve is the ideal speedup, serial time divided by the
number of processors.
8 16 24 32 40 48 56 64
Number of Processors
Figure 9. Timings on an Origin 2000 machine for the bridge structure
with a CDF of 40 for 20,000 iterations.
8 16 24 32
Number of Processors
Figure 10. Timings on a Beowulf cluster for the bridge structure
with a CDF of 40 for 20,000 iterations.
Cellular automata techniques can be applied to structural design and allow eÂ±cient
use of MPPs. This potentially allows problems of far greater complexity to be solved in a
reasonable time. The technique is also easy to implement and is versatile in design of truss
topologies. Slow convergence and divergence of the CA is mathematically explained by the
spectral radius of the iteration matrix, for analysis only, but adding sizing makes the Â¯xed
point iteration x(n+1) = F(x(n)) nonlinear. A topic for future work is the mathematical
analysis of the full nonlinear iteration. Future work could also include extending the method
to three dimensions and to creating more types of structures other than ground trusses.