# Powerful Low Resolution Structure Determination Method Biology Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The knowledge of the three dimensional structures of macromolecules is essential in understanding how they function which plays a key role in molecular biology [1]. However, genome sequencing project has cause a sharp increase in the amount of known protein sequences, consequently making the gap between known sequences and known structures larger [2]. Traditional high-resolution structure determination methods such as X-ray crystallography and solution NMR have long been applied to solve structure of many proteins at the atomic basic. But there are some limitations. In X-ray crystallography, how to obtain sufficient large and high quality single crystals has become a bottleneck. Nowadays, the costly and time-consuming screening still seems to be the only way to search for the crystallization conditions. Similarly, in NMR, there also exists a bottleneck. The larger the molecular sizes are, the lower the probability of success or the accuracy is.

Small angle scattering (SAS) of X ray (SAXS) and neutrons (SANS) is a powerful method for structure analysis of condensed matter [3]. In the field of molecular biology, it can provide low-resolution (1-2 nm) information about the overall structure and structural transitions of native biological macromolecules in solutions. The data is not sufficient to get the secondary structure or the backbone of the protein, but the precision to determine the quaternary which is in large scale is very high. A comparison among the frequently used structure determination methods is listed in Table 1.

Dating back to late 1930s, the first successful experiment on SAXS was performed by Guinier and Fournet [4]. They pointed out that not only the information on overall sizes and shapes of particles but also that on the internal structure of disordered and partially ordered systems was obtained. This method became increasingly important in the field of molecular biology in the 1960s because of its availability of getting information on the overall shape and internal structure in the absence of crystals, though at low resolution. In the 1970s, the development of synchrotron radiation and neutron source had brought the breakthrough in SAS. The experiment required less investment in time and effort. And it allowed one to investigate intermolecular interactions including assembly and large scale conformational changes. However, in the 1980s, the interest in SAS on studying biomolecules declined as other structural methods developed. One thing worth of our comfort was the time-resolved measurements under synchrotron radiation in studying polymer, which had a great impact [5], because the results of SAS seemed to be sufficient to solve most of the structural problems in polymer systems. The introduction of some advanced SAS data analysis methods owing to the great computing power brought another breakthrough in the 1990s. These methods included efficient ab initio data interpretation methods based on spherical harmonics, global minimization algorithms and rigid body refinement. They were also benefited from advances in instrumentation, especially the third generation of synchrotron radiation sources which allowed the time-resolved measurement in studies of protein and nucleic acid folding.

Table 1 Comparison among frequently used structure determination methods [3]

Methods

Sample

Advantages

Limitations

X Ray Crystallography

Crystals

Very high resolution (up to 0.1 nm)

Revealing detail at atomic level

Crystal with high quality required

Flexible structures are not seen

Structure may be influenced during crystal packing

NMR

Dilute solutions (5-10mg ml-1)

High resolution (0.2-0.3 nm) in solution

The larger the molecular sizes are, the lower the probability of success or the accuracy is

Small Angle Scattering

Dilute and semi-dilute solutions (1-100 mg ml-1)

Analysis of structure, kinetics and interactions in nearly native conditions. Study of mixtures and non-equilibrium systems

Wide MM range (few kDa to hundreds MDa)

Low resolution (1-2 nm)

Require information to resolve ambiguity in model building

Cryo-EM

Frozen very dilute solutions (<1 mg ml-1)

Low amount of material

Direct visualization of particle shape and symmetry

Low resolution (about 1 nm)

Only for MM larger than 200 kDa

Static and dynamic light scattering, ultracentrifugation

Very dilute solutions (<1 mg ml-1)

Non-destructive

Low amount of material

Simplicity of the experiment

Yield overall parameters only

## Basic principle on SAS

Differences between SAXS and SANS [6]

According to electromagnetic theory, charged particles like electrons will emit electromagnetic radiation when they have acceleration. And if the cause of acceleration is electromagnetic wave, the radiation is regarded as elastic scattering. For an observer located at r, the electric field E(r, t) at time t is given by Eq. (2.1), where Ψ is the angle between the direction of polarization and the observer's line of sight.

(2.1)

From the above equation, we know that the electric field is decided by three factors: (i) the Thompson radius of electron, r0 = e2/mc2=2.82-10-15 m, where e and m refers to the charge and mass of electron and c is the velocity of light in vacuum, (ii) a geometrical factor which is corresponded to the location r and the angle Ψ, (iii) a frequency factor, decided by the natural frequency and the incident radiation frequency. If the incident radiation is X ray or neutron sources, ω0<<ω, then the frequency factor equals to -1, so Eq. (2.1) can be simplified to be Eq. (2.2).

(2.2)

And in practice, what in an experiment we detect are the intensity I and the scattering angle 2θ. So the intensity of the scattering for an incident beam I0 is:

(2.3)

From Eq. (2.3), it is clear that when the scattering angle is small (less than 5 degrees), cos(2θ) equals to 1. So is inverse proportion to r2. That is why we test under small angle.

The physical mechanisms of elastic X ray and neutron scattering by matter is fundamentally different, but they can be treated by the same mathematics formalism. However, they do have some differences. First of all, the properties of the radiation sources are different. For an X ray radiation, it is consisted of photons with no mass. It is an electromagnetic wave. The wavelength λ is relatively short (about 0.10-0.15 nm). For a neutron sources, it is quite different. The wavelength λ is given by de Broglie relationship (the so-called wave-particle duality). The wavelength is longer (about 0.20-1.0 nm). Secondly, the interacted objects are different. X ray interacts with electrons. If the amplitude of the scattering wave is described by the scattering length f, the scattering length hard X ray interacting with electrons fx equals to Ner0, where Ne is the number of electrons. That means the scattering length depends on only the number of electrons, but not the wave length. On the other hand, the neutrons interact with the nucleus and the scattering length consists of two parts, fn = fp + fs. fs corresponds to the neutron spins and it can always be regarded as the background. And fp does not increase with atomic number but is sensitive to the isotopic content. This provides an effective tool to give more information after pre-deuteration of the molecules. And as is shown in Table 2, neutrons are more sensitive to lighter atoms while X ray prefers heavy atoms.

Table 2 X-ray and neutron scattering lengths of some elements [3]

Atom

H

D

C

O

P

Au

Atomic mass

1

2

12

16

30

197

N electrons

1

1

6

8

15

79

FX, 10-12 cm

0.282

0.282

1.69

2.16

3.23

22.3

FN, 10-12 cm

-0.374

0.667

0.665

0.580

0.510

0.760

Scattering in biomolecule solutions

In order to describe the scattering, it is convenient to introduce the scattering length density distribution ρ(r) equal to the total scattering length of the atoms per unit volume. It is represented at any point r inside the solute particles by Eq. (2.4).

(2.4)

In the equation, ρc(r) represents the shape of the particle and has a value of 0 outside and 1 inside the particle. ρs(r) corresponds to the fluctuation of the scattering density around the average. ρb refers to the uniform density of the solvent. This difference between ρp and ρb is called the contrast ρ. The integration over the whole particle volume A(s) is the scattering amplitude.

(2.5)

From the above equation, we can see that the amplitude contains both the contributions of the shape Ac(s) and the internal structure As(s), which are independent. As the measurements are accomplished in solution form, the location and orientation of the solute particles are random. So the intensity we get is the spherical average, and the intensity in direction s is given by Eq. (2.6).

(2.6)

The first two terms are contrast-dependent, representing the shape of the solute and playing a role only in small angle. The last term is contrast-independent and it corresponds to the internal structure.

What we discuss above is all in a dilute solution without any interactions between solutes. However, in a semi-dilute solution, some correlations have to be made. Thus the scattering intensity can be written as IS(s) = I (s) - S(s), where S(s) represents the particle interactions. Therefore, SAS can be used to not only determine the overall shape but also investigate the interactions.

Monodisperse systems [7]

Using Eq. (2.6), the scattering intensity I(s) can be rewritten in integration form (Eq. (2.7)),

(2.7)

Taking into account that <exp(isr)>Ω=sin(sr)/sr and integrating in spherical coordinates,

(2.8)

where

(2.9)

is the average autocorrelation function of the excess scattering density.

And p(r) = r2γ(r) is the distribution function of distances. From Eq. (2.8), we can calculate p(r) by the inverse Fourier transformation.

(2.10)

From McLaurin expansion, we know that sin (sr)/sr = 1 − (sr)2/3! + · · ·. So, near s=0, Eq. (2.8) can become

(2.11)

where I(0) is forward scattering

(2.12)

and Rg is the radius of gyration

(2.13)

Eq. (2.11) is first derived by Guiner, and it is the most useful tool at the first stage of data analysis. In principle, the Guinier plot (ln[I(s)] versus s2) is a linear function, and I(0) and Rg can be extracted from the y-axis intercept and the slope of the linear region. As is mentioned above, Eq. (2.11) is derived near s=0. This is the so-called Guinier approximation and only valid when s<1.3/Rg which is estimated by practice. This is another reason why the scattering should be under small angle.

Linearity of the Guinier plot can be used as a test of the sample homogeneity and a non-linear Guinier plot is a strong indicator of attractive or repulsive interparticle interactions leading to interference effects. An example is shown in Fig. 1A and 1B. Samples that contain a significant proportion of non-specific aggregates yield scattering curves and Guinier plots with a sharp increase in intensity at very small values of s (1B(1)), while samples containing significant inter-particle repulsion yield curves and Guinier plots that show a decrease in intensity at small values of s (1B(3)). Of course, the linearity of Guinier plot does not guarantee the monodispersity and researchers should use other methods such as dynamic light scattering (DLS) to confirm this result.

Figure 1 Standard plots for characterization by SAXS (A and B), (1) aggregation, (2) good data and (3) inter-particle repulsion [8]

Eq. (2.11) is valid for particle with arbitrary shapes. And for rod-like particles, the radius of gyration of the cross section RC is the slope of the plot of (ln(sI(s) versus s2), while for flattened the slope of the plot of (ln(s2I(s) versus s2) gives the radius of gyration of the thickness, Rt. The expression is in Eq. (2.14).

, (2.14)

And especially, for some biological structures like filaments (actin, myosin, etc.), which is hundreds of nm long, it is possible that no reliable data is reliable in the Guinier region (s<1.3/Rg).

The molecular mass (MM) can be estimated by the forward scattering intensity I(0). From Eq. (2.12), the experimentally obtained value of I(0) is proportional to the squared contrast of the particle. If the measurements are made on an absolute scale, the MM can be directly calculated by:

. (2.15)

In practice, the MM can often be readily estimated by comparison with a well-characterized reference sample (for proteins, lysozyme or bovine serum albumin (BSA) solution). One should keep in mind that the accuracy of the MM estimation is limited because normalization against the solute concentrations is required.

Polydisperse systems

We mainly focus on ideal monodisperse systems in the previous discussion. However, in practice, one has to deal with systems that are not ideal. As a result, different data interpretation tools are required to develop. There are two requirements in monodisperse systems. One is the identical particle size and the other is the no interparticle interaction. Now, let's consider a system consisting of different sizes and structures of particles without interaction with each other. Then the total scattering can be written as a linear combination

(2.16)

where vkIk corresponds to the contribution of the kth type of particles, and K is the number of the components. It is impossible for one to reconstruct the structures of every individual component after a single SAS experiment. But if the scattering patterns of every component are known by other methods, the volume fractions in linear combination can simply be determined by linear least squares. This is useful in well-defined systems like oligomeric equilibrium mixtures of proteins.

If the number of components and their scattering patterns are not known, there is still a way to solve the problem. It is the singular value decomposition (SVD) [9] introduced initially introduced in the analysis of SAXS in the early 1980s. This method is particularly useful in titrations and time-resolved experiments. And the number of components it gives is smaller than the actual.

Interacting systems

When the concentration of solutions gets higher, the interparticle interaction cannot be ignored. There are two forms of interactions, specific and non-specific. Specific interaction will lead to formation of complexes. It can be treated by the previous method discussed in polydisperse systems. In this section, non-specific interactions such as mutual impenetrability of macromolecules, electrostatic force between charged surfaces and long-ranged van der Waals interaction are considered. That is to say, this method can study the behavior in larger distance, which is quite different from the interaction during crystallization in X ray crystallography.

As is mentioned in the section 2.2, the scattering intensity in an interacting system can be written as IS(s) = I (s) - S(s), where S(s) represents the particle interactions. S(s) is also called the structure factor, while I(s) is called the form factor. The structure factor can be determined by from the ratio of the experimental intensity at a concentration c to an extremely low concentration.

(2.17)

The above method is the experimental determination of structure factor, but in practice the computation method is used more frequently. From thermodynamic and physico-chemical theory, the relationship between structure factor and the osmotic pressure Π is given by:

(2.18)

where R is the gas constant and M the molecular mass of the solute. In a sufficiently low concentration of solution, the interaction is weak. Then the osmotic pressure can be approximated by series expansion:

(2.19)

So

(2.20)

A2 is the second virial coefficient. A2>0 when the interactions are repulsive and A2<0 when the interactions are attractive.

Modeling

ab initio methods

It seems to be difficult to reconstruct the low-resolution 3D models from 1D SAS data. But now, this is a standard procedure and also a rapid characterization tool. Introduction of a spherical harmonics representation by Stuhrmann is an effective way to solve this problem.

First of all, the scattering density can be expressed as

(2.21)

Where (r, ω)=(r, θ, φ) are spherical coordinates. And

(2.22)

are radial functions. So the amplitude can be written as

(2.23)

Combining the above equation with Eq. (2.7), we can have

(2.24)

Alm(s) are computed from a series of shape coefficients and the criteria of the these coefficients is the discrepancy χ between the experimental and the calculated scattering curves.

(2.25)

Figure 2 Accuracy of shape representation using spherical harmonics [6]

The truncation value L defines the accuracy of the expansion. Fig. 2 shows an example of the accuracy of shape under different L values. And as L→∞, the model reflects the real shape. However, the L value also defines the number of independent parameters Np (Np=(L+1)2-6). Therefore, the larger the L value is, the more enormous the more complex the calculation is. Some understanding of the geometry can simplify the calculation. The most effective way is make use of the symmetry. The higher the symmetry is, the more coefficients can be reduced, thus larger L value can be used. Then the more accuracy of the model can achieved.

Rigid body refinement [11, 12]

One of the applications of the scattering data from SAS is construct the structural models of complex particles from known high resolution models of individual subunits. The method used is rigid body refinement. For a complex of two subunits A and B, the scattering intensity is

(2.26)

Ia(s) and Ib(s) are the scattering intensities of A and B. The Alm(s) are partial amplitudes of the fixed subunit A, and the Clm(s) those of subunit B rotated by the Euler angles α, β, γ and translated by a vector u. These six rotation and translation parameters are to be iteratively refined to fit the experimental data. Similarly, information on the symmetry can reduce the number of parameters and will speed up the refinement.

2.6.3 Ensemble optimization method (EOM) in flexible systems [13]

SAXS is thought to be a powerful technique in studying flexible systems. A new ensemble optimization method is involved in this approach. For flexible structures, if there are N different conformations, then the overall scattering intensity I(s) is given by:

(2.27)

Here, In(s) is the scattering intensity of the nth conformer. A large number of possible conformations are generated to form a pool. And using a genetic algorithm (GA), a subset of ensembles is selected. Then, comparing both the Rg, we can evaluate the flexibility of the systems. If the Rg distribution of the models in the selected ensembles is as broad as that in the initial random pool, the protein is likely to be flexible; obtaining a narrow Rg peak suggests that the system may be rigid.

However, this EOM analysis cannot be applied in polydisperse systems because the aggregation or forming oligomers will result in misestimating the weights of conformers.

## Application and future prospects

Analysis of macromolecular shapes

Since first introduction of ab initio method for SAS analysis, it has become a major tool, especially during the last few years. Several programs such as DAMMIN, GASBOR are available on the Web for shape determination, and they have their own characters.

Figure 3 Scattering curves and ab initio low-resolution models of Z1Z2 and its complexes with telethonin.

Here is an example [14] to study the shape of a giant protein complex with ab initio method. It was a muscle protein titin which used to be the largest known protein. Within the protein, telethonin (MM=18kDa) interacts with two Z-disk IG-like domain (Z1Z2, MM=22kDa), and both the structures of these two domains had always been predicted. The problem was how they formed a complex. It was helped by SAXS measurement and ab initio method. The scattering patterns (Figure 3A) and the models reconstructed by DAMMIN and GASBOR (Figure 3B, 3C, 3D) are shown in figure 3. From Figure 3B, we can see that in five independent runs, the models show a little difference, which means the models by ab initio method is not unique and it is a reflection of the flexibility of protein in solution. To reduce this uncertainty, more iterative runs are preferred to generate an average model (Figure 3C, 3D).

Quaternary structure of complex particles

Figure 4 Scattering curves and rigid body models of SUR2A NBD1(A), NBD2 (B) and NBD1/NBD2 (C)

Rigid body refinement is the most popular method in determination of quaternary structure because small angle scattering can reveal domain organization without the requirement of a crystalline sample. A successful example [15] is the structure determination of dimeric nucleotide binding domains NBDs, which can distinguish an ATP-binding cassette (ABC) protein, sulfonylurea receptor 2A (SUR2A). After performing a small angle scattering experiment under synchrotron radiation, a shape model was obtained using ab initio method and rigid body refinement. And as no crystallographic structures of SUR2A NBD1 or NBD2 were solved before, homology models (ClustalX with hemolysin B (HylB) shared 29% sequence identity with SUR2A) were used to dock into the built shape model. The rigid body models of the homodimeric and heterodimeric protein is shown in Figure 4. The structure determined clarifies the macromolecular arrangement of cardiac ATP sensitive K+ (KATP) channel SUR2A regulatory domains.

Equilibrium systems and oligomeric mixtures

As is pointed out in Section 2.4, SAS is one of the most useful techniques in studying well-defined systems like oligomeric equilibrium mixtures of proteins. The volume fractions of mixtures of different macromolecules or of different conformations/aggregations states of the same macromolecules can be quantitatively characterized using Eq. (2.) from the scattering curves.

C

B

A

Figure 5 Experimental SAXS curves (A) and models of individual subunits (B) and complex (C)

The solution structure of bacteriophage PRD1 vertex complex is one of the examples [16]. Bacteriophage PRD1 is a prototype of viruses with an internal membrane, whose functions are to mediate host cell binding and control delivery of double-stranded DNA. It consists of monomeric P2 (MM=66kDa), trimeric P53 (MM=103kDa) and pentameric P315 (MM=69kDa) proteins. The models of these three components are built using ab initio method from the scattering curves (Figure 5A). The receptor-binding protein P2 is a 15.5 nm long and thin monomer that is anchored to the vertex base. Protein P53 is a 27 nm long trimer that resembles the adenovirus Ad2 spike protein pIV. P315 forms a globular, pentameric base with a maximum diameter of 8.5 nm, which is shown in Figure 5B. As there were always mixtures in solution, it was difficult to direct determine the shape. Tentative models of these aggregates were constructed interactively followed by fitting the experimental scattering data by linear combination of assembled. The final models are shown in Figure 5C.

This result has proved that it becomes possible to quantitatively characterize the structure and composition of mixtures containing different types of particles which is especially important for the structural analysis of complex and equilibrium systems, by combining SAS with other physico-chemical and biochemical methods.

Intermolecular interactions and protein crystallization

Understanding and protein crystallization process from solutions have long been a great challenge and enormous efforts have been made on the studying. However, it is still not well understood, because a large number of solutions parameters play a crucial role in this process. Among of all these parameters, understanding of the interaction is of great significance in many researchers opinion. And many researchers have proved that osmotic second virial coefficient can act as predictor in protein crystallization [17, 18]. As we have discussed in Section 2.5, small angle scattering can correlate to the virial coefficient.

Studying on Hen egg white lysozyme crystallization in solution by small angle X ray scattering is an example [19]. It was found that as the temperature raised the scattering intensity at low angle decrease and remained the same at relatively high angle. Intensity at low angle represents the structure factor, while that at high angle represents the form factor. The change of intensities at low angle means change of interactions at different temperatures, which is easy to be interpreted. The scattering intensities as a function of time were also investigated. The intensities at low angle decreased as time passed. And even, after more than two hours, some Bragg peak had shown up, which meant that small crystals had been formed.

B

A

Figure 6 Small angle scattering intensities versus temperature (A) and time (B)

Other applications

SAS is a good complementary of the current high-resolution structure determination methods such as X ray crystallography and NMR [20, 21]. Very often, some flexible parts will be absent in structure determination methods by X ray crystallography and NMR methods. These missing parts are always some disordered surface amino acids (loops). X-ray solution scattering offers the possibility of obtaining complementary information and adding missing loops or domains by fixing a known structure and building the unknown regions to fit the experimental scattering data obtained from the entire particle [22].

SAS can not only be used to study the structures of proteins, but also the structures of other macromolecules such as RNA [23]. A rapid coarse-grained method has been developed for calculating the SAXS profile from RNA.

The available of synchrotron radiation allows the time-resolved dynamic study of macromolecules solutions. The study of the conformational change of the highly conserved 90kDa heat hock protein (Hsp90) in the ATPase cycle is achieved by small angle X ray scattering [24].

Future prospects

In the last decade, SAS has become one of the most important structure determination methods revealing low-resolution structures in solutions. And the advances in instrumentation and methods development has attracted more and more new research groups to incorporate this technique into their research programs. SAS are on the way to its mature state.

However, as many novel and exciting biological questions are brought, there remains a long way for SAS to go. First of all, the computational methods are to be developed more advanced to analyze the data more effective. Moreover, the instrumentations need to be developed into a more automated state so that the automation of data collection, data reduction and analysis in particular can make SAXS more accessible to the non-expert. And automated sample changers and pipelines are being developed to rapidly perform major analysis steps without user intervention, allowing for fast screening in a high-throughput mode. Last but not least, the radiation should be developed, too. At present, a synchrotron SAXS experiment can be done on few microlitre samples with solute concentrations below 1 mg/ml. Further decrease of the amount and concentration of material is expected when using nanometre-sized beams in a microfluidic environment.

Though it cannot reveal a whole high resolution structure alone, it is a good complementary to other characterization methods. There is no doubt that SAS will play a more and more important role in structural biology.