# Science Of Relating Measurements Made Biology Essay

Industrial development and production are the application areas of chemometrics. Chemometrics helped in the introduction of methodology, structure reactivity and structure activity. In chemometrics, on validation and independent prediction sets, pictures and shattered illusions are obtained. chemometrics is widely used in drug design and other areas of pharmaceutical R&D.

The areas in which chemometrics is most successful are (a)multivariate calibration, (b)structure-(re)activity modelling, (c)Pattern recognition, classification and discriminant analysis, (d)Multivariate process modelling and monitoring.

(a)Multivariate calibration:

This is one of the methods of chemometrics which is preferable used over single wavelength approach because of its increase in precision and selectivity. Though it is not very accurate, it is enough accurate to solve many real problems. The response obtained, which is either physical or chemical quantity, is function of quantities which are measured. Nonspecific predictors and physical information from spectra is used in multivariate calibration. The predictors are usually computed descriptors of the molecular structure in the study of biological activity.

The function which computes the response from predictors is obtained with the help of chemometrics tools, which is able to obtained from many non specific predictors. Multivariate calibration is used in case of complex real matrices, where the chemical treatments and separation procedures are necessary but time consuming and expensive.

Multivariate calibration is an important analytical tool which is used in many fields like pharmaceutical analysis, food chemistry, agriculture, industrial, environment and clinical chemistry. It is used in quality control of pharmaceutical preparations and monitoring of beer production. It is also used for predicting biological activity, sensory scores and toxicity. In chemical industry, both physical quantities of interest and chemical species can be determined by using multivariate calibration.

Analytical procedure in multivariate calibration is fast and cheap. In many cases, software is part of the instrument. Data treatment is automatic and blind.

Multivariate process modelling and monitoring:

In molecular modelling, computer pictures are obtained. These processes are rapidly used in all types of manufacturing processes like pulp, paper, commodity chemicals, pharmaceuticals, food, beverages and cosmetics. These methods are modified for batch processes like wafer production in the semiconductor industry and biotechnical fermentation processes.

The different chemometric techniques are

Principal Component Analysis (PCA): It converts a set of observations of correlated variables into uncorellated variables called as eigen vectors, factors or principle components.

X= TL

Principal Component Regression(PCR)

Partial Least Squares(PLS)

Multivariate curve Resolution(MCR)

## Advantages of chemometrics:

Real -time information from data can be obtained very fast by using chemometrics.

High quality information can be extracted from less resolved data.

When applied to second, third or higher order data, it gives clear information resolution and discrimination power.

Methodology of cloning sensors is provided.

Diagnostics for integrity and probability is provided to show that the information derived is accurate.

Measurement quality is improved.

The knowledge of the existing processes is improved.

It is inexpensive.

## Disadvantages of chemometrics:

With the help of computer, complex numerical solutions can be generated which cannot be interpreted.

Complex mathematical solutions can lead to plenty of misinterpretation.

Change is required in approach of solving the problem from univariate to multivariate.

Best practices need to be collected and codified into useful standards.

## Importance of chemometrics in Process analytical techniques:

Process Analytical Technology involves the following steps to produce finished products of acceptable quality.

Use of raw material properties

Manufacturing parameters

Monitoring the process and

Using chemometric techniques

The generation of product quality information is a real time process.

Monitoring of the process is done by using in-line testing using near infrared, Mid-IR, Raman, acoustic emission signals and some other physiochemical techniques.

The physiochemical processes involved in the manufacturing processes are multivariate with subtle interactions of variables. Previously, the manufacturing processes were treated in univariate manner. Chemometrics is designed to handle multivariate chemical data.

Chemometrics is the application of mathematical and statistical methods which is used to improve chemical measurement processes by extracting the useful information from physical and chemical measurement data. Chemometrics is used for multivariate data collection and analysis protocols, process modeling, calibration, pattern recognition and classification, signal correction and compression, and statistical process control. Standard terminology and methods are necessary for verification and validation. The working group of PAT has recommended implementing chemometrics in PAT. The important aspects for the successful application of chemometrics involve: Design of experiments which is scientifically valid, proper application of preprocessing, calibration, and diagnostics and prediction validation. Design of experiments tells the extent to which these models are applicable. If chemometrics are to be applied in PAT, then it requires to prove the application of mathematical and statistical methods as well as basic understanding of chemical phenomena under study. The process modelling will determine and also derive the state of the process. The focus is on the chemical understanding but not on making of statistical tool. Model must be monitored after implementing it, during the process by use of actual real time measurements. Along with the monitoring system, the process is controlled either automatically or manually with real-time feedback which is provided to individuals who has knowledge on physical and chemical phenomena and also the limitations of the statistics which are used to determine the relationships. The results of analysts include improved understanding of processes and products based on the basic science and the product attributes. Only statistics do not demonstrate cause and effect. To explore the principle underlying which causes variation, to improve the computational approaches to understand the data and to assess the fundamental information of the process, statistics are useful to understand the basic science involved. Quality data obtained from design of experiment and understanding of the physical and chemical phenomena are essential. The limitations of statistical methods must be considered for successful long-term implementation of chemometric models for understanding and efficient control of manufacturing processes.

The advantages of chemometrics are as follows: real-time information can be obtained very fast from data, high quality information can be extracted from less resolved data and clear information resolution and discrimination power can be obtained even when it is applied to data which is second, third, and higher order. It also provides methodology for cloning sensors which makes one sensor take data as other sensor. Other benefits are diagnostics for the probability and integrity that the information obtained from sensor data is accurate. Improvement in measurement quality and improved knowledge of the process. Some of the economic benefits of chemometrics are capital required is low, safer plant, process operations by real time monitoring and preventing the dangerous process upsets, gives assurance that the plant environment and processes are according to the environmental regulations, increase in the operations of plant process by adjusting process timely by using real time data. Other benefits include improvement in product quality by maintaining very tight control limits, minimizing the waste products by process optimization, minimizing the cost of the product production by tight target limits and more accurate scheduling of production, optimizing the production capacity which results from increase in process operability and continuous verification of product quality.

Chemometrics real time measurements eliminate the greatest challenges to 100% compliance and gives analytical accuracy for measurement of a process. If PAT is managed well, it provides new intellectual property for good opportunities. Paradigm shift is required in chemometrics and process model should be understood close to reality. The best practices must be collected and coded into necessary standards. Paradigm shift is required for best utilization of chemometrics. Paradigm shift demands that one should not fix to the thermodynamic models. A reality check should be done on the ideal states which involves real time data input obtained from inexpensive measurements and finally chemometric analysis is done.

The [rewards/(risk + cost)] ratio should be a very large number. Chemometrics satisfies the above mentioned requirements by providing expertise. Chemometrics can be applied at a minimum cost by using data analysis techniques which requires small investments and also helps in understanding a process and helps in improving it. Because of the flow of real time information, risk can be minimized.

The benefits of chemometrics in PAT are as follows:

Chemometrics is a safer plant and process operations by real time monitoring and prevents the potentially dangerous upsets of the process.

Chemometrics increases the process plant operability by timely adjustments in processes by usage of real time data.

Chemometrics gives assurance that the plant environments and processes are according to the environmental regulations.

Chemometrics also improved quality of the product by maintaining the control limits.

It also minimized the waste products by process optimization.

Continuous product quality verification and increased product operability resulted in optimization of production capacity.

Information and technology is provided by chemometrics, for real time feedback learning and control.

It also provides analytical accuracy for the process

## One of the chemometric techniques which is used in the present experiment is Partial Least Squares (PLS):

Wold introduced Partial Least Squares (PLS). This method first originated in social sciences but became popular in chemometrics. PLS is a technique that generalise and combines some of the features of principal component analysis and multiple regression. It is useful particularly in predicting a set of dependent variables from a large set of independent variables.

The main goal of this method is to predict Y and x and also to know their common structure. This is goal is reached by using ordinary multiple regression when Y is vector, X is full rank. Regression approach cannot be done if the number of predictors are large than the number of observations because of multicollinearity. This problem can be solved by the orthogonality of the principal components. The PLS method obtain orthogonal linear combinations of predictors (known as factors) from the predictor data.

The steps involved in the analysis of PLS analysis is as follows:

Calculating a PLS model by a high number of factors.

Determining the number of factors either by analysing the information obtained during the process or calculating prediction accuracy.

Fitting the model by calculating the parameter estimates with the determined number of factors.

To fit a PLS model, a set of predictors and responses are used. Suitable number of factors are used to calculate the parameter estimates and estimate response values to new predictor data.

In PLS, decomposition of spectral data and concentration data is done simultaneously. The factor is removed after the calculation. Next factor is calculated by using the newly reduced data. In this way desired number of factors are calculated.

In PLS, the vectors are directly related to constituents. The left column represents the spectra of the "pure" constituents used to construct the data set. The center column represents the first PLS-1 vector for each constituent calculated from the data set. The right column shows the first two PCA vectors for the same data.

Advantages of PLS are as follows:

Useful for very complex mixtures.

Single step decomposition and regression.

Useful in predicting samples with constituents (contaminants).

Disadvantages of PLS are as follows:

Calculations are slower.

Difficult to understand and interpret the models.

A large number of samples are required for accurate calibration.

## Quantitative analysis of paracetamol polymorphs in powder mixtures by FT-Raman spectroscopy and PLS regression:

‘Polymorphism: It is a phenomenon in which the same compound can exhibit two or more forms with different crystal structures.’ In pharmaceutical compounds, polymorphism is widely observed phenomenon. The various polymorphs of a compound can exhibit different physicochemical properties such as solubility, melting point, dissolution rate, bioavailability, chemical reactivity, resistance to degradation, etc., Changes in these properties can change the therapeutic effect and process ability of a drug. Therefore it is important to establish reliable methods for the characterization of the solid-state forms of pharmaceutical products.

Paracetamol is a drug widely used anti analgesic and anti pyretic. It has two polymorphic forms, monoclinic (form I) and orthorhombic (form II). The metastable orthorhombic polymorph of paracetamol, form II, is mostly used in tablet manufacturing, because of its well-defined slip planes in crystal lattice and it is suitable for direct compression. Usually, form II is contaminated with crystals of the monoclinic polymorph (form I) based on harvesting time and drying condition. So quantitative analysis of forms I and II in crystalline powders is done.

For the quantitative analysis of the paracetamol, FT-Raman spectroscopy is used. The biggest hurdle in Raman spectroscopy is that the interpretation of the data. To overcome this, one of the chemometrics technique, Partial Least Squares is used.

The aim of the experiment is to conduct quantitative analysis of paracetamol form I and form II by the application of Raman spectroscopy and Partial Least Squares(PLS), a chemometric technique.

## Experiment:

Compounds: Paracetamol (form I) was obtained from Apoka (Apoka Pharma Produktions und Handelgesellschaft m.b.H., Austria). Pure orthorhombic paracetamol (form II) was prepared by using melt crystallization. The crystal form and purity of both polymorphs (I and II) was checked by using powder X-ray diffraction.

## Powder X-ray diffraction

The powder X-ray diffraction was conducted using a Siemens D-5000 diffractometer (Siemens AG, Karlsruhe, Germany), equipped with a theta/theta goniometer, a Cu Kα radiation source, a Goebel mirror (Bruker AXS, Karlsruhe, Germany), a 0.15° soller slit collimator and a scintillation counter. The angular range at which the powder samples were scanned is 2–40°. The scan rate is 0.005° 2θ/s at a tube voltage of 40 kV.The tube current is 35 mA.

Raman Spectroscopy:

Raman spectroscopy is analytical technique which is extensively used in pharmaceutical industry. The two techniques employed by Raman spectrometers are Dispersive Raman and Fourier Transform Raman.

Fourier Transform Raman spectroscopy:

http://www1.chm.colostate.edu/Files/FTIR-Raman/FTIR-Raman.pdf

FT Raman instrument consists of

a laser for excitation of sample,

one or more filters to block the Rayleigh scattering,

interferometer,

a sensitive detector,

a capability to do a fast Fourier-transform on interferogram.

The advantages of this instrument are

Wavelengths of light are simultaneously detected.

Because of longer wavelength, no loss in scattering efficiency.

Wavenumber values are accurate in a spectrum.

## FT-Raman spectroscopy

FT-Raman spectra were recorded on a Bruker RFS 100.

FT-Raman spectrometer is equipped with a diode pumped Nd:YAG laser (1064 nm) ,which is used as excitation source, and a liquid nitrogen cooled, high sensitivity Ge detector (Bruker Optik GmbH, Ettlingen, Germany). A small amount (few milligrams) of the sample were placed in a small aluminum sample cup and was packed lightly. 64 scans were performed at a resolution of 4 cm−1 over the range 0–4000 cm−1 for each spectrum. A Blackman-Harris B4 term was used as apodization function. The spectral data obtained from each sample were saved in electronic format. The Know-It-All Informatics System v.5.0 (Bio-Rad Laboratories, Inc.) was used for analysis and also for finding peak attributions.

## Preparation of sample mixtures:

By geometrically mixing pure polymorphs II and I, eighteen binary mixtures were prepared. In the mixing procedure, low concentrations of form I is used. Although, it covers the entire concentration range sufficiently. 20 samples were prepared. The concentration of form I in the samples was: 100, 97.75, 95.5, 91, 82, 73, 64, 48, 32, 24, 16, 12, 8, 6, 4, 3, 2, 1.5, 1 and 0% (w/w).

## Multivariate calibration

Three preprocessing algorithms, namely orthogonal signal correction (OSC), standard normal variate transformation (SNV) and multiplicative scatter correction (MSC) used to eliminate sources of non-linearity or remove features uncorrelated with the concentration of the analyte. The range of the complete spectrum used was 0–4000 cm−1 . The OSC algorithm calculates parts of the spectrum that are uncorrelated to the concentration of the analyte and removes them. The OSC preprocessed spectra are then subjected to mean-centering (subtraction of the average from each spectrum). The SNV transformation centers each spectrum separately by subtracting its mean. Then scales it by its own standard deviation. MSC eliminates light scattering or change in path length effects for each sample by shifting and rotating each spectrum so that it fits closely to the average spectrum of the dataset. The algorithm operates on a segment which acts as the baseline of the spectra. The fitting to the average spectrum is performed by least squares.

Cross-validation by the leave-one-out method was applied to evaluate the models. The data were split into homogeneous training and test subsets, each consisting of 10 samples. Kennard-Stone design is applied. This algorithm chooses data points, which starts from a point closest to the average spectrum of the data set and adding subsequent points on the basis of the maximum squared distance to all of the already selected points. This guarantees that the selected data points are uniformly distributed within the original data set. The training set selected contained the 100, 82, 73, 64, 48, 24, 8, 4, 3 and 0% (w/w) mixtures. The rest of the mixtures (97.75, 95.5, 91, 32, 16, 12, 6, 2, 1.5 and 1%, w/w) were assigned to the test set.

All calculations were performed on a PC. The Simca-P v.9 (Umetrics AB) was used for the OSC, SNV and MSC transformations and also fitting of the PLS models. Kennard-Stone routine of the ChemoAC toolbox for the Matlab (Matlab 6.5, Mathworks Inc.) was used for the separation of the data into uniform training and test subsets. The predictive performance was assessed by the root mean squared error of cross-validation, of calibration and of prediction, calculated by the formula:

The below mentioned experimental data confirms the polymorphic purity of the samples which is used to prepare the mixtures. In the PXRD pattern of form II, intensity of the 24.03° 2θ reflection was found to be high which is unusual.This is a strong indication of preferred orientation of the crystalline particles. This is also evident from the relative intensities which are listed in the table 1, in which all relative intensities in the melt-crystallized sample are much lower when compared to the solution-crystallized orthorhombic form. This reflection relates to the 0 0 2 Miller plane, a well-defined slip plane, existing in the form II structure. The reduced elasticity of this polymorph is because of the slip plane. During sample preparation, the grinding of the melt-grown crystalline material may result in extensive fracture along the 0 0 2 plane, and with the consecutive increase of the corresponding reflection intensity in the diffractograms. Other difference between the two monoclinic forms is in relative intensities of the 0 2 2 and the 1 1 1 h k l plane reflections. the highest intensity reflection is shown by 0 2 2 plane. this data is seen in table 1. The highest intensity reflection in the commercial product which is used in this study is shown by 1 1 1 plane. This observation is consistant with the preferred orientation effects. This is one of the reason to say that spectroscopic methods are advantageous over PXRD based methods,which are not sensitive to particle orientation, for the quantitative analysis of paracetamol polymorphs.

figure 1: represents the Powder X-ray diffractograms of paracetamol forms I and II

Kyriakos Kachrimanis, Doris E. Braun & Ulrich J. Griesser ( January 2007), "Quantitative analysis of paracetamol polymorphs in powder mixtures by FT-Raman spectroscopy and PLS regression", Journal of Pharmaceutical and Biomedical Analysis, vol. 43, no. 2, pp. 407-412.

Table 1:represents PXRD h k l planes, corresponding peak positions and relative intensities of the monoclinic (form I) and orthorhombic (form II) polymorphs of paracetamol which is reported in the literature [4] and also the starting materials i.e., commercial, form I and melt-grown, form II.

## Monoclinic form (I)

## Orthorhombic form (II)

## Nichols and Frampton [4]

## Commercial

## Nichols and Frampton [4]

## Melt-grown

## h k l

## 2θ(°)

## I/Imax(%)

## 2θ(°)

## I/Imax(%)

## h k l

## 2θ(°)

## I/Imax(%)

## 2θ(°)

0 1 1

12.11

26

12.08

26

200

10.32

4

10.29

1 0 −1

13.83

18

13.78

12

210

12.76

5

12.73

0 0 2

15.24

3

14.99

9

020

14.99

14

14.96

1 0 1

15.51

72

15.51

47

211

17.51

26

17.49

1 1 0

15.70

4

16.40

3

220

18.23

22

18.21

1 1 −1

16.73

11

16.71

5

021

19.20

49

19.19

1 1 1

18.18

68

18.17

100

121

19.88

3

## –

0 2 0

18.91

13

18.84

6

400

20.72

9

20.70

0 2 1

20.38

39

20.35

21

221

21.84

22

21.82

1 1 −2

20.76

7

20.73

5

002

24.04

100

24.03

1 1 2

23.09

9

23.09

16

102

24.60

14

24.58

1 2−1

23.48

62

23.46

47

230

24.86

9

## –

0 2 2

24.37

100

24.35

93

420/112

25.70

13

25.71

1 0 −3

24.74

5

## –

## –

131/202

26.18

5

26.18

1 2 −2

26.55

62

26.53

51

## –

## –

## –

27.29

2 1 −1

27.17

11

27.16

14

231

27.66

12

27.65

2 0 −2

27.85

4

27.86

3

022

28.41

3

## –

1 2 2

28.40

2

## –

## –

122

28.93

21

28.90

2 1 1

29.01

7

29.00

4

312

29.70

5

29.70

1 1 3

29.27

6

29.28

27

222

30.33

27

30.31

0 2 3

29.89

3

## –

## –

600

31.28

5

31.31

1 2 −3

31.27

5

31.31

3

240

32.03

7

32.08

2 2 −1

31.88

3

## –

## –

322

32.57

6

32.55

0 3 2/1 3 1

32.58

17

32.48

10

141/431

33.08

4

## –

## –

## –

## –

32.789

32

132

33.64

2

## –

1 3 −2

34.17

2

34.19

2

611

34.46

5

## –

1 3 2

35.71

3

## –

## –

232

34.84

7

34.84

1 1 4

36.19

10

36.19

7

422

35.45

3

## –

0 3 3

36.90

18

36.90

11

512

36.49

3

## –

2 0 −4/2 2 −3

37.47

7

37.45

2

440/621

36.96

17

36.95

2 1 3

37.90

4

## –

## –

630

38.89

7

38.73

## –

## –

## –

38.48

9

## –

## –

## –

39.54

Kyriakos Kachrimanis, Doris E. Braun & Ulrich J. Griesser ( January 2007), "Quantitative analysis of paracetamol polymorphs in powder mixtures by FT-Raman spectroscopy and PLS regression", Journal of Pharmaceutical and Biomedical Analysis, vol. 43, no. 2, pp. 407-412.

In figure 2, FT-Raman spectra of pure polymorphs I and II are represented along with the difference spectrum (form II–form I). In figure 3, FT-Raman spectra of pure form II are represented, which is the data obtained after preprocessing by the OSC, SNV and MSC algorithms. From figure 2,it is observed that the most distinct difference lies in the lattice vibration region, where the orthorhombic polymorph has a very intense band at 122 cm−1, which is absent in the spectrum of form I. Another difference is found in the region between 1540 and 1680 cm−1, where the observed Raman bands are attributed to the amide carbonyl group vibrations and the aromatic hydrogens. In the region around 1200 cm−1, the stretching vibrations of the aromatic CO and CN are seen. There is a difference in the 450–470 cm−1 region, which was exploited for the development of a univariate calibration method. From figure 3, regarding the effect of the preprocessing on the spectra, it is observed that the OSC preprocessing algorithm distorts the spectra, whereas the SNV and MSC algorithms produce similar results. This shows less pronounced differences to the original spectrum, which are related to the position and scale of the spectrum. The similarity between the SNV and MSC algorithms was observed.

Figure 2 represents the FT- Raman spectra of paracetamol forms I and II and corresponding difference spectrum(formI - form II).

Figure 3 represents FT-Raman spectra of the orthorhombic form II after processing by OSC, SNV, MSC algorithm.

Kyriakos Kachrimanis, Doris E. Braun & Ulrich J. Griesser ( January 2007), "Quantitative analysis of paracetamol polymorphs in powder mixtures by FT-Raman spectroscopy and PLS regression", Journal of Pharmaceutical and Biomedical Analysis, vol. 43, no. 2, pp. 407-412.

In Table 2,represents the form I observed concentrations versus predicted. Table 3 represents summarized the root mean squared error of cross-validation in the data set, of calibration in the training and of prediction in the test subset, for the PLS models fitted to the OSC, SNV and MSC preprocessed spectra. Plots of observed versus predicted form I concentrations proved that the PLS model fitted by the leave-one-out cross-validation method on the complete data preprocessed by the OSC algorithm is performing better than the rest in the whole concentration range, though this model contains a single latent variable. In the case of SNV and MSC preprocessing, the PLS models failed. Especially in the low form I concentration range, though they contain three PLS components, in agreement with the corresponding results for the separate test subset. The RMSECV of the model trained on OSC preprocessed data (0.500%) is lower than the models trained on the SNV (2.394%) and MSC (2.764%) preprocessed data. Though leave-one-out cross-validation is suitable for small datasets, it is considered optimistic and incorrect . Therefore, using a separate test subset consisting of equal number of data points to the training subset, the performance of the PLS models was validated to minimize the risk of over-fitting. From table 3, it is observed that the PLS model fitted to the OSC preprocessed data is performing much betterbecause of lower RMSEC (0.842%) and RMSEP (0.538%) and linear regression parameters of observed versus predicted concentration of form I close to the ideal values. The RMSEP has the same magnitude of RMSECV, and slightly less than the RMSEC. This is an indication to say that the model is not over-fitted. The SNV and MSC preprocessing leads to models of comparable predictive performance, as it has been demonstrated that these methods give fairly equivalent results.

Table 2 represents predicted concentration values of form I by PLS regression

## Concentration of paracetamolform I (%, w/w)

## Observed

## Predicted by PLS after preprocessing by algorithm

## OSC (1)

## SNV (3)

1

0.85

0.07

1.5

1.47

3.37

2

1.85

1.59

6

5.95

6.91

12

11.74

7.60

16

15.50

12.26

32

31.82

29.20

91

90.47

90.21

95.5

95.61

79.29

97.75

96.26

83.42

Table 3 represents Root mean squared error of cross validation for data set, calibration (RMSEC) for the training and prediction for the test subset . Linear regression parameters of observed versus predicted concentratons of form I in the test subset.

## Preprocessing algorithm

## OSC (1)

## SNV (3)

Root mean squared error

RMSECV (%)

0.500

2.398

RMSEC (%)

0.842

0.911

RMSEP (%)

0.538

7.177

Regression coefficients

R

0.9999

0.9940

Slope ± S.E.

1.0053 ± 0.0033

1.1122 ± 0.0432

Intercept ± S.E.

0.1332 ± 0.1776

0.5610 ± 2.0518

## Conclusion:

In this experiment, Qualitative analysis of paracetamol polymorphs I and II is done by using FT-Raman method and developed by applying PLS regression to spectra. Excellent performance is seen when preprocessing of the spectra is done by OSC algorithm. Though emphasis is given on lower concentrations, even high concentrations of form I gave excellent performance.

### Request Removal

If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please click on the link below to request removal:

Request the removal of this essay