Chemometrics And Its Application In Pharmaceutical Prospects Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Process analytical technologies are used during the process controls and encompasses with weighing, temperature control, NIRS, Raman, acoustic, imaging etc. According to FDA framework, PAT are defined as the "systems for designing, analysing and controlling of manufacturing processes during timely measurements (i.e. in process) of critical quality parameters and performance attributes of raw materials and in-process products to assure acceptable end product quality at the completion of the process". PAT frameworks also stated that quality of a product should be built by a design; it can't be tested into a product. A process analytical technology can also prefer on-line, at-line and in-line process of testing where on-line refers as sample are diverted during the process whereas at-line is sample are removed and in-line dictates measurements from the process stream.

Chemometrics is designed to handle many data which has a broader range of definition. Kowalski (1980) defined Chemometrics as the "application of mathematical and statistical methods to chemical measurements". Delaney (1984) added conversion of raw data into information which further converted to knowledge and lastly knowledge converted to intelligence occurred by the development of computer software and develops the Chemometrics. Lavine (1998) stated that "Chemometrics is an approach to analytical and measurements science based on the idea of indirect observation. Measurement related to the chemical composition of a substance is taken and the value of a property of interest is inferred from them through some mathematical relation." So after processing of the product, measurements followed by the collection of data chemometrics are used to gather information and to gain the real knowledge about the product. Combination of both hardware and chemometics makes the process analytical techniques.

Importance of Chemometrics in PAT environment:

The role of chemometrics in process analytical technologies is quite broad which provides a total information and knowledge about the chemicals. Mathematical and statistical tools are the important to attain knowledge from chemical is done by the chemometrics. Understanding of chemistry and statistics both are necessary in chemometrics because of the combination of the both in their analytical purposes. Chemometrics provides some advantages which include:

speed to obtain real information of the data

extract high quality information from the fewer resolute data

accuracy of information that derives by the integration and probability of data

precision of data collection from one sensor to another sensor

clear set of information collected from the all possible order of data

Figure : PAT for continuous process control and improvements (Workman, 2005)

The analytical measurements have to present information to other systems in direct for the PAT to be practical. Raw data from sensors are progressed to information used to control the process (see Figure 1). As described by following flow chart, a new or existing process is measured for one or more key parameters by one or more real time sensors. A computational system collects the information from the sensors, which produce handy process parameter information from the raw data. As for example different graphical data converted to physical or chemical properties. Deterministic process models are produced from the processed information which defines the mathematical liaison among definite developed control factors (e.g. solvent concentration, mixing rate, drying temperature, pressure etc.) and the analytical information. Authenticate and derived processed model is then included in a process information management system (PIMS) as a part of a broader informatics control scheme. To control the process throughout the production time the processed model is used. A sensor network is installed to measure the process. So the analyzed sensor data provides information inputs for control of the manufacturing processes in real time.

According to Workman (2005), the successful purpose of chemometrics involves systematically design of experiments (DOE), proper application of preprocessing, calibration, diagnostics and rigorous prediction validation. And application of chemometrics determines the initial extent of DOE. PAT environment is very much benefited from the chemometrics which makes chemometrics more important in PAT. Following are the benefits of chemometrics obtained due to the combination of PAT illustrated by the Workman in FDA.

Monitoring: As chemometrics is shared with the PAT it shows the safer plant and process operating through the real time monitoring and avoiding the possible risky process distresses.

Compliance: It assured the process and plant environments are in compliance to environmental regulations.

Modelling: The process plant operability rises through modification of time and probably by means of real time data. It's also identified and derives the state of the process.

Control: Superiority of the product is enhanced during continuation of tighter control limits. It controls the vigorously influencing the process to sustain or attain a required situation. The product quality is raised by using the more strict production guideline which would followed by the design of experts. DOE is one of the important factors for successful PAT and chemometrics is essential for controlling the PAT.

Minimization of Waste: By using the process optimization the waste of the process is minimized. In conventional methods there are some sorts of waste occurred due to the testing during the process (In-process testing). But chemometrics minimize the waste by the using design of experiments (DOE) which comprises on-line, at-line and in-line techniques.

Cost Minimization: More accurate production schedule and tighter target limits minimizes the production cost. Due to design of experiment, the production schedule is strictly maintained where all of the testing procedure is follow the PAT and ultimately reduce the cost of production and process.

Eliminate sampling error: 100% compliance and analytical accuracy is obtained to eliminate the sampling error and attaining 100% compliance is one of the furthermost confronted. Sampling error is one of the error which are happened in the conventional system, as PAT is merge with the chemometrics which reduce the sampling error and provide accuracy in sampling and précised analysis.

The inherent character of spectroscopic data is equal to greatly superimposed signals from various chemical compounds jointed with a lot of related information. These characteristics and the information can be complex to remove by using traditional and frequently too simple univariate procedures. Prototype detection techniques have been used to handle data for decades. In recent times new methods, more specifically dedicated to data obtained from the IR, NIR, ATR, Raman spectroscopy have been concerned.  

With typical chemometrics methods like Principal Component Analysis (PCA) and multivariate calibration methods like Partial Least Squares (PLS) helpful multivariate patterns can be removed from the data. From these patterns relevant chemical information can be deduced and accounted if the model factors are used in the accurate technique. Eliminating outliers from very small data sets and understanding the model, choosing significant wavelength regions, trusting poorly validated data too much, categorizing when data are noisy are just a few instances of things that can cause troubles for the consumer of chemometrics.

Above benefits are provided by using chemometrics with the combination of process analytical technology (PAT).

Techniques involved in Chemometrics:

A major part of the purposes of chemometric techniques PAT environment falls in the common framework of pattern recognition. The classification is based on the different measurements. Marini (2008) illustrated the classification based on the discriminating among the similar group and dissimilar groups.

Pure classification techniques are mostly responsiveness in discriminating among the dissimilar groups. These are:

Linear and Quadratic Discriminant Analysis (LDA & QDA)

K-Nearest Neighbors (KNN)

Partial Least Squares-Discriminant Analysis (PLS-DA)

Back-Propagation and Counter-propagation Artificial Neural Networks (BP- & CP-ANN)

Support Vector Machines


Class-modeling techniques represent a different approach to pattern recognition. This classification is based on modeling the analogies among the elements of a class rather than on discriminating among the different categories. The most commonly used chemometric class-modeling techniques are



Other most useful Chemometrics techniques are used in the PAT environment i.e. in pharmaceutical or chemical measurement. These are:

Principal Component Analysis (PCA)

Principal Component Regression (PCR)

Partial Least Square (PLS)

Multivariate Curve Resonate (MCR)

Neural networks.

Partial Least Squares (PLS):

The calibration (i.e. correlation with the standards) model which referred to as the partial least squares regression is a technique produced, developed and liked in analytical sciences (Adams, 2004).It is a quantitative spectral decomposition methods which is narrowly linked with the Principal component regression (PCR). The difference between the PCR and PLS is including dependant variable in the data compression and decomposition operations. Different ways of decomposition is performed in PLS whereas the first decomposing the spectral matrix into a set of eigenvectors and scores. Regression the set of eigenvectors and scores against concentrations is a different step. Information of concentration while the decomposition is processed is done by the PLS. And higher concentration having more constituents is heavily weighted than the low concentrations in the spectra. So the scores and eigenvectors that are determined from the PLS are dissimilar with the PCR. The principal of PLS is to obtain more information regarding concentration from the first few loading vectors as much as possible.

In PLS technique the regressions are calculated with least squares algorithms. The aim of the PLS is to launch a linear link between two matrices, the spectral data X and the reference values Y. This method is modelling both X and Y in order to find out the variables in X matrix that will best describes the Y matrix. This can be described by the illustration of the spectra in the space of wavelengths in order to show directions that will be linear combinations of wavelengths called factors which describe best the studied property (Roggo et al., 2007)

There is similarities among the PLS, PCA and PCR. The concentration data of the constituents is added in the decomposition process while PLS is the concerned technique. The data obtained from the concentration and the scores are replaced as each new factor is added to the model.

Figure : Schematic of PLS (Sammon, 2011)

Both PCR and PLS are follow the one step process and there is no dissimilar regression step involved in these two techniques. Both spectral and the concentration data are decomposed all together and perform the PLS decomposition process. Calculation of the new factors from the model where the scores are superimposed and the factor that contributed from the raw data is removed. So then the new factors score might be observed. Calculation of the next factor is also done by the use of the reduced data matrices and the process is continued for several times to get calculate the desired number of factors. The complication of PLS is behind this step rather than the PCR. And this makes the PLS more complicated than the PCR. The resultant spectral vectors are closely related with the elements of attention which makes the PLS more beneficial. The most common spectral variations in the data that represents from the vectors which is completely ignore the relationship between the elements of attention and the vectors until the final regression step is quite unlike than PCR.

The vectors that generate from the PLS are directly related with the elements of interest than from the PCA. The constituents which are pure used to construct the data set are shown as spectra in left (see ). The middle column shows the first PLS vector (PLS-1) for each of the constituents calculated from the data set. And the right column shows the spectra of the PCA vectors from the same data set.

Figure : PLS and PCA vectors with respect to the original constituents (Sammon 2011)

The difference between the PLS-1 and PLS-2 methods is very important which effects on the results. The outcomes of both of the methods showed the spectral decomposition and provide one set of scores and one set of eigenvectors to calibrate and calculate the vectors are not optimized for each of the constituents. The accuracy of the prediction of the constituents is quite challenging for complex sample mixtures. A different set of loading vectors and scores are determined for each set of elements of interest in case of PLS-1. This different set of eigenvectors and the scores are specifically adjusted for each constituent that make this PLS-1 more accurate than the PCR and PLS-2 in respect to prediction of the elements of interest. The increased time of measurement might be significant while the training sets with large number of samples and constituents. When there is wide variety of constituents' concentration is encountered by using the PLS-1 shows the highest advantages in this system. That means while there is high concentration of constituents present in the system the PLS-1 method provide more advantageous and accurate information rather than the low concentration of constituents. The concentration of the constituents is more important to select the method whether PLS-1 or PLS-2. If the concentration of the elements neither are nor significantly difference then the selection of PLS-1 method is not advantageous. PLS-2 is the method of choice to calculate the elements of the interest. But the disadvantage is that PLS-2 take longer time to calculate the data.

Advantages of PLS:

PLS are the combination of both full spectra coverage of classical least squares (CLS) and the partial composition of regression of inverse least squares (ILS). CLS is also known as K-matrix method which extends the application of ordinary least squares as applied to a single independent variable. And ILS is also known as P-matrix method where the calibration models are transformed and the component concentrations are defined as a function of recorded response values ((Adams, 2004).

The decomposition and the regression steps follow the one step process. The eigenvectors are closely related with the elements of interest to a certain extent than the largest common spectral variation.

The calibration process is more robust and accurate to calculate the unknown sample which make the PLS more beneficial to determination of the unknown sample. The calibration process from the known sample is done to make the methods more reliable.

Very complex mixtures of the components also analyzed by the PLS method and the elements of interest should be identified before analysis.

Prediction of the elements of interest that are not present in the original calibration mixtures also be done by this PLS method.

PLS having the superiority to predict the elements of interest rather than other methods that were successfully applied for spectral quantitative analysis.

Partial least squares regression is provide much better outcome than the principal component regression. PLS-1 is more accurate, robust and reliable than the PLS-2.

Disadvantage of PLS:

Despite of wide variety of advantages are observed in Partial least squares regression (PLS) which also has some sort of disadvantages.

Partial least squares regression-1is quite slower process to calculate the data which might be slower than some classical methods.

The difficulty to understand and the interpretation of data make the methods more complicate.

A huge number of samples are required to calibrate the methods. Because it is advantages while the sample are complex in components concentration.

Difficult to obtain the calibration sample and it might avoid the collinear constituent concentration.

Chemometrics coupled with Spectroscopy:

Near-infrared spectroscopy (NIRS) is one of the most non-destructive techniques which have wide range of application in pharmaceutical industry as well different chemical laboratories. Chemometrics while associated with the NIR, it provides a powerful tool for the pharmaceutical industry (Roggo et al., 2007) and combination of both NIR and chemometrics open many prospective ways to quantitative and qualitative analysis (Reich, 2003). NIRS having a vital role during the time of production of pharmaceutical product and maintenance of quality control is more important role in this instance.

Near-infrared (NIR) Spectroscopy:

Near-infrared spectroscopy is fast, nondestructive techniques which provide multi-component analysis. It wraps the wavelength region near mid-infrared region to visible region. Recently, NIR spectroscopy has attained wide approval within the pharmaceutical industry for raw material testing, product quality control (QC) and process monitoring. The major benefits over other analytical techniques are making the NIRS quite popular in testing of pharmaceutical product and pharmaceutical industries are also now interested in NIRS.

The NIR spectroscopic region of the electromagnetic spectrum as the wavelength ranges from 700 to 2500 nm (Sammon, 2011). Overtones and combinations of fundamental vibrations of -CH, -NH, -OH (and -SH) functional groups are occurred in the NIR region and provides good absorption bands. The main factors which determine the incidence and spectral properties, i.e. frequency and intensity of NIR absorption bands are anharmonicity and Fermi resonance (Roggo et al., 2007)

Figure : Flow chart of the Near-infrared Spectroscopy (Reich, 2005)

The NIR spectroscopy composed of Light source, monochromator, detector diffuse reflectance and detector transmittance and sample holder (see ). Tungstain halogen lamp is used as a source of light. Silicon, lead sulphide and indium gallium are the common detectors used in the NIR spectroscopy. Silicon detector is much faster than others. It also shows low noise, small and highly sensitive to visible ranges of wavelength. Lead sulphide detectors are not faster like silicon though it having popularity due to its sensitivity. Indium gallium is the most expensive detector in NIR spectroscopy. Discrete filter photometers and light emitting diodes are used as a monochromator to narrowing the range of wavelengths and selected frequencies. Diffraction grating, interferometer and diode-array are used to provide wide range of frequencies to pass (Recih, 2005; Skoog et al., 2007; Watson, 2005).

Figure : NIR measuring modes (Reich, 2005)

Optical properties of the sample are important to measure the appropriate mode of NIR spectroscopy (see ). Transmittance is the measurable way for the transparent materials (A) and the turbid liquids, semi-solids and solid materials are measured by using diffuse transmittance (B), diffuse reflectance (C) or diffuse transflectance (D/E) depending on their properties (absorption and scattering) (Reich, 2005).

The commonly used chemometrics method to analyse the NIR spectra data (Roggo et al., 2007) are shown below:

Mathematical parameters

Classification methods which acted upon of the group of samples together according to their spectra.

Regression methods which used to quantify the properties of the sample.

Regression methods:

Construction of regression line is done by using Beer-lamberts law. The regression methods that are commonly used in the pharmaceutical field to analyse the NIR spectral data listed below:

Multi-linear regression

Principal component regression

Partial least squares regression

Artificial neural network

Support vector machine

Calibration of the NIR spectrometer is necessary before starting quantitative analysis to do multivariate methods (Reich, 2005). The calibration process includes:

Selection of a representative calibration sample set.

Spectra acquisition and determination of reference values.

Multivariate modeling to relate the spectral variations to the reference values of the analytical target property.

Validation of the model by cross validation, set validation or external validation.

Pharmaceutical Applications:

Physical Parameters:

The information of physical and chemical properties of the sample is described by the NIR spectra. Different type of pharmaceutical properties of the sample can be analyzed by using NIR spectra. The properties include hardness, solubility, dissolution, particle size, compaction force etc. Physical properties on powders and tablets can be determined by using the NIR spectroscopy in Pharmaceutical environment. PLS and MLR methods are well defined to determine the hardness of tablets. Formulation of the tablet is most important which varied the accuracy of the results (Morisseau and Rhodes, 1997). They observed that an increase in tablet hardness increase in upwards shift in the NIR spectra (see ) and the prediction of the NIR hardness were precise as laboratory test.

Figure : NIR spectra of CTM 6% tablets (Morisseau and Rohodes, 1997)

Figure : NIR spectra for a production of tablet and laboratory sample (Blanco and Alcala, 2006)

Blanco and Alcala (2006) have shown the option to expect the pressure of compaction on a laboratory sample by using a PLS mode. They observed that displacement of baseline of the spectra from laboratory sample is due to the tableting compaction pressure which produces increases in absorbance of each tablet (see ). They used PLS mode to analyze the tablet compaction pressure by using NIR as a spectroscopic techniques.

By using of various regression methods (linear, quadratic, cubic and PLS) allows following the percentage of the drug released in the medium by a theophyline tablet. The dissolution profile is determined by NIRS on six different times between 15 and 120 min (see ) (Donoso and Ghaly, 2004).

Figure : NIR spectra of theophyllin tablet in seven dissolution peak (Donoso and Ghaly 2004)

Figure : Calibration of percentage drug dissolved from tablet at 15 minutes Vs. NIR spectra by using partial least squares regression analysis. (Donoso and Ghaly 2004)

Figure 9 shows the validation curve obtained from the NIR spectra by using partial least squares regression analysis. Donoso and Ghaly (2004) concluded that the dissolution of drug is related with the compressional force of tablet which modifies the intensity of the reflected lights.

Polymorphs determination:

Dissolution of the final product might be altered by the any polymorphic form of the drug content. So polymorphic form of the active drug should be correct which is necessary to be a successful drug. And the detection of the polymorphs is important. X-ray crystallography is the widely used technique to determine the ration of amorphous and crystalline forms of products. NIR spectroscopy technique is also used to measure the detection of amorphous forms in crystalline forms. The limit of detection is lower in case of NIR than the X-ray crystallography (Roggo et al., 2007). Combination of different regression methods with the NIR spectroscopy was used to analyze several polymorphism and crystallization applications on different products. Seyer and Luner (2001) used PLS to determine the crystallinity of indomethacin by using reflectance Near-infrared spectroscopy. They observed an improved quantification by using PLS regression. Berntsson et al., (2000) used QPLS regression methods to determine the binary mixture of the powder where reflectance NIR used as a spectroscopic method. They finalized that quantitative analysis of binary powder mixtures can be simplified by using only a limited number of calibration samples. A linear multivariate calibration model can be used in most cases where the powder particle sizes are relatively uniform and fine (<300 Î¼m) but non-linear calibration is sometimes necessary, in particular if there is a large difference in particle size between the two powder components or if the content range is large.

Moisture content:

Moisture content is very important for the pharmaceutical product. Accurate amount of moisture should be available in the pharmaceutical product to develop a successful drug. Presence of water in the pharmaceutical product is the key to stability of the product. So the determination of moisture content is one of the most important applications of NIR spectroscopy. Signal of water can be determine from the NIR spectral ranges from the 1450 nm to around 1940 nm. NIR spectroscopy is used to determine the water content from the granules, tablets and capsules (Roggo et al., 2007). There are several multivariate regression method are used to analyze the water content in the pharmaceutical dosage form like tablets, capsule and powders. Partial least squares regression has been used by the several researchers to determine the water content in the pharmaceutical product. So determination of the moisture content is done by using the different regression methods with the NIR spectroscopic techniques which provides a accurate analytical outcome in the spectral analysis.

Content determination:

Determination of chemical compound content in pharmaceuticals is the matter of concerns for its importance. The content of the compound in the pharmaceutical product is necessary to obtain the pharmacological responses of the drug. So it's been always challenging to determine the content of the chemicals. NIR spectroscopy is the technique of choice which combined the multivariate regression method to analyze the content of the compound. Chalus et al., (2005) experienced that partial least squares regression is the best prediction to determine the active substance content in the low dosage tablets by using NIR rather than the principal component regression.


Lyophilisation is a broadly used technique for the formulation of a huge range of pharmaceutical products especially vulnerable to degradation in aqueous solutions like peptides, proteins or complex organic molecules. The objectives of lyophilisation are to manufacture materials with superior shelf stability and which are unaffected after reconstitution with water. Determination of the residual moisture content from the glass vials is the first NIRS application. The benefit of this technique in contrast to the conventional methods like Karl Fischer titration (KF), thermogravimetry (TG) or gas chromatography (GC) is that it is quick, non-invasive and non-destructive. It keeps away from opening the vials and risking a defect from the distinctive moisture which can result in fault in the determination of remaining water content. NIR stands for thus a different approach for the quality control of lyophilized pharmaceuticals (Roggo et al., 2007). NIR is the suitable for the determination of residual moisture content in lyophilized sucrose (Kamat and DeLuca, 1989). The purpose of NIR is not only for the determination of the residual moisture content but also investigation of alteration in product construction like cake porosity, cake dimensions and excipient-to-protein-ratio might influence the accuracy of NIRS residual moisture content calculation (Lin and Hsu, 2001). Curve fitting analysis and PLS regression models have been developed to enumerate both hydrate and surface water content in lyophilized product.

Powder blending:

Blending of powder is the crucial step to manufacture the pharmaceutical product. This blending step is mainly done among the API powders and the excipients that are necessary to prepare a pharmaceutical dosage forms. So blending step possesses a major role in the pharmaceutical analysis prospect of view. Without a homogenous blend of the API and excipients it is impossible to get a uniform dosage form. But determination of the blend homogeneity is problematic. Conventional methods like HPLC and UV-VIS methods are widely used in the determination of the blending. But these methods are destructive, costly and time consuming. So to minimize these problem fast, non-destructive methods was introduced to do analysis of the blending. NIR spectroscopy is the first choice to carry out the analysis. It is also advantageous which observe all content of the powder mixture not only specifically the API. So the NIR spectroscopy provides wide verity of benefits to analysis. Some research shows that NIR spectroscopy applications on a complete manufacturing process to allow the real time release of the products. Partial least squares regression is the better prediction than principal component analysis for these analytical purposes.


Drying is also a critical step for the pharmaceutical manufacturing. It is employed in progression such as granulation or lyophilisation. In conventional way the quality of the product is to determine form the in-process check where a small portion of the sample i.e. representative is used to clarify the all the sample. If the representative sample is pass then it's assumed that the entire sample also having same quality. But it does not guarantee that the entire product is controlled. So an uncertainty is surrounded throughout the products. To minimize the uncertainty a new system is established where quality control check is done by the design of the experiments. So the chemometrics with the NIR spectroscopy where PCA and PLS both are used to resolve the analysis of product after drying. Here PCA is used for the identification of the products and PLS is used for the quantification of the pharmaceutical product which reveals the 100% control of the quality of the product.


The very last step of the pharmaceutical manufacturing is the packaging of the entire product. NIR spectroscopy is also used to show the 100% packaging of tablets. A principal component regression analysis is built to sort out the quality of the product.


Near-infrared Spectroscopy is the one of the most widely used technique in the pharmaceutical, biotechnological and different chemical laboratories. This superiority of the NIR spectroscopy than the other techniques is due to its range of the wavelength which is lies in between the 700 nm to 2500 nm and lots of the pharmaceutical chemical are absorbed and scattered in this region. Faster, accurate, reliable, precise and the cost effective are the most important parameters to be consider to select the spectroscopic technique to analyze the pharmaceutical products. The reagent that are used in NIR spectroscopy techniques are not hazardous which also an important factor to be consider. And the ease of use is the most important to select NIR. The combination of the multivariate regression methods i.e. chemometrics also made the process more superior than others. PCA, PLS, PCR are the most commonly combined with the NIR spectroscopy. Perhaps all of the methods similar with slight difference but Partial least squares regression is quite beneficial than others. PLS-1 is the more accurate than the PLS-2.