The Introduction To Chromatography Health And Social Care Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Although genome sequencing elaborates and is the complete and most widely accepted blueprint of the genetic make-up of any given organism, yet, this information (i.e. genomics) is insufficient in itself. This is due to the fact that gene expression or phenotypic characteristics is the combination of genetic make up of an organism and its interaction with the environment around it. One of the prime and most widely accepted techniques of analysing the expression of phenotype of any organism is chromatography. Chromatography, in its most basic form, is defined as the technique using which various components of a mixture are separated by simple distribution amongst two phases- a stationary and another mobile [1].

Proteins are made up of one or more than one long chain of AA (amino acids). Expression of protein is a sub-category under gene expression. Once the DNA (DeoxyriboNucleic Acid) has transcribed to mRNA (messenger RiboNucleic Acid) and then translated to a polypeptide chain and then folded into "proteins". Protein expression is useful for the study of proteomics- wherein the cellular level expression of the proteins is analysed. Chromatography finds an active usage in this as the cells in very basic form, can be said to be solid-liquid mixtures. Various forms of chromatography, thus, is used to identify and analyse the expression of the genes after interacting with the environmental factors by using the proteomics level study. However, this field of study, wherein the analysis of the metabolites that are present in either an individual cell or a mass of cells (tissues) at a particular given point of time is carried out quantitatively and qualitatively is termed as metabolomics [2].

Thus, chromatography is one of the most vital links to the three-tier analysis of biological expressions of a living system's conditional perturbation at levels of transcriptome, proteome and metabolome [3]. Chromatographic techniques like Gas Chromatography (GC), High Performance Liquid Chromatography (HPLC) or Liquid Chromatography Mass Spectrometry (LC-MS) are employed for the separation of the metabolite components of the desired sample and the results are then analysed using various software.


In year 1910, M. S. Tswett, who also coined the terms "chromatography" and "chromatogram", used chromatographic method for the first time for the segregation of various pigments in leaf [4]. "Chromatography" simply means colour writing [4]. The technique attracted rapid attention and then the applications included separation of carotinoid pigments, among other uses. Chromatography, being physico-chemical method of the separation of components based on their structural and molecular differences, has had a wide range of applications in biotechnology too. The wide applications have, however, been brought upfront only due to the development of sensitive detectors and software to analyse the test results which took place between the middle to late twentieth century [5].

The basic working mechanism of chromatography involves two phases- (1) stationary and (2) mobile. The stationary phase is attached by chemical bonds to a surface and is found to be fixed in a column whereas the mobile phase is either one of gas or liquid which so termed as it is not bound to anything. The process of separation of the various components of the system depends predominantly on the difference in distribution coefficient [5]. The mobile phase (or the mixture of components to be separated) is constantly fed to the set up, and it is the difference in the amount of time a particular compound (of the mobile phase) stays in the coloumn (or, stationary phase) that is then read in the form of a chromatogram. Chromatograms are graphs of intensity v/s the retention time (in minutes). There are a number of components to the graph which need to be analysed by the reader and it is where the role of software comes into play.


High performance liquid chromatography (HPLC), gas chromatography (GC) and capillary electrophoresis (CE) are all important and well established methods which are involved in the separation of complex mixtures of metabolites [5]. The working methodology of both gas and liquid chromatography is very much similar as they both comprise of three basic components-

an injector (to introduce and supply the mobile phase

a column (made up of the stationary phase), and

a detector (to show the results in the form of chromatogram)

The entire mechanism of the instrument is controlled by proper software which is integrated in computer systems, but the analyses are often carried out separately. The sugars, AAs, organic acids of less molecular weight and other metabolites with polarity are protected while injecting into the system by using chemical alterations like methylation or silylation. This increases upon the volatility.

Fig 1: Main components of a gas chromatograph [5]

The injector or the injection unit is one of the most critical components of the entire chromatograph. It is this point that the mixture (either liquid or gas) to be separated is introduced. If there is any one or all of the problems like entry of foreign substances or improper evaporation of the mixture and etc occurring at this stage, the entire run will be unsuccessful. A number of factors like the proper evaporation of the components of the mixture, irrespective of its volatility is of most importance and has seen a radical change in the recent times [6].

The column comprises of the stationary phase of this system. It is usually made up of components which aid in the easy transfer of components of the mixture to be separated. The temperature of the column is monitored from a computer aided heating mechanism. It is very important that proper temperature is mentioned inside the column, or else the components of the mixture will not be separated properly and can also damage the entire unit.

The third mentioned component of a chromatograph is detector. Multiple detectors are used in a chromatograph- (1) sensitive and (2) selective. The detectors have had three major modifications. From being simple optical-based, i.e. using ultra violet (UV) rays, photo diode arrays, or fluorescence induced by laser [7, 8]. Nuclear magnetic resonance (NMR), which is an alternative, is seen to permit structural elucidation amongst the various compounds present in the mixture, but, needed a high volume of samples for results to be begotten [9], and finally, mass spectrometry (MS) detector units. These units are considered to be of utmost importance from the metabolomics point of view [10].


Mass spectrometry was given shape in the early years of the twentieth century. J. J. Thomson, a British physicist, in 1910, proved the presence of two isotopes of nominal mass 20 and 22 in Neon- a noble gas. The basic idea behind mass spectrometry is the fact that ions, once generated from organic or inorganic compounds, can be separated based on their mass-to-charge ratio (m/z). "m" stands for mass number and "z" stands for elemental charge.

Fig. 2: Steps in Mass Spectrometry

MS is much used in the study of metabolomics owing to its high sensitivity and ability. This method is capable, in itself, to confirm the identified components.


A mass spectrometer, just like a chromatograph, has a very basic and simple set-up. There are three major components, (1) ion source, (2) ass analyser, and (3) detector. Ionization of the compounds is a very important precursor to the entire process of MS, which is achieved either thermally, by using electromagnetic fields, or by the impact of electrons or ions with the analyte at a high speed. The generally positive ions are single ionized atoms, clusters, molecules or the fragments or associates of them. The segregation of ions is usually carried out by either static or dynamic electromagnetic fields or by time-of-flight analysers [11].


This mass analyser is based on the first Law of Motion. If ions are made to travel a fixed distance, being accelerated with the same amount of energy, the time taken depends on the m/z ratio.

Fig. 3: Schematic representation of a TOF-MS with reflectron [12]

The high scan speed of TOF-MS credits it to be the fastest MS analyser available in the market now [13]. It is also credited to have the highest practical mass range of all the available analyzers [13]. The TOF instruments used in today's world is known for its tremendously high mass accuracy (which is down to around 3-5ppm), thus, making it very suitable for compound identification [14]. The more specific the identification, the better is the performance of the tests, thus the modern TOF instruments find a lot of takers [14]. This accurate quantification of isotope peaks is predominantly owing to the wide linear range [14]. The accumulated expression of accurate mass and the correct quantification of isotope distribution help in the preparation of theoretical molecular formula, which is itself used in the specific analysis of unknown components [15]. However, all these advantages of the TOF are minimised as the instrument is very sensitive to changes in temperature and needs frequent re-calibration with a standard solution on a regular basis, in order to obtain the highest mass accuracies [15].

Thus, TOF is an ideal instrument for quick scanning and analyses of volatile molecules with less mass [16] such as proteins, polysaccharides and even eptides, as it comprises of a processing cycle which involves bundling, accelerating and the detection of ions in a flight tube which is evacuated [16]. This entire procedure takes few milliseconds to get over. Another of its major disadvantages is the fact that it utilizes the sample molecule completely.


Quadrupole detector or QUAD is an instrument with four rods or poles and thus the name. It is used for the analysis for presence of and quantification of ions in a given test sample. The quadrupole ion filter is made up of four rods in parallel which are operated electronically with an application of fixed DC (or, Direct Current) and alternating RF (or, Radio-Frequency) voltages. On the basis of the electric field produced, only the ions of any particular m/z will pass through while deflecting all the other ions into the four parallel rods [13]. The ions passing through the filter at every one of the successively monitored masses are kept a record of and then counted by the detectors before generating mass spectra. Since the quadrupole ion filter has tremendous reproducibility and is seen to be highly stable, these find usage in routine analysis in laboratories and industry alike [13].

However, the major drawback of this instrument is its unit mass resolution [17]. For this very reason, this instrument has been rendered to be unfit for the detection of unknown or undocumented compounds [17]. The mass range of this instrument is very limited too. Owing to the limitation of its mass range, this instrument can be used to analyse ions to an m/z ratio in the range of 3,000 to 4,000 [17]. This instrument finds immense application in metabolomics, wherein multiple analyses needs to be carried out on every sample in order to cover the total mass range with adequate sensitivity [17].

There is a variation of this instrument- Triple Quadrupole Detector available as well. In them, three quadrupole detectors are aligned in an array which helps in a couple of filtering steps, which aids in the removal of all the unwanted components from the complex mix and thus the required metabolites can be easily extracted [18].

Fig. 4: Scan modes in a Triple Quadrupole Instrument [18]

As is evident from the pictographic depiction, the second quadrupole (or, Q2) works like a collision cell wherein the analytes identified in the first quadrupole (or, Q1) are fragmented. These fragments are then selected and analysed in the third quadrupole (or, Q3).

The Triple Quadrupole is an automatic choice while dealing with large molecules or in cases where there is a requirement of sensitive quantification [19].


A ring electrode and two endcap electrodes make up the ion trap (or, IT). 3D quadrupolar potential field is generated by the application of alternating current of variable amplitude to the ring electrode [20]. This 3D quadrupolar potential field is produced within the trapping cavity, thereby trapping ions in a steady trajectory of oscillatory motion [20]. While detecting, the potentials of the electrode system are varied to generate instabilities in the trajectory motion of the ions on the basis of their m/z ratio in the axial direction [20].

Accumulation of a number of ions within trap and the possibility of carrying out multiple sequential MSn or MS/MS experiments facilitating structural elucidation of large molecules makes this instrument very sensitive and highly advantageous [20]. Another, very unique feature of this instrument is the capability of analyzing secondary fragments which arise from the collected and stored 'primary ions'.

However, the quantification of the ions is not very well carried out by this instrument [20]. The analysis of the data produced by this instrument is also not easy to carry out as the trapped ions are bound to express the effects of space charge and the effect of ion-molecule reactions [20].


Linear Ion Trap (or, LIT) has a similar design as a quadrupole mass analyzer, however, the LIT consists of an additional static electrical potential which can be applied on the end electrodes in order to axially confine the ions [21]. It is superior to the 3D-Ion Traps as it consists of much higher efficiencies of injection and is capable of storing more ions [21].

The LIT finds ample usage in the various applications which demand low limits for detection along with high robustness and even for MSn or MS/MS characterization [22, 23]. Belov et al., in 2001 [24] reported a dynamic range of as many as five orders of magnitude with detection limit being ~10zmol (~6,000 molecules) [24].


In Fourier-Transform Ion Cyclotron Resonance (FT-ICR) mass spectrometer, the ions are seen to be entrapped in a stable orbit in a particular electromagnetic field [25]. The orbiting (or, free) ions are charged by resonant radio frequency signals. This produces a current of images which can easily be detected [25]. All the ions having different m/z ratios are all excited in tandem by using short RF impulse and then the evolution of the resultant signal is procured and stored for analysis [25]. Since it is quite easy to calculate frequencies, high mass resolutions are also very comfortably achieved [25], which in turn helps in identification of various compounds which are not yet documented [25]. However, Kind and Fiehn in 2006 [14] proved that for molecules of large mass, there is a requirement of added information on the patterns of the isotopes in order to accomplish proper identification [14]. This apart, ion excitation can also be sued for experiments involving MSn or MS/MS and even selective method of ion ejection [26].

Limitation in the dynamic range, the complication of mass spectrum and even the difficult handling method and its overall cost and high maintenance of the FT-ICR makes this instrument very complicated.


This is a combination of two kinds of mass analyzers. Triple Quadrupole- Linear Ion TRAP (or, QTRAP) is basically a triple Quadrupole wherein the third quadrupole is made capable to be used as a Linear Ion Trap. Due to the capability of trapping accumulated ions, compared to a Triple Quadrupole, the QTRAP is seen to be highly sensitive in terms of the mode of scanning [27]. With the activation of trap function, MSn can also be carried out by this mass analyzer [27].


The Quadrupole time-of-flight (QTOF) was created by combining two quadrupoles with a time-of-flight mass analyzer capable of scanning all the ions coming out of the collision cell. This kind of analyser not only gives a very accurate measurement of mass, but also provides information about the structures too [19, 25].


An important aspect of the entire study of mass spectrometry is its use and incorporation in the field of metabolomics. And, on the contrary, metabolomics is widely appreciated as an important form of study because of the role played by mass spectrometry in various identification and quantification methods used in metabolomics. In this chapter, we will discuss about the role of mass spectrometry in the study of metabolomics in details by citing certain examples of the various kinds of techniques involved.


Metabolomics is the field of study which involves the analysis of the various chemical processes involved in metabolites. Daviss in 2005 [28] defined metabolomics as the systematic study of the unique chemical fingerprints that specific cellular processes leave behind [28], i.e. the study of the small molecule metabolite profiles. Even though the use of very specific and elaborate separation techniques is employed, still, detectors play a major role. Detectors provide information with regards to the compounds which are identified in a given test mixture. Mass spectrometry and Nuclear magnetic resonance (NMR) spectroscopy are the main methods of detection used in metabolomics.

Simple chromatographic techniques are found incapable of providing the necessary information with regards to the compounds present in any given test mixture. The information with regards to the retention time of any unknown compound, though matching the retention time of a particular standard, in itself is not considered data enough to identify the unknown compound correctly. This is due to the fact that there are a number of compounds and merely the retention time is not considered data enough to identify unknown compounds in test mixtures. It is thus, mandatory to use additional information for the same, and MS plays a major role in providing more insight into the same.

Mass spectrometry is used for the identification as well as for the quantification of various metabolites after they have been separated by the various separation methods (like Gas Chromatography, High power Liquid Chromatography or Capillary Electrophoresis). MS is a very sensitive and a very specific method of detecting the presence and identification of unknown compounds.

The first approach towards this step was reported as early as 1958 by coupling of gas-chromatography to mass spectrometry (GC-MS). Chromatographic detectors provide the researchers with a chromatogram representing the flow of mass eluting from any particular chromatographic column. A mass spectrometer, as a detector, scans m/z range of interest repeatedly during a chromatographic run thus establishing a relationship between the chromatogram and the mass spectra of the eluting components.

Issues with the result may be feared if the separation of ions is improper or partial and the analyte to be identified still remains as a part of the mixture. However, the begotten mass spectrum carries ions from all the components of the test mixture and this helps in identification of the unknown compound, as, many compounds with same or nearly same retention times will still have very distinct mass spectra and thus can be easily identified.


Gas Chromatography in combination with Mass Chromatography finds ample usage in the analysis of metabolites because of its very high efficiency in separation of the various components of biological mixtures [29]. GC-MS is a cost efficient and easy to operate method for identification and analysis of a wide range of metabolites present in a mix. Not only a wide array of volatile compounds but also many semi-volatile metabolites can all be analysed using GC-MS [29].

GC-MS analysis is usually carried out on quadrupole MS, which provides nominal-mass information [29]. TOF instruments, having the potential to profile complex mixes even in very minimal amount of time, are seen to support with higher scan speeds and are noted to be compatible even with ultrafast GC-MS [30].

One of the major limitations of this method, however, is the fact that the samples are best analysed if volatile. With a large number of metabolites not being adequately volatile to be directly analysed by a GC instrument, an intermediary step gets involved in the process. This derivatization of the non-volatile metabolites not only elongates upon the total time of the experiment but also increases upon the cost of the procedure [29, 31]. Derivatization is carried out by using silylation or acetylation [31]. This decreases the polarity of molecules which leads to an improvement in volatility of the compound. Due to the high sensitivity of the GC-MS, even trace elements can also be analysed and is best suited for metabolomics.


In case of GC-MS, almost all compounds which are capable of passing through the column are seen to be ionisable, which in turn makes them easy to analyze using the mass spectrometer. Liquid Chromatography is a basic technique to separate the components of a given biological or chemical mixture. This technique when combined with Mass Spectrometers, mass spectral data is obtained and thus LC-MS (Liquid Chromatography- Mass Spectrometry) gives valuable information such as molecular weight, structure, quantity and even purity of a given sample. This technique finds ample usage, thus, in both qualitative as well as quantitative analysis of samples [32].

The main advantage of LC-MS over a GC-MS system is that neither the compounds need to be altered at their chemical level before analysis, nor is this technique case-specific. Various compounds ranging from being highly polar to being thermo-stable and even having high molecular weight are all processed and their constituents can be detected by LC-MS [32]. Moreover, a number of columns and various elution methods are easily available for the easy and case-specific separation and identification of the components in every compound.


Capillary Electrophoresis is the best option to segregate the polar and even charged particles, as, by this process, the components of a mix are separated simply on the basis of the charge-to-mass ratio [33]. Capillary electrophoresis provides supporting information with regards to the make-up of biological samples too [33]. Capillary electrophoresis technique is faster and seen to be more efficient and better separation technique as compared to LC. The use of costly and time consuming sample pre-treatment is avoided by using this technique. Simple fused-silica capillaries are used in the technique, instead of the costly columns used in LC, thereby further reducing the cost of analysis using this technique.

The major drawback of this technique, however, is the fact that the concentration sensitivity of capillary electrophoresis is very low, due to the limitation in the amount of sample being introduced in the capillary for analysis. However, this limitation can be overcome by the combination of Capillary Electrophoresis with Mass Spectrometry. CE-MS (Capillary Electrophoresis- Mass Spectrometry) is widely known as a very useful complimentary technique for the study of metabolomics. The true potential of CE-MS has been identified only in the twenty-first century. Monton and Soga, 2007 [34] have published a well documented review on the improvement of the CE-MS and its role in the study of metabolomics [34].


In case of direct injection (or, infusion) Mass spectrometry (DIMS), the sample/s is directly injected into the mass analyser without any treatment. This direct infusion provides a very fast and high-throughput analysis of unprocessed (or, crude) samples. The sample is usually delivered to the mass spectrometer at a very low flow rate. Since an analytes' molecular feature is identified only using the data related to its mass, high-resolution mass detectors, such as Time-Of-Flight Mass spectrometer (TOF-MS) or Fourier-Transform Ion Cyclotron Resonance mass spectrometer (FT-ICR-MS) are usually employed [35].

Compounds isolated using DIMS cannot be discriminated on the basis of the isobaric metabolites and the matrix effects hamper with the quantitative and qualitative identification [36]. However, since it aids in high-throughput analysis of biological compounds, this is a very significant tool in modern day medical screening. Rapid and accurate secondary metabolite profiling of various fungal and plant species adds to the advantages of this combination. Moreover, the metabolic errors in the neonatals can also be screened very easily by using DIMS [29].


Matrix-assisted laser desoption (or, ionization) in combination with mass spectrometry (MALDI-MS) is one of the most commonly used techniques in proteomics. In recent times, it has found active usage even in the analysis of small molecules. The conventional (or, basic) matrices are found to be much more suitable for the ionization in positive mode too [37, 38]. This was rectified by the use of a 9-aminoacridine (9-AA) matrix, which assists in yielding a low background and high sensitivity even in negative mode [37, 38].

MALDI-MS, however, has a major drawback- it cannot be used for quantitative analysis of samples. Alteration in the concentration of even one analyte is seen to negatively influence the quantification of the metabolites. This is a major drawback faced while using MALDI-MS in case of metabolomics [37, 38].


In order to gain knowledge about the overall behaviour of any given network, the analysis of metabolite compositions of a biological system is not enough. The in vivo rates of reaction is calculated by using metabolic flux analysis, which indirectly measures the reaction rate of the various metabolic activities taking place in the network [39]. A number of varied strategies have been employed to analyse quantitatively the intracellular fluxes, and the various metabolic networks, by using mass spectrometry [40].

Feeding cells with a 13C labelled glucose (substrate) followed by the analysis of label distribution using GC-MS is a commonly taken approach to quantify the intracellular fluxes [40]. Few reports also suggest the use of LC-MS based approach with regards the flux analysis. Using LC-MS, the step of derivatization is avoided and is thus cost efficient.


Metabolomics or metabolic profiling is a key area of research which is helping scientists carry-out a number of scientific works in the field of functional genomics, toxicology, nutrigenomics, to name a few [41]. Some of the major applications of metabolomics are enumerated as follows:


One of the most widely used area of study under metabolomics is toxicity assessment or toxicology and drug development [41]. Metabolic profiling provides a very rapid and non-invasive toxicological analysis, which is easily reproducable and is noted to require the technical data which is already available [42]. The observed changes are usually related to organ specific syndromes, such as specific lesion in the kidney or liver [43], which is of much interest to the pharmaceutical companies which want to analyse for the toxicity of various potential candidates. If the levels of toxicity expressed by any particular compound is prior known, the charges incurred by any pharmaceutical company for carrying out clinical trials can be cut down drastically. Metabolics can also be useful in the near future to produce tailor-made medicines or pharmacogenomics [42]. This will help start personalized therapy on a large scale.


One of the most unpredictable factors of the study of genetics has been the ability of genes to mutate due to codon insertion or deletion and etc [41]. Metabolomics can be very instrumental in detecting the cause of these gene manipulations [41]. This analysis of gene manipulation is quite an achievement in itself, however, the modern day life science researchers are using this technique to predict unknown mutations by comparing metabolic perturbations caused by deletion or insertion of known genes thus altering the overall functionality [41, 44].


Nutrigenomics is a general term which combines the role of nutrition to genomics, transcriptomics, proteomics and metabolomics [41]. Diet is seen to not only influence the well being of an individual but also maintain the healthy status. Metabolomics is used to determine the balance of age, gender, body composition and even genetics which in turn reflects the metabolic fingerprint of the individual's metabolism [45, 46]. This aids in deciding upon various factors such as the quality and quantity of food taken during one's life time.


Metabolomics is bound to play a very strong role in the study of aging or gerentology [41]. This will not only help address the issue of the aging process at clinical level but also help research at scientific level too [47]. Metabolomics will help produce results of the interaction of an organism with its surrounding environment and the health concerns arising with aging will also be dealt by using the study of metabolomics [41, 47].


Morris and Watkins, (2005) [48], recenlty published a paper with egards the role of lipid profiling and the impact of metabolomics in the proces of the development of drugs. The entire drug or pharma industry is very heavily dependent on the advances of metabolomics as this improves upon the decision making and quality of the produced drug [48].


MS is commonly hyphenated with the separation technique of chromatography in metabolic studies in order to provide another completely new dimension of separation of samples and also improve upon the signal-to-noise ratio. Metabolomics involves the qualitative and/or qualitative analysis of a number of metabolites in a complex system [49]. Since most of the metabolites, by nature, are polar, their separation by chromatography and further analysis presented a challenging situation, however, hydrophilic interaction chromatography (HILIC) has become a very useful and widely acclaimed tool for the analysis of such molecules with polarity [49].

However, LC-MS is a very widely used configuration which carries out liquid chromatography separation; MS then detects and analyses the results. LC-MS finds active usage and edges out GC-MS as it can analyze even the non-volatile metabolites without carrying out the step of derivatization and thus has wider metabolome coverage. LC-MS has, thus, long enjoyed the position of being a platform for metabolomics research owing to its high throughput, soft ionization and good spectrum of metabolites. An LC-MS dependent metabolomics study is predominantly dependent on various steps involved in the process like multiple experimental, analytical and computation.

After a sample is introduced (or, injected) into the LC-MS instrument, it first undergoes separation by using a liquid chromatography method. During the process of separation, the introduced sample is first dissolved into a fluid state (called the "mobile phase" of chromatography) and then the sample-carrying fluid mix is forcefully (by using high pressure) made to go through a column which is packed with small particles, which, for example, can be porous monolithic layer, or porous membrane (called the "stationary phase" of chromatography). Different constituents of the unknown mix are seen to elute out of the column at different times. The specific time at which a compound is noted to elute out of the column is called the retention time of that particular compound. Retention time of almost all the metabolites is unique as the strength of interaction of the metabolites with the stationary phase is not the same.

Although all mass spectrometers consist of three basic modules- (1) ion source, (2) mass analyzer, (3) detector, depending on the particular technology involved, there are a number of different types of mass spectrometers. The ion source changes the compounds which are electrically neural (in the sample) to charged molecular ions. This conversion is usually achieved by electrospray ionization (ESI). ESI is also termed as "soft ionization" approach, wherein itact molecular ions are formed, and this helps in the initial identification of the metabolites present [50]. Since metabolites are seen to have very diverse chemical properties, the biological samples are often required to be analyzed in both, positive (+ve) and negative (-ve) ionization modes in order to maximize the coverage [51].

Mass analyzers aid in the separation of ions based on their m/z value by the application of electrical fields or magnetic fields on them. Most commonly used mass analyzers are quadrupole, ion trap, time-of-flight (TOF), Fourier transform ion cyclotron (FT-ICR). Almost all of them follow the same principle. There are some hybrid instruments such as Quadrupole Time-of-Flight (or, QTOF), triple-Quadrupole, and etc. too, which are used at tandem.

Detectors convert the abundances of ions from the mass analyzer to electrical signals. This is done by recording the charge induced when an ion passes through the detector module.

Since the samples are segregated using both liquid chromatography and mass spectrometry, the result produced from an LC-MS based metabolomics research is a 3-dimensional signal.


The analysis of LC-MS based metabolomics data is seen to involve three main steps: (1) pre-processing, (2) statistical analysis, and (3) identification of metabolites. The step of pre-processing converts the raw data obtained from LC-MS measurement into a documented list of peaks which can be compared to multiple samples and standard. This is followed by the statistical analysis step, wherein subset of peaks is identified, which represents phenotypic alterations of the given biological sample. Metabolite identification is then seen to search and identify the observed peaks by using a side-by-side comparative analysis of the biological sample and then verify and authenticate the compounds.


In order to convert the raw data obtained from LC-MS into a peak list, the following multiple pre-processing steps are needed to be performed:

Outliner Screening: This step is carried out in order to eliminate the LC-MS peaks which show an unexplainable deviation from the bulk of replicates (both, analytical as well as biological). The variability of LC-MS is not totally avoidable, yet, the outlier runs or peaks (with an excessive amount of bias) need to be removed from further analysis.

Filtering: This is used for the removal of the unpleasant noise and contaminants from the LC-MS data. One of the most important functions of filtering is to minimise the noise and yet preserve the peaks in the data. Savitzky-Golay filter can be used to minimise the noise while keeping the peaks undisturbed from LC-MS signals [52].

Baseline correction: Low frequency baseline often disturbs and causes miscalculations. By using this step, the low-frequency baseline is identified and then reduced from the raw data obtained by the LC-MS. Baseline shift is often observed due to the fact that the baseline of the intensities is increased with an increase in retention time. An elevation in baseline produces over-estimate of the intensities of the late eluting analytes. A low-order Savitzky-Golay filter can be employed to get rid of the baseline from LC-MS signals [52].

Peak detection: This is a transformation of the raw data obtained continuously from LC-MS into centroided discrete data such that each ion is noted to be represented as a peak. This transformation offers two very important advantages, namely: (1) a portion of the noise in the continuous data is curtailed, (2) data dimension is minimised without much loss of important information. In general, peak detection is carried out in a two-step process, the first being calculation of centroids of peaks over m/z range and the second is searching across retention time range for the chromatographic peaks.

Peak alignment: This step allows for the comparative analysis of the LC-MS based metabolomics raw data across the samples. Retention time of an ion even in case of the replicates of the same sample is known to vary, which is not uniform and is not possible to control completely during the experiments. This needs to be controlled and the off-shoot peaks have to either be removed from the raw data obtained or straightened.

Normalization of peak intensities: Normalization of peak intensities helps in with reduction of the systematic variations in the raw data obtained from LC-MS. A simple, yet costly method, for normalization of peak intensities is by adding same amount of internal standards into the samples. For normalization of peak intensities without the use of internal standards, normalization to osmolality and normalization to "total useful MS signal" are suggested by Warrack et al. (2009) [53].

Transformation of LC-MS data: The transformation of LC-MS data is needed to sometimes alter the distribution of raw data such that it is found to be more suitable for the statistical analyses which are subsequently carried out. Transformations, for example, leading to a more normal-distributed dataset is often preferred. The dynamic range of the raw data produced should also be compressed before statistical analysis is carried out.


After the pre-processing step, the raw data obtained after LC-MS is summarized by a peak list. This statistical analysis step is carried out to identify those peaks which have significantly altered levels of intensity between distinct groups of biological systems. The analytical methods used in this analysis can be bifurcated as: (1) univariate analytical approach and (2) multivariate analytical approach. These are discussed further.

Univariate Analysis: The most common statistical test or analysis for significance in terms of statistics, which regards the distinction between two groups is t-test which gives a P-value thus showing the distinction between the mean of the groups and whether it is statistically significant or not. If the total number of samples for the t-test is not very large, then the possibility of having many false negative and false positive results arise [54]. Sometimes, it is also observed that a number of metabolites do not give any statistically significant results individually, but in a combination, i.e. using multivariate statistical analysis, significance can be observed. The fold change between the metabolites is calculated by calculated by plotting a graph between the fold change versus the P-value from t-test carried out [54]. This plot is also known as "volcano plot' and it is used to compare the size of the fold-change to the level of statistical significance [54].

Multivariate Data Analysis: Multivariate data analysis is seen to combine the effects of multiple variables while getting results. It is further divided into two categories, namely- (1) unsupervised technique and (2) supervised technique. Unsupervised technique involves the methods which identify hidden structures in the data without the knowledge of class labels. One of the most commonly used unsupervised techniques in multivariate data analysis of LC-MS based metabolomics study of raw data is principal Component Analysis, or, PCA [54]. Supervised technique, however, uses class label information to construct a statistical model to interpret the raw data obtained by LC-MS. Partial least square-discriminant analysis (PLS-DA) is a widely used supervised technique which statistically analyses the raw data generated by LC-MS based metabolomics study. These are further explained as:

Principal Component Analysis: Principal Component Analysis is a data transformation method [54] which is employed to decrease multidimensional data sets to a very low number of dimensions for future analysis [54]. In PCA, a data set of inter-related values or variables is transformed to a new set of variables, which is called principal components [54] in a manner such that they remain uncorrelated and the first few of the principal components retain almost all of the variations already present in the whole data set [55]. PCA is an unsupervised technique, thus, information of prior groups is not necessary and is therefore, sometimes useful to explore potential grouping of samples in a given experiment [54, 55].

Linear Discriminant Analysis: Linear Discriminant Analysis or LDA is a classical method to predict groups of samples [54]. This, unlike PCA, is a supervised technique and requires prior information about the groups [54]. LDA is best used for the non-targeted metabolic profiling data [54], which is usually seen to be grouped. This technique, unlike PCA, is seen to maximise the comparison between "between-class" variance to the variance in "within-class" [54]. This method too, just like PCA, reduces the dimensions of data. However, LDA gives better separation between the groups of experimental data [55]. LDA thus, functions not only to reduce the dimensions of the raw data obtained using LC-MS based metabolomics studies but also separates the sample classes [54].

Hierarchical Cluster Analysis: Hierarchical Cluster Analysis or HCA is a statistical analysis method in which groups of samples that act in a similar manner and/or show similar characters quantify the structural characteristics of the variables [54]. This statistical method requires the construction of hierarchy of tree-like structures [54]. There are two ways of constructing a structure. They are: (1) agglomerative and (2) divisive [54]. HCA is used for the statistical analysis of results obtained while treating both, biochemical samples as well as metabolites [54]. The results produced by HCA can be shown both, as a clustering dendogram or as a heat map [54].

Partial-least Squares Discriminant Analysis: Partial-least Squares Discriminant Analysis, PLS-DA, is a very strong and widely used supervised classification method [54]. This statistical method is capable of solving high-dimensional data too [54]. The most important application of this technique is to analyse data generated from studies being carried out for the diagnosis, prognosis or even the outcomes of treatment [54]. Similar to the PCA, the PLS-DA also produces classification, i.e. score plot as well as feature selection, i.e. loading plot [54]. This being a supervised technique, there is a risk of over-fitting the model, but most of the available software for this technique has some kind or other options to cross validate the models [54].


The most important reason for the entire study of metabolomics being of such importance is its key role in identifying the metabolites correctly. This step therefore, is of utmost importance and also the most challenging one. Owing to the fact that there are a large number of metabolites in the body, this step needs most attention. Metabolite identification in case of untargeted metabolic analysis is basically achieved by mass-based searches; which is then followed by a manual verification [56- 59]. This may, however, still not give the best results as identification of unambiguous metabolites to be identified with extremely similar molecular weights is not easy, moreover, mass-based metabolite identification method is incapable of distinguishing between isomers and the most important limitation is the restricted coverage in the database. Several steps are being taken to improve upon this technique.


Tandem mass spectrometry or MS/MS carries out more than one step of MS analysis at a given time, and the molecular fragmentation is seen to occur between stages. This method is carried out either by using many mass analyzers which are separated in distance or by using a single mass analyzer which operates differently in time [60]. High quality analysers help identify a wide spectrum or the MS/MS Spectrum for validation of metabolites. However, a relatively poor reproducibility of the MS/MS spectra is a major drawback of this technique [60].


There are many factors which contribute to the variability of data obtained from LC-MS [60]. Biological entities vary tremendously, and so does the results pertaining to biological entities [60]. Variations in the analytical reasons also contribute to the significant variations in result [60].

Variations in the data obtained by LC-MS are also noted for in the results begotten by MS/MS [60]. If the data is generated in different types of machines, the MS/MS spectra is also observed to vary [60].