# Classifying Crop And Weed From Digital Images Biology Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Conventional cropping systems tend to rely heavily on herbicide applications to reduce the abundance of weeds. Although these approaches have been successful in increasing crop productivity and farm labour efficiency, concerns regarding the environmental and economic impacts of these weed control practices and the development of herbicide resistance have generated interest in identifying alternative weed control strategies. Rather than the conventional method of applying herbicide uniformly across the field, an automated machine vision system can be an economically feasible alternative. The objective is to reduce the amount of herbicide use by applying a normal dose of herbicide where there grow a lot of weeds and applying a smaller dose where there are little or no weeds. So automated machine vision systems, either to control a sprayer or to do weeding, must have the ability to analyse images and detect weeds. This paper deals with the application of Bayesian classifier and support vector machine (SVM) as the pattern recognition model for detection of crop and weed species. These two techniques are compared to find which one is more accurate and efficient for crop and weed classification. Both Bayesian classifier and SVM classifier have been tested to determine the robustness and accuracy. A total number of 22 features were investigated to find out the best combination of features for both classifiers. The analysis of the classification results over 224 sample images shows that, SVM provides better classification accuracy than Bayesian classifier.

Index terms: SVM, Bayesian classifier, Herbicide, Weed control, Machine vision systems.

## I. Introduction

Weeds can be defined as unwanted plants having adaptive characteristics which allow them to survive and reproduce in cropping systems. Increasing productivity and upgrading plantation systems are the major concerns for accelerating agricultural development. Weeds hamper this development by competing with crops for water, light, soil nutrients and space. Uncontrolled weeds commonly reduce crop yields from 10 to 95 percent [1]. So, better weed control strategies are required to sustain crop productivity and quality. There are several strategies for weed control such as manual labor, mechanical cultivation and use of agricultural chemicals known as herbicides. Using herbicides is the most common weed control method which has negative impacts on environment and human health. There are also some economic concerns regarding the use of herbicides. In United States, total cost of herbicides was about $16 billion in the year 2005 [2]. One of the major cost ineffective aspects of using herbicides system is that in most of the cases, herbicides are applied uniformly within crop field. There can be many portions of field having no or few weeds but herbicides are also applied there. On the other hand, human involvement in applying herbicides is time consuming, inefficient and costly. Again, repeated use of the same herbicide in a field tends to promote the emergence of herbicide tolerant weeds [3]. Over 290 biotypes of herbicide tolerant weeds have been reported in agricultural fields and gardens worldwide [3].

The economy of Bangladesh is primarily supported by agriculture. The performance of this sector has an overwhelming impact on poverty alleviation, economic development and food security. The total cultivable land of Bangladesh is 8.44 million hectare [4], which is not sufficient for this huge population. Population pressure continues to place a severe burden on productive capacity. For cost-effective land use, crop production and quality must be maximized and the cost of weed control must be minimized. The most commonly used technique for applying herbicides in Bangladesh is to spray the herbicide solution with a knapsack sprayer. This technique is considered to be inefficient, time consuming and recommended safety measures are rarely maintained. So, a machine vision system, having the ability to detect crops and weeds and put herbicides where there are weeds, can be a novel approach which will enhance the profitability and lessen environmental degradation. In this approach, images will be taken from crop field and crops and weeds will be identified by an automated system.

Much research has investigated various strategies to find out a robust weed control system. A few real-time field systems have been developed. The photo sensor based plant detection systems developed by Shearer and Jones (1991) and Hanks (1996) were able to detect all the green plants and spray only the plants. Islam et al. (2005) used PDA as processing device and measure Weed Coverage Rate (WCR) to discriminate between narrow and broad leaves. Ahmad I. et al. (2007) developed an algorithm to classify images into broad and narrow class based on Histogram Maxima with threshold for selective herbicide application with an accuracy of 95%. Ahmed et al. (2007) developed a real-time weed recognition system using statistical methods on the bases of leaves with accuracy over 90%. Ghazali et al. (2008) developed an intelligent real-time system for automatic weeding strategy in oil palm plantation using statistical approach GLCM and structural approach FFT and SIFT with a success rate above 80%.

The main objective of this work is to use Bayesian classifier and SVM classifier as the classification model to classify crops and weeds from digital images and to determine which one performs better. SVM was chosen as a pattern recognition model because of significant advantages of SVM such as good generalization performance, the absence of local minima and the sparse representation of solution [5]. On the other hand, advantages of Bayesian classifier are ease of implementation and computational efficiency.

## II. MATERIALS AND METHODS

## 2.1 Image Acquisition

The images to be used for this study were taken from a chilli field. Also five weed species were chosen which are common in chilli fields of Bangladesh. TABLE I lists the English and Scientific names of chilli and the selected weed species.

Table I: Selected species.

Class Label

English Name

Scientific Name

1

Chilli

Capsicum frutescens

2

Pigweed

Amaranthus viridis

3

Marsh herb

Enhydra fluctuans

4

Lamb's quarters

Chenopodium album

5

Cogongrass

Imperata cylindrica

6

Burcucumber

Sicyos angulatus

The images were taken with a digital camera equipped with a 4.65 to 18.6 mm lens. The camera was pointed directly towards the ground while taking the images. The lens of the camera was 40 cm above the ground level. An image taken with these settings would cover a 30 cm by 30 cm ground area. The image resolution of the camera was set to 1200x768. The images taken were all color images. Fig. 1 shows sample images of chilli and other five weed species.

(a) (b) (c)

(d) (e) (f)

Figure 1: Sample images of different plants; (a) chilli (b) pigweed (c) marsh herb (d) lamb's quarter (e) cogongrass (f) burcucumber.

## 2.2 Pre-processing

Segmentation based on thresholding technique was used to separate the plants from soil in images. The fact that plants are greener than soil was used to do segmentation. Let 'G' denotes the green color component of a RGB image. A gray-scale image was obtained from the original image by considering only the 'G' value. A threshold value of 'G' was then calculated. Let 'T' denotes this threshold value. The pixels with 'G' value greater than 'T' were treated as plant pixels and lower than were soil pixels. For each image, a binary image was obtained by segmentation, where pixels with value '0' represent soil and pixels with value '1' represent plant.

For removing noise from the images, an opening operation was first applied to the binary images. In opening, an erosion operation is followed by a dilation operation. It has the effect of removing small pixel regions [6]. Then a closing operation was applied. In closing, a dilation operation is followed by an erosion operation. It will fill small holes in an object [6]. Fig. 2 shows the overall pre-processing steps of a sample image.

(a) (b) (c) (d)

Figure 2: Images of a pigweed; (a) RGB image (b) gray-scale image (c) segmented binary image (d) binary image after noise removal.

## 2.3 Feature Extraction

A total number of 22 features were extracted from each image. These features can be divided into four categories: color features, size dependent object descriptors, size independent shape features and moment invariants.

## 2.3.1 Color Features

Let 'R', 'G' and 'B' denote the red, green and blue color components respectively. Every component was divided by the sum of all three components to make the color features independent of different light conditions [7].

r = (1)

g = (2)

b = (3)

Only plant pixels were used when calculating the color features, so the features are only based on plant color not soil color. The color features used were: mean value of 'r', mean value of 'g', mean value of 'b', standard deviation of 'r', standard deviation of 'g' and standard deviation of 'b'.

## Size Dependent Object Descriptors

The size dependent descriptors were calculated on the segmented binary images. These features are dependent on plant shape and size. The size dependent object descriptors used were:

Area, defined as the number of pixels valued '1' in a binary image, which means the number of plant pixels.

Perimeter, defined as the number of pixels which define the border of a plant. Pixels with value '1' which has at least one neighbor soil pixel were considered as border pixel.

Convex area, defined as the area of the smallest convex hull that covers all the plant pixels.

Convex perimeter, defined as the perimeter of the smallest convex hull that covers all the plant pixels.

## 2.3.3 Size Independent Shape Features

Some size independent shape features can be calculated from the size dependent object descriptors. The size independent features used for this study were:

Formfactor = (4)

Elongatedness = (5)

Convexity = (6)

Solidity = (7)

All these features are dimensionless. Thickness of an object is defined as twice the number of steps required to eliminate border pixels one layer per step to make the object disappear [8].

## Moment Invariant Features

The moments determine how spread an object's area is [7]. The following moment invariants were considered for this study:

Φ1 = η2,0 + η0,2 (8)

Φ2 = (η2,0 + η0,2)2 + 4η1,12 (9)

Φ3 = (η3,0 − 3η1,2)2 + (η0,3 − 3η2,1)2 (10)

Φ4 = (η3,0 + η1,2)2 + (η0,3 + η2,1)2 (11)

where

(12)

Here, 'γ' and 'µp,q' are defined as:

(13)

and

µp,q = ∑x ∑y (x − xÌ…)p (y − yÌ…)q f( x, y) (14)

f(x,y) is '1' for those pairs of (x,y) that correspond to plant pixels and '0' for soil pixels. The moment features are invariant to rotation and reflection. Natural logarithm was used to make the moment invariants more linear. The moment invariants features used were: ln(Φ1) of area, ln(Φ2) of area, ln(Φ3) of area, ln(Φ4) of area, ln(Φ1) of perimeter, ln(Φ2) of perimeter, ln(Φ3) of perimeter and ln(Φ4) of perimeter.

## 2.4 Classification Using Support Vector Machine

The way SVM works is to map vectors into an N-dimensional space and use an (N-1)-dimensional hyperplane as a decision plane to classify data. The task of SVM modeling is to find the optimal hyperplane that separates different class membership. In SVM, a classification task usually involves vectorizing each instance into an array of features, modeling with training data to find optimal separating hyperplane with maximal margin, using SVM to map all the objects into a different space via a kernel function and classifying new object according to its position with respect to hyperplane. Errors in training are allowed while the goal of training is to maximize the margin and minimize errors.

In this study, collected data were separated into training and testing sets. Each instance in the training set contains one class label and some features. As the feature value for the dataset can have the value in dynamic range, dataset needs to be normalized to avoid attributes with greater numeric ranges dominating those with smaller numeric ranges. The goal of SVM is to produce a model (based on the training data) which predicts the target values of the testing data given only the testing data attributes [9]. In the training set, each tuple was represented by an n-dimensional feature vector,

X = (x1,x2,… …,xn) where n = 22

Here, 'X' depicts 'n' measurements made on the tuple from 'n' features. There are six classes labeled 1 to 6 as listed in TABLE I.

LIBSVM 2.91 was used for SVM classification. Each feature value of the dataset was scaled to the range of [0, 1]. RBF (Radial-Basis Function) kernel was used for SVM training and testing. As this kernel nonlinearly maps samples into a higher dimensional space so it can handle the case when the relation between class labels and features is nonlinear [9]. A commonly used radial basis function is:

K(xi , xj) = exp(−γ|| xi − xj ||2), γ>0 (15)

where

|| xi - xj ||2 = (xi - xj)t (xi − xj) (16)

This RBF kernel requires two parameters: 'γ' and a penalization parameter, 'C'. Appropriate values of 'C' and 'γ' should be calculated to achieve high accuracy rate in classification. For the purpose of this study, selected values of these two parameters were C = 1.00 and γ = 1 / total number of features.

## 2.5 Classification Using Bayesian Classifier

Bayesian classifier is a fundamental and computationally efficient statistical methodology. This classifier can be represented in terms of a set of discriminant functions gi(x), i=1,…,c. The classifier will assign a d-component column vector 'x' to class 'wi' if

gi(x) > gj(x) for all j ≠ i

Minimum-error-rate classification can be gained by:

gi(x) = ln(P(x|wi)) + ln(P(wi)) (17)

Here 'P(x|wi)' is the state conditional probability density function for 'x', with the probability density function for 'x' conditioned on 'wi' being the class and 'P(wi)' describes the prior probability that nature is in class 'wi'. If the densities 'P(x|wi)' are normal, then 'gi(x)' can be calculated as:

gi(x) = − (1/2)(x − µi)t ∑i−1(x − µi) - (d/2)ln2π - (1/2) ln(|∑i|) + ln(P(wi)) (18)

Here 'µ' is the d-component mean vector, '∑' is the d-by-d covariance matrix, '|∑|' and '∑−1' are the determinant and inverse of the covariance matrix respectively.

## III. RESULT AND DISCUSSION

Cross-validation is a common testing procedure. It is quite efficient as it prevents the overfitting problem. Ten-fold cross-validation was selected for the testing purpose. In ten-fold cross-validation, the training set is divided into ten subsets of equal size. Sequentially one subset is tested using the classifier trained on the remaining nine subsets. Thus, each instance of the whole training set is predicted once so the cross-validation accuracy is the percentage of data which are classified correctly.

The cross-validation results of support vector machine and Bayesian classifier using all features were 93.75% and 89.23% respectively over 224 samples. No crop image was misclassified as weed by SVM classifier. But Bayesian classifier misclassified two chilli plants as weed. It is evident that, classification accuracy of Bayesian classifier is relatively lower than SVM accuracy in this case. The overall classification result is shown in TABLE II and TABLE III.

Table II: Classification result using all features.

English Name of Samples

Number of Samples

SVM Classifier

Bayesian Classifier

Number of Misclassified Samples

Accuracy Rate

Number of Misclassified Samples

Accuracy Rate

Chilli

40

0

100%

2

95%

Pigweed

40

5

87.5%

15

62.5%

Marsh herb

31

2

93.5%

0

100%

Lamb's quarters

33

5

84.84%

5

84.85%

Cogongrass

45

0

100%

2

95.56%

Burcucumber

35

2

94.3%

0

100%

Table III: Success rate comparison using all features.

Method

Total Number of Samples

Total Number of Misclassification

Average Success Rate

SVM Classifier

224

14

93.75%

Bayesian Classifier

224

24

89.23%

Feature reduction is necessary for reducing computational complexity and improving performance by eliminating noisy features. There may be cases when two features carry good classification information when treated separately, but there is little gain if they are combined together in a feature vector because of high mutual correlation [10]. Thus complexity increases without much gain. The main objective of feature reduction is to select features leading to large between-class distances and small within-class variance in the feature vector space [10]. To select the set of features which gives the best classification result, both forward-selection and backward-elimination methods were used. In forward-selection, selection process starts with one feature and other features are added one at a time. At each step, each feature that is not already in the set is tested for inclusion in the set. This process continues until no significant improvement in classification result is observed. In backward-elimination, the process starts with all features included. At each stage, the least significant feature is eliminated from the set. This process continues until a certain criterion is met. These two processes can be combined to find an optimal set of features. This method is called stepwise selection. In stepwise selection, features are added as in forward selection, but after a feature is added, all the features in the set are candidates for backward-elimination.

After feature reduction, a set of eleven features was found for SVM classifier which provides the best classification rate. The best features were: convexity, solidity, mean value of 'r', mean value of 'b', standard deviation of 'r', standard deviation of 'b', ln(Φ1) of area, ln(Φ2) of area, ln(Φ3) of area, ln(Φ4) of area, ln(Φ2) of perimeter.

The result of ten-fold cross-validation of SVM classifier using these eleven features was 98.22%. It is evident that accuracy rate increases significantly with this combination of features. Only four weed images were misclassified. Classification result using these nine features is given in TABLE IV.

Table IV: Classification result of SVM using set of best features.

English Name of Samples

Number of Samples

Number of Misclassified Samples

Success Rate

Chilli

40

0

100%

Pigweed

40

2

95%

Marsh herb

31

0

100%

Lamb's quarters

33

0

100%

Cogongrass

45

0

100%

Burcucumber

35

2

94.3%

Average Success Rate

98.22%

For Bayesian classifier, a different set of eleven features was found which provides the best accuracy. The best features were: perimeter, convex perimeter, formfactor, elongatedness, solidity, mean value of 'r', mean value of 'g', mean value of 'b', standard deviation of 'g', ln(Φ2) of area, ln(Φ3) of perimeter

The result of ten-fold cross-validation for Bayesian classifier using these eleven features was 95.79%. For Bayesian classifier, accuracy rate also increases significantly with this set of features, though it is lower than the accuracy rate obtained using SVM classifier. Classification result of Bayesian classifier using these eleven features is given in TABLE V.

Table V: Classification result of Bayesian classifier using set of best features.

English Name of Samples

Number of Samples

Number of Misclassified Samples

Success Rate

Chilli

40

0

100%

Pigweed

40

3

92.5%

Marsh herb

31

2

93.55%

Lamb's quarters

33

3

90.91%

Cogongrass

45

1

97.78%

Burcucumber

35

0

100%

Average Success Rate

95.79%

Fig. 3 is a graph of accuracy vs. number of features for Bayesian classifier and SVM classifier. For both classifiers, set of best features was used to determine the corresponding accuracy rate.

Figure 3: Accuracy Rate vs. Number of Features graph for SVM and Bayesian classifier.

It can be seen that, Bayesian classifier provides higher accuracy rate than SVM when Number of Features<=3. But for 4<=Number of Features<=22, SVM accuracy rate is always higher than the accuracy rate of Bayesian classifier. From this analysis, it can be concluded that SVM classifier performance is better than Bayesian classifier for crops and weeds classification.

## IV. CONCLUSION

Machine vision system based on digital image processing is found to be the most efficient sensor detection technique. For real time implementation of this machine vision system, an efficient classification model is required which can classify crops and weeds with a high accuracy ratio. The goal of this paper was to test the feasibility of SVM classifier and Bayesian classifier in crops and weeds classification. From the results, it is clear that SVM provides very higher accuracy ratio. On the other hand, though Bayesian classifier are easy to implement and cost effective, the accuracy ratio is relatively lower than SVM.