Document Type : Research Paper
Authors
Department of Forestry, Faculty of Natural Resources, University of Guilan, Sowmeh sara, Guilan, Iran
Abstract
Keywords
[Research]
Remote sensing-assisted mapping of quantitative attributes in Zagros open forests of Iran
Soleimannejad L.*, Bonyad A.E., Naghdi R.
Department of Forestry, Faculty of Natural Resources, University of Guilan, Sowmeh sara, Guilan,Iran
* Corresponding author’s E-mail: leilisoleimannejad@yahoo.com (Received: Feb. 12. 2018 Accepted: June 25. 2018)
ABSTRACT
The Zagros forests come as one of the most valuable ecosystems in western Iran. Therefore, accurate and up-to-date information on basal area, canopy cover, and stem number per hectare of these forests are the important factors in the context of forest management and conservation. The main objective of this study was to estimate quantitative forest attributes using Landsat 8-OLI image data and Random Forest, a well-known machine learning technique. The results were shown the lowest out of bag error with the combination of 800 trees and 8 variables in each node as the optimal model parameters to classify forest canopy cover with overall accuracy and Kappa coefficient of 83% and 0.73 respectively, while those of classified mapping of basal area were 78% and 0.72, and also those of stem number per hectare were 75% and 0.69 respectively. All in all, the Random Forest classifier algorithm provided comparatively successful mapping results of quantitative attributes in Zagros open forests of Iran from Landsat 8-OLI image data.
Key words: Random forest classifiers; landsat 8-OLI data; Zagros forests, Iran.
INTRODUCTION
Forests classification and mapping from satellite imagery is an important research topic and has been extensively studied using remote sensing techniques. However, accurate classification and mapping of forest inventory variables from satellite imagery data, across semi-arid areas in particular, is still a challenge. Various classification techniques currently exist, with main categories being supervised or unsupervised, parametric or non-parametric algorithms. Machine learning classifiers are non-parametric supervised classification algorithms out of which Support Vector Machines (SVM) and Random Forest (RF) algorithms are commonly used. Other techniques are also known but rarely used in remote sensing through Iranian literatures.
Compared to other non-parametric classifiers, the RF algorithm renders equally accurate results and is easier to apply (Pal 2005), insensitive to noise or overtraining (Gislason et al. 2006). The RF algorithm is a technique that has become efficient and popular for remote sensing applications such as land cover and image classification (Pal 2005, Prasad et al. 2006) and produces more accurate results compared to other techniques (Li et al. 2014). This method is low cost, requires fewer parameters than other methods, and is therefore easier to operate (Pal 2005). Naghavi et al. (2014) classified forest canopy cover in semi-Mediterranean Zagros forests using two non-parametric algorithms and reported RF regression model outperformed the SVM. Immitzer et al. (2016) and Ng et al. (2017) used RF algorithm and Sentinel2 imagery data for tree species classifications in central Europe and Kenya, respectively.
With respect to forest stands classification and mapping, the Landsat 8-OLI images are typically the most widely used remotely sensed data, on which a number of studies have been recently reported in the context of forest mapping. Poursanidis et al. (2015) compared the Landsat 8-OLI classification results with Landsat 7 ETM+ and Landsat 5 TM, and reported that classification using Landsat 8 provided better results.
Information about the basal area (BA), canopy cover (CC), and stem number per hectare (SPH) are important in the context of forest management across Zagros forests of Iran, which are semi-Mediterranean forests and woodlands affected by several natural and anthropogenic factors. Combining field data with remote sensing data using appropriate classification methods enable the entire geographical area of forest to be classified and mapped. The main purpose of this study was to estimate quantitative forest attributes in the Zagros open forests using Landsat 8-OLI image data and Random Forest, a well-known machine learning classifier.
MATERIALS AND METHODS
Study area
The RF classification algorithm was implemented in Zagros open forest of Iran which includes north of Ilam Province. The study area is located between 46° 27´00’’E and 33° 40´00’’N to 46° 30´00’’ E and 33° 44´00’’N (Fig. 1). The region covers about 2000 ha. Annual mean temperature and total precipitation in the study area are 16.7 °C and 571 mm, respectively. This area is semi-Mediterranean and the dominant tree species is Quercus brantii. Besides, the other tree species such as: Pistacia atlantica, Crataegus speciesandAcer monspessulanum rarely occur.
Fig. 1. The location of the study area in Ilam Province, Iran.
Field measurements
The field survey focused on oak dominance species. 100 sample plots (60 m × 60 m) were totally collected in a 400 × 500 m regular grid in June and July 2014 (Fig. 2).
In each sample plot, CC, BA, and SPH were measured. Tables 1 shows statistical information of the BA (m2 ha-1), CC (percent per hectare), and SPH (stem number per hectare) in study area.
Table 1. Statistical analysis of the estimation of CC, BA and SPH in sample plots.
Sample plots = 100 |
Min |
Max |
Mean |
Std. deviation |
BA(m2 ha-1) |
0* |
18.14 |
3.08 |
3.12 |
CC (%) |
0* |
57.06 |
20.50 |
12.90 |
SPH (stem ha-1) |
0* |
133.00 |
43.26 |
32.09 |
Fig. 2. Position map of sample plots using systematic random sampling in the study area.
Table 2. Forest attributes (Canopy Cover, Basal area, Stem per hectare) classes.
Class number |
|
CC (%) |
BA (m2 ha-1) |
SPH (stems ha-1) |
1 |
|
0-10 (very sparse) |
0-1 |
0-20 |
2 |
|
10-25 (sparse) |
1-3 |
20-40 |
3 |
|
25-50 (semi dense) |
3-5 |
40-60 |
4 |
|
50-75 (dense) |
5-7 |
60-80 |
5 |
|
- |
7-9 |
80-100 |
6 |
|
- |
>9 |
>100 |
Remote sensing data acquisition and pre processing
Landsat 8-OLI Level 1 terrain-corrected (L1T) product has been used. Used Scene was acquired from the United States Geological Survey (USGS) website (path/row: 167/37, acquired on 2014-06-20) for the classification of forest attributes. The on-board sensor for Landsat 8-OLI is composed of 11 bands with 30 m spatial resolution for the bands 1–7 while 9, 15 m for the panchromatic band 8, and also 100 m for the bands 10 and 11. The used scene of Landsat 8 was geometrically corrected by Level 1 product generation system (LPGS) (USGS). Landsat 8-OLI Digital Number (DN) values were converted into the top of atmospheric (TOA) reflectance. We used top-of-atmosphere values to derive synthetic dataset.
Deriving the explanatory variables (Vegetation indices)
A total of 22 explanatory variables were derived from the pre-processed Landsat 8-OLI sensor for the classification of forest attribute. 6
out of 22 were derived from the original bands (i.e. band 2 to band 7) and 16 were derived from the artificial bands (i.e., principal component analysis, and Tasselled cap transformation). The vegetation indices (VIs) are broadly classified into slope-based and distance-based. The slope-based VIs are the Normalized Difference Vegetation Index (NDVI), Ratio Vegetation Index (RVI), and Renormalized Difference Vegetation Index (RDVI). The slope- based VIs were computed using the red and near infrared bands (Mroz & Sobieraj 2004). The distance-based indices are the Soil Adjusted Vegetation Index (SAVI), Difference Vegetation Index (DVI), Modified Second Adjusted Vegetation Index2 (MSAVI2), Perpendicular Vegetation Index (PVI), Perpendicular Vegetation Index 1 (PVI1) and Weighted Difference Vegetation Index (WDVI). All the mentioned distance-based indices were derived using the regression between the infrared band as an independent variable and the red band as a dependent variable as shown in Fig. 2a. The red band was used as the independent variable when computing the distance-based VIs such as the Perpendicular Vegetation Index2 (PVI2) (Ramachandra 2007; Silleos et al. 2006; Mroz & Sobieraj 2004) [Fig. 3(b) and Table 3].
The soil line was obtained via a linear regression between the infrared band and red band for bare soil pixels (Richardson & Weigand 1977; Baret et al. 1993). The more detailed explanation of VIs is summarized in Table 3.
In addition to the VIs as described above, principal component analysis (PCA) including PCA2-4 (2: Blue, 3: Green, 4: Red), PCA5-7, (5: NIR, 6: SWIR1, 7: SWIR2), PCA2-7 were used to provide a standard measure of dimensionality reduction. Moreover, the Tasselled Cap transformation (TCT) was applied to increase the optimal visibility of vegetation cover (Crist & Cicone 1984). In this transformation, the multiple bands in a multispectral image can be visualized by defining an N-dimensional space, where N stands for the number of bands. As such, the yielded three axes can be interpreted as the degree of brightness, greenness, and wetness as calculated by the Tasselled Cap coefficients (Baig et al. 2014). Furthermore, the mean decrease in Gini index was used to find the most important explanatory variables. The mean decrease in Gini index was rendered as an output of RF in R software.
(a) (b)
Fig. 3. The calculated soil line with infrared (a) and red (b) bands as response.
Table 3. Summary of the vegetation indices.
Reference |
Equation vegetation index |
Vegetation index |
Rouse et al. (1974) |
NDVI = (NIR - R) / (NIR + R) |
NDVI |
Richardson & Wiegand (1977) |
DVI = a*NIR - R |
DVI |
Richardson & Wiegand (1977) |
PVI = (NIR – β* R- α) / √ (1+ β2) |
PVI |
Perry & Lautenschlager (1984) |
PVI1 = α* NIR – R + β/ √ (1+α 2) |
PVI1 |
Bannari et al. (1995) |
PVI2 = NIR – α* R + β/ √ (1+ α2) |
PVI2 |
Huete (1988) |
SAVI = [(NIR - R) / (NIR + R + L)]*(1 + L) |
SAVI |
Roujean & Breon (1995) |
RDVI = (NIR –red)/ √ ( NIR + red) |
RDVI |
Qi et al. (1994) |
MSAVI2 = (1/2)(2*(NIR+1)- √ ((2*NIR+1) 2-8*(NIR-R))) |
MSAVI2 |
Richardson & Wiegand (1977) |
RVI = R / NIR |
RVI |
Clevers (1989) |
WDVI = NIR – α*R |
WDVI |
Random forest classification algorithm
The RF classifier, suggested by Breiman (2001) is a supervised classification technique that is based on decision trees and uses improved
bagging and bootstrapping techniques. RF is one of the top supervised classification models that can handle numerous predictor variables and is thus able to cope with complexities in data dimensionality (Cutler et al. 2007). The algorithm is generally a mixture of randomized decision trees and aggregation of average predictions (Nandhini & Porkodi 2017) with a variety of usages in remote sensing-based methods. RF classifier is formed by a large number of trees, each is grown within randomly-selected training pixels, which are then used to train the classifier. On the other hand, the remaining samples called out-of-bag (oob) samples are used for the accuracy assessment. In the current implementation used in this study, there are two parameters needed to be defined, namely: 1) ntree: number of trees to grow and 2) mtry: number of variables to split each node. An optimization of the classifier using these parameters can result in high classification performance.
After the optimization of the final model by 70% data set, the quality of the RF prediction was further checked using 30% of data as test set to estimate overall accuracy (OA), sensitivity, and specificity. The OA is the proportion of observations correctly classified by the predictive model, sensitivity is the rate of observations correctly classified in a target class and specificity is the rate of correctly classified observations of the non-target class (Golino & Gomes 2014). Further the model performance for each class using receiver operating characteristic (ROC) analysis was examined (Fawcett 2006). Since ROC provides a highly visual account of a model performance, the ROC and Area under a ROC curve, a two-dimensional depiction of classifier performance were calculated (Hanley & McNeil 1982; Bradley 1997). With respect to the fact that the AUC is a portion of the area of the square unit, its value ranges between 0 and 1.
In this study, RF was used to classify forest attributes. All the above-mentioned extracted indices were used as the explanatory variables for the classification of the BA, CC and SPH. The error of RF was measured by cross-validations using out of bag samples (Breiman 2001). The optimum values of ntree and mtry were finally selected based on the lowest oob error.
The simplest way to find the optimum value for mtry is the calculation of the square root of the sum of explanatory variables. According to Mohammadi et al. (2017) who suggested increasing a number of mtry to ± 3 for each of the attributes, a range of mtry between 2 and 8 was examined. Also, the parameter of the number of trees across a broad range of values from 100 to 1000 was varied. The optimized RF classification model was employed using a combination of the caret (Kuhn 2015), random forest (Liaw & Wiener 2002), and e1071 (Meyer & Wien 2015) R packages. RF model Loops established using "For" command. In this study, a total of 70 models were tested using different combinations of ntree and mtry values for the selection of final three models based on the lowest oob error (optimized parameter values). The optimum ntree = 300 and mtry = 8 were used to classify BA. Similarly, the optimum ntree = 800 and mtry = 8 were used to classify CC, while the optimum ntree = 200 and mtry = 6 were used for classification of SPH (Fig. 4).
RESULTS
Basal area classification by RF classifiers
The accuracy assessment results indicated the success of RF classification and mapping of BA in the study area from Landsat 8-OLI imagery with OA and Kappa coefficient (KC) values 78% and 0.72, respectively (Table 4). The results indicated that the oob had lowest error in classification and mapping (Fig. 4a) of BA, using RF algorithm by the combination of mtry = 8 and ntree = 300. Furthermore, statistical parameters were calculated and analysed including sensitivity, specificity, OA and KC of each class. The results showed that the highest overall accuracies calculated in the class 5 with 7-9 m2 ha-1 and class 6 with >9 m2 ha-1 (Table 4).
In this study, the ROC and AUC of BA were calculated (Fig. 5). The results indicated that the AUC values in all classes of BA were higher than 0.90. This result indicated that the RF classifiers had accurate prediction in 6 BA classes of forest stands (Fig. 5).
(a)
(b)
(c)
Fig. 4. Effect of the ntree and mtry on the out of bag error for the basal area RF classification (a), canopy cover RF classification (b), stem number per hectare RF classification (c) using the 22 explanatory variables.
Table 4. Per class accuracies of BA using Random Forest classifiers.
Overall accuracy = 78% Kappa coefficient = 0.72 |
||||||||
Prediction |
Reference |
Sensitivity |
Specificity |
|||||
|
0-1 |
1-3 |
3-5 |
5-7 |
7-9 |
>9 |
|
|
0-1 |
5 |
1 |
0 |
0 |
0 |
0 |
0.62 |
0.95 |
1-3 |
3 |
4 |
1 |
0 |
0 |
0 |
0.80 |
0.82 |
3-5 |
0 |
0 |
5 |
1 |
0 |
0 |
0.83 |
0.95 |
5-7 |
0 |
0 |
0 |
2 |
0 |
0 |
0.66 |
1.00 |
7-9 |
0 |
0 |
0 |
0 |
3 |
0 |
1.00 |
1.00 |
>9 |
0 |
0 |
0 |
0 |
0 |
2 |
1.00 |
1.00 |
(a) (b)
(c) (d)
(e) (f)
Fig. 5. ROC and AUC curve in six BA classes.
According to mean decrease Gini, values of MSAVI2 and SAVI that used additional bands in the classification of BA, had larger importance than other original Landsat 8-OLI image and additional bands (Fig. 6(. The final results of classification and mapping of BA in Zagros open forest stands in six classes from Landsat 8-OLI image data using RF classifiers were shown in Fig. 7.
Canopy cover classification by RF classifiers
The information about forest canopy cover is important in the context of forest management in this ecosystem. The obtained accuracies indicated OA and KC values of 82% and 0.73, respectively for RF classification of forest canopy cover (Table 5). The lowest amount of oob validation for forest canopy cover classification with the RF algorithm was achieved by the combination of mtry = 8 and ntree = 800 (Fig. 4b). Statistical parameters including sensitivity, specificity, OA and KC indicated that the highest accuracy in term of sensitivity and specificity values was found in CC class 2 with sparse (10-25%) canopy cover (Table 5). The ROC curve and AUC of forest canopy cover were calculated (Fig. 8).
The results indicated that the AUC values in all classes of CC were higher than 0.93. This result indicated that the RF classifiers had well prediction in 4 CC classes of forest stands (Fig. 8).
According to mean decrease Gini, the highest importance values were achieved by RVI and NDVI used as artificial bands in the classification of forest canopy cover (Fig. 9). The results indicated that the PVI had the lowest variable importance for the classification of forest canopy cover in the study area.
The final map of forest canopy cover classification in 4 classes is shown in Fig. 10.
Fig. 6. Statistical analysis of the 22 bands in the classification of BA from mean decrease Gini values.
Fig. 7. Classification map of BA from Landsat 8-OLI imagery using Random Forest classifiers.
Table 5. Per class accuracies of CC using RF classifiers.
Overall accuracy = 82% Kappa coefficient = 0.73 |
||||||
Prediction |
Reference |
Sensitivity |
Specificity |
|||
|
very sparse |
sparse |
semi dense |
dense |
|
|
very sparse |
2 |
0 |
0 |
0 |
0.67 |
1.00 |
sparse |
1 |
3 |
1 |
0 |
1.00 |
0.96 |
semi dense |
0 |
0 |
9 |
2 |
0.82 |
0.88 |
dense |
0 |
0 |
2 |
9 |
0.82 |
0.88 |
(a) (b)
(c) (d)
Fig. 8. ROC and area under an ROC curve (AUC) in four CC classes.
Fig. 9. Statistical analysis of the 22 bands in the classification of CC from mean decrease Gini values.
Fig. 10. Classification map of CC from Landsat 8-OLI imagery using Random Forest classifiers.
Stem per hectare classification by RF classifiers
In this study, SPH was accurately classified across the study site in Zagros forests. Information on SPH is crucially important in the context of forest management, particularly across open forests exposed to many anthropogenic and natural hazards.
This classification map will help to assess the status of the Zagros open forest stands for subsequent comparison evaluations to determine the rate of change in SPH in the study area.
The results indicated the success of RF classification and mapping of SPH from Landsat 8-OLI images with OA and KC values as 75% and 0.69, respectively (Table 6). The results have shown that the oob with 13.89% had lowest error (Fig. 4c) of SPH using RF
algorithm with combination of mtry = 6 and ntree = 200. The results have shown the highest overall accuracies calculated in class 5 with 80-100 stems ha-1 and class 6 with >100 stems ha-1 (Table 6). The ROC curve and AUC of SPH were calculated (Fig. 11). The results indicated that the AUC values in all classes of stem per hectare were higher than 0.95. This result indicated the RF classifiers had well prediction in 6 of SPH classes of forest stands (Fig. 11).
The results of this study indicated that the mean decrease Gini values of SAVI and MSAVI2 used in classification of SPH as artificial bands had the highest importance than other bands (Fig. 12). The final results of classification and mapping of forest SPH in Zagros open forest stands in six classes from Landsat 8-OLI image data using RF classifiers are shown in Fig. 13.
Table 6. Per class accuracies of SPH using RF classifiers.
Overall accuracy = 75% Kappa coefficient = 0.69 |
||||||||
Prediction |
Reference |
Sensitivity |
Specificity |
|||||
|
0-20 |
20-40 |
40-60 |
60-80 |
80-100 |
>100 |
|
|
0-20 |
4 |
2 |
1 |
0 |
0 |
0 |
0.67 |
0.86 |
20-40 |
2 |
5 |
0 |
0 |
0 |
0 |
0.71 |
0.90 |
40-60 |
0 |
0 |
4 |
2 |
0 |
0 |
0.80 |
0.91 |
60-80 |
0 |
0 |
0 |
3 |
0 |
0 |
0.60 |
1.00 |
80-100 |
0 |
0 |
0 |
0 |
2 |
0 |
1.00 |
1.00 |
>100 |
0 |
0 |
0 |
0 |
0 |
3 |
1.00 |
1.00 |
(a) (b)
(c) (d)
(e) (f)
Fig. 11. ROC and Area under an ROC curve (AUC) in six SPH classes.
Fig. 12. Statistical analysis of 22 bands in the classification of SPH from mean decrease Gini values.
Fig. 13. Classified map of SPH using Landsat 8-OLI by RF classification model.
DISCUSSION AND CONCLUSION
The main objective of this study was mapping of a number of forest attributes on the OLI imagery data using RF algorithm across a portion of open Zagros forest in Iran. The highest accuracy was achieved for CC classification followed by BA and SPH.
The results of classifying CC exceeded those achieved in other comparable studies. Sarouei
(1999) and Naseri (2003) reported that the CC classification on Landsat 7 data in Zagros forests resulted in OA ranging from 66%–70%, KC from 0.3-0.45 using parametric algorithms. Abodlahi et al. (2010) reported OA = 53.68% and 64.54%, as well as KC = 0.21 and 0.43 from Landsat7-ETM+ and IRS-Liss3 using parametric algorithm in Zagros forest respectively. Similarly, Joshi et al. (2006) evaluated CC by ETM+ resulting in OA = 34.4% and KC = 0.23 using Maximum Likelihood Classification and OA = 59% and KC = 0.52 using Artificial Neural Network.
The results of classifying BA and SPH are roughly comparable with Abedi & bonyad (2015) used non parametric k Nearest Neighbor (KNN) algorithm on IRS-P6 LISS III satellite image data in a very dense forest to estimate BA and SPH in the Caspian forests of Iran. They reported 80% and 0.57 of OA and KC values for KNN classification map of BA and OA = 94%, KC=0.79 for that of SPH.
According to the results from statistical analysis such as mean decrease Gini criterion, the RVI and NDVI additional bands were the most important variables for the classification and mapping of CC from Landsat 8-OLI image data in open forest stands of the study area. In other word, the slope based VIs had obviously the largest amount of explanatory power in RF model concerning to CC classification. For BA and SPH, two distance-based vegetation indices were the most important variables. Naghavi et al. (2014) reported that the slope- and distance-based VIs showed higher accuracy for RF model concerning to CC estimation in Zagros open forest in comparison with the applied set of bands including only distance based VIs as explanatory variables. The results showed that soil reflectance has more influence on estimating of BA and SPH. The MSAVI2 and SAVI additional bands were used in the classification of BA and SPH, because the mean decrease Gini index analysis (Figs. 6 and 12) indicated that these two additional bands were the most important variables in the classification and mapping of BA and SPH from Landsat 8-OLI image data in open forest stands of the study area. As a result, distance based VIs have to be used to minimize the effects caused by soil background reflectance. Sivanpillai et al. (2006) reported that transformed bands of ETM+ including RVI, N
[Research]
Remote sensing-assisted mapping of quantitative attributes in Zagros open forests of Iran
Soleimannejad L.*, Bonyad A.E., Naghdi R.
Department of Forestry, Faculty of Natural Resources, University of Guilan, Sowmeh sara, Guilan,Iran
* Corresponding author’s E-mail: leilisoleimannejad@yahoo.com (Received: Feb. 12. 2018 Accepted: June 25. 2018)
ABSTRACT
The Zagros forests come as one of the most valuable ecosystems in western Iran. Therefore, accurate and up-to-date information on basal area, canopy cover, and stem number per hectare of these forests are the important factors in the context of forest management and conservation. The main objective of this study was to estimate quantitative forest attributes using Landsat 8-OLI image data and Random Forest, a well-known machine learning technique. The results were shown the lowest out of bag error with the combination of 800 trees and 8 variables in each node as the optimal model parameters to classify forest canopy cover with overall accuracy and Kappa coefficient of 83% and 0.73 respectively, while those of classified mapping of basal area were 78% and 0.72, and also those of stem number per hectare were 75% and 0.69 respectively. All in all, the Random Forest classifier algorithm provided comparatively successful mapping results of quantitative attributes in Zagros open forests of Iran from Landsat 8-OLI image data.
Key words: Random forest classifiers; landsat 8-OLI data; Zagros forests, Iran.
INTRODUCTION
Forests classification and mapping from satellite imagery is an important research topic and has been extensively studied using remote sensing techniques. However, accurate classification and mapping of forest inventory variables from satellite imagery data, across semi-arid areas in particular, is still a challenge. Various classification techniques currently exist, with main categories being supervised or unsupervised, parametric or non-parametric algorithms. Machine learning classifiers are non-parametric supervised classification algorithms out of which Support Vector Machines (SVM) and Random Forest (RF) algorithms are commonly used. Other techniques are also known but rarely used in remote sensing through Iranian literatures.
Compared to other non-parametric classifiers, the RF algorithm renders equally accurate results and is easier to apply (Pal 2005), insensitive to noise or overtraining (Gislason et al. 2006). The RF algorithm is a technique that has become efficient and popular for remote sensing applications such as land cover and image classification (Pal 2005, Prasad et al. 2006) and produces more accurate results compared to other techniques (Li et al. 2014). This method is low cost, requires fewer parameters than other methods, and is therefore easier to operate (Pal 2005). Naghavi et al. (2014) classified forest canopy cover in semi-Mediterranean Zagros forests using two non-parametric algorithms and reported RF regression model outperformed the SVM. Immitzer et al. (2016) and Ng et al. (2017) used RF algorithm and Sentinel2 imagery data for tree species classifications in central Europe and Kenya, respectively.
With respect to forest stands classification and mapping, the Landsat 8-OLI images are typically the most widely used remotely sensed data, on which a number of studies have been recently reported in the context of forest mapping. Poursanidis et al. (2015) compared the Landsat 8-OLI classification results with Landsat 7 ETM+ and Landsat 5 TM, and reported that classification using Landsat 8 provided better results.
Information about the basal area (BA), canopy cover (CC), and stem number per hectare (SPH) are important in the context of forest management across Zagros forests of Iran, which are semi-Mediterranean forests and woodlands affected by several natural and anthropogenic factors. Combining field data with remote sensing data using appropriate classification methods enable the entire geographical area of forest to be classified and mapped. The main purpose of this study was to estimate quantitative forest attributes in the Zagros open forests using Landsat 8-OLI image data and Random Forest, a well-known machine learning classifier.
MATERIALS AND METHODS
Study area
The RF classification algorithm was implemented in Zagros open forest of Iran which includes north of Ilam Province. The study area is located between 46° 27´00’’E and 33° 40´00’’N to 46° 30´00’’ E and 33° 44´00’’N (Fig. 1). The region covers about 2000 ha. Annual mean temperature and total precipitation in the study area are 16.7 °C and 571 mm, respectively. This area is semi-Mediterranean and the dominant tree species is Quercus brantii. Besides, the other tree species such as: Pistacia atlantica, Crataegus speciesandAcer monspessulanum rarely occur.
Fig. 1. The location of the study area in Ilam Province, Iran.
Field measurements
The field survey focused on oak dominance species. 100 sample plots (60 m × 60 m) were totally collected in a 400 × 500 m regular grid in June and July 2014 (Fig. 2).
In each sample plot, CC, BA, and SPH were measured. Tables 1 shows statistical information of the BA (m2 ha-1), CC (percent per hectare), and SPH (stem number per hectare) in study area.
Table 1. Statistical analysis of the estimation of CC, BA and SPH in sample plots.
Sample plots = 100 |
Min |
Max |
Mean |
Std. deviation |
BA(m2 ha-1) |
0* |
18.14 |
3.08 |
3.12 |
CC (%) |
0* |
57.06 |
20.50 |
12.90 |
SPH (stem ha-1) |
0* |
133.00 |
43.26 |
32.09 |
Fig. 2. Position map of sample plots using systematic random sampling in the study area.
Table 2. Forest attributes (Canopy Cover, Basal area, Stem per hectare) classes.
Class number |
|
CC (%) |
BA (m2 ha-1) |
SPH (stems ha-1) |
1 |
|
0-10 (very sparse) |
0-1 |
0-20 |
2 |
|
10-25 (sparse) |
1-3 |
20-40 |
3 |
|
25-50 (semi dense) |
3-5 |
40-60 |
4 |
|
50-75 (dense) |
5-7 |
60-80 |
5 |
|
- |
7-9 |
80-100 |
6 |
|
- |
>9 |
>100 |
Remote sensing data acquisition and pre processing
Landsat 8-OLI Level 1 terrain-corrected (L1T) product has been used. Used Scene was acquired from the United States Geological Survey (USGS) website (path/row: 167/37, acquired on 2014-06-20) for the classification of forest attributes. The on-board sensor for Landsat 8-OLI is composed of 11 bands with 30 m spatial resolution for the bands 1–7 while 9, 15 m for the panchromatic band 8, and also 100 m for the bands 10 and 11. The used scene of Landsat 8 was geometrically corrected by Level 1 product generation system (LPGS) (USGS). Landsat 8-OLI Digital Number (DN) values were converted into the top of atmospheric (TOA) reflectance. We used top-of-atmosphere values to derive synthetic dataset.
Deriving the explanatory variables (Vegetation indices)
A total of 22 explanatory variables were derived from the pre-processed Landsat 8-OLI sensor for the classification of forest attribute. 6
out of 22 were derived from the original bands (i.e. band 2 to band 7) and 16 were derived from the artificial bands (i.e., principal component analysis, and Tasselled cap transformation). The vegetation indices (VIs) are broadly classified into slope-based and distance-based. The slope-based VIs are the Normalized Difference Vegetation Index (NDVI), Ratio Vegetation Index (RVI), and Renormalized Difference Vegetation Index (RDVI). The slope- based VIs were computed using the red and near infrared bands (Mroz & Sobieraj 2004). The distance-based indices are the Soil Adjusted Vegetation Index (SAVI), Difference Vegetation Index (DVI), Modified Second Adjusted Vegetation Index2 (MSAVI2), Perpendicular Vegetation Index (PVI), Perpendicular Vegetation Index 1 (PVI1) and Weighted Difference Vegetation Index (WDVI). All the mentioned distance-based indices were derived using the regression between the infrared band as an independent variable and the red band as a dependent variable as shown in Fig. 2a. The red band was used as the independent variable when computing the distance-based VIs such as the Perpendicular Vegetation Index2 (PVI2) (Ramachandra 2007; Silleos et al. 2006; Mroz & Sobieraj 2004) [Fig. 3(b) and Table 3].
The soil line was obtained via a linear regression between the infrared band and red band for bare soil pixels (Richardson & Weigand 1977; Baret et al. 1993). The more detailed explanation of VIs is summarized in Table 3.
In addition to the VIs as described above, principal component analysis (PCA) including PCA2-4 (2: Blue, 3: Green, 4: Red), PCA5-7, (5: NIR, 6: SWIR1, 7: SWIR2), PCA2-7 were used to provide a standard measure of dimensionality reduction. Moreover, the Tasselled Cap transformation (TCT) was applied to increase the optimal visibility of vegetation cover (Crist & Cicone 1984). In this transformation, the multiple bands in a multispectral image can be visualized by defining an N-dimensional space, where N stands for the number of bands. As such, the yielded three axes can be interpreted as the degree of brightness, greenness, and wetness as calculated by the Tasselled Cap coefficients (Baig et al. 2014). Furthermore, the mean decrease in Gini index was used to find the most important explanatory variables. The mean decrease in Gini index was rendered as an output of RF in R software.
(a) (b)
Fig. 3. The calculated soil line with infrared (a) and red (b) bands as response.
Table 3. Summary of the vegetation indices.
Reference |
Equation vegetation index |
Vegetation index |
Rouse et al. (1974) |
NDVI = (NIR - R) / (NIR + R) |
NDVI |
Richardson & Wiegand (1977) |
DVI = a*NIR - R |
DVI |
Richardson & Wiegand (1977) |
PVI = (NIR – β* R- α) / √ (1+ β2) |
PVI |
Perry & Lautenschlager (1984) |
PVI1 = α* NIR – R + β/ √ (1+α 2) |
PVI1 |
Bannari et al. (1995) |
PVI2 = NIR – α* R + β/ √ (1+ α2) |
PVI2 |
Huete (1988) |
SAVI = [(NIR - R) / (NIR + R + L)]*(1 + L) |
SAVI |
Roujean & Breon (1995) |
RDVI = (NIR –red)/ √ ( NIR + red) |
RDVI |
Qi et al. (1994) |
MSAVI2 = (1/2)(2*(NIR+1)- √ ((2*NIR+1) 2-8*(NIR-R))) |
MSAVI2 |
Richardson & Wiegand (1977) |
RVI = R / NIR |
RVI |
Clevers (1989) |
WDVI = NIR – α*R |
WDVI |
Random forest classification algorithm
The RF classifier, suggested by Breiman (2001) is a supervised classification technique that is based on decision trees and uses improved
bagging and bootstrapping techniques. RF is one of the top supervised classification models that can handle numerous predictor variables and is thus able to cope with complexities in data dimensionality (Cutler et al. 2007). The algorithm is generally a mixture of randomized decision trees and aggregation of average predictions (Nandhini & Porkodi 2017) with a variety of usages in remote sensing-based methods. RF classifier is formed by a large number of trees, each is grown within randomly-selected training pixels, which are then used to train the classifier. On the other hand, the remaining samples called out-of-bag (oob) samples are used for the accuracy assessment. In the current implementation used in this study, there are two parameters needed to be defined, namely: 1) ntree: number of trees to grow and 2) mtry: number of variables to split each node. An optimization of the classifier using these parameters can result in high classification performance.
After the optimization of the final model by 70% data set, the quality of the RF prediction was further checked using 30% of data as test set to estimate overall accuracy (OA), sensitivity, and specificity. The OA is the proportion of observations correctly classified by the predictive model, sensitivity is the rate of observations correctly classified in a target class and specificity is the rate of correctly classified observations of the non-target class (Golino & Gomes 2014). Further the model performance for each class using receiver operating characteristic (ROC) analysis was examined (Fawcett 2006). Since ROC provides a highly visual account of a model performance, the ROC and Area under a ROC curve, a two-dimensional depiction of classifier performance were calculated (Hanley & McNeil 1982; Bradley 1997). With respect to the fact that the AUC is a portion of the area of the square unit, its value ranges between 0 and 1.
In this study, RF was used to classify forest attributes. All the above-mentioned extracted indices were used as the explanatory variables for the classification of the BA, CC and SPH. The error of RF was measured by cross-validations using out of bag samples (Breiman 2001). The optimum values of ntree and mtry were finally selected based on the lowest oob error.
The simplest way to find the optimum value for mtry is the calculation of the square root of the sum of explanatory variables. According to Mohammadi et al. (2017) who suggested increasing a number of mtry to ± 3 for each of the attributes, a range of mtry between 2 and 8 was examined. Also, the parameter of the number of trees across a broad range of values from 100 to 1000 was varied. The optimized RF classification model was employed using a combination of the caret (Kuhn 2015), random forest (Liaw & Wiener 2002), and e1071 (Meyer & Wien 2015) R packages. RF model Loops established using "For" command. In this study, a total of 70 models were tested using different combinations of ntree and mtry values for the selection of final three models based on the lowest oob error (optimized parameter values). The optimum ntree = 300 and mtry = 8 were used to classify BA. Similarly, the optimum ntree = 800 and mtry = 8 were used to classify CC, while the optimum ntree = 200 and mtry = 6 were used for classification of SPH (Fig. 4).
RESULTS
Basal area classification by RF classifiers
The accuracy assessment results indicated the success of RF classification and mapping of BA in the study area from Landsat 8-OLI imagery with OA and Kappa coefficient (KC) values 78% and 0.72, respectively (Table 4). The results indicated that the oob had lowest error in classification and mapping (Fig. 4a) of BA, using RF algorithm by the combination of mtry = 8 and ntree = 300. Furthermore, statistical parameters were calculated and analysed including sensitivity, specificity, OA and KC of each class. The results showed that the highest overall accuracies calculated in the class 5 with 7-9 m2 ha-1 and class 6 with >9 m2 ha-1 (Table 4).
In this study, the ROC and AUC of BA were calculated (Fig. 5). The results indicated that the AUC values in all classes of BA were higher than 0.90. This result indicated that the RF classifiers had accurate prediction in 6 BA classes of forest stands (Fig. 5).
(a)
(b)
(c)
Fig. 4. Effect of the ntree and mtry on the out of bag error for the basal area RF classification (a), canopy cover RF classification (b), stem number per hectare RF classification (c) using the 22 explanatory variables.
Table 4. Per class accuracies of BA using Random Forest classifiers.
Overall accuracy = 78% Kappa coefficient = 0.72 |
||||||||
Prediction |
Reference |
Sensitivity |
Specificity |
|||||
|
0-1 |
1-3 |
3-5 |
5-7 |
7-9 |
>9 |
|
|
0-1 |
5 |
1 |
0 |
0 |
0 |
0 |
0.62 |
0.95 |
1-3 |
3 |
4 |
1 |
0 |
0 |
0 |
0.80 |
0.82 |
3-5 |
0 |
0 |
5 |
1 |
0 |
0 |
0.83 |
0.95 |
5-7 |
0 |
0 |
0 |
2 |
0 |
0 |
0.66 |
1.00 |
7-9 |
0 |
0 |
0 |
0 |
3 |
0 |
1.00 |
1.00 |
>9 |
0 |
0 |
0 |
0 |
0 |
2 |
1.00 |
1.00 |
(a) (b)
(c) (d)
(e) (f)
Fig. 5. ROC and AUC curve in six BA classes.
According to mean decrease Gini, values of MSAVI2 and SAVI that used additional bands in the classification of BA, had larger importance than other original Landsat 8-OLI image and additional bands (Fig. 6(. The final results of classification and mapping of BA in Zagros open forest stands in six classes from Landsat 8-OLI image data using RF classifiers were shown in Fig. 7.
Canopy cover classification by RF classifiers
The information about forest canopy cover is important in the context of forest management in this ecosystem. The obtained accuracies indicated OA and KC values of 82% and 0.73, respectively for RF classification of forest canopy cover (Table 5). The lowest amount of oob validation for forest canopy cover classification with the RF algorithm was achieved by the combination of mtry = 8 and ntree = 800 (Fig. 4b). Statistical parameters including sensitivity, specificity, OA and KC indicated that the highest accuracy in term of sensitivity and specificity values was found in CC class 2 with sparse (10-25%) canopy cover (Table 5). The ROC curve and AUC of forest canopy cover were calculated (Fig. 8).
The results indicated that the AUC values in all classes of CC were higher than 0.93. This result indicated that the RF classifiers had well prediction in 4 CC classes of forest stands (Fig. 8).
According to mean decrease Gini, the highest importance values were achieved by RVI and NDVI used as artificial bands in the classification of forest canopy cover (Fig. 9). The results indicated that the PVI had the lowest variable importance for the classification of forest canopy cover in the study area.
The final map of forest canopy cover classification in 4 classes is shown in Fig. 10.
Fig. 6. Statistical analysis of the 22 bands in the classification of BA from mean decrease Gini values.
Fig. 7. Classification map of BA from Landsat 8-OLI imagery using Random Forest classifiers.
Table 5. Per class accuracies of CC using RF classifiers.
Overall accuracy = 82% Kappa coefficient = 0.73 |
||||||
Prediction |
Reference |
Sensitivity |
Specificity |
|||
|
very sparse |
sparse |
semi dense |
dense |
|
|
very sparse |
2 |
0 |
0 |
0 |
0.67 |
1.00 |
sparse |
1 |
3 |
1 |
0 |
1.00 |
0.96 |
semi dense |
0 |
0 |
9 |
2 |
0.82 |
0.88 |
dense |
0 |
0 |
2 |
9 |
0.82 |
0.88 |
(a) (b)
(c) (d)
Fig. 8. ROC and area under an ROC curve (AUC) in four CC classes.
Fig. 9. Statistical analysis of the 22 bands in the classification of CC from mean decrease Gini values.
Fig. 10. Classification map of CC from Landsat 8-OLI imagery using Random Forest classifiers.
Stem per hectare classification by RF classifiers
In this study, SPH was accurately classified across the study site in Zagros forests. Information on SPH is crucially important in the context of forest management, particularly across open forests exposed to many anthropogenic and natural hazards.
This classification map will help to assess the status of the Zagros open forest stands for subsequent comparison evaluations to determine the rate of change in SPH in the study area.
The results indicated the success of RF classification and mapping of SPH from Landsat 8-OLI images with OA and KC values as 75% and 0.69, respectively (Table 6). The results have shown that the oob with 13.89% had lowest error (Fig. 4c) of SPH using RF
algorithm with combination of mtry = 6 and ntree = 200. The results have shown the highest overall accuracies calculated in class 5 with 80-100 stems ha-1 and class 6 with >100 stems ha-1 (Table 6). The ROC curve and AUC of SPH were calculated (Fig. 11). The results indicated that the AUC values in all classes of stem per hectare were higher than 0.95. This result indicated the RF classifiers had well prediction in 6 of SPH classes of forest stands (Fig. 11).
The results of this study indicated that the mean decrease Gini values of SAVI and MSAVI2 used in classification of SPH as artificial bands had the highest importance than other bands (Fig. 12). The final results of classification and mapping of forest SPH in Zagros open forest stands in six classes from Landsat 8-OLI image data using RF classifiers are shown in Fig. 13.
Table 6. Per class accuracies of SPH using RF classifiers.
Overall accuracy = 75% Kappa coefficient = 0.69 |
||||||||
Prediction |
Reference |
Sensitivity |
Specificity |
|||||
|
0-20 |
20-40 |
40-60 |
60-80 |
80-100 |
>100 |
|
|
0-20 |
4 |
2 |
1 |
0 |
0 |
0 |
0.67 |
0.86 |
20-40 |
2 |
5 |
0 |
0 |
0 |
0 |
0.71 |
0.90 |
40-60 |
0 |
0 |
4 |
2 |
0 |
0 |
0.80 |
0.91 |
60-80 |
0 |
0 |
0 |
3 |
0 |
0 |
0.60 |
1.00 |
80-100 |
0 |
0 |
0 |
0 |
2 |
0 |
1.00 |
1.00 |
>100 |
0 |
0 |
0 |
0 |
0 |
3 |
1.00 |
1.00 |
(a) (b)
(c) (d)
(e) (f)
Fig. 11. ROC and Area under an ROC curve (AUC) in six SPH classes.
Fig. 12. Statistical analysis of 22 bands in the classification of SPH from mean decrease Gini values.
Fig. 13. Classified map of SPH using Landsat 8-OLI by RF classification model.
DISCUSSION AND CONCLUSION
The main objective of this study was mapping of a number of forest attributes on the OLI imagery data using RF algorithm across a portion of open Zagros forest in Iran. The highest accuracy was achieved for CC classification followed by BA and SPH.
The results of classifying CC exceeded those achieved in other comparable studies. Sarouei
(1999) and Naseri (2003) reported that the CC classification on Landsat 7 data in Zagros forests resulted in OA ranging from 66%–70%, KC from 0.3-0.45 using parametric algorithms. Abodlahi et al. (2010) reported OA = 53.68% and 64.54%, as well as KC = 0.21 and 0.43 from Landsat7-ETM+ and IRS-Liss3 using parametric algorithm in Zagros forest respectively. Similarly, Joshi et al. (2006) evaluated CC by ETM+ resulting in OA = 34.4% and KC = 0.23 using Maximum Likelihood Classification and OA = 59% and KC = 0.52 using Artificial Neural Network.
The results of classifying BA and SPH are roughly comparable with Abedi & bonyad (2015) used non parametric k Nearest Neighbor (KNN) algorithm on IRS-P6 LISS III satellite image data in a very dense forest to estimate BA and SPH in the Caspian forests of Iran. They reported 80% and 0.57 of OA and KC values for KNN classification map of BA and OA = 94%, KC=0.79 for that of SPH.
According to the results from statistical analysis such as mean decrease Gini criterion, the RVI and NDVI additional bands were the most important variables for the classification and mapping of CC from Landsat 8-OLI image data in open forest stands of the study area. In other word, the slope based VIs had obviously the largest amount of explanatory power in RF model concerning to CC classification. For BA and SPH, two distance-based vegetation indices were the most important variables. Naghavi et al. (2014) reported that the slope- and distance-based VIs showed higher accuracy for RF model concerning to CC estimation in Zagros open forest in comparison with the applied set of bands including only distance based VIs as explanatory variables. The results showed that soil reflectance has more influence on estimating of BA and SPH. The MSAVI2 and SAVI additional bands were used in the classification of BA and SPH, because the mean decrease Gini index analysis (Figs. 6 and 12) indicated that these two additional bands were the most important variables in the classification and mapping of BA and SPH from Landsat 8-OLI image data in open forest stands of the study area. As a result, distance based VIs have to be used to minimize the effects caused by soil background reflectance. Sivanpillai et al. (2006) reported that transformed bands of ETM+ including RVI, NDVI and TCT components did not result in any improvement in estimating SPH in east Texas forest. The present results indicated that the RF classifier algorithm provided comparatively successful mapping results of quantitative attributes in Zagros open forests of Iran from Landsat 8-OLI image data.
DVI and TCT components did not result in any improvement in estimating SPH in east Texas forest. The present results indicated that the RF classifier algorithm provided comparatively successful mapping results of quantitative attributes in Zagros open forests of Iran from Landsat 8-OLI image data.