Application of genetic algorithm (GA) to select input variables in support vector machine (SVM) for analyzing the occurrence of roach, Rutilus rutilus, in streams


1 *, and 1- Department of Environmental Sciences, Faculty of Natural Resources, University of Guilan, P.O. Box 1144, Sowmeh Sara, Guilan, Iran 2- Department of Applied Ecology, Ghent University, J. Plateaustraat 22, B-9000 Gent * Corresponding author?s E-mail:

2 O. Rafieyan*1, A. A. Darvishsefat2, S. Babaii1, A. Mataji1


Support vector machine (SVM) was used to analyze the occurrence of roach in Flemish stream basins (Belgium). Several habitat and physico?chemical variables were used as inputs for the model development. The biotic variable merely consisted of abundance data which was used for predicting presence/absence of roach. Genetic algorithm (GA) was combined with SVM in order to select the most important predictors for assessing the presence/absence of roach in the sampling sites. Before and after variable selection, the SVM were evaluated and compared by two predictive performances namely the percentage of Correctly Classified Instances (CCI %) and Cohen's kappa statistics (k). The obtained results showed that before variable selection, the SVM yielded a reliable performance but the prediction further improved after the combination of SVM with GA. According to the attribute weights, the habitat variables were more responsible than physico?chemical ones in assessing the presence/absence of fish in the streams. GA also presented that roach are more dependent on the habitat variables rather than on water quality ones. Though after variable selection the predictive performances increased, the attribute weights of SVM could be an alternative substitute for GA since all input variables can be evaluated in terms of their weights.


Adriaenssens, V., De Baets, B., Goethals,P.L.M. and De Pauw, N. (2004). Fuzzyrule–based models for decision supportin ecosystem management. Science ofthe Total Environment.319, 1–12.
Ambelu, A., Lock, K. and Goethals, P. (2010). Comparison of modelling techniques to predict macroinvertebrate community composition in rivers of Ethiopia. Ecologicalinformatic. 5, 147–152.
Begon, M., Harper, J.L. and Townsend, C.R. (1996). Ecology, Individuals, Population, and Communities, 3rd edn. Blackwell Science, Oxford.
Brabrand, A. and Faafeng, B. (1994). Habitat shift in roach, Rutilus rutilusinduced by the introduction of pike-perch, Stizostedion lucioperca. Limnologie. 25, 21–23.
Brosse, S. and Lek, S. (2000). Modelling roach (Rutilus rutilus) microhabitat using linear and nonlinear techniques. Freshwater Biology.44, 34–41.
Burges, C. J. C. (1998). A tutorial onsupport vector machines for patternrecognition.Data Mining and KnowledgeDiscovery.2, 121–167.
Cohen, J. (1960). A coefficient of agreementfor nominal scales. Educational andPsychological Measurement. 20, 37–46.
Copp, G. H.(1990). Shifts in themicrohabitat of larval and juvenile theroach, Rutilus rutilus L. in a floodplainchannel. Journal of Fish Biology.36, 683–692.
Application of genetic algorithm (GA) to select...244
Copp, G. H. (1992). An empirical model for predicting microhabitat of 0+ juvenile fishes in a lowland river catchment. Oecologia. 91, 338–345.
Dakou, E., D'heygere, T., Dedecker, A. P., Goethals, P.L.M., Lazaridou–Dimitriadou, M. and De Pauw, N. (2007). Decision tree models for prediction of macroinvertebrate taxa in the river Axios (Northern Greece). Aquatic Ecology.41, 399–411.
Dakou, E., Goethals, P.L.M., D’heygere, T.,Dedecker, A.P., Gabriels, W. and DePauw, N. (2006). Development ofartificial neural network modelspredicting macroinvertebrate taxa inthe river Axios (Northern Greece).Japanese Journal of Limnology.l5, 10–17.
Decoste, D. and Scholkopf, B. (2002). Training invariant support vector machines. MachineLearning.46, 161–190.
Dedecker, A. P., Goethas, P.L.M., Gabriels,W. and De Pauw, N. (2002).Comparison of Artificial NeuralNetwork (ANN) model developmentsmethods for prediction ofmacroinvertebrates communities in theZwalm river basin in Flanders,Belgium. The Scientific World Journal.2,96–104. D’heygere, T., Goethals, P. L. M. and De Pauw, N. (2006). Genetic algorithms for optimization of predictive ecosystems models based on decision trees and neural networks. EcologicalModelling.195, 20–29.
D’heygere, T., Goethals, P. L. M. and De Pauw, N. (2003). Use of genetic algorithms to select input variables in decision tree models for the prediction of benthic macroinverteberates. Ecological Modelling.160, 291–300.
Dzeroski, S., Demsar, D. and Grbovic, J. (2000). Predicting chemical parameters of river waterquality from bioindicator data. Applied Intelligence. 13, 7–17.
Eklöv, P. (1997). Effects of habitatcomplexity and prey abundance on thespatial and temporal distributions ofperch (Perca fluviatilis) and pike (Esoxlucius). Canadian Journal of Fisheries andAquatic Sciences. 54, 1520–1531.
Fielding, A. H. and Bell, J. F. (1997). Areview of methods for the assessment of prediction errors in conservationpresence/absence models. EnvironmentalConservation.24, 38–49.
Fischer, P. and Eckmann, R. (1997). Spatialdistribution of littoral fish species in alarge European lake, Lake Constance,Germany. Archiv für Hydrobiologie.140,91–116.
Garner, P. (1995). Suitability indices for juvenile 0+roach, Rutilus rutilus(L.) using point abundance sampling data. Regulated Rivers: Research and Management(SAUS).10, 99–104.
Goethals, P.L.M. and De Pauw, N. (2001). Development of a concept for integrated ecological river assessment in Flanders, Belgium. Journal of Limnology. 60, 7–16.
Goethals, P. L. M. (2005). Data driven development of predictive ecological models for benthic macroinvertebrates in rivers. PhD thesis. University of Ghent. 377 pp.
Goethals, P.L.M., Dedecker, A.P., Gabriels, W., Lek, S. and De Pauw, N. (2007). Applications of artificial neural networks predicting macroinvertebrates in freshwaters. Aquatic Ecology.41, 491–508.
Goethals, P.L.M., Dedcker, A., Gabriels, W. and De Pauw, N. (2002). Development andapplication of predictive river ecosystem models based on classification trees and artificial neural networks. Ecological informatics, Understanding ecology by biologically inspired computation. (ed. Recknagel), Springer, Berlin, 432 pp.
Goethals, P.L.M. and De Pauw, N. (2001). Development of a concept for integrated ecological river assessment in Flanders, Belgium. Journal of Limnology. 60, 7–16.
Goldberg, D. E. (1989). Genetic Algorithms in Search Optimization, and Machine Learning, Addison–Wesley, Reading, MA.Guo, Q., Kellya, M. and Graham, C.H. (2005). Support vector machines for predicting distribution of Sudden Oak Death in California. Ecological Modelling.182, 75–90.
Hoang, T.H., Lock, K., Mouton, A. and Goethals, P. L.M. (2010). Application of classification trees and support vector machines to model the presence of
Zarkami et al.,245macroinvertebrates in rivers in Vietnam. Ecological Informatic. 5, 140–146.
Holland, J.H. (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI.Jackson, D.A. and Harvey, H.H. (1997). Qualitative and quantitative sampling of lake fish communities. Canadian Journalof Fisheries and AquaticSciences.54, 2807–2813.
Jongman, R.H.G., Ter Braak, C. J. F. and Van Tongeren, O.F.R. (1995). Data Analysis in Community and Landscape Ecology, 2nd ed. Cambridge University Press, Cambridge, p. 299. Journal of Futures Markets.15, 953–970.
Keerthi, S.S., Shevade, S.K., Bhattacharyya, C. and Murthy, K.R.K. (2001). Improvements toPlatt's SMO algorithm for SVM classifier design. Neural Computation.13, 637–649.
Kohavi, R. (1995). A study of cross–validation and bootstrap for accuracy estimation and model selection. In:Lavrac, M., Wrobel, S. (Eds.), Proceedings of the International Joint Conference on Artificial Intelligence. Pp. 1137–1143.
Manel, S., Williams, H.C. and Ormerod, S.J. (2001). Evaluating presence–absence models in ecology, the need to account for prevalence. Journal of Applied Ecology.38, 921–931.
Manel, S., Dias, J.M., Buckton, S.T. and Ormerod, S.J. (1999). Alternatives methods for predicting species distribution, an illustration with Hialayan river birds. Journal of Applied Ecology.36,734–747.
Mouton, A.M., De Baets, B. and Goethals, P.L.M. (2009). Knowledge–based versus data–driven fuzzy habitatsuitability models for river management. Environmental modelling software. 24, 982–993.
Moyle, P.B. and Baltz, D.M. (1985). Microhabitat use by an assemblage of California stream fishes, developing criteria for in–stream flow determinations. Transactions of theAmerican Fisheries Society.114, 695–704.
Parsons, M., Thoms, M.C. and Horris, R.H. (2004). Development of a standard approach to river habitat assessment in Australia. EnvironmentalMonitoring and Assessment.98, 109–130.
Persson, L. (1983). Effects of intraspecific and interspecific competition on dynamics and size structure of a perch, Perca fluviatilis and a roach, Rutilus rutilus population. Oikos. 41, 26–32.
Platt, J. (1998). “Fast Training of Support Vector Machines using Sequential Minimal Optimization”. Advances in Kernel Methods–Support Vector Learning, eds: Schoelkopf, B., Burges, C. and Smola, A., MIT Press. Rossier, O., Castella, E. and Lachavanne, J.B. (1996). Influence of submerged aquatic vegetation on size class distribution of perch (Perca fluviatilis) and roach (Rutilus rutilus) in the littoral zone of Lake Geneva (Switzerland). AquaticSciences.58, 1–14.
Rossier, O. (1995). Spatial and temporal separation of littoral zone fishes of Lake Geneva (Switzerland–France). Hydrobiologia. 300/301,321–327.
Schoener, T. (1974). Resource partitioning in ecological communities. Science. 185, 27–39.Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer–Verlag, New York.Vose, M.D. (1999). Random heuristic search. TheoreticalComputer Science.229, 103–142.
Witten, I.H., Frank, E. and Hall, M.A. 2011. Data Mining, Practical Machine Learning Tools and Techniques. Morgan Kaufmann,San Francisco, 3rded. 629 pp.
Zarkami, R., Goethals, P. and De Pauw, N. (2010). Use of classification tree methods to study the habitat requirements of tench (Tinca tinca) (L., 1758). Caspian journal of environmental science. 8, 55–63.