tulika  goyal tulika goyal

Comparing a Random Forest to a CART model (Part 2)

Random forest is one of the most commonly used algorithm in Kaggle competitions. Along with a good predictive power, Random forest model are pretty simple to build. We have previously explained the algorithm of a random forest ( Introduction to Random Forest ). This article is the second part of the series on comparison of a random forest with a CART model. In the first article, we took an example of an inbuilt R-dataset to predict the classification of an specie. In this article we will build a random forest model on the same dataset to compare the performance with previously built CART model. I did this experiment a week back and found the results very insightful. I recommend the reader to read the first part of this article (Last article) before reading this one.

Background on Dataset “Iris” 

Data set “iris” gives the measurements in centimeters of the variables : sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of Iris. The dataset has 150 cases (rows) and 5 variables (columns) named Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species. We intend to predict the Specie based on the 4 flower characteristic variables.

We will first load the dataset into R and then look at some of the key statistics. You can use the following codes to do so.

data(iris)
# look at the dataset
summary(iris)
# visually look at the dataset
qplot(Petal.Length,Petal.Width,colour=Species,data=iris)
tulika  goyal

tulika goyal Creator

B-tech 2nd year student of polymer science.

Suggested Creators

tulika  goyal