김두두의 블로그
R-What is boosting.cv? 본문
?boosting.cv를 이용해 알아봤다.
Description: The data are divided into v non-overlapping subsets of roughly equal size. Then, boosting is applied on (v-1) of the subsets. Finally, predictions are made for the left our subsets, and the process is repeated for each of the v sets.
v-fold cross validation을 하고 boosting에 v-1개의 하위집합으로 적용시킨다는 뜻인 듯 하다. (찾아보니 아주 잘 설명된 블로그가 나왔다!!)
boosting과 bagging은 모두 ensemble method지만, bagging에서 만약 100개의 모델을 만들면 그 모델들은 모두 병렬적이지만, boosting은 model끼리 영향을 준다는 차이점이 있다. (독립적과 종속적의 차이정도랄까?)
Usage:
boosting.cv(formula, data, v = 10, boos = TRUE, mfinal = 100,
coeflearn = "Breiman", control, par=FALSE)
###각 내용###
formula
a formula, as in the lm function.
data
a data frame in which to interpret the variables named in formula
boos
if TRUE (by default), a bootstrap sample of the training set is drawn using the weights for each observation on that iteration. If FALSE, every observation is used with its weights.
v
An integer, specifying the type of v-fold cross validation. Defaults to 10. If v is set as the number of observations, leave-one-out cross validation is carried out. Besides this, every value between two and the number of observations is valid and means that roughly every v-th observation is left out.
mfinal
an integer, the number of iterations for which boosting is run or the number of trees to use. Defaults to mfinal=100 iterations.
coeflearn
if 'Breiman'(by default), alpha=1/2ln((1-err)/err) is used. If 'Freund' alpha=ln((1-err)/err) is used. In both cases the AdaBoost.M1 algorithm is used and alpha is the weight updating coefficient. On the other hand, if coeflearn is 'Zhu' the SAMME algorithm is implemented with alpha=ln((1-err)/err)+ ln(nclasses-1).
control
options that control details of the rpart algorithm. See rpart.control for more details.
par
if TRUE, the cross validation process is runned in parallel. If FALSE (by default), the function runs without parallelization.
boosting함수와 usage는 동일한 것으로 보인다.
#boosting함수의 usage
boosting(formula, data, boos = TRUE, mfinal = 100, coeflearn = 'Breiman',
control,...)
예제)
## rpart library should be loaded
data(iris)
iris.boostcv <- boosting.cv(Species ~ ., v=2, data=iris, mfinal=5,
control=rpart.control(cp=0.01))
iris.boostcv[-1]
출처: https://www.rdocumentation.org/packages/adabag/versions/4.2/topics/boosting.cv
boosting.cv function - RDocumentation
Alfaro, E., Gamez, M. and Garcia, N. (2013): ``adabag: An R Package for Classification with Boosting and Bagging''. Journal of Statistical Software, Vol 54, 2, pp. 1--35. Alfaro, E., Garcia, N., Gamez, M. and Elizondo, D. (2008): ``Bankruptcy forecasting:
www.rdocumentation.org
근데 말로만 들었지 이렇게 오래걸릴줄이야..;;
> adaboost_cv=boosting.cv(Species~.,data=iris)
i: 1 Sat May 15 16:06:11 2021
i: 2 Sat May 15 16:07:10 2021
i: 3 Sat May 15 16:08:07 2021
i: 4 Sat May 15 16:09:11 2021
지금 막 5번째가 나오는 중이다...대체 언제까지 널 기다려야 하는거니...
> adaboost_cv=boosting.cv(Species~.,data=iris)
i: 1 Sat May 15 16:17:16 2021
i: 2 Sat May 15 16:18:27 2021
i: 3 Sat May 15 16:19:42 2021
i: 4 Sat May 15 16:20:58 2021
i: 5 Sat May 15 16:22:11 2021
i: 6 Sat May 15 16:23:20 2021
i: 7 Sat May 15 16:24:33 2021
i: 8 Sat May 15 16:25:50 2021
i: 9 Sat May 15 16:27:05 2021
i: 10 Sat May 15 16:28:16 2021
한 줄당 1분씩 걸렸다.
boosting과 boosting.cv의 좋은 점은 바로 table을 따로 지정안해도 $confusion을 통해 confusion matrix를 구할 수 있는 점인 것 같다!
'it' 카테고리의 다른 글
[파이썬] 백준 2525번 오븐시계 (0) | 2022.03.04 |
---|---|
[python] 백준 2562번- 배열과 최댓값 (0) | 2022.03.03 |
R-adaboost 중 error: Error in `[.data.frame`(data, , as.character(formula[[2]])) (0) | 2021.05.15 |
R-bagging함수 (데이터프레임) (0) | 2021.05.15 |
HTML-sublime text 설치하고 간단히 만들어보기 (0) | 2021.05.15 |