scikit learn - Pruning and Boosting in Decision Trees -
how can use pruning , boosting in decision trees based classification approach?
i have 10 features , 3000 samples. 
here example demonstrate how use boosting.
from sklearn.datasets import make_classification sklearn.ensemble import gradientboostingclassifier sklearn.tree import decisiontreeclassifier sklearn.cross_validation import stratifiedshufflesplit sklearn.metrics import classification_report  # generate artificial data x, y = make_classification(n_samples=3000, n_features=10, n_informative=2, flip_y=0.1, weights=[0.15, 0.85], random_state=0)  # train/test split split = stratifiedshufflesplit(y, n_iter=1, test_size=0.2, random_state=0) train_index, test_index = list(split)[0] x_train, y_train = x[train_index], y[train_index] x_test, y_test = x[test_index], y[test_index]  # boosting: many many weak classifiers (max_depth=1) refine sequentially # tree default base classifier estimator = gradientboostingclassifier(n_estimators=200, learning_rate=0.1, max_depth=1, random_state=0) estimator.fit(x_train, y_train) y_pred = estimator.predict(x_test) print(classification_report(y_test, y_pred))               precision    recall  f1-score   support            0       0.88      0.80      0.84       109           1       0.96      0.98      0.97       491  avg / total       0.94      0.94      0.94       600  # benchmark: standard tree tree_benchmark = decisiontreeclassifier(max_depth=3, class_weight='auto') tree_benchmark.fit(x_train, y_train) y_pred_benchmark = tree_benchmark.predict(x_test) print(classification_report(y_test, y_pred_benchmark))               precision    recall  f1-score   support            0       0.63      0.84      0.72       109           1       0.96      0.89      0.92       491  avg / total       0.90      0.88      0.89       600 
Comments
Post a Comment