random forest - How to get the probability per instance in classifications models in spark.mllib -


i'm using spark.mllib.classification.{logisticregressionmodel, logisticregressionwithsgd} , spark.mllib.tree.randomforest classification. using these packages produce classification models. these models predict specific class per instance. in weka, can exact probability each instance of each class. how can using these packages?

in logisticregressionmodel can set threshold. i've created function check results each point on different threshold. cannot done randomforest (see how set cutoff while training data in random forest in spark)

unfortunately, mllib can't probabilities per instance classification models till version 1.4.1.

there jira issues (spark-4362 , spark-6885) concerning exact topic in progress i'm writing answer now. nevertheless, issue seems on hold since november 2014

there no way posterior probability of prediction naive baye's model during prediction. should made available along label.

and here note @sean-owen on mailing list on similar topic regarding naive bayes classification algorithm:

this discussed on mailing list. can't probabilities out directly now, can hack bit internal data structures of naivebayesmodel , compute there.

reference : source.

major edit: issue has been resolved spark 1.5.0. please refer jira issue more details.


Comments

Popular posts from this blog

c# - Better 64-bit byte array hash -

webrtc - Which ICE candidate am I using and why? -

php - Zend Framework / Skeleton-Application / Composer install issue -