Help on class RandomizedSearchCV in module sklearn.model_selection._search:
class RandomizedSearchCV(BaseSearchCV)
| Randomized search on hyper parameters.
|
| RandomizedSearchCV implements a "fit" and a "score" method.
| It also implements "predict", "predict_proba", "decision_function",
| "transform" and "inverse_transform" if they are implemented in the
| estimator used.
|
| The parameters of the estimator used to apply these methods are optimized
| by cross-validated search over parameter settings.
|
| In contrast to GridSearchCV, not all parameter values are tried out, but
| rather a fixed number of parameter settings is sampled from the specified
| distributions. The number of parameter settings that are tried is
| given by n_iter.
|
| If all parameters are presented as a list,
| sampling without replacement is performed. If at least one parameter
| is given as a distribution, sampling with replacement is used.
| It is highly recommended to use continuous distributions for continuous
| parameters.
|
| Read more in the :ref:`User Guide <randomized_parameter_search>`.
|
| Parameters
| ----------
| estimator : estimator object.
| A object of that type is instantiated for each grid point.
| This is assumed to implement the scikit-learn estimator interface.
| Either estimator needs to provide a ``score`` function,
| or ``scoring`` must be passed.
|
| param_distributions : dict
| Dictionary with parameters names (string) as keys and distributions
| or lists of parameters to try. Distributions must provide a ``rvs``
| method for sampling (such as those from scipy.stats.distributions).
| If a list is given, it is sampled uniformly.
|
| n_iter : int, default=10
| Number of parameter settings that are sampled. n_iter trades
| off runtime vs quality of the solution.
|
| scoring : string, callable, list/tuple, dict or None, default: None
| A single string (see :ref:`scoring_parameter`) or a callable
| (see :ref:`scoring`) to evaluate the predictions on the test set.
|
| For evaluating multiple metrics, either give a list of (unique) strings
| or a dict with names as keys and callables as values.
|
| NOTE that when using custom scorers, each scorer should return a single
| value. Metric functions returning a list/array of values can be wrapped
| into multiple scorers that return one value each.
|
| See :ref:`multimetric_grid_search` for an example.
|
| If None, the estimator's default scorer (if available) is used.
|
| fit_params : dict, optional
| Parameters to pass to the fit method.
|
| .. deprecated:: 0.19
| ``fit_params`` as a constructor argument was deprecated in version
| 0.19 and will be removed in version 0.21. Pass fit parameters to
| the ``fit`` method instead.
|
| n_jobs : int, default=1
| Number of jobs to run in parallel.
|
| pre_dispatch : int, or string, optional
| Controls the number of jobs that get dispatched during parallel
| execution. Reducing this number can be useful to avoid an
| explosion of memory consumption when more jobs get dispatched
| than CPUs can process. This parameter can be:
|
| - None, in which case all the jobs are immediately
| created and spawned. Use this for lightweight and
| fast-running jobs, to avoid delays due to on-demand
| spawning of the jobs
|
| - An int, giving the exact number of total jobs that are
| spawned
|
| - A string, giving an expression as a function of n_jobs,
| as in '2*n_jobs'
|
| iid : boolean, default=True
| If True, the data is assumed to be identically distributed across
| the folds, and the loss minimized is the total loss per sample,
| and not the mean loss across the folds.
|
| cv : int, cross-validation generator or an iterable, optional
| Determines the cross-validation splitting strategy.
| Possible inputs for cv are:
| - None, to use the default 3-fold cross validation,
| - integer, to specify the number of folds in a `(Stratified)KFold`,
| - An object to be used as a cross-validation generator.
| - An iterable yielding train, test splits.
|
| For integer/None inputs, if the estimator is a classifier and ``y`` is
| either binary or multiclass, :class:`StratifiedKFold` is used. In all
| other cases, :class:`KFold` is used.
|
| Refer :ref:`User Guide <cross_validation>` for the various
| cross-validation strategies that can be used here.
|
| refit : boolean, or string default=True
| Refit an estimator using the best found parameters on the whole
| dataset.
|
| For multiple metric evaluation, this needs to be a string denoting the
| scorer that would be used to find the best parameters for refitting
| the estimator at the end.
|
| The refitted estimator is made available at the ``best_estimator_``
| attribute and permits using ``predict`` directly on this
| ``RandomizedSearchCV`` instance.
|
| Also for multiple metric evaluation, the attributes ``best_index_``,
| ``best_score_`` and ``best_parameters_`` will only be available if
| ``refit`` is set and all of them will be determined w.r.t this specific
| scorer.
|
| See ``scoring`` parameter to know more about multiple metric
| evaluation.
|
| verbose : integer
| Controls the verbosity: the higher, the more messages.
|
| random_state : int, RandomState instance or None, optional, default=None
| Pseudo random number generator state used for random uniform sampling
| from lists of possible values instead of scipy.stats distributions.
| If int, random_state is the seed used by the random number generator;
| If RandomState instance, random_state is the random number generator;
| If None, the random number generator is the RandomState instance used
| by `np.random`.
|
| error_score : 'raise' (default) or numeric
| Value to assign to the score if an error occurs in estimator fitting.
| If set to 'raise', the error is raised. If a numeric value is given,
| FitFailedWarning is raised. This parameter does not affect the refit
| step, which will always raise the error.
|
| return_train_score : boolean, optional
| If ``False``, the ``cv_results_`` attribute will not include training
| scores.
|
| Current default is ``'warn'``, which behaves as ``True`` in addition
| to raising a warning when a training score is looked up.
| That default will be changed to ``False`` in 0.21.
| Computing training scores is used to get insights on how different
| parameter settings impact the overfitting/underfitting trade-off.
| However computing the scores on the training set can be computationally
| expensive and is not strictly required to select the parameters that
| yield the best generalization performance.
|
| Attributes
| ----------
| cv_results_ : dict of numpy (masked) ndarrays
| A dict with keys as column headers and values as columns, that can be
| imported into a pandas ``DataFrame``.
|
| For instance the below given table
|
| +--------------+-------------+-------------------+---+---------------+
| | param_kernel | param_gamma | split0_test_score |...|rank_test_score|
| +==============+=============+===================+===+===============+
| | 'rbf' | 0.1 | 0.8 |...| 2 |
| +--------------+-------------+-------------------+---+---------------+
| | 'rbf' | 0.2 | 0.9 |...| 1 |
| +--------------+-------------+-------------------+---+---------------+
| | 'rbf' | 0.3 | 0.7 |...| 1 |
| +--------------+-------------+-------------------+---+---------------+
|
| will be represented by a ``cv_results_`` dict of::
|
| {
| 'param_kernel' : masked_array(data = ['rbf', 'rbf', 'rbf'],
| mask = False),
| 'param_gamma' : masked_array(data = [0.1 0.2 0.3], mask = False),
| 'split0_test_score' : [0.8, 0.9, 0.7],
| 'split1_test_score' : [0.82, 0.5, 0.7],
| 'mean_test_score' : [0.81, 0.7, 0.7],
| 'std_test_score' : [0.02, 0.2, 0.],
| 'rank_test_score' : [3, 1, 1],
| 'split0_train_score' : [0.8, 0.9, 0.7],
| 'split1_train_score' : [0.82, 0.5, 0.7],
| 'mean_train_score' : [0.81, 0.7, 0.7],
| 'std_train_score' : [0.03, 0.03, 0.04],
| 'mean_fit_time' : [0.73, 0.63, 0.43, 0.49],
| 'std_fit_time' : [0.01, 0.02, 0.01, 0.01],
| 'mean_score_time' : [0.007, 0.06, 0.04, 0.04],
| 'std_score_time' : [0.001, 0.002, 0.003, 0.005],
| 'params' : [{'kernel' : 'rbf', 'gamma' : 0.1}, ...],
| }
|
| NOTE
|
| The key ``'params'`` is used to store a list of parameter
| settings dicts for all the parameter candidates.
|
| The ``mean_fit_time``, ``std_fit_time``, ``mean_score_time`` and
| ``std_score_time`` are all in seconds.
|
| For multi-metric evaluation, the scores for all the scorers are
| available in the ``cv_results_`` dict at the keys ending with that
| scorer's name (``'_<scorer_name>'``) instead of ``'_score'`` shown
| above. ('split0_test_precision', 'mean_train_precision' etc.)
|
| best_estimator_ : estimator or dict
| Estimator that was chosen by the search, i.e. estimator
| which gave highest score (or smallest loss if specified)
| on the left out data. Not available if ``refit=False``.
|
| For multi-metric evaluation, this attribute is present only if
| ``refit`` is specified.
|
| See ``refit`` parameter for more information on allowed values.
|
| best_score_ : float
| Mean cross-validated score of the best_estimator.
|
| For multi-metric evaluation, this is not available if ``refit`` is
| ``False``. See ``refit`` parameter for more information.
|
| best_params_ : dict
| Parameter setting that gave the best results on the hold out data.
|
| For multi-metric evaluation, this is not available if ``refit`` is
| ``False``. See ``refit`` parameter for more information.
|
| best_index_ : int
| The index (of the ``cv_results_`` arrays) which corresponds to the best
| candidate parameter setting.
|
| The dict at ``search.cv_results_['params'][search.best_index_]`` gives
| the parameter setting for the best model, that gives the highest
| mean score (``search.best_score_``).
|
| For multi-metric evaluation, this is not available if ``refit`` is
| ``False``. See ``refit`` parameter for more information.
|
| scorer_ : function or a dict
| Scorer function used on the held out data to choose the best
| parameters for the model.
|
| For multi-metric evaluation, this attribute holds the validated
| ``scoring`` dict which maps the scorer key to the scorer callable.
|
| n_splits_ : int
| The number of cross-validation splits (folds/iterations).
|
| Notes
| -----
| The parameters selected are those that maximize the score of the held-out
| data, according to the scoring parameter.
|
| If `n_jobs` was set to a value higher than one, the data is copied for each
| parameter setting(and not `n_jobs` times). This is done for efficiency
| reasons if individual jobs take very little time, but may raise errors if
| the dataset is large and not enough memory is available. A workaround in
| this case is to set `pre_dispatch`. Then, the memory is copied only
| `pre_dispatch` many times. A reasonable value for `pre_dispatch` is `2 *
| n_jobs`.
|
| See Also
| --------
| :class:`GridSearchCV`:
| Does exhaustive search over a grid of parameters.
|
| :class:`ParameterSampler`:
| A generator over parameter settins, constructed from
| param_distributions.
|
| Method resolution order:
| RandomizedSearchCV
| BaseSearchCV
| abc.NewBase
| sklearn.base.BaseEstimator
| sklearn.base.MetaEstimatorMixin
| __builtin__.object
|
| Methods defined here:
|
| __init__(self, estimator, param_distributions, n_iter=10, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', random_state=None, error_score='raise', return_train_score='warn')
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __abstractmethods__ = frozenset([])
|
| ----------------------------------------------------------------------
| Methods inherited from BaseSearchCV:
|
| decision_function(*args, **kwargs)
| Call decision_function on the estimator with the best found parameters.
|
| Only available if ``refit=True`` and the underlying estimator supports
| ``decision_function``.
|
| Parameters
| -----------
| X : indexable, length n_samples
| Must fulfill the input assumptions of the
| underlying estimator.
|
| fit(self, X, y=None, groups=None, **fit_params)
| Run fit with all sets of parameters.
|
| Parameters
| ----------
|
| X : array-like, shape = [n_samples, n_features]
| Training vector, where n_samples is the number of samples and
| n_features is the number of features.
|
| y : array-like, shape = [n_samples] or [n_samples, n_output], optional
| Target relative to X for classification or regression;
| None for unsupervised learning.
|
| groups : array-like, with shape (n_samples,), optional
| Group labels for the samples used while splitting the dataset into
| train/test set.
|
| **fit_params : dict of string -> object
| Parameters passed to the ``fit`` method of the estimator
|
| inverse_transform(*args, **kwargs)
| Call inverse_transform on the estimator with the best found params.
|
| Only available if the underlying estimator implements
| ``inverse_transform`` and ``refit=True``.
|
| Parameters
| -----------
| Xt : indexable, length n_samples
| Must fulfill the input assumptions of the
| underlying estimator.
|
| predict(*args, **kwargs)
| Call predict on the estimator with the best found parameters.
|
| Only available if ``refit=True`` and the underlying estimator supports
| ``predict``.
|
| Parameters
| -----------
| X : indexable, length n_samples
| Must fulfill the input assumptions of the
| underlying estimator.
|
| predict_log_proba(*args, **kwargs)
| Call predict_log_proba on the estimator with the best found parameters.
|
| Only available if ``refit=True`` and the underlying estimator supports
| ``predict_log_proba``.
|
| Parameters
| -----------
| X : indexable, length n_samples
| Must fulfill the input assumptions of the
| underlying estimator.
|
| predict_proba(*args, **kwargs)
| Call predict_proba on the estimator with the best found parameters.
|
| Only available if ``refit=True`` and the underlying estimator supports
| ``predict_proba``.
|
| Parameters
| -----------
| X : indexable, length n_samples
| Must fulfill the input assumptions of the
| underlying estimator.
|
| score(self, X, y=None)
| Returns the score on the given data, if the estimator has been refit.
|
| This uses the score defined by ``scoring`` where provided, and the
| ``best_estimator_.score`` method otherwise.
|
| Parameters
| ----------
| X : array-like, shape = [n_samples, n_features]
| Input data, where n_samples is the number of samples and
| n_features is the number of features.
|
| y : array-like, shape = [n_samples] or [n_samples, n_output], optional
| Target relative to X for classification or regression;
| None for unsupervised learning.
|
| Returns
| -------
| score : float
|
| transform(*args, **kwargs)
| Call transform on the estimator with the best found parameters.
|
| Only available if the underlying estimator supports ``transform`` and
| ``refit=True``.
|
| Parameters
| -----------
| X : indexable, length n_samples
| Must fulfill the input assumptions of the
| underlying estimator.
|
| ----------------------------------------------------------------------
| Data descriptors inherited from BaseSearchCV:
|
| classes_
|
| grid_scores_
|
| ----------------------------------------------------------------------
| Methods inherited from sklearn.base.BaseEstimator:
|
| __getstate__(self)
|
| __repr__(self)
|
| __setstate__(self, state)
|
| get_params(self, deep=True)
| Get parameters for this estimator.
|
| Parameters
| ----------
| deep : boolean, optional
| If True, will return the parameters for this estimator and
| contained subobjects that are estimators.
|
| Returns
| -------
| params : mapping of string to any
| Parameter names mapped to their values.
|
| set_params(self, **params)
| Set the parameters of this estimator.
|
| The method works on simple estimators as well as on nested objects
| (such as pipelines). The latter have parameters of the form
| ``<component>__<parameter>`` so that it's possible to update each
| component of a nested object.
|
| Returns
| -------
| self
|
| ----------------------------------------------------------------------
| Data descriptors inherited from sklearn.base.BaseEstimator:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)