В этом примере показано, как использовать fitcauto
автоматически попробовать выбор типов модели классификации с различными гиперзначениями параметров, учитывая учебный предиктор и данные об ответе. По умолчанию функция использует Байесовую оптимизацию, чтобы выбрать и оценить модели. Если ваш обучающий набор данных содержит много наблюдений, можно использовать асинхронный последовательный алгоритм сокращения вдвое (ASHA) вместо этого. После того, как оптимизация завершена, fitcauto
возвращает модель, обученную на целом наборе данных, который, как ожидают, лучше всего классифицирует новые данные. Проверяйте производительность модели на тестовых данных.
Этот пример использует 1 994 данных о переписи, хранимых в census1994.mat
. Набор данных состоит из демографической информации из Бюро переписи США, которое может использоваться, чтобы предсказать, передает ли индивидуум 50 000$ в год.
Загрузите выборочные данные census1994
, который содержит обучающие данные adultdata
и тестовые данные adulttest
. Предварительно просмотрите первые несколько строк обучающего набора данных.
load census1994
head(adultdata)
ans=8×15 table
age workClass fnlwgt education education_num marital_status occupation relationship race sex capital_gain capital_loss hours_per_week native_country salary
___ ________________ __________ _________ _____________ _____________________ _________________ _____________ _____ ______ ____________ ____________ ______________ ______________ ______
39 State-gov 77516 Bachelors 13 Never-married Adm-clerical Not-in-family White Male 2174 0 40 United-States <=50K
50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse Exec-managerial Husband White Male 0 0 13 United-States <=50K
38 Private 2.1565e+05 HS-grad 9 Divorced Handlers-cleaners Not-in-family White Male 0 0 40 United-States <=50K
53 Private 2.3472e+05 11th 7 Married-civ-spouse Handlers-cleaners Husband Black Male 0 0 40 United-States <=50K
28 Private 3.3841e+05 Bachelors 13 Married-civ-spouse Prof-specialty Wife Black Female 0 0 40 Cuba <=50K
37 Private 2.8458e+05 Masters 14 Married-civ-spouse Exec-managerial Wife White Female 0 0 40 United-States <=50K
49 Private 1.6019e+05 9th 5 Married-spouse-absent Other-service Not-in-family Black Female 0 0 16 Jamaica <=50K
52 Self-emp-not-inc 2.0964e+05 HS-grad 9 Married-civ-spouse Exec-managerial Husband White Male 0 0 45 United-States >50K
Каждая строка содержит демографическую информацию для одного взрослого. Последний столбец salary
показывает, есть ли у человека зарплата, меньше чем или равная 50 000$ в год или больше, чем 50 000$ в год.
Найдите соответствующий классификатор для данных в adultdata
при помощи fitcauto
. По умолчанию, fitcauto
использует Байесовую оптимизацию, чтобы выбрать модели и их гиперзначения параметров, и вычисляет ошибку классификации перекрестных проверок (Validation loss
) для каждой модели. По умолчанию, fitcauto
предоставляет график оптимизации и итеративное отображение результатов оптимизации. Для получения дополнительной информации о том, как интерпретировать эти результаты, смотрите Многословное Отображение.
Установите веса наблюдения, попробуйте все доступные типы ученика и гиперпараметры, и задайте, чтобы запустить Байесовую оптимизацию параллельно, которая требует Parallel Computing Toolbox™. Из-за невоспроизводимости синхронизации параллели, параллельная Байесова оптимизация не обязательно приводит к восстанавливаемым результатам. Из-за сложности оптимизации этот процесс может занять время, особенно для больших наборов данных.
bayesianOptions = struct("UseParallel",true); [bayesianMdl,bayesianResults] = fitcauto(adultdata,"salary","Weights","fnlwgt", ... "Learners","all","OptimizeHyperparameters","all", ... "HyperparameterOptimizationOptions",bayesianOptions);
Warning: 'CategoricalPredictors' value must be empty or 'all' for a 'knn' learner. The function omits 'knn' from the list of learners.
Warning: Categorical predictors are not supported for a 'discr' learner. The function omits 'discr' from the list of learners.
Warning: Data set has more than 10000 observations. Because ASHA optimization often finds good solutions faster than Bayesian optimization for data sets with many observations, try specifying the 'Optimizer' field value as 'asha' in the 'HyperparameterOptimizationOptions' value structure.
Warning: It is recommended that you first standardize all numeric predictors when optimizing the Naive Bayes 'Width' parameter. Ignore this warning if you have done that.
Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6). Copying objective function to workers... Done copying objective function to workers. Learner types to explore: ensemble, kernel, linear, nb, svm, tree Total iterations (MaxObjectiveEvaluations): 180 Total time (MaxTime): Inf |===========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Estimated min | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 1 | 6 | Best | 0.17115 | 4.1409 | 0.17115 | 0.17115 | tree | MinLeafSize: 1308 | | | | | | | | | | MaxNumSplits: 129 | | | | | | | | | | SplitCriterion: deviance | | | | | | | | | | NumVariablesToSample: 9 | | 2 | 6 | Error | NaN | 5.8369 | 0.17115 | 0.17115 | svm | BoxConstraint: 0.0039078 | | | | | | | | | | KernelScale: 0.0041503 | | | | | | | | | | PolynomialOrder: 4 | | | | | | | | | | Standardize: false | | 3 | 6 | Accept | 0.21702 | 2.7086 | 0.17115 | 0.17115 | linear | Lambda: 0.0002108 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: ridge | | 4 | 6 | Accept | 0.18989 | 6.0701 | 0.17115 | 0.17115 | linear | Lambda: 0.051636 | | | | | | | | | | Learner: svm | | | | | | | | | | Regularization: ridge | | 5 | 6 | Accept | 0.24677 | 42.561 | 0.17115 | 0.17115 | kernel | Learner: logistic | | | | | | | | | | KernelScale: 0.11479 | | | | | | | | | | Lambda: 0.00051256 | | | | | | | | | | NumExpansionDimensions:3144 | | 6 | 6 | Best | 0.1622 | 1.6833 | 0.1622 | 0.1622 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 7 | 6 | Accept | 0.23856 | 48.252 | 0.1622 | 0.1622 | ensemble | Method: RUSBoost | | | | | | | | | | NumLearningCycles: 278 | | | | | | | | | | LearnRate: 0.04363 | | | | | | | | | | MinLeafSize: 9467 | | | | | | | | | | MaxNumSplits: 13 | | | | | | | | | | SplitCriterion: deviance | | | | | | | | | | NumVariablesToSample: NaN | | 8 | 6 | Accept | 0.17169 | 53.236 | 0.1622 | 0.1622 | ensemble | Method: LogitBoost | | | | | | | | | | NumLearningCycles: 190 | | | | | | | | | | LearnRate: 0.009595 | | | | | | | | | | MinLeafSize: 4927 | | | | | | | | | | MaxNumSplits: 1659 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 9 | 6 | Accept | 0.25327 | 22.206 | 0.1622 | 0.1622 | kernel | Learner: logistic | | | | | | | | | | KernelScale: 0.019076 | | | | | | | | | | Lambda: 8.2128e-08 | | | | | | | | | | NumExpansionDimensions: 903 | | 10 | 6 | Best | 0.14083 | 46.045 | 0.14083 | 0.1622 | ensemble | Method: AdaBoostM1 | | | | | | | | | | NumLearningCycles: 51 | | | | | | | | | | LearnRate: 0.010501 | | | | | | | | | | MinLeafSize: 264 | | | | | | | | | | MaxNumSplits: 10796 | | | | | | | | | | SplitCriterion: deviance | | | | | | | | | | NumVariablesToSample: NaN | |===========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Estimated min | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 11 | 6 | Accept | 0.24677 | 49.785 | 0.14083 | 0.1622 | kernel | Learner: svm | | | | | | | | | | KernelScale: 0.012471 | | | | | | | | | | Lambda: 6.2848e-08 | | | | | | | | | | NumExpansionDimensions:4735 | | 12 | 6 | Accept | 0.18178 | 112.83 | 0.14083 | 0.17115 | nb | DistributionNames: kernel | | | | | | | | | | Width: 531.86 | | | | | | | | | | Kernel: normal | | 13 | 6 | Accept | 0.22245 | 1.5098 | 0.14083 | 0.17115 | linear | Lambda: 5.2257e-08 | | | | | | | | | | Learner: svm | | | | | | | | | | Regularization: lasso | | 14 | 6 | Accept | 0.23856 | 37.43 | 0.14083 | 0.17115 | ensemble | Method: LogitBoost | | | | | | | | | | NumLearningCycles: 257 | | | | | | | | | | LearnRate: 0.0022309 | | | | | | | | | | MinLeafSize: 8893 | | | | | | | | | | MaxNumSplits: 6 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 15 | 6 | Accept | 0.16386 | 6.977 | 0.14083 | 0.17115 | linear | Lambda: 1.2971e-06 | | | | | | | | | | Learner: svm | | | | | | | | | | Regularization: ridge | | 16 | 6 | Accept | 0.148 | 177.2 | 0.14083 | 0.17115 | svm | BoxConstraint: 5.3771 | | | | | | | | | | KernelScale: 13.172 | | | | | | | | | | PolynomialOrder: 2 | | | | | | | | | | Standardize: true | | 17 | 6 | Accept | 0.19972 | 30.992 | 0.14083 | 0.17115 | ensemble | Method: RUSBoost | | | | | | | | | | NumLearningCycles: 61 | | | | | | | | | | LearnRate: 0.0022005 | | | | | | | | | | MinLeafSize: 249 | | | | | | | | | | MaxNumSplits: 658 | | | | | | | | | | SplitCriterion: deviance | | | | | | | | | | NumVariablesToSample: NaN | | 18 | 6 | Accept | 0.1622 | 0.4573 | 0.14083 | 0.1669 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 19 | 6 | Best | 0.13106 | 68.692 | 0.13106 | 0.1669 | ensemble | Method: GentleBoost | | | | | | | | | | NumLearningCycles: 167 | | | | | | | | | | LearnRate: 0.87924 | | | | | | | | | | MinLeafSize: 9 | | | | | | | | | | MaxNumSplits: 5 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 20 | 6 | Error | NaN | 2.415 | 0.13106 | 0.1669 | svm | BoxConstraint: 2.7183 | | | | | | | | | | KernelScale: 0.044663 | | | | | | | | | | PolynomialOrder: 4 | | | | | | | | | | Standardize: false | |===========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Estimated min | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 21 | 6 | Accept | 0.14471 | 1.3765 | 0.13106 | 0.15793 | tree | MinLeafSize: 104 | | | | | | | | | | MaxNumSplits: 5184 | | | | | | | | | | SplitCriterion: gdi | | | | | | | | | | NumVariablesToSample: 8 | | 22 | 6 | Accept | 0.19853 | 79.147 | 0.13106 | 0.15793 | nb | DistributionNames: kernel | | | | | | | | | | Width: 45514 | | | | | | | | | | Kernel: box | | 23 | 6 | Accept | 0.2467 | 12.256 | 0.13106 | 0.15793 | kernel | Learner: logistic | | | | | | | | | | KernelScale: 0.14114 | | | | | | | | | | Lambda: 2.8324e-06 | | | | | | | | | | NumExpansionDimensions: 350 | | 24 | 6 | Accept | 0.24677 | 228.2 | 0.13106 | 0.15793 | svm | BoxConstraint: 0.016454 | | | | | | | | | | KernelScale: 195.62 | | | | | | | | | | PolynomialOrder: 3 | | | | | | | | | | Standardize: true | | 25 | 6 | Accept | 0.22584 | 0.81299 | 0.13106 | 0.17105 | tree | MinLeafSize: 3819 | | | | | | | | | | MaxNumSplits: 909 | | | | | | | | | | SplitCriterion: deviance | | | | | | | | | | NumVariablesToSample: 2 | | 26 | 6 | Accept | 0.14896 | 54.975 | 0.13106 | 0.17031 | nb | DistributionNames: kernel | | | | | | | | | | Width: 0.50222 | | | | | | | | | | Kernel: triangle | | 27 | 6 | Accept | 0.13859 | 39.098 | 0.13106 | 0.17031 | kernel | Learner: svm | | | | | | | | | | KernelScale: 38.43 | | | | | | | | | | Lambda: 1.4954e-06 | | | | | | | | | | NumExpansionDimensions: 239 | | 28 | 6 | Accept | 0.16354 | 6.2686 | 0.13106 | 0.17031 | linear | Lambda: 2.2757e-05 | | | | | | | | | | Learner: svm | | | | | | | | | | Regularization: ridge | | 29 | 6 | Accept | 0.22319 | 1.1041 | 0.13106 | 0.17031 | linear | Lambda: 3.0618e-09 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: lasso | | 30 | 6 | Accept | 0.22319 | 0.96159 | 0.13106 | 0.17031 | linear | Lambda: 7.5579e-09 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: lasso | |===========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Estimated min | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 31 | 6 | Accept | 0.32931 | 249.56 | 0.13106 | 0.17031 | kernel | Learner: logistic | | | | | | | | | | KernelScale: 0.32143 | | | | | | | | | | Lambda: 6.0283e-07 | | | | | | | | | | NumExpansionDimensions:5258 | | 32 | 6 | Accept | 0.13628 | 29.765 | 0.13106 | 0.15473 | ensemble | Method: Bag | | | | | | | | | | NumLearningCycles: 21 | | | | | | | | | | LearnRate: NaN | | | | | | | | | | MinLeafSize: 9 | | | | | | | | | | MaxNumSplits: 10267 | | | | | | | | | | SplitCriterion: deviance | | | | | | | | | | NumVariablesToSample: 12 | | 33 | 6 | Accept | 0.1622 | 0.42231 | 0.13106 | 0.15473 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 34 | 6 | Accept | 0.22245 | 0.81185 | 0.13106 | 0.15473 | linear | Lambda: 0.00036246 | | | | | | | | | | Learner: svm | | | | | | | | | | Regularization: lasso | | 35 | 6 | Accept | 0.1555 | 0.63361 | 0.13106 | 0.15473 | tree | MinLeafSize: 120 | | | | | | | | | | MaxNumSplits: 200 | | | | | | | | | | SplitCriterion: gdi | | | | | | | | | | NumVariablesToSample: 3 | | 36 | 6 | Accept | 0.1622 | 0.37218 | 0.13106 | 0.15473 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 37 | 6 | Accept | 0.23856 | 0.29699 | 0.13106 | 0.15473 | tree | MinLeafSize: 8766 | | | | | | | | | | MaxNumSplits: 9 | | | | | | | | | | SplitCriterion: gdi | | | | | | | | | | NumVariablesToSample: 6 | | 38 | 6 | Best | 0.1286 | 87.205 | 0.1286 | 0.1453 | ensemble | Method: GentleBoost | | | | | | | | | | NumLearningCycles: 330 | | | | | | | | | | LearnRate: 0.0098276 | | | | | | | | | | MinLeafSize: 12 | | | | | | | | | | MaxNumSplits: 3 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 39 | 6 | Accept | 0.24685 | 4.3779 | 0.1286 | 0.1453 | kernel | Learner: logistic | | | | | | | | | | KernelScale: 1.7072 | | | | | | | | | | Lambda: 4.9189e-08 | | | | | | | | | | NumExpansionDimensions: 186 | | 40 | 6 | Accept | 0.19477 | 104.34 | 0.1286 | 0.1453 | nb | DistributionNames: kernel | | | | | | | | | | Width: 7640.9 | | | | | | | | | | Kernel: triangle | |===========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Estimated min | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 41 | 6 | Accept | 0.15967 | 59.354 | 0.1286 | 0.14561 | ensemble | Method: GentleBoost | | | | | | | | | | NumLearningCycles: 383 | | | | | | | | | | LearnRate: 0.038324 | | | | | | | | | | MinLeafSize: 7227 | | | | | | | | | | MaxNumSplits: 2177 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 42 | 6 | Accept | 0.1622 | 1.0952 | 0.1286 | 0.14561 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 43 | 6 | Accept | 0.1849 | 441.8 | 0.1286 | 0.14561 | kernel | Learner: logistic | | | | | | | | | | KernelScale: 559.78 | | | | | | | | | | Lambda: 8.2254e-06 | | | | | | | | | | NumExpansionDimensions:7841 | | 44 | 6 | Accept | 0.23856 | 0.90498 | 0.1286 | 0.14561 | tree | MinLeafSize: 11397 | | | | | | | | | | MaxNumSplits: 6 | | | | | | | | | | SplitCriterion: deviance | | | | | | | | | | NumVariablesToSample: 2 | | 45 | 6 | Accept | 0.21702 | 1.8862 | 0.1286 | 0.14561 | linear | Lambda: 0.00031752 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: ridge | | 46 | 6 | Accept | 0.15113 | 0.64593 | 0.1286 | 0.14561 | tree | MinLeafSize: 15 | | | | | | | | | | MaxNumSplits: 18 | | | | | | | | | | SplitCriterion: deviance | | | | | | | | | | NumVariablesToSample: 10 | | 47 | 6 | Accept | 0.22319 | 1.0114 | 0.1286 | 0.14561 | linear | Lambda: 0.0079609 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: lasso | | 48 | 6 | Accept | 0.21702 | 1.183 | 0.1286 | 0.14561 | linear | Lambda: 0.014859 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: ridge | | 49 | 6 | Accept | 0.22319 | 1.0524 | 0.1286 | 0.14561 | linear | Lambda: 9.0307e-10 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: lasso | | 50 | 6 | Accept | 0.18098 | 84.088 | 0.1286 | 0.14561 | nb | DistributionNames: kernel | | | | | | | | | | Width: 303.14 | | | | | | | | | | Kernel: epanechnikov | |===========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Estimated min | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 51 | 6 | Accept | 0.18478 | 705.34 | 0.1286 | 0.14561 | kernel | Learner: svm | | | | | | | | | | KernelScale: 685.59 | | | | | | | | | | Lambda: 6.5111e-06 | | | | | | | | | | NumExpansionDimensions:2855 | | 52 | 6 | Accept | 0.18133 | 82.973 | 0.1286 | 0.14561 | nb | DistributionNames: kernel | | | | | | | | | | Width: 358.05 | | | | | | | | | | Kernel: epanechnikov | | 53 | 6 | Accept | 0.14417 | 1.1731 | 0.1286 | 0.14561 | tree | MinLeafSize: 20 | | | | | | | | | | MaxNumSplits: 19194 | | | | | | | | | | SplitCriterion: deviance | | | | | | | | | | NumVariablesToSample: 10 | | 54 | 6 | Accept | 0.15351 | 0.48309 | 0.1286 | 0.14561 | tree | MinLeafSize: 31 | | | | | | | | | | MaxNumSplits: 9291 | | | | | | | | | | SplitCriterion: gdi | | | | | | | | | | NumVariablesToSample: 3 | | 55 | 6 | Accept | 0.19079 | 7.3404 | 0.1286 | 0.14561 | kernel | Learner: logistic | | | | | | | | | | KernelScale: 9.1721 | | | | | | | | | | Lambda: 7.4579e-06 | | | | | | | | | | NumExpansionDimensions: 137 | | 56 | 6 | Accept | 0.22245 | 0.92055 | 0.1286 | 0.14561 | linear | Lambda: 1.3978e-06 | | | | | | | | | | Learner: svm | | | | | | | | | | Regularization: lasso | | 57 | 6 | Accept | 0.1622 | 0.63712 | 0.1286 | 0.14561 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 58 | 6 | Accept | 0.24677 | 198.23 | 0.1286 | 0.14561 | svm | BoxConstraint: 0.11794 | | | | | | | | | | KernelScale: 604.2 | | | | | | | | | | PolynomialOrder: 2 | | | | | | | | | | Standardize: true | | 59 | 6 | Accept | 0.15076 | 0.37789 | 0.1286 | 0.14561 | tree | MinLeafSize: 5 | | | | | | | | | | MaxNumSplits: 120 | | | | | | | | | | SplitCriterion: gdi | | | | | | | | | | NumVariablesToSample: 3 | | 60 | 6 | Accept | 0.15376 | 0.5061 | 0.1286 | 0.14561 | tree | MinLeafSize: 7 | | | | | | | | | | MaxNumSplits: 8 | | | | | | | | | | SplitCriterion: gdi | | | | | | | | | | NumVariablesToSample: 13 | |===========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Estimated min | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 61 | 6 | Accept | 0.24677 | 14.057 | 0.1286 | 0.14561 | kernel | Learner: svm | | | | | | | | | | KernelScale: 0.024719 | | | | | | | | | | Lambda: 7.8849e-06 | | | | | | | | | | NumExpansionDimensions:1387 | | 62 | 6 | Accept | 0.24677 | 12.415 | 0.1286 | 0.14561 | kernel | Learner: svm | | | | | | | | | | KernelScale: 0.31491 | | | | | | | | | | Lambda: 4.0883e-08 | | | | | | | | | | NumExpansionDimensions:1269 | | 63 | 6 | Accept | 0.14477 | 98.657 | 0.1286 | 0.13884 | ensemble | Method: GentleBoost | | | | | | | | | | NumLearningCycles: 170 | | | | | | | | | | LearnRate: 0.099075 | | | | | | | | | | MinLeafSize: 1462 | | | | | | | | | | MaxNumSplits: 6219 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 64 | 6 | Accept | 0.15127 | 45.894 | 0.1286 | 0.14073 | ensemble | Method: LogitBoost | | | | | | | | | | NumLearningCycles: 23 | | | | | | | | | | LearnRate: 0.02095 | | | | | | | | | | MinLeafSize: 5 | | | | | | | | | | MaxNumSplits: 12754 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 65 | 6 | Accept | 0.13527 | 53.795 | 0.1286 | 0.13986 | ensemble | Method: LogitBoost | | | | | | | | | | NumLearningCycles: 66 | | | | | | | | | | LearnRate: 0.090491 | | | | | | | | | | MinLeafSize: 579 | | | | | | | | | | MaxNumSplits: 1206 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 66 | 6 | Accept | 0.13706 | 33.079 | 0.1286 | 0.13479 | ensemble | Method: LogitBoost | | | | | | | | | | NumLearningCycles: 45 | | | | | | | | | | LearnRate: 0.072253 | | | | | | | | | | MinLeafSize: 42 | | | | | | | | | | MaxNumSplits: 28 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 67 | 6 | Accept | 0.1336 | 180.94 | 0.1286 | 0.13483 | ensemble | Method: LogitBoost | | | | | | | | | | NumLearningCycles: 230 | | | | | | | | | | LearnRate: 0.11405 | | | | | | | | | | MinLeafSize: 1001 | | | | | | | | | | MaxNumSplits: 2860 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 68 | 6 | Accept | 0.16092 | 183.16 | 0.1286 | 0.13483 | svm | BoxConstraint: 4.9135 | | | | | | | | | | KernelScale: 109.99 | | | | | | | | | | PolynomialOrder: 3 | | | | | | | | | | Standardize: true | | 69 | 6 | Accept | 0.12943 | 760.38 | 0.1286 | 0.13483 | kernel | Learner: logistic | | | | | | | | | | KernelScale: 17.218 | | | | | | | | | | Lambda: 2.9536e-07 | | | | | | | | | | NumExpansionDimensions:5361 | | 70 | 6 | Accept | 0.16712 | 0.49926 | 0.1286 | 0.13483 | tree | MinLeafSize: 631 | | | | | | | | | | MaxNumSplits: 38 | | | | | | | | | | SplitCriterion: gdi | | | | | | | | | | NumVariablesToSample: 3 | |===========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Estimated min | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 71 | 6 | Accept | 0.15342 | 61.275 | 0.1286 | 0.13483 | nb | DistributionNames: kernel | | | | | | | | | | Width: 3.9984 | | | | | | | | | | Kernel: epanechnikov | | 72 | 6 | Accept | 0.22319 | 0.97079 | 0.1286 | 0.13483 | linear | Lambda: 0.57131 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: lasso | | 73 | 6 | Accept | 0.26454 | 2391.5 | 0.1286 | 0.13483 | svm | BoxConstraint: 0.010397 | | | | | | | | | | KernelScale: 0.26383 | | | | | | | | | | PolynomialOrder: 3 | | | | | | | | | | Standardize: true | | 74 | 6 | Accept | 0.15591 | 0.42509 | 0.1286 | 0.13483 | tree | MinLeafSize: 147 | | | | | | | | | | MaxNumSplits: 30 | | | | | | | | | | SplitCriterion: deviance | | | | | | | | | | NumVariablesToSample: 4 | | 75 | 6 | Accept | 0.30262 | 3188.9 | 0.1286 | 0.13483 | svm | BoxConstraint: 4.5266 | | | | | | | | | | KernelScale: 4.4805 | | | | | | | | | | PolynomialOrder: 3 | | | | | | | | | | Standardize: false | | 76 | 6 | Accept | 0.38172 | 2354.3 | 0.1286 | 0.13483 | svm | BoxConstraint: 0.34479 | | | | | | | | | | KernelScale: 0.24179 | | | | | | | | | | PolynomialOrder: 3 | | | | | | | | | | Standardize: true | | 77 | 6 | Accept | 0.17001 | 96.28 | 0.1286 | 0.13483 | nb | DistributionNames: kernel | | | | | | | | | | Width: 35 | | | | | | | | | | Kernel: normal | | 78 | 6 | Accept | 0.22319 | 0.97591 | 0.1286 | 0.13483 | linear | Lambda: 0.65692 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: lasso | | 79 | 6 | Accept | 0.13944 | 312.79 | 0.1286 | 0.13483 | kernel | Learner: svm | | | | | | | | | | KernelScale: 109.29 | | | | | | | | | | Lambda: 4.7795e-06 | | | | | | | | | | NumExpansionDimensions:2963 | | 80 | 6 | Accept | 0.1461 | 0.80323 | 0.1286 | 0.13483 | tree | MinLeafSize: 81 | | | | | | | | | | MaxNumSplits: 4868 | | | | | | | | | | SplitCriterion: gdi | | | | | | | | | | NumVariablesToSample: 7 | |===========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Estimated min | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 81 | 6 | Accept | 0.1622 | 1.0258 | 0.1286 | 0.13483 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 82 | 6 | Error | NaN | 2.1441 | 0.1286 | 0.13483 | svm | BoxConstraint: 0.096689 | | | | | | | | | | KernelScale: 0.18876 | | | | | | | | | | PolynomialOrder: 4 | | | | | | | | | | Standardize: false | | 83 | 6 | Accept | 0.35552 | 2693.6 | 0.1286 | 0.13483 | svm | BoxConstraint: 0.033837 | | | | | | | | | | KernelScale: 0.027074 | | | | | | | | | | PolynomialOrder: 2 | | | | | | | | | | Standardize: true | | 84 | 6 | Accept | 0.14904 | 0.51977 | 0.1286 | 0.13483 | tree | MinLeafSize: 29 | | | | | | | | | | MaxNumSplits: 35 | | | | | | | | | | SplitCriterion: deviance | | | | | | | | | | NumVariablesToSample: 7 | | 85 | 6 | Accept | 0.2586 | 24.889 | 0.1286 | 0.13483 | kernel | Learner: logistic | | | | | | | | | | KernelScale: 0.0010015 | | | | | | | | | | Lambda: 1.2767e-05 | | | | | | | | | | NumExpansionDimensions:1350 | | 86 | 6 | Accept | 0.55399 | 3398.7 | 0.1286 | 0.13483 | svm | BoxConstraint: 0.0092124 | | | | | | | | | | KernelScale: 155.18 | | | | | | | | | | PolynomialOrder: 3 | | | | | | | | | | Standardize: false | | 87 | 6 | Accept | 0.1831 | 199.67 | 0.1286 | 0.13483 | svm | BoxConstraint: 8.6137 | | | | | | | | | | KernelScale: 204.69 | | | | | | | | | | PolynomialOrder: 2 | | | | | | | | | | Standardize: true | | 88 | 6 | Accept | 0.13752 | 148.86 | 0.1286 | 0.13535 | ensemble | Method: GentleBoost | | | | | | | | | | NumLearningCycles: 293 | | | | | | | | | | LearnRate: 0.79452 | | | | | | | | | | MinLeafSize: 14 | | | | | | | | | | MaxNumSplits: 8 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 89 | 6 | Accept | 0.16247 | 2925.9 | 0.1286 | 0.13535 | svm | BoxConstraint: 290.56 | | | | | | | | | | KernelScale: 4.1671 | | | | | | | | | | PolynomialOrder: 3 | | | | | | | | | | Standardize: true | | 90 | 6 | Accept | 0.16136 | 934.29 | 0.1286 | 0.13535 | kernel | Learner: svm | | | | | | | | | | KernelScale: 5.8589 | | | | | | | | | | Lambda: 1.4894e-06 | | | | | | | | | | NumExpansionDimensions:6076 | |===========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Estimated min | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 91 | 6 | Error | NaN | 2.1144 | 0.1286 | 0.13535 | svm | BoxConstraint: 5.3542 | | | | | | | | | | KernelScale: 0.0016058 | | | | | | | | | | PolynomialOrder: 3 | | | | | | | | | | Standardize: true | | 92 | 6 | Accept | 0.17756 | 1104.9 | 0.1286 | 0.13535 | kernel | Learner: svm | | | ...
__________________________________________________________ Optimization completed. Total iterations: 180 Total elapsed time: 17462.344 seconds Total time for training and validation: 96919.4627 seconds Best observed learner is an ensemble model with: Method: LogitBoost NumLearningCycles: 307 LearnRate: 0.66866 MinLeafSize: 8 MaxNumSplits: 6 SplitCriterion: <undefined> NumVariablesToSample: NaN Observed validation loss: 0.12617 Time for training and validation: 114.7304 seconds Best estimated learner (returned model) is an ensemble model with: Method: GentleBoost NumLearningCycles: 235 LearnRate: 0.0035794 MinLeafSize: 68 MaxNumSplits: 3 SplitCriterion: <undefined> NumVariablesToSample: NaN Estimated validation loss: 0.12893 Estimated time for training and validation: 61.6123 seconds Documentation for fitcauto display
Total elapsed time
значение показывает, что Байесова оптимизация требовала времени к запущенному (приблизительно 4,9 часа).
Итоговая модель возвращена fitcauto
соответствует лучшему предполагаемому ученику. Прежде, чем возвратить модель, функция переобучает его с помощью целого обучающего набора данных (adultdata
), перечисленный Learner
(или модель) тип и отображенные гиперзначения параметров.
Когда fitcauto
с Байесовой оптимизацией занимает много времени, чтобы запуститься из-за количества наблюдений в вашем наборе обучающих данных, рассмотреть использование fitcauto
с оптимизацией ASHA вместо этого. Учитывая, что adultdata
содержит более чем 10 000 наблюдений, попытайтесь использовать fitcauto
с оптимизацией ASHA, чтобы автоматически найти соответствующий классификатор. Когда вы используете fitcauto
с оптимизацией ASHA функция случайным образом выбирает несколько моделей с различными гиперзначениями параметров и обучает их на небольшом подмножестве обучающих данных. Если ошибка классификации перекрестных проверок (Validation Loss
) из конкретной модели обещает, модель продвинута и обучена на большей сумме обучающих данных. Этот процесс повторения и успешные модели обучен на прогрессивно больших объемах данных. По умолчанию, fitcauto
предоставляет график оптимизации и итеративное отображение результатов оптимизации. Для получения дополнительной информации о том, как интерпретировать эти результаты, смотрите Многословное Отображение.
Установите веса наблюдения, попробуйте все доступные типы ученика и гиперпараметры, и задайте, чтобы запустить оптимизацию ASHA параллельно. Обратите внимание на то, что оптимизация ASHA часто имеет больше итераций, чем Байесова оптимизация по умолчанию. Если у вас есть ограничение времени, можно задать MaxTime
поле HyperparameterOptimizationOptions
структура, чтобы ограничить номер секунд fitcauto
запуски.
ashaOptions = struct("Optimizer","asha","UseParallel",true); [ashaMdl,ashaResults] = fitcauto(adultdata,"salary","Weights","fnlwgt", ... "Learners","all","OptimizeHyperparameters","all", ... "HyperparameterOptimizationOptions",ashaOptions);
Warning: 'CategoricalPredictors' value must be empty or 'all' for a 'knn' learner. The function omits 'knn' from the list of learners.
Warning: Categorical predictors are not supported for a 'discr' learner. The function omits 'discr' from the list of learners.
Warning: It is recommended that you first standardize all numeric predictors when optimizing the Naive Bayes 'Width' parameter. Ignore this warning if you have done that.
Copying objective function to workers...
Warning: Files that have already been attached are being ignored. To see which files are attached see the 'AttachedFiles' property of the parallel pool.
Done copying objective function to workers. Learner types to explore: ensemble, kernel, linear, nb, svm, tree Total iterations (MaxObjectiveEvaluations): 2387 Total time (MaxTime): Inf |========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Training set | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |========================================================================================================================================| | 1 | 6 | Best | 0.1833 | 0.52781 | 0.1833 | 102 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 2 | 6 | Accept | 0.23405 | 0.57763 | 0.1833 | 102 | linear | Lambda: 7.6648e-07 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: lasso | | 3 | 6 | Accept | 0.22888 | 0.49975 | 0.1833 | 102 | linear | Lambda: 0.00011257 | | | | | | | | | | Learner: svm | | | | | | | | | | Regularization: lasso | | 4 | 6 | Accept | 0.19604 | 0.3835 | 0.1833 | 102 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 5 | 6 | Accept | 0.21784 | 0.48395 | 0.1833 | 102 | linear | Lambda: 1.8794e-07 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: ridge | | 6 | 6 | Best | 0.17277 | 0.38569 | 0.17277 | 408 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 7 | 6 | Accept | 0.23373 | 0.5185 | 0.17277 | 102 | linear | Lambda: 1.2608e-05 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: lasso | | 8 | 6 | Accept | 0.25612 | 2.4452 | 0.17277 | 102 | kernel | Learner: logistic | | | | | | | | | | KernelScale: 11.254 | | | | | | | | | | Lambda: 0.015724 | | | | | | | | | | NumExpansionDimensions: 326 | | 9 | 6 | Accept | 0.21958 | 7.1669 | 0.17277 | 102 | svm | BoxConstraint: 99.195 | | | | | | | | | | KernelScale: 36.647 | | | | | | | | | | PolynomialOrder: 2 | | | | | | | | | | Standardize: false | | 10 | 6 | Accept | 0.25909 | 8.1615 | 0.17277 | 102 | kernel | Learner: logistic | | | | | | | | | | KernelScale: 0.0034442 | | | | | | | | | | Lambda: 7.2286e-07 | | | | | | | | | | NumExpansionDimensions:1549 | |========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Training set | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |========================================================================================================================================| | 11 | 6 | Best | 0.16728 | 0.98305 | 0.16728 | 408 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 12 | 6 | Accept | 0.22845 | 8.7047 | 0.16728 | 102 | kernel | Learner: svm | | | | | | | | | | KernelScale: 41.828 | | | | | | | | | | Lambda: 7.5949e-08 | | | | | | | | | | NumExpansionDimensions:1767 | | 13 | 6 | Accept | 0.69963 | 9.2306 | 0.16728 | 102 | svm | BoxConstraint: 17.143 | | | | | | | | | | KernelScale: 0.8188 | | | | | | | | | | PolynomialOrder: 3 | | | | | | | | | | Standardize: false | | 14 | 6 | Accept | 0.23856 | 0.23818 | 0.16728 | 102 | tree | MinLeafSize: 2581 | | | | | | | | | | MaxNumSplits: 6 | | | | | | | | | | SplitCriterion: deviance | | | | | | | | | | NumVariablesToSample: 9 | | 15 | 6 | Accept | 0.21089 | 0.51243 | 0.16728 | 408 | linear | Lambda: 1.8794e-07 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: ridge | | 16 | 6 | Accept | 0.24509 | 6.5163 | 0.16728 | 102 | svm | BoxConstraint: 0.035345 | | | | | | | | | | KernelScale: 70.734 | | | | | | | | | | PolynomialOrder: 3 | | | | | | | | | | Standardize: false | | 17 | 6 | Accept | 0.24983 | 1.7456 | 0.16728 | 102 | kernel | Learner: logistic | | | | | | | | | | KernelScale: 0.0045787 | | | | | | | | | | Lambda: 0.0057617 | | | | | | | | | | NumExpansionDimensions: 157 | | 18 | 6 | Accept | 0.18862 | 0.28711 | 0.16728 | 102 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 19 | 6 | Accept | 0.25396 | 5.3202 | 0.16728 | 102 | kernel | Learner: svm | | | | | | | | | | KernelScale: 1.5172 | | | | | | | | | | Lambda: 0.018728 | | | | | | | | | | NumExpansionDimensions: 970 | | 20 | 6 | Accept | 0.23856 | 0.21895 | 0.16728 | 102 | tree | MinLeafSize: 120 | | | | | | | | | | MaxNumSplits: 17 | | | | | | | | | | SplitCriterion: deviance | | | | | | | | | | NumVariablesToSample: 2 | |========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Training set | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |========================================================================================================================================| | 21 | 6 | Accept | 0.17237 | 0.40752 | 0.16728 | 408 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 22 | 6 | Accept | 0.22232 | 0.42795 | 0.16728 | 102 | linear | Lambda: 2.9724 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: lasso | | 23 | 6 | Best | 0.16657 | 0.39947 | 0.16657 | 1629 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 24 | 6 | Accept | 0.22239 | 0.4554 | 0.16657 | 102 | linear | Lambda: 2.7806e-05 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: lasso | | 25 | 6 | Accept | 0.2952 | 8.0199 | 0.16657 | 102 | svm | BoxConstraint: 0.050614 | | | | | | | | | | KernelScale: 6.0247 | | | | | | | | | | PolynomialOrder: 2 | | | | | | | | | | Standardize: false | | 26 | 6 | Error | NaN | 8.3248 | 0.16657 | 102 | svm | BoxConstraint: 203.22 | | | | | | | | | | KernelScale: 0.11329 | | | | | | | | | | PolynomialOrder: 2 | | | | | | | | | | Standardize: false | | 27 | 6 | Accept | 0.18774 | 9.4024 | 0.16657 | 102 | nb | DistributionNames: kernel | | | | | | | | | | Width: 26.023 | | | | | | | | | | Kernel: normal | | 28 | 5 | Accept | 0.24219 | 6.6727 | 0.16657 | 102 | kernel | Learner: logistic | | | | | | | | | | KernelScale: 0.0053538 | | | | | | | | | | Lambda: 2.7097e-05 | | | | | | | | | | NumExpansionDimensions:1361 | | 29 | 5 | Error | NaN | 0.33524 | 0.16657 | 102 | svm | BoxConstraint: 0.0090329 | | | | | | | | | | KernelScale: 0.024795 | | | | | | | | | | PolynomialOrder: 4 | | | | | | | | | | Standardize: true | | 30 | 6 | Accept | 0.23282 | 7.4467 | 0.16657 | 102 | svm | BoxConstraint: 0.099034 | | | | | | | | | | KernelScale: 0.0086267 | | | | | | | | | | PolynomialOrder: 3 | | | | | | | | | | Standardize: true | |========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Training set | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |========================================================================================================================================| | 31 | 6 | Accept | 0.21781 | 0.4873 | 0.16657 | 102 | linear | Lambda: 5.4453e-06 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: lasso | | 32 | 6 | Accept | 0.35809 | 1.404 | 0.16657 | 102 | kernel | Learner: svm | | | | | | | | | | KernelScale: 1.7319 | | | | | | | | | | Lambda: 0.00085629 | | | | | | | | | | NumExpansionDimensions: 104 | | 33 | 6 | Accept | 0.21288 | 0.52465 | 0.16657 | 102 | linear | Lambda: 0.00063782 | | | | | | | | | | Learner: svm | | | | | | | | | | Regularization: ridge | | 34 | 6 | Accept | 0.17908 | 21.191 | 0.16657 | 102 | ensemble | Method: GentleBoost | | | | | | | | | | NumLearningCycles: 377 | | | | | | | | | | LearnRate: 0.027257 | | | | | | | | | | MinLeafSize: 43 | | | | | | | | | | MaxNumSplits: 329 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 35 | 6 | Accept | 0.23856 | 13.827 | 0.16657 | 102 | ensemble | Method: AdaBoostM1 | | | | | | | | | | NumLearningCycles: 238 | | | | | | | | | | LearnRate: 0.022847 | | | | | | | | | | MinLeafSize: 12916 | | | | | | | | | | MaxNumSplits: 75 | | | | | | | | | | SplitCriterion: gdi | | | | | | | | | | NumVariablesToSample: NaN | | 36 | 6 | Accept | 0.18104 | 14.07 | 0.16657 | 102 | nb | DistributionNames: kernel | | | | | | | | | | Width: 3.1601 | | | | | | | | | | Kernel: triangle | | 37 | 6 | Accept | 0.17248 | 15.368 | 0.16657 | 408 | nb | DistributionNames: kernel | | | | | | | | | | Width: 26.023 | | | | | | | | | | Kernel: normal | | 38 | 6 | Accept | 0.20918 | 0.56712 | 0.16657 | 102 | linear | Lambda: 0.00011269 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: ridge | | 39 | 6 | Accept | 0.67983 | 22.722 | 0.16657 | 408 | svm | BoxConstraint: 99.195 | | | | | | | | | | KernelScale: 36.647 | | | | | | | | | | PolynomialOrder: 2 | | | | | | | | | | Standardize: false | | 40 | 6 | Accept | 0.23856 | 0.23164 | 0.16657 | 102 | tree | MinLeafSize: 62 | | | | | | | | | | MaxNumSplits: 31 | | | | | | | | | | SplitCriterion: gdi | | | | | | | | | | NumVariablesToSample: 6 | |========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Training set | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |========================================================================================================================================| | 41 | 6 | Accept | 0.39265 | 2.4575 | 0.16657 | 102 | kernel | Learner: svm | | | | | | | | | | KernelScale: 0.01949 | | | | | | | | | | Lambda: 1.5679e-07 | | | | | | | | | | NumExpansionDimensions: 359 | | 42 | 6 | Accept | 0.29263 | 1.7934 | 0.16657 | 102 | kernel | Learner: logistic | | | | | | | | | | KernelScale: 23.744 | | | | | | | | | | Lambda: 7.6584e-08 | | | | | | | | | | NumExpansionDimensions: 116 | | 43 | 6 | Accept | 0.21901 | 0.9797 | 0.16657 | 102 | linear | Lambda: 1.2986e-06 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: lasso | | 44 | 6 | Accept | 0.21196 | 0.43529 | 0.16657 | 102 | linear | Lambda: 0.00013664 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: lasso | | 45 | 6 | Accept | 0.18595 | 0.35076 | 0.16657 | 102 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 46 | 6 | Accept | 0.41729 | 22.162 | 0.16657 | 102 | kernel | Learner: svm | | | | | | | | | | KernelScale: 0.013257 | | | | | | | | | | Lambda: 7.0734e-08 | | | | | | | | | | NumExpansionDimensions:4800 | | 47 | 6 | Best | 0.16193 | 0.39629 | 0.16193 | 408 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 48 | 6 | Accept | 0.18519 | 0.41552 | 0.16193 | 102 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 49 | 6 | Accept | 0.35517 | 1.7655 | 0.16193 | 102 | kernel | Learner: svm | | | | | | | | | | KernelScale: 0.001388 | | | | | | | | | | Lambda: 3.1975e-07 | | | | | | | | | | NumExpansionDimensions: 193 | | 50 | 6 | Accept | 0.22378 | 0.46214 | 0.16193 | 102 | linear | Lambda: 3.5422e-08 | | | | | | | | | | Learner: svm | | | | | | | | | | Regularization: lasso | |========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Training set | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |========================================================================================================================================| | 51 | 6 | Accept | 0.18154 | 0.36808 | 0.16193 | 408 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 52 | 6 | Accept | 0.16863 | 0.36511 | 0.16193 | 1629 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 53 | 6 | Accept | 0.20281 | 2.5644 | 0.16193 | 102 | svm | BoxConstraint: 0.0064229 | | | | | | | | | | KernelScale: 958.04 | | | | | | | | | | PolynomialOrder: 3 | | | | | | | | | | Standardize: false | | 54 | 6 | Accept | 0.23856 | 0.19664 | 0.16193 | 102 | tree | MinLeafSize: 6412 | | | | | | | | | | MaxNumSplits: 23 | | | | | | | | | | SplitCriterion: deviance | | | | | | | | | | NumVariablesToSample: 7 | | 55 | 6 | Accept | 0.26397 | 4.1124 | 0.16193 | 102 | kernel | Learner: logistic | | | | | | | | | | KernelScale: 0.314 | | | | | | | | | | Lambda: 2.2303e-06 | | | | | | | | | | NumExpansionDimensions: 694 | | 56 | 6 | Accept | 0.17924 | 1.4097 | 0.16193 | 102 | ensemble | Method: LogitBoost | | | | | | | | | | NumLearningCycles: 13 | | | | | | | | | | LearnRate: 0.62696 | | | | | | | | | | MinLeafSize: 24 | | | | | | | | | | MaxNumSplits: 614 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 57 | 6 | Best | 0.15007 | 1.5243 | 0.15007 | 408 | ensemble | Method: LogitBoost | | | | | | | | | | NumLearningCycles: 13 | | | | | | | | | | LearnRate: 0.62696 | | | | | | | | | | MinLeafSize: 24 | | | | | | | | | | MaxNumSplits: 614 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 58 | 6 | Accept | 0.2018 | 9.9532 | 0.15007 | 102 | nb | DistributionNames: kernel | | | | | | | | | | Width: 3.9538 | | | | | | | | | | Kernel: normal | | 59 | 6 | Accept | 0.20213 | 0.39632 | 0.15007 | 102 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 60 | 6 | Accept | 0.25971 | 14.739 | 0.15007 | 102 | kernel | Learner: svm | | | | | | | | | | KernelScale: 0.39803 | | | | | | | | | | Lambda: 1.6745e-05 | | | | | | | | | | NumExpansionDimensions:3052 | |========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Training set | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |========================================================================================================================================| | 61 | 6 | Accept | 0.23856 | 0.18682 | 0.15007 | 102 | tree | MinLeafSize: 59 | | | | | | | | | | MaxNumSplits: 5359 | | | | | | | | | | SplitCriterion: deviance | | | | | | | | | | NumVariablesToSample: 2 | | 62 | 6 | Accept | 0.1635 | 19.181 | 0.15007 | 408 | nb | DistributionNames: kernel | | | | | | | | | | Width: 3.1601 | | | | | | | | | | Kernel: triangle | | 63 | 6 | Accept | 0.23856 | 5.9776 | 0.15007 | 102 | ensemble | Method: LogitBoost | | | | | | | | | | NumLearningCycles: 111 | | | | | | | | | | LearnRate: 0.083672 | | | | | | | | | | MinLeafSize: 2980 | | | | | | | | | | MaxNumSplits: 5832 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 64 | 6 | Accept | 0.19308 | 0.28597 | 0.15007 | 102 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 65 | 6 | Accept | 0.16484 | 27.772 | 0.15007 | 408 | ensemble | Method: GentleBoost | | | | | | | | | | NumLearningCycles: 377 | | | | | | | | | | LearnRate: 0.027257 | | | | | | | | | | MinLeafSize: 43 | | | | | | | | | | MaxNumSplits: 329 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 66 | 6 | Error | NaN | 0.33714 | 0.15007 | 102 | svm | BoxConstraint: 4.6725 | | | | | | | | | | KernelScale: 0.026549 | | | | | | | | | | PolynomialOrder: 3 | | | | | | | | | | Standardize: false | | 67 | 6 | Accept | 0.25112 | 35.651 | 0.15007 | 102 | kernel | Learner: logistic | | | | | | | | | | KernelScale: 0.0096847 | | | | | | | | | | Lambda: 0.020659 | | | | | | | | | | NumExpansionDimensions:7870 | | 68 | 6 | Accept | 0.18534 | 8.9802 | 0.15007 | 102 | nb | DistributionNames: kernel | | | | | | | | | | Width: 135.21 | | | | | | | | | | Kernel: box | | 69 | 6 | Accept | 0.24933 | 0.66771 | 0.15007 | 102 | svm | BoxConstraint: 0.0017712 | | | | | | | | | | KernelScale: 10.315 | | | | | | | | | | PolynomialOrder: 4 | | | | | | | | | | Standardize: true | | 70 | 6 | Accept | 0.21584 | 4.794 | 0.15007 | 102 | kernel | Learner: svm | | | | | | | | | | KernelScale: 14.148 | | | | | | | | | | Lambda: 0.0047599 | | | | | | | | | | NumExpansionDimensions: 886 | |========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Training set | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |========================================================================================================================================| | 71 | 6 | Accept | 0.20338 | 0.89738 | 0.15007 | 102 | linear | Lambda: 0.19522 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: lasso | | 72 | 6 | Accept | 0.17783 | 0.26471 | 0.15007 | 102 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 73 | 6 | Accept | 0.1734 | 0.36479 | 0.15007 | 408 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 74 | 6 | Best | 0.13742 | 1.8412 | 0.13742 | 1629 | ensemble | Method: LogitBoost | | | | | | | | | | NumLearningCycles: 13 | | | | | | | | | | LearnRate: 0.62696 | | | | | | | | | | MinLeafSize: 24 | | | | | | | | | | MaxNumSplits: 614 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 75 | 6 | Accept | 0.17481 | 0.30797 | 0.13742 | 102 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 76 | 6 | Accept | 0.21799 | 0.74319 | 0.13742 | 102 | linear | Lambda: 1.1064e-07 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: ridge | | 77 | 6 | Accept | 0.16922 | 14.713 | 0.13742 | 408 | nb | DistributionNames: kernel | | | | | | | | | | Width: 3.9538 | | | | | | | | | | Kernel: normal | | 78 | 6 | Accept | 0.19186 | 9.4827 | 0.13742 | 102 | nb | DistributionNames: kernel | | | | | | | | | | Width: 21.308 | | | | | | | | | | Kernel: epanechnikov | | 79 | 6 | Accept | 0.23856 | 0.17781 | 0.13742 | 102 | tree | MinLeafSize: 72 | | | | | | | | | | MaxNumSplits: 507 | | | | | | | | | | SplitCriterion: deviance | | | | | | | | | | NumVariablesToSample: 1 | | 80 | 6 | Accept | 0.21839 | 0.51905 | 0.13742 | 102 | linear | Lambda: 0.19696 | | | | | | | | | | Learner: svm | | | | | | | | | | Regularization: lasso | |========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Training set | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |========================================================================================================================================| | 81 | 6 | Accept | 0.16708 | 0.56823 | 0.13742 | 408 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 82 | 6 | Accept | 0.22351 | 0.47201 | 0.13742 | 102 | linear | Lambda: 1.0538e-05 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: lasso | | 83 | 6 | Accept | 0.18091 | 14.968 | 0.13742 | 102 | nb | DistributionNames: kernel | | | | | | | | | | Width: 4583.9 | | | | | | | | | | Kernel: triangle | | 84 | 6 | Error | NaN | 0.33319 | 0.13742 | 102 | svm | BoxConstraint: 799.29 | | | | | | | | | | KernelScale: 0.041631 | | | | | | | | | | PolynomialOrder: 3 | | | | | | | | | | Standardize: false | | 85 | 6 | Accept | 0.21771 | 0.54034 | 0.13742 | 102 | linear | Lambda: 2.9094 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: ridge | | 86 | 6 | Accept | 0.22361 | 0.66518 | 0.13742 | 102 | linear | Lambda: 0.0025068 | | | | | | | | | | Learner: logistic | | | | | | | | | | Regularization: ridge | | 87 | 6 | Accept | 0.20679 | 8.4962 | 0.13742 | 102 | nb | DistributionNames: kernel | | | | | | | | | | Width: 0.4566 | | | | | | | | | | Kernel: epanechnikov | | 88 | 6 | Accept | 0.16362 | 12.527 | 0.13742 | 408 | nb | DistributionNames: kernel | | | | | | | | | | Width: 135.21 | | | | | | | | | | Kernel: box | | 89 | 6 | Accept | 0.31314 | 2.4851 | 0.13742 | 102 | kernel | Learner: svm | | | | | | | | | | KernelScale: 0.26268 | | | | | | | | | | Lambda: 2.4658e-05 | | | | | | | | | | NumExpansionDimensions: 287 | | 90 | 6 | Accept | 0.1963 | 0.39621 | 0.13742 | 102 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | |========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Training set | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |========================================================================================================================================| | 91 | 6 | Accept | 0.29272 | 7.9575 | 0.13742 | 102 | svm | BoxConstraint: 0.026874 | | | | | | | | | | KernelScale: 3.5415 | | | | | | | | | | PolynomialOrder: 3 | | | | | | | | | | Standardize: false | | 92 | 6 | Accept | 0.25432 | 9.0131 | 0.13742 | 102 | svm | BoxConstraint: 329.7 | | | | | | | | | | KernelScale: 0.017534 | | | | | | | | | | PolynomialOrder: 2 | | | | | | | | | | Standardize: false | | 93 | 6 | Accept | 0.18366 | 0.36305 | 0.13742 | 102 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 94 | 6 | Accept | 0.21171 | 8.1838 | 0.13742 | 102 | svm | BoxConstraint: 0.038505 | | | | | | | | | | KernelScale: 1.8597 | | | | | | | | | | PolynomialOrder: 3 | | | | | | | | | | Standardize: false | | 95 | 6 | Accept | 0.2473 | 0.62532 | 0.13742 | 102 | svm | BoxConstraint: 0.0031518 | | | | | | | | | | KernelScale: 64.038 | | | | | | | | | | PolynomialOrder: 3 | | | | | | | | | | Standardize: true | | 96 | 6 | Accept | 0.16289 | 0.3975 | 0.13742 | 408 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 97 | 6 | Accept | 0.1624 | 0.37295 | 0.13742 | 1629 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 98 | 6 | Accept | 0.25848 | 8.9384 | 0.13742 | 102 | svm | BoxConstraint: 0.0025802 | | | | | | | | | | KernelScale: 0.025276 | | | | | | | | | | PolynomialOrder: 2 | | | | | | | | | | Standardize: false | | 99 | 6 | Best | 0.13168 | 4.284 | 0.13168 | 6513 | ensemble | Method: LogitBoost | | | | | | | | | | NumLearningCycles: 13 | | | | | | | | | | LearnRate: 0.62696 | | | | | | | | | | MinLeafSize: 24 | | | | | | | | | | MaxNumSplits: 614 | | | | | | | | | | SplitCriterion: <undefined> | | | | | | | | | | NumVariablesToSample: NaN | | 100 | 6 | Accept | 0.23856 | 1.5422 | 0.13168 | 102 | ensemble | Method: AdaBoostM1 | | | | | | | | | | NumLearningCycles: 17 | | | | | | | | | | LearnRate: 0.016685 | | | | | | | | | | MinLeafSize: 43 | | | | | | | | | | MaxNumSplits: 892 | | | | | | | | | | SplitCriterion: gdi | | | | | | | | | | NumVariablesToSample: NaN | |========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Training set | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |========================================================================================================================================| | 101 | 6 | Accept | 0.22709 | 0.42263 | 0.13168 | 102 | linear | Lambda: 6.792e-09 | | | | | | | | | | Learner: svm | | | | | | | | | | Regularization: lasso | | 102 | 6 | Accept | 0.17125 | 13.985 | 0.13168 | 408 | nb | DistributionNames: kernel | | | | | | | | | | Width: 21.308 | | | | | | | | | | Kernel: epanechnikov | | 103 | 6 | Accept | 0.17426 | 0.38353 | 0.13168 | 408 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN | | | | | | | | | | Kernel: <undefined> | | 104 | 6 | Accept | 0.18015 | 0.23257 | 0.13168 | 102 | tree | MinLeafSize: 6 | | | | | | | | | | MaxNumSplits: 17667 | | | | | | ...
__________________________________________________________ Optimization completed. Total iterations: 2387 Total elapsed time: 2628.053 seconds Total time for training and validation: 14609.748 seconds Best observed learner is an ensemble model with: Method: LogitBoost NumLearningCycles: 114 LearnRate: 0.55256 MinLeafSize: 5 MaxNumSplits: 13 SplitCriterion: <undefined> NumVariablesToSample: NaN Observed validation loss: 0.12429 Time for training and validation: 53.3034 seconds Documentation for fitcauto display
Total elapsed time
значение показывает, что оптимизация ASHA заняла меньше времени, чтобы запуститься, чем Байесова оптимизация (приблизительно 0,7 часа).
Итоговая модель возвращена fitcauto
соответствует лучшему наблюдаемому ученику. Прежде, чем возвратить модель, функция переобучает его с помощью целого обучающего набора данных (adultdata
), перечисленный Learner
(или модель) тип и отображенные гиперзначения параметров.
Оцените эффективность возвращенного bayesianMdl
и ashaMdl
модели на наборе тестов adulttest
при помощи матриц беспорядка и кривых рабочей характеристики приемника (ROC).
Для каждой модели найдите предсказанные метки и значения баллов для набора тестов.
[bayesianLabels,bayesianScores] = predict(bayesianMdl,adulttest); [ashaLabels,ashaScores] = predict(ashaMdl,adulttest);
Создайте матрицы беспорядка из результатов набора тестов. Диагональные элементы указывают на количество правильно классифицированных экземпляров данного класса. Недиагональными элементами являются экземпляры неправильно классифицированных наблюдений. Используйте 1 2 мозаичное размещение, чтобы сравнить результаты.
tiledlayout(1,2) nexttile confusionchart(adulttest.salary,bayesianLabels) title("Confusion Matrix (Bayesian Optimization)") nexttile confusionchart(adulttest.salary,ashaLabels) title("Confusion Matrix (ASHA Optimization)")
Вычислите точность классификации наборов тестов для каждой модели, где точность является процентом правильно классифицированных наблюдений набора тестов.
bayesianAccuracy = (1-loss(bayesianMdl,adulttest,"salary"))*100
bayesianAccuracy = 85.1825
ashaAccuracy = (1-loss(ashaMdl,adulttest,"salary"))*100
ashaAccuracy = 85.7155
На основе матриц беспорядка и значений точности, ashaMdl
немного превосходит по характеристикам bayesianMdl
на наборе тестов. Однако обе модели выполняют хорошо.
Для каждой модели постройте кривую ROC для значений баллов, соответствующих метке '<=50K'
. Найдите столбец баллов, который соответствует той метке. Порядок следования столбцов баллов совпадает с порядком классов в обученной модели.
bayesianMdl.ClassNames
ans = 2×1 categorical
<=50K
>50K
ashaMdl.ClassNames
ans = 2×1 categorical
<=50K
>50K
Поскольку '<=50K'
перечислен сначала для обеих моделей, первый столбец баллов соответствует той метке.
Постройте кривые ROC и вычислите область под кривой (AUC). Кривая ROC показывает истинный положительный уровень по сравнению с ложным положительным уровнем для различных порогов классификатора выход. Для совершенного классификатора, истинный положительный уровень которого всегда 1 независимо от порога, AUC = 1. Для бинарного классификатора, который случайным образом присваивает наблюдения классам, AUC = 0.5. Большое значение AUC (близко к 1) указывает на хорошую эффективность классификатора.
[bayesianX,bayesianY,~,bayesianAUC] = perfcurve(adulttest.salary,bayesianScores(:,1),"<=50K"); [ashaX,ashaY,~,ashaAUC] = perfcurve(adulttest.salary,ashaScores(:,1),"<=50K"); figure plot(bayesianX,bayesianY) hold on plot(ashaX,ashaY) title("ROC Curves") xlabel("False Positive Rate") ylabel("True Positive Rate") legend(["Bayesian Optimization","ASHA Optimization"])
bayesianAUC
bayesianAUC = 0.9026
ashaAUC
ashaAUC = 0.9084
На основе значений AUC оба классификатора выполняют хорошо на тестовых данных.
fitcauto
| confusionchart
| perfcurve
| BayesianOptimization