В этом примере показано, как сравнивать новый Logistic модель для пожизненного PD против модели «чемпиона».
Загрузите данные портфеля, которые включают информацию о ссуде и макросе.
load RetailCreditPanelData.mat
data = join(data,dataMacro);
disp(head(data)) ID ScoreGroup YOB Default Year GDP Market
__ __________ ___ _______ ____ _____ ______
1 Low Risk 1 0 1997 2.72 7.61
1 Low Risk 2 0 1998 3.57 26.24
1 Low Risk 3 0 1999 2.86 18.1
1 Low Risk 4 0 2000 2.43 3.19
1 Low Risk 5 0 2001 1.26 -10.51
1 Low Risk 6 0 2002 -0.59 -22.95
1 Low Risk 7 0 2003 0.63 2.78
1 Low Risk 8 0 2004 1.85 9.48
nIDs = max(data.ID); uniqueIDs = unique(data.ID); rng('default'); % for reproducibility c = cvpartition(nIDs,'HoldOut',0.4); TrainIDInd = training(c); TestIDInd = test(c); TrainDataInd = ismember(data.ID,uniqueIDs(TrainIDInd)); TestDataInd = ismember(data.ID,uniqueIDs(TestIDInd));
Для этого примера поместите новую модель, используя только информацию о группе баллов, но без информации о возрасте. Во-первых, эту модель можно проверить автономно. Дополнительные сведения см. в разделе Проверка базовой модели PD на время жизни.
В этом наборе данных важна информация о возрасте. Новая модель работает не так хорошо, как модель чемпиона (которая включает возраст, группу баллов и макровары).
Вписать новый Logistic модель с использованием fitLifetimePDModel.
ModelType = "logistic"; pdModel = fitLifetimePDModel(data(TrainDataInd,:),ModelType,... 'ModelID','LogisticNoAge',... 'IDVar','ID',... 'LoanVars','ScoreGroup',... 'MacroVars',{'GDP','Market'},... 'ResponseVar','Default'); disp(pdModel)
Logistic with properties:
ModelID: "LogisticNoAge"
Description: ""
Model: [1x1 classreg.regr.CompactGeneralizedLinearModel]
IDVar: "ID"
AgeVar: ""
LoanVars: "ScoreGroup"
MacroVars: ["GDP" "Market"]
ResponseVar: "Default"
Сравнение нового Logistic модели чемпионской модели необходим доступ к предсказаниям модели чемпионской модели. Модель чемпиона может даже иметь разные предикторы, поэтому отображение между используемыми данными и точными входами модели чемпиона может потребовать промежуточного этапа предварительной обработки. В этом примере предполагается, что имеется инструмент «черный ящик» для получения прогнозов из модели чемпиона.
Сравнение производительности модели для обеих моделей с помощью modelDiscrimination.
DataSetChoice ="Testing"; if DataSetChoice=="Training" Ind = TrainDataInd; else Ind = TestDataInd; end ChampionPD = getChampionModelPDs(data(Ind,:)); [DiscMeasure,DiscData] = modelDiscrimination(pdModel,data(Ind,:),'DataID',DataSetChoice,... 'ReferencePD',ChampionPD,'ReferenceID',"Champion"); disp(DiscMeasure)
AUROC
_______
LogisticNoAge, Testing 0.66503
Champion, Testing 0.70018
disp(head(DiscData))
ModelID X Y T
_______________ ________ ________ ________
"LogisticNoAge" 0 0 0.02287
"LogisticNoAge" 0.04673 0.090978 0.02287
"LogisticNoAge" 0.064656 0.14922 0.022711
"LogisticNoAge" 0.10982 0.22764 0.020553
"LogisticNoAge" 0.14421 0.311 0.018483
"LogisticNoAge" 0.19237 0.41454 0.01722
"LogisticNoAge" 0.23558 0.43738 0.014125
"LogisticNoAge" 0.27979 0.52037 0.012812
disp(tail(DiscData))
ModelID X Y T
__________ _______ _______ __________
"Champion" 0.88743 0.98021 0.0032242
"Champion" 0.90293 0.98477 0.0025583
"Champion" 0.91884 0.98896 0.0023801
"Champion" 0.93303 0.99239 0.0018756
"Champion" 0.94995 0.99391 0.0017711
"Champion" 0.96705 0.99695 0.0016436
"Champion" 0.98295 0.99886 0.0012847
"Champion" 1 1 0.00086887
IndModel = DiscData.ModelID=="LogisticNoAge"; plot(DiscData.X(IndModel),DiscData.Y(IndModel)) hold on IndModel = DiscData.ModelID=="Champion"; plot(DiscData.X(IndModel),DiscData.Y(IndModel),':') hold off title(strcat("ROC ",pdModel.ModelID)) xlabel('Fraction of non-defaulters') ylabel('Fraction of defaulters') legend(strcat(DiscMeasure.Properties.RowNames,", AUROC = ",num2str(DiscMeasure.AUROC)),'Location','southeast')

[DiscMeasure,DiscData] = modelDiscrimination(pdModel,data(Ind,:),'SegmentBy','YOB','DataID',DataSetChoice,... 'ReferencePD',ChampionPD,'ReferenceID',"Champion"); disp(DiscMeasure)
AUROC
_______
LogisticNoAge, YOB=1, Testing 0.64879
Champion, YOB=1, Testing 0.64972
LogisticNoAge, YOB=2, Testing 0.65699
Champion, YOB=2, Testing 0.66496
LogisticNoAge, YOB=3, Testing 0.63508
Champion, YOB=3, Testing 0.64774
LogisticNoAge, YOB=4, Testing 0.62656
Champion, YOB=4, Testing 0.66204
LogisticNoAge, YOB=5, Testing 0.6205
Champion, YOB=5, Testing 0.65439
LogisticNoAge, YOB=6, Testing 0.61739
Champion, YOB=6, Testing 0.63156
LogisticNoAge, YOB=7, Testing 0.64016
Champion, YOB=7, Testing 0.63117
LogisticNoAge, YOB=8, Testing 0.63339
Champion, YOB=8, Testing 0.63339
disp(head(DiscData))
ModelID YOB X Y T
_______________ ___ _______ _______ _________
"LogisticNoAge" 1 0 0 0.022711
"LogisticNoAge" 1 0.12062 0.22401 0.022711
"LogisticNoAge" 1 0.23459 0.41435 0.018483
"LogisticNoAge" 1 0.33329 0.59151 0.01722
"LogisticNoAge" 1 0.45578 0.69107 0.01151
"LogisticNoAge" 1 0.5683 0.77452 0.009347
"LogisticNoAge" 1 0.67031 0.84919 0.0087028
"LogisticNoAge" 1 0.78943 0.9063 0.0064814
disp(tail(DiscData))
ModelID YOB X Y T
_______________ ___ _______ ______ __________
"LogisticNoAge" 8 0 0 0.014125
"LogisticNoAge" 8 0.31762 0.5625 0.014125
"LogisticNoAge" 8 0.65751 0.8125 0.0071273
"LogisticNoAge" 8 1 1 0.0040058
"Champion" 8 0 0 0.0040291
"Champion" 8 0.31762 0.5625 0.0040291
"Champion" 8 0.65751 0.8125 0.0017711
"Champion" 8 1 1 0.00086887
Сравнение точности двух моделей с modelAccuracy.
GroupingVar ="YOB"; [AccMeasure,AccData] = modelAccuracy(pdModel,data(Ind,:),GroupingVar,'DataID',DataSetChoice,... 'ReferencePD',ChampionPD,'ReferenceID',"Champion"); disp(AccMeasure)
RMSE
__________
LogisticNoAge, grouped by YOB, Testing 0.0031021
Champion, grouped by YOB, Testing 0.00046476
disp(head(AccData))
ModelID YOB PD
__________ ___ _________
"Observed" 1 0.017636
"Observed" 2 0.013303
"Observed" 3 0.010846
"Observed" 4 0.010709
"Observed" 5 0.0093528
"Observed" 6 0.0060197
"Observed" 7 0.0034776
"Observed" 8 0.0012535
disp(tail(AccData))
ModelID YOB PD
__________ ___ _________
"Champion" 1 0.017244
"Champion" 2 0.012999
"Champion" 3 0.011428
"Champion" 4 0.010693
"Champion" 5 0.0085574
"Champion" 6 0.005937
"Champion" 7 0.0035193
"Champion" 8 0.0021802
AccDataUnstacked = unstack(AccData,"PD","ModelID"); figure; plot(AccDataUnstacked.(GroupingVar),AccDataUnstacked.(pdModel.ModelID),'-o') hold on plot(AccDataUnstacked.(GroupingVar),AccDataUnstacked.Observed,'*') plot(AccDataUnstacked.(GroupingVar),AccDataUnstacked.("Champion"),':s') hold off title(strcat(AccMeasure.Properties.RowNames,", RMSE = ",num2str(AccMeasure.RMSE))) xlabel(GroupingVar) ylabel('PD') legend(pdModel.ModelID,"Observed","Champion") grid on

[AccMeasure,AccData] = modelAccuracy(pdModel,data(Ind,:),["YOB","ScoreGroup"],'DataID',DataSetChoice,... 'ReferencePD',ChampionPD,'ReferenceID',"Champion"); disp(AccMeasure)
RMSE
_________
LogisticNoAge, grouped by YOB, ScoreGroup, Testing 0.0036974
Champion, grouped by YOB, ScoreGroup, Testing 0.0010716
disp(head(AccData))
ModelID YOB ScoreGroup PD
__________ ___ ___________ _________
"Observed" 1 High Risk 0.030877
"Observed" 1 Medium Risk 0.013541
"Observed" 1 Low Risk 0.0081449
"Observed" 2 High Risk 0.022838
"Observed" 2 Medium Risk 0.012376
"Observed" 2 Low Risk 0.0046482
"Observed" 3 High Risk 0.017651
"Observed" 3 Medium Risk 0.0092652
unstack(AccData,'PD','ModelID')
ans=24×5 table
YOB ScoreGroup Champion LogisticNoAge Observed
___ ___________ _________ _____________ _________
1 High Risk 0.028165 0.019641 0.030877
1 Medium Risk 0.014833 0.0099388 0.013541
1 Low Risk 0.008422 0.0055911 0.0081449
2 High Risk 0.02167 0.019337 0.022838
2 Medium Risk 0.011123 0.0098141 0.012376
2 Low Risk 0.0061856 0.0055194 0.0046482
3 High Risk 0.019285 0.020139 0.017651
3 Medium Risk 0.0098085 0.010179 0.0092652
3 Low Risk 0.0054096 0.0057356 0.005813
4 High Risk 0.018136 0.019175 0.018562
4 Medium Risk 0.0091921 0.0096563 0.0094929
4 Low Risk 0.0050562 0.0054292 0.004392
5 High Risk 0.014818 0.014806 0.016288
5 Medium Risk 0.0072853 0.007454 0.0080033
5 Low Risk 0.0039358 0.0041822 0.0041745
6 High Risk 0.01049 0.012153 0.0096889
⋮
Также можно сравнить две новые разрабатываемые модели.
pdModelTTC = fitLifetimePDModel(data(TrainDataInd,:),"probit",... 'ModelID','ProbitTTC',... 'AgeVar','YOB',... 'IDVar','ID',... 'LoanVars','ScoreGroup',... 'ResponseVar','Default',... 'Description',"TTC model, no macro variables, probit."); disp(pdModelTTC)
Probit with properties:
ModelID: "ProbitTTC"
Description: "TTC model, no macro variables, probit."
Model: [1x1 classreg.regr.CompactGeneralizedLinearModel]
IDVar: "ID"
AgeVar: "YOB"
LoanVars: "ScoreGroup"
MacroVars: ""
ResponseVar: "Default"
Сравните точность.
[AccMeasureTTC,AccDataTTC] = modelAccuracy(pdModelTTC,data(Ind,:),["YOB","ScoreGroup"],'DataID',DataSetChoice,... 'ReferencePD',predict(pdModel,data(Ind,:)),'ReferenceID',pdModel.ModelID); disp(AccMeasureTTC)
RMSE
_________
ProbitTTC, grouped by YOB, ScoreGroup, Testing 0.0016726
LogisticNoAge, grouped by YOB, ScoreGroup, Testing 0.0036974
unstack(AccDataTTC,'PD','ModelID')
ans=24×5 table
YOB ScoreGroup LogisticNoAge Observed ProbitTTC
___ ___________ _____________ _________ _________
1 High Risk 0.019641 0.030877 0.028114
1 Medium Risk 0.0099388 0.013541 0.014865
1 Low Risk 0.0055911 0.0081449 0.0087364
2 High Risk 0.019337 0.022838 0.023239
2 Medium Risk 0.0098141 0.012376 0.012053
2 Low Risk 0.0055194 0.0046482 0.0069786
3 High Risk 0.020139 0.017651 0.019096
3 Medium Risk 0.010179 0.0092652 0.0097145
3 Low Risk 0.0057356 0.005813 0.0055406
4 High Risk 0.019175 0.018562 0.015599
4 Medium Risk 0.0096563 0.0094929 0.0077825
4 Low Risk 0.0054292 0.004392 0.0043722
5 High Risk 0.014806 0.016288 0.012666
5 Medium Risk 0.007454 0.0080033 0.0061971
5 Low Risk 0.0041822 0.0041745 0.0034292
6 High Risk 0.012153 0.0096889 0.010223
⋮
function PD = getChampionModelPDs(data) m = load('LifetimeChampionModel.mat'); PD = predict(m.pdModel,data); end
fitLifetimePDModel | Logistic | modelAccuracy | modelDiscrimination | predict | predictLifetime | Probit