10
UMLClass
30
50
470
60
halign=left
Data mining goal:
Feature selection and classification project plan
UMLSpecialState
30
170
20
20
type=initial
Relation
40
170
90
30
lt=<-
70.0;10.0;10.0;10.0
UMLState
110
150
200
60
Obtain interesting dataset
from the sql project plan
UMLState
370
140
140
90
Prepare data
for modelling
with the
mulset R package
UMLSpecialState
690
130
160
100
type=decision
All datasets used
for modelling?
Relation
300
170
90
30
lt=<-
70.0;10.0;10.0;10.0
UMLState
560
140
100
90
Multiple of
datasets are
generated
Relation
500
170
80
30
lt=<-
60.0;10.0;10.0;10.0
Relation
650
170
60
30
lt=<-
40.0;10.0;10.0;10.0
Relation
100
220
690
140
lt=<-
[NO]
50.0;120.0;10.0;120.0;10.0;30.0;670.0;30.0;670.0;10.0
UMLState
540
290
150
110
Train and validate
a selection of
classifiers on training
set, using
cross validation
UMLState
720
300
150
80
Filter models with
bad
AUROC, specificity,
or sensitivity
Relation
680
330
60
30
lt=<-
40.0;10.0;10.0;10.0
UMLState
150
300
150
70
Split data in training
and test set
Relation
290
320
70
30
lt=<-
50.0;10.0;10.0;10.0
UMLState
340
310
150
60
Set random seed
Relation
480
330
80
30
lt=<-
60.0;10.0;10.0;10.0
UMLState
900
310
180
60
Compare models in
terms of
training and test AUROC
Relation
860
330
60
30
lt=<-
40.0;10.0;10.0;10.0
Relation
760
220
250
110
lt=<-
10.0;10.0;230.0;10.0;230.0;90.0
Relation
840
80
90
120
lt=<-
[YES]
70.0;10.0;10.0;100.0
UMLState
910
40
150
90
Compute
variable
importance
for models
using caret package
Relation
980
120
80
70
lt=<-
60.0;50.0;10.0;50.0;10.0;10.0
UMLState
1040
140
140
70
Perform
correlation analysis
and visualise
results
Relation
1100
200
80
80
lt=<-
60.0;60.0;10.0;60.0;10.0;10.0
UMLSpecialState
1160
250
20
20
type=final