Data Mining

Anuncio
Census
Data Analysis & Data Mining
Da t a M i n i n g
Mar ía del Rosar io Br uer a
I BM Scholar s Pr ogr am
Census
Data Analysis & Data Mining
Pr e g u n t a s y r e s p u e s t a s
Pr egunt as:
· ¢&XiOHVHOYDORUGHORVFOLHQWHV"
· ¢&XiOHV VRQ ORV FOLHQWHV TXH WLHQHQ PD\RU
SUREDELOLGDGGHGHVHUWDU"
· ¢&XiOHV VRQ ORV SURGXFWRV TXH VH YHQGHQ HQ
IRUPDFRQMXQWD"«
Respuest as:
· (VWiQHQORVGDWRVGHOXVXDULR
· 6HQHFHVLWDQKHUUDPLHQWDVHVSHFLDOHVSDUD
HQFRQWUDUODV
Census
Data Analysis & Data Mining
Business Int elligenc e
S´(V XQ SDUDJXDV EDMR HO TXH VH
LQFOX\H XQ FRQMXQWR GH FRQFHSWRV \
PHWRGRORJtDVFX\DPLVLyQFRQVLVWHHQ
PHMRUDU HO SURFHVR GH WRPD GH
GHFLVLRQHVHQORVQHJRFLRVEDViQGRVH
HQ KHFKRV \ VLVWHPDV TXH WUDEDMDQ
FRQKHFKRVµ
+RZDUG'UHVQHU
*DUWQHU*URXS
Census
Data Analysis & Data Mining
B .I .: r e c u r s o s y h e r r a m i e n t a s
S)XHQWHV GH GDWRV ZDUHKRXVHV
GDWDPDUWVHWF
S+HUUDPLHQWDV GH DGPLQLVWUDFLyQ GH
GDWRV
S+HUUDPLHQWDV GH H[WUDFFLyQ \
FRQVXOWD
S+HUUDPLHQWDVGHPRGHOL]DFLyQ'DWD
0LQLQJ
Census
Data Analysis & Data Mining
¿Qu é e s Da t a M i n i n g ? (1 9 9 7 )
·'DWD 0LQLQJ es el pr oceso de
explor ación y análisis - de maner a
aut omát ica o semiaut omát ica - de los
dat os
par a
obt ener
pat r ones
signif icat ivos y r eglas de negocio.
·
0LFKDHO%HUU\*RUGRQ/LQRII
'DWD0LQLQJIRUPDUNHWLQJVDOHV
DQGFXVWRPHUVXSSRUW
:LOH\86$
Census
Data Analysis & Data Mining
Re f l e x i o n e s (2 0 0 0 )
S QRV JXVWD OD QRFLyQ GH TXH ORV SDWURQHV
GHEHQVHUVLJQLILFDWLYRV«
S 6L KD\DOJR TXHUHFKD]DPRV HV OD IUDVH ´SRU
PHGLRV DXWRPiWLFRV R VHPLDXWRPiWLFRVµ QR
SRUTXH QR VHD FLHUWR VLQ DXWRPDWL]DFLyQ HV
LPSRVLEOH PLQDU JUDQGHV FDQWLGDGHV GH GDWRV VLQR SRUTXH HQWHQGHPRV TXH VH KD SXHVWR
GHPDVLDGR pQIDVLV HQ OD DXWRPDWL]DFLyQ \ QR
VXILFLHQWH HQ ODV HWDSDV GH H[SORUDFLyQ \
DQiOLVLV
S 'DWD0LQLQJHVXQSURFHVR
!"$# %'&(#)* %+,- /.102435
565
Census
Data Analysis & Data Mining
Qu é N O e s Da t a M i n i n g
S1R HV XQ SURGXFWR TXH VH FRPSUD
HQODWDGR VLQR XQD GLVFLSOLQD TXH
GHEHVHUGRPLQDGD
S1RHVXQDVROXFLyQLQVWDQWiQHDDORV
SUREOHPDVGHQHJRFLR
S1R HV XQ ILQ HQ Vt PLVPR VLQR XQ
SURFHVR TXH D\XGD D HQFRQWUDU
VROXFLRQHVDSUREOHPDVGHQHJRFLR
Census
Data Analysis & Data Mining
Pi l a r e s d e l p r o c e s o
d e Da t a M i n i n g
S
S
S
'DWRV
$OJRULWPRV\WpFQLFDV
3UiFWLFDVGHPRGHOL]DFLyQ
Census
Data Analysis & Data Mining
Di s c i p l i n a s q u e s e i n t e g r a n
S,QWHOLJHQFLD$UWLILFLDO
S(VWDGtVWLFD
S7HFQRORJtDV
GH
VRSRUWH
GH
GHFLVLRQHV2/73
S7HFQRORJtDVGHKDUGZDUH\VRIWZDUH
Census
Data Analysis & Data Mining
Pe r s p e c t i v a h i s t ó r i c a
Fuente: Mining your Own Business Data Using DB2 Intelligent Miner for Data
Census
Data Analysis & Data Mining
Fuente: Mining your Own Business Data Using DB2 Intelligent Miner for Data
Census
Data Analysis & Data Mining
Et a p a s e n e l p r o c e s o
d e Da t a M i n i n g
·,GHQWLILFDUHOSUREOHPDGHQHJRFLR
·7UDQVIRUPDU
ORV
GDWRV
HQ
LQIRUPDFLyQ
·$FWXDUDSDUWLUGHORVUHVXOWDGRV
·0HGLUORVUHVXOWDGRVGHODVDFFLRQHV
Census
Data Analysis & Data Mining
The Mining
Pr ocess
Fuente: Mining your Own Business Data Using DB2 Intelligent Miner for Data
Census
Data Analysis & Data Mining
El A n a l i s t a d e Da t o s
S(V HO YtQFXOR HQWUH ODV iUHDV GH
WHFQRORJtD LQIRUPiWLFD \ ODV iUHDV GH
QHJRFLRV
S7UDGXFH
ORV
UHTXHULPLHQWRV
GH
LQIRUPDFLyQHQSUHJXQWDVDSURSLDGDVSDUD
VX DQiOLVLV FRQ ODV KHUUDPLHQWDV GH
PLQHUtD
S5HDOLPHQWDHO'DWD:DUHKRXVHGHOD
FRPSDxtDFRQQXHYRVFULWHULRVGHGDWD
FOHDQLQJ\GDWDYDOLGDWLRQ
Census
Data Analysis & Data Mining
El A n a l i s t a d e Da t o s
7HFQRORJtD
LQIRUPiWLFD
8VXDULRV
GHQHJRFLR
Census
Data Analysis & Data Mining
El A n a l i s t a
7896: ;<99>=
?
@
A 9
B6@96C
D: 9E?F<
G H
d e Da t o s
];?^MA <[6=<
D: 9E?F<
G H
D@I @YX;@
A H96: 9
D@I @*LZ?
[
<
A A : ;=
D@I @
D: 9E?F<
G H
D@I @J@
G <
K?89<
\ ;96: =KI 9
D@I @POQA <@
;6: ;=
D@I @MBRG @
;9ST?UGTV>@IW: ?U;
LM: ;6: ;=ND@I @
Census
Data Analysis & Data Mining
Habilidades requeridas
S'DWDPDQLSXODWLRQ64/
S&RQRFLPLHQWRGHODVWpFQLFDVGH
PLQHUtD\DQiOLVLVH[SORUDWRULR
S+DELOLGDG
GH
FRPXQLFDFLyQ
LQWHUSUHWDFLyQGHORVSUREOHPDVGH
QHJRFLR
S&UHDWLYLGDG
Census
Data Analysis & Data Mining
Da t a M i n i n g T e a m
Fuente: Mining your Own Business Data Using DB2 Intelligent Miner for Data
Census
Data Analysis & Data Mining
Co s t o s d e p r o y e c t o
Fuente: Mining your Own Business Data Using DB2 Intelligent Miner for Data
Census
Data Analysis & Data Mining
Or i g e n d e l o s d a t o s
S%DVHVGH'DWRV5HODFLRQDOHV
S'DWD:DUHKRXVHV
S'DWD0DUWVDQG2/$3
S2WURVIRUPDWRV([FHODUFKLYRV
$6&,,HQFXHVWDVGDWRVFHQVDOHV
HWF
Census
Data Analysis & Data Mining
Tipos de fuent es de dat os
S7UDQVDFFLRQDOHVHMODVRSHUDFLRQHV
UHDOL]DGDVFRQWDUMHWDGHFUpGLWR
S5HODFLRQDOHVHMODHVWUXFWXUDGH
ORVSURGXFWRVTXHRIUHFHHO%DQFR
S'HPRJUiILFRVHMFDUDFWHUtVWLFDV
GHOJUXSRIDPLOLDU
Census
Data Analysis & Data Mining
La form a de los dat os
p a r a Da t a M i n i n g
S6HRUJDQL]DQHQIRUPDGHXQDWDEOD
SODQD FRPSXHVWD SRU ILODV \
FROXPQDV
S/DV )LODV XQLGDG GH DQiOLVLV3RU
HMHPSORXQDFXHQWDXQWLFNHW
S/DV FROXPQDV ORV DWULEXWRV GH
FDGD XQLGDG GH DQiOLVLV3RU HMHPSOR
IUHFXHQFLDGHXVRGHODWDUMHWDGH
FUpGLWR
Census
Data Analysis & Data Mining
Ca r a c t e r ís t i c a s d e l a s t a b l a s d e
d a t o s p a r a Da t a M i n i n g
S7RGRVORVGDWRVGHEHQHVWDUHQXQDVROD
WDEODR´YLVWDµGHOD%DVHGH'DWRV
S&DGD ILOD GHEH FRUUHVSRQGHU D XQD
LQVWDQFLDUHOHYDQWHDOQHJRFLR
S/DV &ROXPQDV VLQ YDULDELOLGDG GHEHQ VHU
LJQRUDGDV
S/DV &ROXPQDV FRQ YDORUHV ~QLFRV SDUD
FDGD FDVR GHEHQ VHU LJQRUDGDV 1UR GH
FXHQWD
Census
Data Analysis & Data Mining
La c alidad de los dat os
·(O p[LWR GH ODV DFWLYLGDGHV GH Dat a
Mining VH UHODFLRQD GLUHFWDPHQWH FRQ OD
CALI DADGHORVGDWRV
·6H GHEH LGHQWLILFDU ORV GDWRV IDOWDQWHV
“missings” RIXHUDGHUDQJR“out lier s”
Census
Data Analysis & Data Mining
La c alidad de los dat os
· 0XFKDVYHFHVUHVXOWDQHFHVDULRSUHSURFHVDUORV
GDWRVDQWHVGHGHULYDUORVDOPRGHORGHDQiOLVLV
(OSUHSURFHVDPLHQWRSXHGHLQFOXLU
WUDQVIRUPDFLRQHVUHGXFFLRQHVRFRPELQDFLRQHV
GHORVGDWRV
· /DVHPiQWLFDGHORVGDWRVGHEHD\XGDUSDUDOD
VHOHFFLyQGHXQDFRQYHQLHQWHr epr esent ación \
ODVERQGDGHVGHODUHSUHVHQWDFLyQHOHJLGD
JUDYLWDQGLUHFWDPHQWHVREUHODFDOLGDGGHO
PRGHOR\GHORVUHVXOWDGRVSRVWHULRUHV
Census
Data Analysis & Data Mining
Pr o b l e m a s c o n l o s d a t o s
· 'HPDVLDGRVGDWRV
_ GDWRVFRUUXSWRVRFRQUXLGR
_ GDWRVUHGXQGDQWHVUHTXLHUHQIDFWRUL]DFLyQ
_ GDWRVLUUHOHYDQWHV
_ H[FHVLYDFDQWLGDGGHGDWRVPXHVWUHR
· 3RFRVGDWRV
_ DWULEXWRVSHUGLGRVPLVVLQJV
_ YDORUHVSHUGLGRV
_ SRFDFDQWLGDGGHGDWRV
· 'DWRVIUDFWXUDGRV
_ GDWRVLQFRPSDWLEOHV
_ P~OWLSOHVIXHQWHVGHGDWRV
Census
Data Analysis & Data Mining
Pr e p a r a c i ó n d e l o s d a t o s
7UDQVIRUPHG'DWD
$VVLPLODWHG
,QIRUPDWLRQ
([WUDFWHG
,QIRUPDWLRQ
`ba cTa
deaf$ghijklg
m g b` n ga o$cTcTa gp
6HOHFW
7UDQVIRUP
0LQH
$VVLPLODWH
Census
Data Analysis & Data Mining
Da t a Wa r e h o u s e
S'DWD :DUHKRXVH LV D VXEMHFWRULHQWHG
LQWHJUDWHG WLPHYDULDQW QRQ YRODWLOH
FROOHFWLRQ RI GDWD LQ VXSSRUW RI
PDQDJHPHQWGHFLVLRQV
%LOO,QPRQ
S$FRS\RIWUDQVDFWLRQGDWDVSHFLILFDOO\
VWUXFWXUHGIRUTXHU\DQGDQDO\VLV
5DOSK.LPEDOO
Census
Data Analysis & Data Mining
Da t a M a r t s
S7pFQLFDPHQWHHVXQVXEFRQMXQWRGHO
': RULHQWDGR D XQD ILQDOLGDG
HVSHFtILFD GH QHJRFLR PDUNHWLQJ
ILQDQ]DVSURGXFFLyQHWF
S(O WpUPLQR VH XWLOL]D WDPELpQ SDUD
LGHQWLILFDU VROXFLRQHV DOWHUQDWLYDV D
XQ ': FRUSRUDWLYR PiV UHGXFLGDV \
GH PHQRU FRVWR \ WLHPSR GH
LPSODQWDFLyQ
Census
Data Analysis & Data Mining
Arquit ec t ura del
Da t a w a r e h o u s e
qrts uWvtwWwTx ylz{$wT| }Wvlz
~€ ‚wWvlv$s uTƒtt„ vR‚}
Metadata
ReportQ
uery,
EIS
OLAP
DW
Datos
operacionales y
externos
Data
Mining
Census
Data Analysis & Data Mining
Herram ient as de
e x p l o t a c i ó n d e l DW
S+HUUDPLHQWDVGHYLVXDOL]DFLyQ
S5HSRUWLQJ
S2/$3
S'DWD0LQLQJ
Census
Data Analysis & Data Mining
OL A P
S2Q/LQH$QDO\WLFDO3URFHVVLQJ
S3HUPLWHQ OD HODERUDFLyQ GH YLVWDV
PXOWLGLPHQVLRQDOHV GHO ': SDUD
RSWLPL]DUSHUIRUPDQFH
S(VWiQ VRSRUWDGDV SRU PRWRUHV GH
DGPLQLVWUDFLyQ GHO ': TXH DGPLWHQ
ODFRQVWUXFFLyQGHHVWRV´FXERVµ
Census
Data Analysis & Data Mining
OL A P
S+HUUDPLHQWDV~WLOHV\SRGHURVDV
SDUDDFFHGHUD%DVHVGH'DWRV\
'DWD:DUHKRXVHV\REWHQHU
´UHSRUWHVµGHLQIRUPDFLyQ
S/D WHFQRORJtD 2/$3 FRPPSOHPHQWD
ODV DFWLYLGDGHV GH 'DWD 0LQLQJ \
VXSHUDODVSRVLELOLGDGHVGHO64/
Census
Data Analysis & Data Mining
Da t a M i n i n g y OL A P
S/DV KHUUDPLHQWDV GH UHSRUWLQJ
2/$3
\
FRQVXOWD
UHVSRQGHQ
HIHFWLYDPHQWH SDUD OD FRQVWUXFFLyQ
GH
PRGHORV
GHVFULSWLYRV
\
UHWURVSHFWLYRV SDUD FRQILUPDU R
UHFKD]DU KLSyWHVLV SUHYLDV GHO
XVXDULR
Census
Data Analysis & Data Mining
Da t a M i n i n g y OL A P
S/DV KHUUDPLHQWDV GH 'DWD 0LQLQJ
SHUPLWHQ HQFRQWUDU SDWURQHV QR
HYLGHQWHV HQ ORV JUDQGHV YRO~PHQHV
GH LQIRUPDFLyQ GHO ': \ SURSRQHU
PRGHORVSUHGLFWLYRV
Census
Data Analysis & Data Mining
Qu é e s l a Es t a d ís t i c a
S(V OD GLVFLSOLQD TXH H[WUDH
LQIRUPDFLyQ JHQHUDO D SDUWLU GH
GDWRVHVSHFtILFRV
S(VHOHVWXGLRGHODHVWDELOLGDGHQOD
YDULDFLyQ
S(V HO DUWH GH H[DPLQDU VXPDUL]DU
\ H[WUDHU FRQFOXVLRQHV D SDUWLU GH
ORVGDWRV
Census
Data Analysis & Data Mining
Da t a M i n i n g y Es t a d ís t i c a
S/RV PpWRGRV HVWDGtVWLFRV VRQ HO
FRUD]yQ GH PXFKDV GH ODV WpFQLFDV
GHPLQHUtDGHGDWRV
S2ULJLQDOPHQWH PXFKDV GH HVWDV
WpFQLFDV IXHURQ GLVHxDGDV FRQ
SURSyVLWRVFRQILUPDWRULRV
S/D HVWDGtVWLFD H[SORUDWRULD DSDUHFH
HQ ORV FRQ ORV DSRUWHV GH
-7XFNH\
Census
Data Analysis & Data Mining
Da t a M i n i n g y Es t a d ís t i c a
S(Q OD 0LQHUtD GH 'DWRV QR VH KDFHQ
VXSXHVWRV D SULRUL VREUH OD QDWXUDOH]D
GH ODV YDULDEOHV \ GH ODV UHODFLRQHV
HQWUH HOODV QRUPDOLGDG OLQHDOLGDG
HWF
S/RVDOJRULWPRVHVWDGtVWLFRVVHDGDSWDQ
SDUD 0LQHUtD GH 'DWRV DO
SURFHVDPLHQWR GH JUDQGHV YRO~PHQHV
GHGDWRV
Census
Data Analysis & Data Mining
Da t a M i n i n g e I A
S/D ,QWHOLJHQFLD$UWLILFLDO VHLQWHJUD
D OD 0LQHUtD GH 'DWRV D SDUWLU GH
ODVUHGHVQHXURQDOHVDUWLILFLDOHV
S6H XWLOL]DQ SDUD FRQVWUXLU PRGHORV
SUHGLFWLYRV QR OLQHDOHV TXH DSUHQGHQ
DWUDYpVGHHQWUHQDPLHQWR\TXHVH
DVLPLODQDORVPRGHORVGHUHGHVGH
QHXURQDVELROyJLFDV
Census
Data Analysis & Data Mining
Re d e s n e u r o n a l e s
S/DVUHGHVQHXURQDOHVVRQDGHFXDGDV
SDUDSUREOHPDVGHWLSRSUHGLFWLYR
S8QSUREOHPDDSURSLDGRSDUDXQDUHG
QHXURQDOWLHQHWUHVFDUDFWHUtVWLFDV
… Se compr enden clar ament e los I NPUTS
… Se compr ende clar ament e el OUTPUT
… Exist en ej emplos (exper iencia)
suf icient es par a ent r enar a la r ed
Census
Data Analysis & Data Mining
Los m odelos neuronales
S/D UHG QHXURQDO QR SURGXFH UHJODV
H[SOtFLWDVTXHGHVFULEDQHOPRGHOR
S8QPRGHORQHXURQDOHVWDQEXHQRFRPROR
HV HO VHW GH GDWRV XVDGR SDUD HQWUHQDU
ODUHG
S(O PRGHOR HV HVWiWLFR \ GHEH VHU
H[SOtFLWDPHQWH DFWXDOL]DGR DJUHJDQGR
HMHPSORV UHFLHQWHV \ UHHQWUHQDQGR OD
UHGSDUDDVHJXUDUVXYLJHQFLD\XWLOLGDG
Census
Data Analysis & Data Mining
Los m odelos neuronales
S&RQ PRGHORV QHXURQDOHV VH SXHGH DWDFDU
XQD JUDQ YDULHGDG GH SUREOHPDV \
SURGXFLU EXHQRV UHVXOWDGRV D~Q HQ
GRPLQLRV
FRPSOHMRV
FRQ
YDULDEOHV
FRQWLQXDV\FDWHJyULFDV
S6RQ DSURSLDGRV SDUD WDUHDV GH
FODVLILFDFLyQ \ SUHGLFFLyQ FXDQGR ORV
UHVXOWDGRV GHO PRGHOR VRQ PiV
LPSRUWDQWHV TXH FRPSUHQGHU FyPR
IXQFLRQDHOPRGHOR
Census
Data Analysis & Data Mining
Cu s t o m e r Re l a t i o n s h i p
Managem ent
S(V HO SURFHVR TXH DGPLQLVWUD OD
UHODFLyQ HQWUH OD FRPSDxtD \ VXV
FOLHQWHV
S3DUD TXH UHVXOWH H[LWRVR UHVXOWD
QHFHVDULR LGHQWLILFDU ORV SDWURQHV
GHFRQVXPR\FRPSRUWDPLHQWRGHORV
FOLHQWHV
Census
Data Analysis & Data Mining
Da t a M i n i n g - CRM
S'DWD 0LQLQJ VH XWLOL]D SDUD
VLVWHPDWL]DU ORV SURFHVRV GH
E~VTXHGD GH ORV SUHGLFWRUHV GH
FRPSRUWDPLHQWR GH ORV FOLHQWHV HQ
ODVHWDSDVGHGLVHxRGHFDPSDxDV
S7DPELpQ VH DSOLFD SDUD OD PHGLFLyQ
GHORVUHVXOWDGRVGHODFDPSDxD\OD
UHDOLPHQWDFLyQGHO&50
Census
Data Analysis & Data Mining
Pr o b l e m a s t íp i c o s d e Da t a M i n i n g
S&ODVLILFDFLyQ
S(VWLPDFLyQ
S3UHGLFFLyQ
S$JUXSDPLHQWRDSDUWLUGH UHJODVGH
DVRFLDFLyQ
S&OXVWHULQJ
S'HVFULSFLyQ\YLVXDOL]DFLyQHWF
Census
Data Analysis & Data Mining
Pr o b l e m a d e Cl u s t e r i n g
$JUXSDUDORVFOLHQWHVVHJ~QVXVLQGLFDGRUHV
55HFHQF\ ))UHFXHQFLD 0 0RQWR HWF
HQVHJPHQWRVGHFRPSRUWDPLHQWRKRPRJpQHR
5HVXOWDGR &OLHQWHV +HDY\ 0HGLXP /LJKW
HWF
(OGHODIDFWXUDFLyQVHFRQFHQWUDHQHO
FOXVWHU+HDY\GHORVFOLHQWHV
/RV FOLHQWHV +HDY\ VRQ FDVDGRV FRQ KLMRV
WUDEDMDGRUHV DXWyQRPRV FRQ XQ LQJUHVR
VXSHULRUD
Census
Data Analysis & Data Mining
Pr o b l e m a d e Cl a s i f i c a c i ó n
&ODVLILFDU XQ QXHYR FOLHQWH GH
DFXHUGR
D
VX
SHUILO
VRFLRGHPRJUiILFR FRPR SRWHQFLDO
FOLHQWH+HDY\0HGLXP/LJKW
Census
Data Analysis & Data Mining
Pr o b l e m a d e Es t i m a c i ó n
(VWLPDU HO FRQVXPR GH XQ
GHWHUPLQDGR UXEUR GH DUWtFXORV GH
XQ JUXSR FOLHQWHV HQ HO SUy[LPR
WULPHVWUH
(VWLPDU HO /79 /LIH 7LPH 9DOXH
SRWHQFLDOGHXQQXHYRFOLHQWH
Census
Data Analysis & Data Mining
Pr o b l e m a d e Pr e d i c c i ó n
3UHGHFLUHODEDQGRQRGHXQFOLHQWH
FKXUQLQJDWULWWLRQ
3DUDXQDFRPSDxtDGHWHOHIRQtD
FHOXODU
3DUDXQD$)-3
3DUDXQDWDUMHWDGHFUpGLWR
Census
Data Analysis & Data Mining
Pr o b l e m a d e A s o c i a c i ó n
(QFRQWUDUODVUHJODVTXHGHWHUPLQDQ
HOFURVVWUDIILFHQWUHSURGXFWRV
SDUDORVFOLHQWHVGHXQ%DQFR3RU
HMHPSOR
´&XDQGRXQFOLHQWHVHDFWLYDHQ&DMD
GH$KRUURVHOVLJXLHQWHSURGXFWR
HQGRQGHVHDFWLYDHV3UpVWDPRV
SHUVRQDOHV(VWHSDWUyQRFXUUHHQ
HOGHORVFDVRVµ
Census
Data Analysis & Data Mining
Pr o b l e m a d e v i s u a l i za c i ó n
5HSUHVHQWDU PHGLDQWH XQ VRIWZDUH
GH
JHRORFDOL]DFLyQ
*,6
OD
GLVWULEXFLyQ GH ORV FOLHQWHV HQ OD
]RQDGH LQIOXHQFLD GH ODVVXFXUVDOHV
GHXQFRPHUFLR
Census
Data Analysis & Data Mining
Pr o b l e m a s u s u a l e s
S&DUDFWHUL]DFLyQ GH SHUILOHV GH
FOLHQWHVSDUDGHILQLUDFFLRQHVGH8S
VHOOLQJ\&URVVVHOOLQJ
S7UDFNLQJ GH FDPSDxDV \ SUHGLFFLyQ
GHUHVSXHVWDQRUHVSXHVWD
S&DQDVWDGHFRQVXPRGHWDUMHWDVGH
FUpGLWR\SUHYHQFLyQGHIUDXGHV
S0RGHORVGHSUHGLFFLyQGHDEDQGRQR
Census
Data Analysis & Data Mining
Pr o b l e m a s u s u a l e s
S3URJUDPDVGHPLOODMH\ILGHOL]DFLyQ
S&RQVROLGDFLyQGH%DVHVGH'DWRV
SURSLDVFRQIXHQWHVH[WHUQDV
S:HEPLQLQJ\DQiOLVLVGHWUiILFR\
XVRGHUHFXUVRVGHHEXVLQHVV
S'HILQLFLyQ GH PDUFRV PXHVWUDOHV
SDUD LQYHVWLJDFLRQHV GH PHUFDGR \
HQFXHVWDVGHFXVWRPHUVDWLVIDFWLRQ
Census
Data Analysis & Data Mining
La elec c ión del m odelo
p a r a Da t a M i n i n g
·3ULQFLSDOHVREMHWLYRVGHOSURFHVRGH'DWD
0LQLQJ
† pr edicción
† descr ipción
·(O PpWRGR D XWLOL]DU GHSHQGH GH ORV
REMHWLYRVSHUVHJXLGRVSRUHODQiOLVLVSHUR
WDPELpQ GH OD FDOLGDG \ FDQWLGDG GH ORV
GDWRVGLVSRQLEOHV
Census
Data Analysis & Data Mining
Fuente: Mining your Own Business Data Using DB2 Intelligent Miner for Data
Census
Data Analysis & Data Mining
Có m o s e l e c c i o n a r u n a
p o t e n c i a l a p l i c a c i ó n d e DM
&RQVLGHUDFLRQHVSUiFWLFDV
·3RWHQFLDOLPSDFWRVLJQLILFDWLYR5HODFLyQ
FRVWREHQHILFLR
·1RKD\RWUDDOWHUQDWLYD
·([LVWHVRSRUWHLQVWLWXFLRQDO
·1RH[LVWHQLPSHGLPHQWRVOHJDOHVGHXVR
GHODLQIRUPDFLyQ
Census
Data Analysis & Data Mining
Có m o s e l e c c i o n a r u n a
p o t e n c i a l a p l i c a c i ó n d e DM
Consider aciones t écnicas:
·'LVSRQLELOLGDGVXILFLHQWHGHGDWRV
·5HOHYDQFLDGHDWULEXWRV
·%DMRVQLYHOHVGHUXLGRHQORVGDWRV
·3UHFLVDUHOQLYHOGHFRQILDQ]DSDUDORV
UHVXOWDGRV
·&RQRFLPLHQWRDQWHULRUH[LVWHQWH
Census
Data Analysis & Data Mining
La evaluac ión de los m odelos
·&XiQDMXVWDGRHVHOPRGHOR"
·(VFRUUHFWDVXGHVFULSFLyQGHORV
GDWRVREVHUYDGRV"
·&XDQWDFRQILDQ]DVHSXHGHWHQHUHQ
VXVSUHGLFFLRQHV"
·&XiQFRPSUHQVLEOHHVHOPRGHOR"
Census
Data Analysis & Data Mining
Las m edidas
·/D FRQFRUGDQFLD GH XQ PRGHOR SUHGLFWLYR
FRQODUHDOLGDGVHPLGHFRQUHODFLyQDOD
WDVDGHHUURUHVGHFLUHOSRUFHQWDMHGH
FDVRV FODVLILFDGRV R FX\D SUHGLFFLyQ IXH
LQFRUUHFWD
·3DUD HOOR VH GLVSRQH GH GDWRV GH
YDOLGDFLyQ \ WHVWLQJ VREUH ORV TXH GHEH
DSOLFDUVH SHULyGLFDPHQWH HO PRGHOR D
PRGRGHFRQWURO
Census
Data Analysis & Data Mining
Las m edidas
·(Q HO FDVR GH ORV PRGHORV GHVFULSWLYRV
XQDEXHQDUHJODHVODTXHSURSRUFLRQDOD
LQIRUPDFLyQ PiV FRPSUHQVLEOH FRQ OD
PHQRU ´ORQJLWXGµ GH H[SUHVLyQ GH OD
UHJOD
·(Q GHILQLWLYD OD PHGLGD PiV LPSRUWDQWH
GH HIHFWLYLGDG HV HO UHWRUQR GH OD
LQYHUVLyQ
Census
Data Analysis & Data Mining
Un proyec t o ex it oso
S8Q~QLFRSURMHFWOHDGHU
S8QHTXLSR PXOWLGLVFLSOLQDULR LQWHJUDGR SRU
SHUVRQDVGHODViUHDVGH,7\GHQHJRFLR
S/DV XQLGDGHV GH QHJRFLR HVWiQ
LQYROXFUDGDVGHVGHHOFRPLHQ]R
S(O iUHD GH ,7 HVWi LQYROXFUDGD GHVGH HO
FRPLHQ]R
S8Q SHTXHxR SUR\HFWR SLORWR TXH PXHVWUH
ODVYHQWDMDVGH'DWD0LQLQJ
Census
Data Analysis & Data Mining
L a s n u e v a s t e c n o l o g ía s
Census
Data Analysis & Data Mining
We b M i n i n g
S(V HO GHVFXEULPLHQWR GH SDWURQHV
VLJQLILFDWLYRVDSDUWLUGHODQiOLVLVGH
OD HVWUXFWXUD FRQWHQLGRV \ XVR GH
OD:HE
Census
Data Analysis & Data Mining
We b M i n i n g T a x o n o m y
:HE0LQLQJ
:HEFRQWHQW :HE6WUXFWXUH :HEXVDJH
Census
Data Analysis & Data Mining
Re s u l t a d o s We b m i n i n g
S(O GH ORV YLVLWDQWHV TXH
DFFHGHQ D ZZZLEPFRPUHGERRNV
DFFHGHQD
ZZZLEPFRPVRIWZDUHGDWDLPLQHU
IRUGDWD
S(QWU\DQG([LWSRLQWV
Census
Data Analysis & Data Mining
Re s u l t a d o s We b m i n i n g
S/LQN
DQDO\VLV
\
SDWURQHV
VHFXHQFLDOHVGHHQODFHVGHSiJLQDV
S6HJPHQWDFLyQGHFOLHQWHVGHH
FRPPHUFH
S&DQDVWDGHSURGXFWRV
SHWFHWFHWF
Census
Data Analysis & Data Mining
Tex t Mining
S6RQQXHYDVKHUUDPLHQWDVGHVWLQDGDV
D
H[WUDHU
LQIRUPDFLyQ
GH
GRFXPHQWRV ´QR HVWUXFWXUDGRVµ
RUJDQL]DUORV
VHJPHQWDUORV
LQGH[DUORV
Census
Data Analysis & Data Mining
Pr o b l e m a s d e T e x t M i n i n g
S'LUHFFLRQDPLHQWR DXWRPiWLFR GH
HPDLOVVHJ~QVXFRQWHQLGR
S&ODVLILFDFLyQ
DXWRPiWLFD
GH
GRFXPHQWRVGHXQDLQWUDQHW
S%~VTXHGD GH LQIRUPDFLyQ HQ
GRFXPHQWRV GH GLVWLQWRV LGLRPDV
VLPXOWiQHDPHQWH
Census
Data Analysis & Data Mining
Pr o b l e m a s d e T e x t M i n i n g
S$QiOLVLV GH FRQWHQLGRV GH SiJLQDV
:HE
S2UJDQL]DFLyQ GH VHUYLFLRV GH
E~VTXHGDHQOD:HE
S([WUDFFLyQGHFRQFHSWRVGHVtQWHVLV
HQ GRFXPHQWRV UHIHULGRV DO PLVPR
DVXQWR
Census
Data Analysis & Data Mining
Co n c l u s i o n e s
Census
Data Analysis & Data Mining
Pa r a q u é M i n e r ía d e Da t o s
S/D 0LQHUtD GH 'DWRV HV XQD
KHUUDPLHQWDHILFD]SDUDGDUUHVSXHVWD
SUHJXQWDVFRPSOHMDVGH,QWHOLJHQFLDGH
1HJRFLRV
S/DV KHUUDPLHQWDV GLVSRQLEOHV SHUPLWHQ
DXWRPDWL]DU SDUWH GH OD WDUHD GH
HQFRQWUDU
ORV
SDWURQHV
GH
FRPSRUWDPLHQWRRFXOWRVHQORVGDWRV
S3HUR«
Census
Data Analysis & Data Mining
Qu é n o p u e d e
a u t o m a t i za r s e (t o d a v ía )
S/D HOHFFLyQ GH ORV SUREOHPDV GH QHJRFLR
FDQGLGDWRVSDUDWDUHDVGH'DWD0LQLQJ
S/D LGHQWLILFDFLyQ \ UHFROHFFLyQ GH ORV
GDWRV TXH FRQWLHQHQ OD LQIRUPDFLyQ
EXVFDGD
S(O PDVDMHR \ WUDWDPLHQWR GH ORV GDWRV
TXHSRVLELOLWDODE~VTXHGDGHSDWURQHV
S(OGLVHxR\FiOFXORGHYDULDEOHVGHULYDGDV
Census
Data Analysis & Data Mining
Qu é n o p u e d e
a u t o m a t i za r s e (t o d a v ía )
S(OSODQGHDFFLRQHVTXHDSR\iQGRVHHQORV
UHVXOWDGRVGHOPRGHORSURGX]FDHO52,
S/D PHGLFLyQ GHO p[LWR GH ODV DFFLRQHV
UHDOL]DGDV D SDUWLU GH ORV UHVXOWDGRV
SURSRUFLRQDGRVSRU'DWD0LQLQJ
Census
Data Analysis & Data Mining
Co n c l u s i o n e s
S&RQYLHUWD D 'DWD 0LQLQJ HQ XQD
SDUWHGHVXSUR\HFWRGHQHJRFLR
S,QFOX\D D 'DWD 0LQLQJ HQ OD
´FXOWXUDµGHVXRUJDQL]DFLyQ
Census
Data Analysis & Data Mining
Ej e m p l o s c o n
DB 2 I n t e l l i g e n t M i n e r f o r
Da t a
Census
Data Analysis & Data Mining
T é c n i c a s u t i l i za d a s
S&OXVWHULQJVHJPHQWDFLyQ
S&DQDVWDGHSURGXFWRV
S$UEROGHGHFLVLyQ
S5HGQHXURQDOFRPRPRGHOR
SUHGLFWLYR
Census
Data Analysis & Data Mining
¿Qu é e s “ c l u s t e r i n g ” ?
S(V OD SDUWLFLyQ GHO FRQMXQWR GH
LQGLYLGXRV HQ VXEFRQMXQWRV OR PiV
KRPRJpQHRVSRVLEOHV
S(OREMHWLYRHVPD[LPL]DUODVLPLOLWXG
GH ORV LQGLYLGXRV GHO FOXVWHU \
PD[LPL]DU ODV GLIHUHQFLDV HQWUH
FOXVWHUV
Census
Data Analysis & Data Mining
Aplic ac iones de la t éc nic a
S6HJPHQWDFLyQGHODEDVHGHGDWRV
S'HWHFFLyQGHIUDXGHV
S'HWHFFLyQGHGHIHFWRV
Census
Data Analysis & Data Mining
Ob j e t i v o s
S'HWHUPLQDUHOQ~PHURySWLPRGH
FOXVWHUV
S$VLJQDUDFDGDLQGLYLGXRDXQ~QLFR
FOXVWHU
S(YDOXDUHOLPSDFWRGHODVYDULDEOHV
HQODIRUPDFLyQGHOFOXVWHU
S&RPSUHQGHU HO ´SHUILOµ GH FDGD
FOXVWHU
Census
Data Analysis & Data Mining
Medidas de sim ilaridad
S9DULDEOHV
FDWHJyULFDV
HVFDODV
QRPLQDOHV \ RUGLQDOHV VRQ
VLPLODUHVVLVRQLJXDOHV
S9DULDEOHV
QXPpULFDV
HVFDODV
PpWULFDV HO DOJRULWPR GHWHUPLQD
VXGLIHUHQFLDH[SUHVDGDHQXQLGDGHV
GHGHVYLDFLRQHVVWDQGDUG
Census
Data Analysis & Data Mining
Ej e m p l o s i m i l a r i d a d
1RPEUH
Juan
Maria
No evaluado
6H[R
M
F
Diferente
(VW&LYLO
C
C
Igual
/XJDU
Cap.Fed
GBA
Diferente
6LPLODULGDG
0.33
0.33
Census
Data Analysis & Data Mining
Cr i t e r i o Co n d o r c e t
S(VXQDPHGLGDGHVLPLODULGDGTXHYDUtD
HQWUH\
S9DOH ORV LQGLYLGXRV HVWiQ XELFDGRV
DOHDWRULDPHQWHHQORVFOXVWHUV
S9DOH 7RGRV ORV LQGLYLGXRV GH ORV
FOXVWHUVVRQLGpQWLFRV\QRKD\LQGLYLGXRV
FRQ HVDV FDUDFWHUtVWLFDV IXHUD GH FDGD
FOXVWHU
S&RQGRUFHWPtQLPRXVXDO Census
Data Analysis & Data Mining
El p r o b l e m a
6H WUDWD GH VHJPHQWDU OD %DVH GH
'DWRVGHORVFOLHQWHVGHXQDWDUMHWD
GH FUpGLWR D SDUWLU GH VXV
LQGLFDGRUHV GH FRQVXPR SDUD
LGHQWLILFDU DO VHJPHQWR GH PD\RU
YDORU
Census
Data Analysis & Data Mining
Los dat os disponibles
S $ SDUWLU GH OD %DVH GH 'DWRV GH WUDQVDFFLRQHV
GHO ~OWLPR DxR GH ORV FOLHQWHV VH REWLHQHQ FRPR
YDULDEOHV
‡ )UHFXHQFLDGHXVRGHODWDUMHWD : calculada
como media de días ent r e t r ansacciones.
‡ 6DOGRSURPHGLRPHQVXDOGHWUDQVDFFLRQHVHQ
‡ 0RQWRSURPHGLRSRUWUDQVDFFLyQ
‡ &DQWLGDGGHVHUYLFLRVSRUGpELWRDXWRPiWLFR
‡ 'DWRVVRFLRGHPRJUiILFRVVH[RHGDG
HVWDGRFLYLORFXSDFLyQKLMRV
Census
Data Analysis & Data Mining
La preparac ión de dat os
S'HILQLUODXQLGDGGHDQiOLVLV
¢FXHQWDRWDUMHWD"
S'HILQLUTXpHVXQDWUDQVDFFLyQ
HM¢FyPRVHFRQVLGHUDQORVDMXVWHV
PRQWRVQHJDWLYRV"
S'HILQLUYDULDEOHVGHULYDGDVHQOD
IUHFXHQFLD¢FyPRLQWHUYLHQHQORV
GpELWRVDXWRPiWLFRV"
Census
Data Analysis & Data Mining
La preparac ión de dat os
S'HVFULELUODVYDULDEOHVDLQFOXLUHQ
HOPRGHORSDUD
ˆ Calcular medidas de posición y disper sión
ˆ I dent if icar dist r ibuciones asimét r icas
ˆ I dent if icar missings
ˆ I dent if icar valor es incor r ect os o f uer a
de r ango
ˆ I dent if icar out lier s
Census
Data Analysis & Data Mining
E s ta d is tic a s
'HVFULSWLYRVJHQHUDOHV
C lu s te r 0
1 0 0 ,0 0 % d e p o b la c ió n
S o lte D
roiv o rc ia d o /Viu d o
Ca s a d o
s c io s
N o tra b aCjau e n ta P ro p ia
R e la c io n d e p e n d e n c ia
e dad
Ma s c u lin o
F e m e n in o
e s ta d o _ c ivil
Si
No
o cup
s e xo
h ijo s
a vg tc kt
fre c u
pe sos
Census
Data Analysis & Data Mining
Cr i t e r i o s d e s e g m e n t a c i ó n
S6H WRPDQ FRPR YDULDEOHV ´DFWLYDVµ
ODV
TXH
FRUUHVSRQGHQ
DO
FRPSRUWDPLHQWRGHFRQVXPR
S6H
WRPDQ
FRPR
YDULDEOHV
VXSOHPHQWDULDV
ORV
DWULEXWRV
VRFLRGHPRJUiILFRV
Census
Credit
Ca rd
1
Data Analysis & Data Mining
Mas culino
Femenino
s cio s
[s e xo ]
Divo
So
lt ero
rciado /Viud o NoCuent
t rab aja
a Pro pia Si
Cas ado Relacio n d ep end encia
[es t ad o_ civil]
[o cup ]
No
[hijo s ]
fre cu
pe s os
a vg tckt
[e d a d ]
55
2
Divo
So
lt ero
rciado /Viud o NoCuent
t rab aja
a Pro pia Mas culino
Femenino
Cas ado Relacio n d epend encia
s cio s
[es t ad o_ civil]
[o cup ]
[s e xo ]
Si
pe s os
No
[hijo s ]
fre cu
[e d a d ]
a vg tckt
0
27
Divo
So
lt ero
rciado /Viud o NoCuent
t rab aja
a Pro pia Si
Cas ado Relacio n d ep end encia
s cio s
18
fre cu
pe s os
[es t ad o_ civil]
[o cup ]
Census
Cre dit
Ca rd Clus te r 2
[hijo s ]
Ma s c ulino
Fe me nino
Uso frecuente
fre cu
a vg tckt
[e d a d ]
No tra b aCue
ja nta P ro p ia
R e la c io n d e p e nd e nc ia
Trabajo Cta Propia
[e s ta d o_ c ivil]
[oc up ]
Si
Saldo >>>
Varones
[s e xo ]
[s e xo ]
S o lteDivo
ro rc ia d o /Viud o
Ca s a d o
Casados
s cio s
Mas culino
Femenino
27,21% de pobla ción
Data Analysis & Data Mining
Tienen 4 o más
débitos automáticos
No
No
Con hijos
[hijo s ]
pe sos
Edad 40-45
[e d a d ]
Ticket >>>
a vg tckt
Census
Data Analysis & Data Mining
Pa r e t o
120
100
80
Cluster 0
Cluster 1
Cluster 2
60
40
20
0
% Cuentas
% Suma Saldo
Census
Data Analysis & Data Mining
Arboles de dec isión
S6RQ WpFQLFDV TXH VH XWLOL]DQ FRQ
ILQDOLGDGSUHGLFWLYD\GHFODVLILFDFLyQ
S6H REWLHQH FRPR UHVXOWDGR ´UHJODVµ
TXHH[SOLFDQHOFRPSRUWDPLHQWRGHXQD
YDULDEOH 7$5*(7 FRQ UHODFLyQ D
RWUDV35(',&725$6
S(Q HVWH HMHPSOR VH XWLOL]DQ SDUD
´H[SOLFDUµORVFOXVWHUV
Census
Data Analysis & Data Mining
Algorit m os
S&+$,'&KL6TXDUHG$XWRPDWLF
'HWHFWLRQ
S&57
&ODVVLILFDWLRQ
DQG
5HJUHVVLRQ7UHH
S&4XHVW\RWURV
S,QWHOOLJHQW 0LQHU XWLOL]D XQD
YDULDQWHGH&57
Census
Data Analysis & Data Mining
Arbol de c om port am ient o
Si tiene 4 o más
débitos automáticos y
un saldo > $ 727
entonces su
probabilidad de
pertenecer al cluster 2
es del 99%
Census
Data Analysis & Data Mining
Arbol soc iodem ográfic o
Census
Data Analysis & Data Mining
Mark et Bask et Analysis
S(OSUREOHPD6HWUDWDGHHQFRQWUDU
ODV UHJODV GH DVRFLDFLyQ TXH
RUJDQL]DQ ORV SHGLGRV GH ´WRSSLQJVµ
H[WUD GH XQD SL]]HUtD D SDUWLU GHO
DQiOLVLV GH XQ FRQMXQWR GH WLFNHWVGHYHQWD
Census
Data Analysis & Data Mining
L a t a b l a d e Da t a M i n i n g
S ,GWLFNHW
S &yGLJRGHSURGXFWR
‰
+RQJRV
3HSSHURQL
4XHVR
&HUYH]D
*DVHRVD
2WUDEHELGD
‰
‰
‰
‰
‰
Census
Data Analysis & Data Mining
Pr o p ó s i t o d e M B A
S*HQHUDUUHJODVGHOWLSR
Š I F (SI ) condición ENTONCES (THEN)
r esult ado
S(MHPSOR
‹ 6Lpr oduct o A y pr oduct o C
ENTONCES pr oduct o B
Census
Data Analysis & Data Mining
Tipos de reglas
S8WLOHV DSOLFDEOHV UHJODV TXH
FRQWLHQHQ
EXHQD
FDOLGDG
GH
LQIRUPDFLyQ TXH SXHGHQ WUDGXFLUVH
HQDFFLRQHVGHQHJRFLR
S7ULYLDOHVUHJODV\DFRQRFLGDVHQHO
QHJRFLRSRUVXIUHFXHQWHRFXUUHQFLD
S,QH[SOLFDEOHV
FXULRVLGDGHV
DUELWUDULDVVLQDSOLFDFLyQSUiFWLFD
Census
Data Analysis & Data Mining
Pr o b l e m a s d e l M B A
S/DH[LVWHQFLDGHPXFKRVLWHPVHQHO
VHW
GH
DQiOLVLV
FRPSOLFD
H[SRQHQFLDOPHQWH HO WLHPSR GH
FiOFXOR
S5HVXOWD QHFHVDULR GHILQLU FULWHULRV
SDUDVHOHFFLRQDUODVPHMRUHVUHJODV
S(V LPSRUWDQWH OD FRQVWUXFFLyQ GH
XQDWD[RQRPtDGHSURGXFWRV
Census
Data Analysis & Data Mining
¿Cu á n b u e n a e s u n a r e g l a ?
S0HGLGDVTXHFDOLILFDQDXQDUHJOD
Š Sopor t e
Š Conf ianza
Š Lif t (I mpr ovement )
Census
Data Analysis & Data Mining
So p o r t e
S(VODFDQWLGDGGHWUDQVDFFLRQHV
HQGRQGHVHHQFXHQWUDODUHJOD
Š Ej : “Si A ent onces B” est á pr esent e en
4000 de 10000 t r ansacciones.
Š Sopor t e (A/ B) : 40%
Census
Data Analysis & Data Mining
Co n f i a n za
S&DQWLGDGGHWUDQVDFFLRQHVTXH
FRQWLHQHQODUHJODUHIHULGDDOD
FDQWLGDGGHWUDQVDFFLRQHVTXH
FRQWLHQHQODFOiXVXODFRQGLFLRQDO
Š Ej : Par a el caso ant er ior , si A est á
pr esent e en 6000 t r ansacciones (60%)
Š Conf ianza (A/ B) = 40% / 60% = 66%
Census
Data Analysis & Data Mining
M e j o r a (I m p r o v e m e n t )
S&DSDFLGDGSUHGLFWLYDGHODUHJOD
Š Mej or a = p(A/ B) / p(A) * p(B)
Š Ej :
p(A/ B) = 40% ; p(A) = 60%; p(B) = 30%
I mpr ov (A/ B) = 40% / (60% * 30%) = 2.22
Mayor a 1 : la r egla t iene valor pr edict ivo
Census
Data Analysis & Data Mining
Ej e m p l o d e c á l c u l o
Census
Data Analysis & Data Mining
Da t o s b á s i c o s
+RQJRV
Si
Si
Si
Si
No
No
No
No
TOTAL
3HSSHURQL
Si
Si
No
No
Si
Si
No
No
4XHVR
Si
No
Si
No
Si
No
Si
No
&DQWLGDG
100
400
300
100
200
150
200
550
2000
Census
Data Analysis & Data Mining
Re g l a s
ŒU6Ž  ‘‘“’R”
•”
–  —
Hongos
Pepperoni
Queso
Hongos --> Pepperoni
Hongos --> Queso
Queso --> Pepperoni
Hongos + Pepperoni --> Queso
Hongos + Queso --> Pepperoni
Queso + Pepperoni --> Hongos
900
850
800
500
400
300
100
100
100
Œ”
Ž˜  UŽ ™›šœ—W”6– 
0.45
0.43
0.40
0.25
0.20
0.15
0.05
0.05
0.05
0.56
0.47
0.38
0.20
0.25
0.33
1.31
1.18
0.88
0.80
0.59
0.74
Pueden descartarse por bajo soporte
Reglas significativas
Census
Data Analysis & Data Mining
Ot r o e j e m p l o d e M B A
S/D DVRFLDFLyQ VH SODQWHD HQWUH ORV
WRSSLQJV GH ODV SL]]DV \ ODV
EHELGDV
S/RV JUiILFRV GH UHJODV SHUPLWHQ
YLVXDOPHQWH LGHQWLILFDU UHJODV FRQ
EXHQVRSRUWHFRQILDQ]D\OLIW
Census
Data Analysis & Data Mining
Census
Data Analysis & Data Mining
Re g l a s
Soporte (%)Confianza(%)
3.1746 80.0000 +
16.6667 81.8200 +
13.0688 78.4100 +
16.6667 63.0000 .
29.8413 72.8700 +
29.8413 62.6700 +
13.0688 61.7500 .
9.0476 57.0000 +
3.0159 57.0000 .
6.9312 56.9600 .
9.0476 56.4400 .
Tipo
1.7800
1.7200
1.6500
1.5400
1.5300
1.5300
1.5100
1.4000
1.3900
1.3500
1.3300
Elevación Cuerpo de regla
[Hongos]+[Otra bebida]
[Cerveza]+[Pepperoni]
[Cerveza]+[Queso]
[Hongos]+[Pepperoni]
[Cerveza]
[Hongos]
[Hongos]+[Queso]
[Pepperoni]+[Queso]
[Hongos]+[Pepperoni]+[Queso]
[Hongos]+[Gaseosa]
[Gaseosa]+[Pepperoni]
Cabecera de regla
==>
[Pepperoni]
==>
[Hongos]
==>
[Hongos]
==>
[Cerveza]
==>
[Hongos]
==>
[Cerveza]
==>
[Cerveza]
==>
[Gaseosa]
==>
[Cerveza]
==>
[Queso]
==>
[Queso]
Census
Data Analysis & Data Mining
We b M i n i n g
S(O SUREOHPD VH WUDWD GH DQDOL]DU
ODV WUDQVDFFLRQHV \ HO SHUILO GH ORV
XVXDULRV GH XQ :HE VLWH GH XQ
FRPHUFLRGHYHQWDSRULQWHUQHW
Census
Data Analysis & Data Mining
Modelos aplic ados
S$VRFLDFLyQGHSiJLQDVYLVLWDGDV
FDQDVWDGHSURGXFWRV
S3HUILOGHXVXDULRVFOXVWHULQJ
GHPRJUiILFR
S3RWHQFLDOHVFRPSUDGRUHViUEROGH
GHFLVLyQ
Census
Data Analysis & Data Mining
Asoc iac ión de páginas
žŸ RŸ
¡¢£¥¤
Ÿ¦U§¨ª©«£¬­6®
¯°¦
³
³ ¨t´
µR¶·¸¶
¶¨¶¹
·6·ºµœ¨ ³ ´6·
·
³
³ ¨t´
µR¶·>Ä
´œ¨¶µ6·6·ºµœ¨ ³ ´6·
·
³ ·Q¨ ³
³ ´6Å>´Æ¨$·µ6·6· ³ ¨t´
µ6·
·
ƨtÄ
Ä6·¶ÌÅ
ÅQ¨$¹
Å
·6· ³ ¨t´6Í
·
·
ƨtÄ
Ä6·¶Î´6¹Q¨$Å´6·6· ³ ¨t´6Í
·
·
±6£
²U«­
»¼§
Ÿ
Ÿ½b¨T¾(¢¿R«RÀÂÁ
ÁUÃ
»l®¾­U¢Q¨T¾(¢¿R«RÀ
»l®¾­U¢Q¨T¾(¢¿R«RÀÂÁ
ÁUÃ
»¼§
Ÿ
Ÿ½b¨T¾(¢¿R«RÀ
»WÇ6Ÿ¿U£¦RÈUɧ
­UȾR¯Ÿ¦Z¨T¾(¢¿R«RÀÁ
ÁUà »Ê¿
Ëȯ
®œ¨W¾¢¿R«RÀ
»l®¾­U¢Q¨T¾(¢¿R«RÀÂÁ
ÁUÃb»WÇ6Ÿ¿U£¦RÈ6ɧ
­UȾR¯
Ÿ¦Z¨T¾¢¿R«RÀ
»WÇ6Ÿ¿U£¦RÈUɧ
­UȾR¯Ÿ¦Z¨T¾(¢¿R«RÀÁ
ÁUà »l®¾­U¢Q¨T¾(¢¿R«RÀ
Census
Data Analysis & Data Mining
Low COMMUNI CATI ON
High r evenue
Most ar e male
High AGE
Low FUN
High r at e in REGI ON 6 = Fr ankf ur t
Clust er ing r esult :
Business clust er
Census
10% of all user s
Data Analysis & Data Mining
Most ar e f emale
High COMMUNI CATI ON
Low r evenue
High r at e in REGI ON 5 = Cologne
High FUN
Low AGE
Clust er ing r esult : Fun
clust er
Census
Data Analysis & Data Mining
,) t he int er est in
I NFORMATI ON is ver y low
(near ly 0) $1' in
COMMUNI CATI ON high
(wit h at least an access r at e
of 5) 7+(1 visit or will
pr obably not buy (95.5%).
Classif icat ion r esult
Census
Data Analysis & Data Mining
Se c u e n c i a d e c l i c k s
Ï6ÐÑ Ò
ÓÔÓÒÑ ÕÑÖ ×ÐØ
in 17.2% (of all t r ansact ions) t he
user goes t o GOURMET.ht ml ; he t hen sends t wo
emails out .
Ï6ÐÑ Ò
ÓÔÓÒÑ ÕÑÖ ×ÐØ
in 56.9% (of all t r ansact ions) t he user
goes f ir st SPORTS.ht ml ; he t hen uses t he chat as a
communicat ion medium; f inally, he f ocus his at t ent ion t o
Fashion.
Ï6ÐÑ Ò
ÓÔÓÒÑ ÕÑÖ ×ÐØ
I n 25.9% (of all t r ansact ions) t he
user goes f ir st t o womens-f ashion.ht ml ; he t hen
sends a post car d, and goes t o womens-f ashion.ht ml
back again.
Census
Data Analysis & Data Mining
De t e c c i ó n t e m p r a n a d e
m ora
S(OSUREOHPD6HWUDWDGHLGHQWLILFDU
DQWLFLSDGDPHQWH ORV FOLHQWHV FRQ
PD\RUSRVLELOLGDGGHHQWUDUHQPRUD
SDUD
DQWLFLSDU
ODV
DFFLRQHV
SUHYHQWLYDVGHFREUDQ]D\UHFXSHUR
Census
Data Analysis & Data Mining
Las soluc iones posibles
S5HJODV SDUD LGHQWLILFDU D ORV
VHJPHQWRV GH FOLHQWHV FRQ PD\RU
SURSHQVLyQDPRUD
S6FRULQJGHULHVJRGHPRURVLGDG
Census
Data Analysis & Data Mining
Modelos aplic ables
S3DUD ODV UHJODV iUERO GH
FODVLILFDFLyQ
S3DUDHOVFRULQJPRGHORQHXURQDO
Census
Data Analysis & Data Mining
A r b o l m u e s t r a 5 0 /5 0
Census
Data Analysis & Data Mining
Morosos
Mo ra 6 0 d ia s
R e g ió n 9 0 -9 8
N
Y
9 ,3 9 % d e p o b la c ió n
1
Y
N
VIP C U S TO ME R
LA TE F E E S P A ID
3 0 D AY S
O VE R C R E D IT LIM IT
C R E D IT S C O R E
C U S TO ME R AG E
C R E D IT LIMIT
IN C O M E
M E M B E R (M O N T H S )
# P U R CH AS E S / W E E K
C AS H LIM IT
MO R A 6 0
Y
N
Census
Data Analysis & Data Mining
No Morosos
Mo ra 6 0 d ia s
Re g ió n 0 -2
0
6 ,8 1% d e p o b la c ió n
Y
N
Y
LATE F E E S P AID
3 0 DAYS
VIP CU S TO ME R
O VE R C R E D IT LIMIT
CU S TO ME R AG E
CR E D IT S CO R E
ME MB E R (MO N T H S )
INC O ME
C AS H LIMIT
C R E DIT LIMIT
# P UR CH AS ES / W EE K
MO R A 6 0
Y
N
Census
Data Analysis & Data Mining
Verific ac ión
1.2
1.0
.8
.6
Scoring predicho
.4
.2
0.0
-.2
N=
2947
258
NO
SI
Mora real
El scoring que predice la red está netamente diferenciado
para morosos y pagadores
Census
Data Analysis & Data Mining
Re f e r e n c i a s
· 'DWD0LQLQJ7HFKQLTXHVIRU0DUNHWLQJ6DOHV
DQG&XVWRPHU6XSSRUW0LFKDHO%HUU\*RUGRQ
/LQRII:LOH\86$
· 'DWD0LQLQJZLWK1HXUDO1HWZRUNV-RVHSK
%LJXV0F*UDZ+LOO86$
· 'DWD0LQLQJDKDQGVRQDSSURDFKIRUEXVLQHVV
SURIHVVLRQDOV5REHUW*URWK3UHQWLFH+DOO
86$
· 0DVWHULQJ'DWD0LQLQJ0LFKDHO%HUU\*RUGRQ
/LQRII:LOH\86$
Census
Data Analysis & Data Mining
Re f e r e n c i a s
S 'DWDSUHSDUDWLRQIRU'DWD0LQLQJ'RULDQ3\OH
0RUJDQ.DXIPDQQ3XEOLVKHUV,QF6DQ)UDQFLVFR
86$
S $QiOLVLV0XOWLYDULDQWH+DLU$QGHUVRQ7DWKDP
%ODFN3UHQWLFH+DOO0DGULG
S %XLOGLQJ'DWD0LQLQJDSSOLFDWLRQVIRU&50$
%HUVRQ66PLWK.7KHDUOLQJ0F*UDZ+LOO
Census
Data Analysis & Data Mining
Re f e r e n c i a s
· ,%0
Ù
Ù
Ù
ZZZLEPFRPVRIWZDUHGDWDLPLQHUIRUGDWD
ZZZGPJRUJ
ZZZLEPFRPUHGERRNV
· 7KH'DWD0LQHZZZWKHGDWDPLQHFRP
· .''0LQHZZZNGQXJJHWVFRP
· FKE#FHQVXVFRPDU
Descargar