The 10th Pacific-Asia
Conference on Knowledge Discovery and Data Mining (PAKDD 2006) is
pleased to host a data mining competition, co-organized by the
Singapore Institute of Statistics (SIS) and the Pattern Recognition
& Machine Intelligence Association (PREMIA) of Singapore.
An Asian telco operator which has
successfully launched a third generation (3G) mobile
telecommunications network would like to make use of existing
customer usage and demographic data to identify which customers are
likely to switch to using their 3G network.
An original sample dataset of 20,000
2G network customers and 4,000 3G network customers has been provided
with more than 200 data fields.
The target categorical variable is “Customer_Type” (2G/3G). A
3G customer is defined as a customer who has a 3G Subscriber Identity
Module (SIM) card and is currently using a 3G network compatible
Three-quarters of the dataset (15K
2G, 3K 3G) will have the target field available and is meant to be
used for training/testing. The remaining portion (5K 2G, 1K 3G) will
be made available with the target field missing and is meant to be
used for prediction.
The data mining task is a
classification problem for which the objective is to accurately
predict as many current 3G customers as possible (i.e. true
positives) from the “holdout” sample provided.