This is a place for instructor and students to record difficulties encountered during the course.

Resources

Tasks

  • pipibjc is going to write a matlab2libsvm.m

Progress

Week 13

Week 12

  • One group starts to study others' approach

Week 11

  • Ma submits overall average

I have submit the overall average (3.60429, similar to 3.605 in the subset) to Netflix. The leaderboard will only show results that outperform Cinematch (<0.9514), but the RMSE of quiz subset will be sent by email.

The RMSE of overall-average prediction is 1.1309.

Week 10

  • zao studies KDD Cup 2007 rules

Week 9

  • pipibjc studies how to use rmse.pl
  • two groups study two proposed approach

Discussions

  • zao group proposes k-means to grouping users then tries nmf
    • take movie quality into consideration
  • ma group proposes the following
    • movie classification to generate missing values in the matrix, so we can calculate the distance measure
    • use people's favor to refine movie classification results (active learning) (optional)
    • kNN

Discussion of Feature Selection Competition

  • DOROTHEA: setting weighted penalty term helps

Our Results on Feature Selection Competition

Result Matrix

studentARCENEDEXTERDOROTHEAGISETTEMADELON
ma 9.45 3.65 10.51 1.32 6.28
oyster 13.16 4.1 8.82 1.32 6.72
pipibjc 10.22 3.80 12.05 1.43 6.83
zao 10.12 3.65 12.04 1.09 6.89

Approach

  • ARCENE
studentapproach
ma my_svc=svc({'coef0=6.069413', 'degree=3', 'gamma=0', shrinkage=0.1'}); my_model=chain({relief('f_max=1378'),normalize,my_svc});
oyster my_svc=svc({'coef0=2', 'degree=0', 'gamma=1.0', 'shrinkage=0.1'}); my_model=chain({normalize, pc_extract('f_max=130'), my_svc});
pipibjc my_svc=svc({'coef0=2','degree=1','gamma=3','shrinkage=0.1'}); relief_chain=chain({normalize,relief({'f_max=1400'})}); s2n_chain=chain({standardize,s2n({'f_max=900'})}); feat_sel=combine_feat({relief_chain,s2n_chain}); my_model=chain({feat_sel,normalize,my_svc});
zao my_svc=svc({'coef0=2', 'degree=3', 'gamma=0', 'shrinkage=0.1'}); my_model=chain({relief('f_max=1600'),normalize,relief('f_max=1400'),normalize,my_svc});
  • DEXTER
studentapproach
ma my_svc=svc({'coef0=1.702212', 'degree=2', 'gamma=0', 'shrinkage=0.5'}); my_model=chain({s2n('f_max=2185'),normalize,my_svc});
oyster my_svc=svc({'coef0=0', 'degree=0', 'gamma=0.9', 'shrinkage=0.1'}); my_model=chain({relief('f_max=3425'), normalize, my_svc });
pipibjc my_svc=svc({'coef0=1','degree=1','gamma=0','shrinkage=0.5'}); relief_chain=relief({'f_max=2500'}); s2n_chain=s2n({'f_max=4500'}); feat_sel=combine_feat({relief_chain,s2n_chain}); my_model=chain({normalize,feat_sel,my_svc});
zao my_classif=svc({'coef0=1', 'degree=1', 'gamma=0', 'shrinkage=0.5'}); my_model=chain({s2n('f_max=6000'), normalize, s2n('f_max=4500'), normalize, s2n('f_max=2500'), normalize, my_classif});
  • DOROTHEA
studentapproach
ma my_svc=svc({'coef0=0', 'degree=0', 'gamma=4.2', 'shrinkage=0.05', 'C=1'}); my_model=chain({TP('f_max=15000'), normalize, relief({'f_max=715','k_num=28'}),my_svc, bias});
oyster my_classif=svc({'coef0=1', 'degree=0', 'gamma=2.6', 'C=0.25', 'w1=10'}); my_model=chain({TP('f_max=15000'), normalize, relief('f_max=700'), my_classif, bias});
pipibjc s2n_chain=chain({normalize,s2n({'f_max=1500'})}); my_model=chain({s2n_chain,naive,bias});
zao my_model=chain({normalize,TP('f_max=15000'),normalize,relief('f_max=700'),naive, bias});
  • GISETTE
studentapproach
ma my_svc=svc({'coef0=0', 'degree=0', 'gamma=0.87', 'shrinkage=0.7'}); my_model=chain({s2n('f_max=1000'), pc_extract('f_max=50'), normalize, my_svc});
oyster my_classif=svc({'coef0=1', 'degree=3', 'gamma=0', 'shrinkage=1'}); my_model=chain({ relief('f_max=1940'), normalize, relief('f_max=1000'), normalize, my_classif});
pipibjc my_model=chain({relief('f_max=1000'),normalize,knn({'k=5'})});
zao my_classif=svc({'coef0=1', 'degree=3', 'gamma=0', 'shrinkage=1'}); my_model=chain({relief('f_max=2000'), s2n('f_max=1000'), normalize, my_classif});
  • MADELON
studentapproach
ma my_svc=svc({'coef0=0', 'degree=0', 'gamma=0.2091', 'shrinkage=0.5741'}); my_model=chain({relief({'f_max=20'}), standardize, my_svc});
oyster my_classif=svc({'coef0=1', 'degree=0', 'gamma=0.3', 'shrinkage=0.5'}); my_model=chain({relief('f_max=210'), normalize, relief('f_max=20'), standardize, my_classif});
pipibjc my_svc=svc({'coef0=1','degree=2','gamma=0.3','shrinkage=0.3'}); relief_chain=chain({normalize,relief({'f_max=20'})});
zao my_classif=svc({'C=1', 'gamma=0.35', 'degree=0', 'shrinkage=1'}); my_model=chain({relief('f_max=100'), normalize, relief('f_max=20'), standardize, my_classif});

Week 8

Problems

  • oyster claims using only training data is better (first two sets)
  • ma claims that for DOROTHEA, you need bias + TP
  • For DOROTHEA, only training is used

Week 7

Tasks

  • refine your submission
  • submit the rest three datasets

Problems

  • Why RBF kernel tends to predict majority class when gamma goes to infinity?

Week 5

Tasks

  • randomNN
  • What exactly probe does?
  • Each one submits one prediction for arcene+dexter

Problems

  • RandomForest is windows only
  • gridsel issues

Week 4

Tasks

  • Why can we use full tree in random forest?
  • Discuss your strategy?

Problems

Week 3

Tasks

  • study five preprocessors (zao), five additional feature selection methods (oyster), five additional feature selection methods (Ma, pipibjc)
  • how they come up with best entry, for example higher validation ber ⇒ higher test ver

Problems

  • missing range.m (from Statistics toolbox)
function y = range(x)
  y = max(x)-min(x);
  • We need DataNCode.zip from the link on their page
  • Some BER values on Table 3 is different from Score/Simple-22-Apr-2006.score (DataNCode.zip). For example, DEXTER or GISETTE.

Week 2

Tasks

  • able to reproduce their baseline and best entries

Problems

--- Makefile_orig.orig  2005-05-12 20:28:32.000000000 +0800
+++ Makefile_orig       2007-03-03 20:07:22.000000000 +0800
@@ -1,11 +1,12 @@
 # This Makefile is used under Linux

 MATLABDIR ?= /usr/local/matlab
-CXX = g++
+CXX = g++-3.3
 CFLAGS = -Wall -O3 -fPIC -I$(MATLABDIR)/extern/include

 MEX = $(MATLABDIR)/bin/mex
 MEX_OPTION = CC\#$(CXX) CFLAGS\#"-Wall -O3 -fPIC"
+MEX_OPTION += -largeArrayDims

 all:    svmpredict svmtrain
--- svm_model_matlab.c.orig     2005-04-25 19:54:28.000000000 +0800
+++ svm_model_matlab.c  2007-03-03 20:06:44.000000000 +0800
@@ -4,6 +4,10 @@

 #include "mex.h"

+#if MX_API_VER < 0x07030000
+typedef int mwIndex;
+#endif
+
 #define NUM_OF_RETURN_FIELD 10

 static const char *field_names[] = {
@@ -112,7 +116,8 @@

        // SVs
        {
-               int *ir, *jc, ir_index, nonzero_element;
+               int ir_index, nonzero_element;
+               mwIndex *ir, *jc;
                mxArray *pprhs[1], *pplhs[1];

                nonzero_element = 0;
@@ -256,7 +261,7 @@
        {
                int sr, sc, elements;
                int num_samples;
-               int *ir, *jc;
+               mwIndex *ir, *jc;
                mxArray *pprhs[1], *pplhs[1];

                // transpose SV

> diff -u svmtrain.c.orig svmtrain.c

--- svmtrain.c.orig     2005-09-23 13:56:22.000000000 +0800
+++ svmtrain.c  2007-03-03 20:07:47.000000000 +0800
@@ -7,6 +7,10 @@
 #include "mex.h"
 #include "svm_model_matlab.h"

+#if MX_API_VER < 0x07030000
+typedef int mwIndex;
+#endif
+
 #define CMD_LEN 2048
 #define Malloc(type,n) (type *)malloc((n)*sizeof(type))

@@ -258,7 +262,7 @@
 void read_problem_sparse(const mxArray *label_vec, const mxArray *instance_mat)
 {
     int i, j, k, low, high;
-    int *ir, *jc;
+    mwIndex *ir, *jc;
     int elements, max_index, num_samples;
     double *samples, *labels;
     mxArray *instance_mat_tr; // transposed instance sparse matrix
  • sample_code/main.m: remove all calls to upper()
 
dmcase2007.txt · Last modified: 2007/05/24 10:17 (external edit)     Back to top