The Gene Dataset

The Gene--problem is concerned with the classification of nucleotid seqences of 60 elements each. The nucleotids were represented by four--valued nomial attributes which are coded by 2 bits. Thus, there are input vectors with 120 bits in total. The classification task is to decide whether the middle of the sequence is either an intron/exon or an exon/intron boundary or none of these. There are 3 outputs for the different classes.

We use a fully--interconnected MFN (including shortcuts) with 10 hidden units. The topology of the net can be described as 120-10-3. So we used 120 Units in the input layer, 10 units in the hidden layer and 3 units in the output layer.

The training set consists of 1588 patterns and the testing set of 1587 patterns.

For each training algorithm we have done 10 runs. Before training, the networks were initialized by random numbers drawn from a normal distribution with zero mean and a standard deviation of 0.01. Thus, each algorithm had equal suppositions.

We divide our competition into two parts of training schemes



Authors: Merten Joost and Wolfram Schiffmann

last change: 20.3.97

Back to the Home Page of the BackProp Competition