Method used for the "self-paced 1s" task of the 2003 BCI competition.

Radford Neal, 28 April 2003

My approach was to use Bayesian logistic regression on a feature set
chosen by exploratory data analysis, with some prior information
incorporated.

I used the 1000Hz data, sometimes with a low-pass filter applied
(cutting off at about 40 Hz, eliminating the 50Hz line contamination),
and sometimes with a band-pass filter applied (about 4Hz to 40Hz).

On the basis of exploratory data analysis, looking primarily at
correlations with the class labels, I produced 188 features from the
data that appeared as if they could be useful in predicting the class.
Most of these features consisted of "reduced" numbers associated with
each of the 28 channels, though sometimes some channels were omitted.

The features fall into three groups.  Time domain features consisted
of the average of the last 100ms of each channel (note that 100ms is a
multiple of the 20ms period of the line contamination), the average of
the first 100ms, and the slope of a least-squares line fit to the
low-pass filtered data.  As frequency domain features, I used the
estimated spectral density at 20Hz for each channel.  Finally, I used
several forms of correlational features, which looked at the
correlation between the band-pass filtered data in different channels.
One such feature set consisted of the squares of the components of the
normalized eigenvector associated with the largest eigenvalue of the
correlation matrix.  Another feature set was the correlations of all
except the "z" channels with the Fz channel.  Finally, I looked at the
correlations with C3 and C4 and of the FC and C channels except C3 and C4.

These features were transformed by taking sums and differences with
opposite channels (data for "z" channels was unchanged).  This does
not change the information present, but allows for inclusion of more
meaningful prior information.  This prior information took the form of
varying scaling for the features.  All were shifted to have mean zero,
and then scaled to have a standard deviation that was related to how
relevant I thought they were likely to be.  I made differences more
relevant than sums and "z" data, "F" and "O" channels less relevant
than "FC", "C", and "CP" channels, and C3/C4 most relevant of all.
The exact scaling factors were chosen partly on the basis of
validation runs (with 216 training and 100 validation trials), since a
fully Bayesian approach with a complex hierarchical prior would have
required considerable programming to implement.

The final model used was simple Bayesian logistic regression on these
188 features, with a single hyperparameter controlling the variance of
the regression coefficients, which was the same for all features
(though the scaling above produces the effect of different variances
for different features).  Computation was done by Markov chain Monte
Carlo.  Other, more complex, models were tried but were not found to
be better on the validation set, nor in terms of their internal
predictions for accuracy.

Based on the predictive probabilities produced by the model, my best
guess is that I will achieve 87% accuracy on the test set.  However,
with only 100 test cases, luck will play a big role.  I would not be
surprised if the accuracy is as low as 81% or as high as 91%.