Method used for the "self-paced 1s" task of the 2003 BCI competition. Radford Neal, 28 April 2003 My approach was to use Bayesian logistic regression on a feature set chosen by exploratory data analysis, with some prior information incorporated. I used the 1000Hz data, sometimes with a low-pass filter applied (cutting off at about 40 Hz, eliminating the 50Hz line contamination), and sometimes with a band-pass filter applied (about 4Hz to 40Hz). On the basis of exploratory data analysis, looking primarily at correlations with the class labels, I produced 188 features from the data that appeared as if they could be useful in predicting the class. Most of these features consisted of "reduced" numbers associated with each of the 28 channels, though sometimes some channels were omitted. The features fall into three groups. Time domain features consisted of the average of the last 100ms of each channel (note that 100ms is a multiple of the 20ms period of the line contamination), the average of the first 100ms, and the slope of a least-squares line fit to the low-pass filtered data. As frequency domain features, I used the estimated spectral density at 20Hz for each channel. Finally, I used several forms of correlational features, which looked at the correlation between the band-pass filtered data in different channels. One such feature set consisted of the squares of the components of the normalized eigenvector associated with the largest eigenvalue of the correlation matrix. Another feature set was the correlations of all except the "z" channels with the Fz channel. Finally, I looked at the correlations with C3 and C4 and of the FC and C channels except C3 and C4. These features were transformed by taking sums and differences with opposite channels (data for "z" channels was unchanged). This does not change the information present, but allows for inclusion of more meaningful prior information. This prior information took the form of varying scaling for the features. All were shifted to have mean zero, and then scaled to have a standard deviation that was related to how relevant I thought they were likely to be. I made differences more relevant than sums and "z" data, "F" and "O" channels less relevant than "FC", "C", and "CP" channels, and C3/C4 most relevant of all. The exact scaling factors were chosen partly on the basis of validation runs (with 216 training and 100 validation trials), since a fully Bayesian approach with a complex hierarchical prior would have required considerable programming to implement. The final model used was simple Bayesian logistic regression on these 188 features, with a single hyperparameter controlling the variance of the regression coefficients, which was the same for all features (though the scaling above produces the effect of different variances for different features). Computation was done by Markov chain Monte Carlo. Other, more complex, models were tried but were not found to be better on the validation set, nor in terms of their internal predictions for accuracy. Based on the predictive probabilities produced by the model, my best guess is that I will achieve 87% accuracy on the test set. However, with only 100 test cases, luck will play a big role. I would not be surprised if the accuracy is as low as 81% or as high as 91%.