Data Set IVa for the BCI Competition III

Data set IVa ‹motor imagery, small training sets›

Data set provided by Fraunhofer FIRST, Intelligent Data Analysis Group (Klaus-Robert Müller, Benjamin Blankertz), and Campus Benjamin Franklin of the Charité - University Medicine Berlin, Department of Neurology, Neurophysics Group (Gabriel Curio)

Correspondence to Benjamin Blankertz ⟨benjamin.blankertz@tu-berlin.de⟩

The Thrill

When taking a machine learning approach to Brain-Computer Interfacing, one has to have labelled training data to teach the classifer. To this end, the user usually performs a boring calibration measurement before starting with BCI feedback applications. One important objective in BCI research is to reduce the time needed for the initial measurement. This data set poses the challenge of getting along with only a little amount of training data. One approach to the problem is to use information from other subjects' measurements to reduce the amount of training data needed for a new subject. Of course, competitors may also try algorithms that work on small training sets without using the information from other subjects.

Experimental Setup

This data set was recorded from five healthy subjects. Subjects sat in a comfortable chair with arms resting on armrests. This data set contains only data from the 4 initial sessions without feedback. Visual cues indicated for 3.5 s which of the following 3 motor imageries the subject should perform: (L) left hand, (R) right hand, (F) right foot. The presentation of target cues were intermitted by periods of random length, 1.75 to 2.25 s, in which the subject could relax.
There were two types of visual stimulation: (1) where targets were indicated by letters appearing behind a fixation cross (which might nevertheless induce little target-correlated eye movements), and (2) where a randomly moving object indicated targets (inducing target-uncorrelated eye movements). From subjects al and aw 2 sessions of both types were recorded, while from the other subjects 3 sessions of type (2) and 1 session of type (1) were recorded.

Format of the Data

Given are continuous signals of 118 EEG channels and markers that indicate the time points of 280 cues for each of the 5 subjects (aa, al, av, aw, ay). For some markers no target class information is provided (value NaN) for competition purpose. Only cues for the classes 'right' and 'foot' are provided for the competition. The following table shows the respective number of training (labelled) trials "#tr" and test (unlabelled) trials "#te" for each subject.

#tr #te

aa 168 112

al 224 56

av 84 196

aw 56 224

ay 28 252

	#tr	#te
aa	168	112
al	224	56
av	84	196
aw	56	224
ay	28	252

Data are provided in Matlab format (*.mat) containing variables:

cnt: the continuous EEG signals, size [time x channels]. The array is stored in datatype INT16. To convert it to uV values, use cnt= 0.1*double(cnt); in Matlab.
mrk: structure of target cue information with fields
- pos: vector of positions of the cue in the EEG signals given in unit sample, length #cues
- y: vector of target classes (1, 2, or NaN), length #cues
- className: cell array of class names.
info: structure providing additional information with fields
- name: name of the data set,
- fs: sampling rate,
- clab: cell array of channel labels,
- xpos: x-position of electrodes in a 2d-projection,
- ypos: y-position of electrodes in a 2d-projection.

As alternative, data is also provided in zipped ASC II format (splitted into three files for each subject):

*_cnt.txt: the continuous EEG signals, where each row holds the values for all channels at a specific time point
*_mrk.txt: target cue information, each row represents one cue where the first value defines the time point (given in unit sample)
*_nfo.txt: contains other information as described for the matlab format.

Requirements and Evaluation

Please provide for each subject an ASC II file (named 'result_IVa_aa.txt', 'result_IVa_al.txt', ...) containing 280 lines of your estimated class labels (1 or 2) for every cue. (For training trials this should be the respective value of mrk.y, and for test trials the output of your algorithm.)
You also have to provide a description of the used algorithm (ASC II, HTML or PDF format) for publication at the results web page.

The performance measure is the overall classification accuracy (number of correct classified test trials divided by the total number of test trials).

Technical Information

The recording was made using BrainAmp amplifiers and a 128 channel Ag/AgCl electrode cap from ECI. 118 EEG channels were measured at positions of the extended international 10/20-system. Signals were band-pass filtered between 0.05 and 200 Hz and then digitized at 1000 Hz with 16 bit (0.1 uV) accuracy. We provide also a version of the data that is downsampled at 100 Hz (by picking each 10th sample) that we typically use for analysis.

References

Guido Dornhege, Benjamin Blankertz, Gabriel Curio, and Klaus-Robert Müller. Boosting bit rates in non-invasive EEG single-trial classifications by feature combination and multi-class paradigms. IEEE Trans. Biomed. Eng., 51(6):993-1002, June 2004.

Note that the above reference describes an older experimental setup. A new paper analyzing the data sets as provided in this competition and presenting the feedback results will appear soon.

[ BCI Competition III ]