from PhysioNet, the research resource for complex physiologic signals


Creating PhysioBank (WFDB-compatible) Records

If you have digital recordings of signals or time series, perhaps with annotations, that you would like to study using PhysioToolkit software such as that in the WFDB software package, or that you would like to contribute to PhysioBank, the information on this page should get you started on creating PhysioBank-compatible records from your data.

Many data formats are WFDB-compatible; there is no single "WFDB format". This tutorial will help you determine if your data are already in a WFDB-compatible format, and to choose a suitable WFDB-compatible format if they are not.

If you haven't done so already, install the WFDB software package before continuing.

Terminology

The basic component of a PhysioBank data set is a record, which consists of data that describe a single subject, simulation, or experimental run. Typically, a record contains one or more signals and one or more sets of annotations, together with header information (described below).

In this context, a signal is a time series of measured or calculated samples separated by uniform time intervals (sampling intervals). In PhysioBank-compatible records, samples are represented as 8-, 10-, 12-, 16-, 24-, or 32-bit integers.*. The accompanying header information provides, among much else, the parameters needed to convert the dimensionless integer samples into calibrated physical quantities (such as blood pressures in mmHg, etc.). The sampling frequency of a signal is the number of sampling intervals per second (which may be less than one for infrequently-sampled signals). In most cases, all signals belonging to a record are sampled at the same sampling frequency. If this is not true, then a frame interval must be defined, generally as the least common multiple of the various sampling intervals used in a record, and the frame frequency is the number of frame intervals per second.

Also in this context, an annotation is a label that "points" to a specific sampling interval (or frame interval) in the record (and optionally to a specific signal as well). Each annotation can have a small number of numeric attributes, as well as either a string or a URL, associated with it. Annotations are commonly used in PhysioBank databases to label heart beats, and to record observations and events that do not take place at uniform intervals.

Are your data already in PhysioBank-compatible format?

Many medical device manufacturers have either adopted PhysioBank-compatible formats natively, or provide a means of exporting their proprietary data into a PhysioBank-compatible format. Sometimes, for historical reasons, the term "MIT format" is used to describe a PhysioBank-compatible format. European Data Format (EDF), used widely to store unannotated data, differs from the formats most often used in PhysioBank in that it does not make use of external header files, but it is fully PhysioBank-compatible. The newer EDF+, which is a variant of EDF that incorporates a limited capability for storing annotations, is mostly PhysioBank-compatible (but see this note about EDF+ annotations). BDF and BDF+ (variants of EDF and EDF+ for 24-bit data) are also PhysioBank-compatible.

If you have records that include .hea or .edf files, verify that they are PhysioBank-compatible by trying to read them with wfdbdesc, rdsamp, and (if you have annotation files) rdann.

Most files in common binary formats that use fixed-length samples for storing digitized signals (including many that, like EDF and EDF+, contain embedded metadata at the beginning of the file) are also PhysioBank-compatible signal files. If your data are already in such a format, it may be sufficient to create a header file and, if applicable, an annotation file for each record. See signal(5) for details of supported signal file formats, and header(5) for complete specifications of header file format, with examples.

What's in a record?

Unlike records in relational databases, each PhysioBank-compatible record is stored in its own files. The files belonging to any given record share a record name (usually the initial part of the file names), and are distinguished by suffixes. Record names (needed by WFDB applications to specify their inputs) never include the .hea suffix, but they do include .edf (or other suffixes, if used) when reading EDF (EDF+, BDF, BDF+) files.

For example, record 100 of the MIT-BIH Arrhythmia Database consists of three files, named 100.hea, 100.dat, and 100.atr.

WFDB-compatible records generally contain three types of files:

Header file
This is the only (normally) required element of a record; it consists of a short text file, named with the suffix .hea. Header files specify the sampling frequency, duration, and (optionally) the starting time) of the record, and the name, storage format, sampling frequency, and calibration parameters of each signal, and the name of the signal file in which it is stored. Additional information, which usually includes the age, gender, diagnoses, and medications taken by the subject, can be included in the header file if available. As noted above, header files are not necessary for EDF, EDF+, BDF, and BDF+ files.
Signal file(s)
These binary files generally contain samples only; conventionally, they have names ending in .dat, but this is not required. Records can include a signal file for each available signal, but usually all of the available signals are stored in a single signal file, in which frames containing samples taken from each signal simultaneously are always arranged in the same order and written in sequence. These characteristics permit efficient random access within signal files, since the position of a sample at any given time can be readily calculated. Records consisting entirely of non-periodic observations may lack signal files.
Annotation file(s)
These binary files contain annotations in a highly compact format that requires slightly more than 16 bits per beat label annotation (more for annotations containing strings or URLs). Many records are multiply annotated, either by different observers or with respect to different attributes, and in such cases, each set of annotations is generally stored in its own file. Files with names ending in .atr are conventionally used to store reference annotations that have been manually checked for accuracy. Since annotations are often created long after the respective signal files, having external annotation files permits them to be added to existing records without a need to replace the usually lengthy signal files. Unannotated records don't include annotation files.

Creating signal and header files

If you don't already have PhysioBank-compatible records, an easy way to make them from the data you have is to begin by creating a CSV file containing one sample of each signal per line, as in this example consisting of samples of two ECG signals:

927,998
927,1017
939,1034
958,1048
980,1064
1010,1086
1048,1111
1099,1131
1148,1140
1180,1119
1192,1066
1177,1007
1128,978
1058,974
991,981
951,988
937,987
939,992
950,994
958,994
If you have written your data in this format to a CSV file named foo.csv, create foo.hea and foo.dat using this command:
wrsamp -F freq -i foo.csv -o foo -s, 0 1
replacing freq by the sampling frequency of your signals. The final command line arguments (0 and 1 in the example) specify the columns of the input file that should be written as signals to the output; column 0 is the leftmost, 1 the next, etc. Columns can be omitted, reordered, or duplicated as desired. See wrsamp for details and additional options that can be used if your samples are not 16-bit integers.

Support for 24- and 32-bit integer samples was introduced in WFDB version 10.5.0 (March 2010). Previous versions were limited to resolutions of 16 bits or fewer.

Edit the .hea file using any text editor of your choice to insert signal names and physical units, and calibration parameters. For records to be contributed to PhysioBank, please add, at the end of the file, an info string (a comment line beginning with '#') that describes (at a minimum) the age, gender, diagnoses, and medications of the subject (other information that does not identify the subject is also welcome). Example:

# <age>: 35  <sex>: M  <diagnoses>: (none)  <medications>: (none)
Please use this format to permit indexing software to parse this information reliably. This string may extend over multiple lines if necessary, but begin each such line with '#'.

Creating annotation files

If your records include beat labels or other non-periodic observations, they can be stored in annotation files. The easiest way to do this is to put your non-periodic information into the text format produced by rdann; text in this format can be converted into PhysioBank-compatible annotation files using wrann.

About EDF+ annotations

EDF+ files are, as noted above, mostly compatible with WFDB applications. Current versions of the WFDB library do not read EDF+ annotations directly, however; it is necessary to extract them from the EDF+ file and rewrite them into a conventional PhysioBank-compatible annotation file in order to read them with WFDB applications. This can be done easily using rdedfann(1) and wrann(1).