Core CCDEF Groups¶

These are the principal groups in a CCDEF HDF5 file. The root group is obviously mandatory but valid files can be created with physiologic data (Numerics and/or Waveforms) only or with clinical data only.

The Research and Reference groups are strictly optional and are not parsed by any of the viewer or analysis tools at this time.

Root (/)¶

The root group is the top level of the file, it contains a series of other high level groups with physiologic and clinical data.

Root Group Metadata¶

The .meta attribute contains information about the file.

/.meta
    {
        "title": "...",
        "author": "...",
        "organization": "...",
        "time_origin": "2020-41-17 15:41:22.306880 EST",
        "ccdef_version": 1.0
    }

Root Group Datasets¶

mapping¶

One of the key issues in data sharing is the ability to seamlessly ingest data from multiple sites into the end users’ application without having to develop site specific pipelines. This is complicated by the fact that collect sites will have various local factors that influence the way they store data, based on equipment, etc. The data model is self-describing so it is fairly straight forward for end users to locate and extract the data that they need, but to add an additional level of interoperability we provide a mapping table for common key signal names that allow a user to directly (or using library functions) search and access the information they require.

signal	dataset	local_name	column	category	loinc	loinc_name
HR	/numerics/vitals	HR	1	numerics	8867-4	Heart rate
NIBP-S	/numerics/vitals	NBP Sys	3	numerics	76534-7	BP sys by Noninvasive
NIBP-D	/numerics/vitals	NBP Dias	4	numerics	76535-4	BP dias by Noninvasive
NIBP-M	/numerics/vitals	NBP Mean	5	numerics	76536-2	BP mean by Noninvasive
RR	/numerics/vitals	RESP	6	numerics	76174-2	Resp rate PulseOx.pleth
SPO2	/numerics/vitals	SpO2	7	numerics	76522-2	Transmission % BldA PulseOx
ECG-II	/waveforms/hemodynamics	II	1	waveforms	None	ECG Lead II
ECG-III	/waveforms/hemodynamics	III	2	waveforms	None	ECG Lead III
ECG-V	/waveforms/hemodynamics	V	4	waveforms	None	ECG Lead V
RR	/waveforms/hemodynamics	RESP	5	waveforms	76174-2	Resp rate PulseOx.pleth

Additonal examples of the mapping functionality are shown here.

Mapping Table Field Descriptions¶

mapping dataset

The mapping table describes parameters in terms of LOINC (link) to provide further standardization and clarity as to the nature of the information. The mapping table also provides the group, dataset and column (if the dataset is tabular)

Parameters

signal (str) – The CCDEF standard signal name (see signal names)
dataset (str) – The name of the dataset containing the signal (can be accessed directly as f[‘/Group/’+dataset])
local_name (str) – The original name of the signal in the datafile. This will generally be the dataset name if multiple datasets are used or it will be the column name in a tabular dataset.
column (int) – The number of the column in dataset containing the signal (default is 1 for a single column dataset)
category (str) – {waveform, numeric}
LOINC (str) – The LOINC for the signal of interest
loinc_name (str) – The LOINC short name (if it exists) for the signal

Note

A number of waveform signals do not currently have assigned LOINC identifiers but additions are being proposed to address this.

Future Mapping Possibilities¶

Future versions of CCDEF may include additional ontologies such as OMOP in the mapping table.

Numerics¶

The numerics group contains signals at a sample rate of less than 50 Hz. In cases where the numeric data consist primarily of vitals signs as recorded simultaneously from the bedside monitor, these data can be stored in a single tabular dataset called Vitals. In cases where there are multiple datasources (eg a ventilator, telemetry modules, etc), some combination of tabular and single datasets may be more apppropriate. In all cases, the mapping dataset is the recommended way to ensure that key parameters are easily located by end users.

Numerics Datasets¶

These can be tabular, single channel or a combination of both as described in detail here.

Typical parameters include:

Invasive BP (ABP)
Non-invasive BP (NIBP)
SpO2
HR
ICP
CVP
Temperature
Respiratory Rate

Waveforms¶

The Waveforms group contains data that is recorded at frequencies typically 50-500 Hz. There is generally more variabiltiy in the sample rates for different waveform signals, particularly if they are derived from different sources (eg bedside monitor, ventilator, etc).

Waveform Datasets¶

The most common datasets will be cardiorespiratory measurements conisting of:

ECG leads
SpO2
ABP

Once again, these can be tabular, single channel or a combination as described in detail here.

Clinical¶

The clinical group contains a variety of information extracted from the EMR and other sources, generally excluding monitor data.

As there are a wide range of EMR data extraction pipelines, it is difficulty to completely standardize this group but we provide some high level guidance. Perhaps the greatest challenge within the clinical data is mapping concepts such as interventions and clinical observations.

Common ontologies for clinical concepts is an active area of research and is one of the goals of the OMOP-CDM and we aim to support this within the ccdef standard as it evolves.

Demographics¶

Demographic information about the patient is stored as an optional attribute to the clinical group.

demographics attribute (/.demographics)

Parameters

age (float) – Patient age in years (fractional years allowed)
gender (str) – patient gender {M,F}
expired (int) – value: {0,1} 0 indicating that the patient did not die during the period covered by the file
admit_dx (str) – admission ICD 9 code (note that a full list of diagnostic codes can also be specified in /clinical/diagnosis

The resulting JSON formatted attribute looks like this:

/.demographics
    {
        "age": 40.1,
        "gender": "M",
        "expired": 0
    }

Clinical Timestamps¶

Clinical data tend to be much sparser than physiologic data and therefore timestamps will typically be included in these datasets. The prefered method is a time column with seconds from the time_orgin.

Note

If no base_datetime is specified in the clinical datasets, the time orgin for the file in the root group metadata will be used (/.meta).

Clinical Datasets¶

Suggested Clinical Datasets Include:

labs
micro
notes (EMR notes)
diagnosis

Imaging if available would be in a separate group /Clinical/Imaging

labs dataset

The labs dataset contains time stamped laboratory data such as chemistry, hematology, etc

Parameters

time (int) – seconds elapsed from base_datetime
test_id (int) – the test identifier (this may link to the .test_info attribute)
value (str) – the value of the test as a string
test_name (str ,optional) – the name of the test

micro dataset

The micro dataset contains time stamped microbiolgy data from a variety of sources (eg blood, urine, CSF, tissue) Note that there may be multiple time fields with relevant information as the time from sample collection to result can be clinicaly relevant. Caution is advised however in that these values may not always be entirely accurate as they often result from manual data entry.

Parameters

time (int) – seconds elapsed from base_datetime
test_id (int) – the test identifier (this may link to the .test_info attribute)
value (str) – the value of the test as a string
test_name (str ,optional) – the name of the test

notes dataset

The notes dataset includes clinical notes from the EMR.

Parameters

time (int) – seconds elapsed from base_datetime
test_id (int) – the test identifier (this may link to the .test_info attribute)
value (str) – the value of the test as a string
test_name (str ,optional) – the name of the test

diagnosis dataset

The diagnosis dataset is a list of diagnostic codes applicable to the patient stay described by the file.

Parameters

dxcode (str) – diagnostic code
dxname (str) – diagnosis text (optional)

Note

The default coding scheme is ICD 9 but this will be specified in the meta data for the diagnostic dataset as shown here

/clinical/diagnosis/coding = "ICD 9"

Clinical Dataset Metadata¶

Information about tests can be stored in .test_info,

.test_info metadata attribute

Parameters

label (str) – name of the test
category (str) – type of test (eg chemistry, blood gas)
fluid (str) – fluid used for test (eg: blood, urine, CSF)
valueuom (str) – units of measurement for the test
loinc_code (str) – the loinc for the test (eg ‘718-8’)

Files converted from MIMIC III will have a JSON formatted string like this:

/clinical/labs.test_info
    {'50809': {
        'label': 'Glucose',
        'category': 'Blood Gas',
        'fluid': 'Blood',
        'valueuom': 'mg/dL',
        'loinc_code': '2339-0'},
    '50810': {
        'label': 'Hematocrit, Calculated',
        'category': 'Blood Gas',
        'fluid': 'Blood',
        'valueuom': '%',
        'loinc_code': '20570-8'},
    '50811': {
        'label': 'Hemoglobin',
        'category': 'Blood Gas',
        'fluid': 'Blood',
        'valueuom': 'g/dL',
        'loinc_code': '718-7'},
    '50813': {
        'label': 'Lactate',
        'category': 'Blood Gas',
        'fluid': 'Blood',
        'valueuom': 'mmol/L',
        'loinc_code': '32693-4'},
    '50816': {
        'label': 'Oxygen',
        'category': 'Blood Gas',
        'fluid': 'Blood',
        'valueuom': '%',
        'loinc_code': '19994-3'},
    '50817': {
        'label': 'Oxygen Saturation',
        'category': 'Blood Gas',
        'fluid': 'Blood',
        'valueuom': '%',
        'loinc_code': '20564-1'},
    '50818': {
        'label': 'pCO2',
        'category': 'Blood Gas',
        'fluid': 'Blood',
        'valueuom': 'mm Hg',
        'loinc_code': '11557-6'},
    '50819': {
        'label': 'PEEP',
        'category': 'Blood Gas',
        'fluid': 'Blood',
        'valueuom': None,
        'loinc_code': '20077-4'},
    '50820': {
        'label': 'pH',
        'category': 'Blood Gas',
        'fluid': 'Blood',
        'valueuom': 'units',
        'loinc_code': '11558-4'},
    }

Research¶

The research group is an optional group with no specific format. It is intended primarily to support files used in trials and can contain trial specific information such as randomization, group assignment, etc.

References¶

The reference group is also optional. The main purpose of this group is to include links (refered to as references in HDF5) to regions of interest within files or external links to other files.