Skip to content


MIMIC is a publicly available dataset developed by the Laboratory for Computational Physiology that comprises deidentified health data associated with thousands of intensive care unit admissions. The dataset is widely used by investigators and engineers around the world, helping to drive research in clinical informatics, epidemiology, and machine learning.

In the early 2000s, with colleagues from academia (Massachusetts Institute of Technology), industry (Philips Medical Systems), and clinical medicine (Beth Israel Deaconess Medical Center, BIDMC) we received NIH (National Institutes of Health) funding to launch the project “Integrating Signals, Models and Reasoning in Critical Care”, a major goal of which was to build a massive critical care research database. The study was approved by the Institutional Review Boards of BIDMC (Boston, MA) and MIT (Cambridge, MA).

We set out to collect comprehensive clinical and physiologic data from all ICU patients admitted to the multiple adult medical and surgical ICUs of our hospital (BIDMC). Each patient record began at ICU admission and ended at final discharge from the hospital. The data acquisition process was continuous and invisible to staff and it did not impact the care of patients or methods of monitoring.

Three categories of data were collected: clinical data, which were aggregated from ICU information systems and hospital archives; high-resolution physiological data (waveforms and time series of vital signs and alarms obtained from bedside monitors); and death data from Social Security Administration Death Master Files.

These data were de-identified in compliance with Health Insurance Portability and Accountability Act standards, and restructured to facilitate reuse in scientific research. The assembled dataset, MIMIC, is shared via the PhysioNet. To restrict users to legitimate medical researchers, access to the clinical database requires completion of a simple data use agreement (DUA) and proof that the researcher has completed human subjects training.

The MIMIC database is a powerful and flexible research resource, but the generalizability of MIMIC-based studies is somewhat limited by the fact that the data are collected from a single institution. Multi-center data would have the advantages of including wider practice variability, and of course a larger number of cases. Data from international institutions would add still greater strength to the database owing to the even larger variations in practice and patient populations.

Our long-term goal is to expand our work to enable incorporate data from multiple institutions, capable of supporting research on cohorts of critically ill patients from around the world. In addition, we seek to expand the data to new modalities, including x-ray images and echocardiograms.

The development and maintenance of MIMIC has been funded by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) and the National Institute of General Medical Sciences (NIGMS) over the period 2003 to present. Grants R01EB1659, R01EB017205, R01GM104987, and U01EB008577.