Introduction

Our team has developed an automated deep-learning model that can identify multiple end-systolic (ES) and end-diastolic (ED) frames in echocardiographic videos of arbitrary length — with performance comparable to human experts, but at a fraction of the time.

Echocardiogram Example
Example echocardiographic video used for model training and evaluation.


Dataset

We used three datasets in this study: one for training/testing, and two for testing only.
Our own UNITY EchoPhase dataset is publicly available and serves as a benchmark for future studies.

It contains 1,000 anonymised echocardiographic videos annotated by two cardiology experts, captured between 2010–2020 from the Imperial College Healthcare NHS Trust archives.

Access UNITY EchoPhase Dataset →


Dataset Summary

Attribute UNITY EchoPhase
Source NHS Trust PACS Archives, Imperial College Healthcare
Ultrasound Machines Philips iE33 xMATRIX, GE Vivid series
Videos 1,000 (2010–2020)
Length 1–3 heartbeats
Annotations 2 independent expert annotations (ED & ES)
Resolution Downsampled to 112×112 pixels
Frame Rate 23–102 fps
Format DICOM → AVI
Use Training & Testing

Network Architecture

We treat echocardiographic videos as visual time-series and employ a CNN–LSTM hybrid model to capture both spatial and temporal dependencies.

Architecture Overview
Overview of the CNN-LSTM framework used for multibeat phase detection.

Key Components

  1. CNN Encoder – Extracts spatial features from individual frames using backbones such as ResNet50, InceptionV3, DenseNet, and InceptionResNetV2.
  2. LSTM Decoder – Captures temporal relationships across sequential frames to understand cardiac motion.
  3. Regression Head – Predicts the probability of ED and ES events for each frame.

Implementation

The models were implemented using TensorFlow 2.0, trained on an NVIDIA GeForce GTX 1080 Ti GPU.

Training configuration:

  • Epochs: 70
  • Batch size: 2
  • Data split: 60% training, 20% validation, 20% testing
  • Augmentation: Random rotations (±10°), spatial cropping (≤10 px)
  • Windowing: 30-frame sliding window, stride 1

A two-layer LSTM stack was found optimal across all experiments.

View Code on GitHub →


Evaluation Metrics

Model performance was measured using the Average Absolute Frame Difference (aaFD) — the difference in frames between predicted and true ED/ES events.

aaFD Metric
Average Absolute Frame Difference (aaFD) used for temporal error evaluation.


Results

PACS Dataset

Manual ED/ES annotation by human experts took an average of 26 ± 11 seconds per event.
Our model reduced this to < 1.5 seconds, while maintaining expert-level precision.

Example Predictions
Predicted cardiac phases (ED/ES) compared to expert annotations.

PACS Results Table
Comparison between model predictions and operator annotations.


Comparison with Existing Studies

Reported Results
Comparison of performance with previously published deep learning approaches.

Our model outperformed most existing methods while removing pre-processing steps and enabling multi-beat detection in a single long video — a unique capability among published approaches.


MultiBeat Dataset

Five experts annotated the MultiBeat dataset; one repeated the process for intra-observer variability evaluation.

MultiBeat Results
Model versus human operator variability.

Results:

  • Intra-observer variability:
    • ED: −0.22 ± 2.76
    • ES: 0.25 ± 3.75
  • Inter-observer range:
    • ED: [−0.87, −5.51] ± [2.29, 4.26]
    • ES: [−0.97, −3.46] ± [3.67, 4.68]

Model discrepancies fell within human inter-observer variability, proving comparable reliability.


EchoNet Dataset

To validate generalisation, we tested the trained model on the public EchoNet dataset (Stanford University) without retraining.

Results (9,190 videos):

  • aaFD: 2.30 (ED), 3.49 (ES)
  • Mean frame difference: 0.16 ± 3.56 (ED), 2.64 ± 3.59 (ES)

Performance was consistent across unseen data, confirming strong generalisation.


Contact

For research queries or dataset access, please contact:
📧 Elisabeth.Lane@uwl.ac.uk


Project Team


References


License

Creative Commons License
This work is licensed under a Creative Commons Attribution–NonCommercial–NoDerivatives 4.0 International License.

If you use this dataset or cite the paper, please reference:

Lane, Elisabeth S., Neda Azarmehr, Jevgeni Jevsikov, James P. Howard, Matthew J. Shun-Shin, Graham D. Cole, Darrel P. Francis, and Massoud Zolgharni. “Multibeat echocardiographic phase detection using deep neural networks.” Computers in Biology and Medicine 133 (2021): 104373.