HomeWhy Challenges?All ChallengesCreate Your Own ChallengeContributorsForum
Sign in / Register

Video Collection

The dataset consists of 50 videos of cataract surgeries performed in Brest University Hospital between January 22, 2015 and September 10, 2015. Reasons for surgery included age-related cataract, traumatic cataract and refractive errors. Patients were 61 years old on average (minimum: 23, maximum: 83, standard deviation: 10). There were 38 females and 12 males. Informed consent was obtained from all patients. Surgeries were performed by three surgeons: a renowned expert (48 surgeries), a one-year experienced surgeon (1 surgery) and an intern (1 surgery). Surgeries were performed under an OPMI Lumera T microscope (Carl Zeiss Meditec, Jena, Germany). Videos were recorded with a 180I camera (Toshiba, Tokyo, Japan) and a MediCap USB200 recorder (MediCapture, Plymouth Meeting, USA). The frame definition was 1920x1080 pixels and the frame rate was approximately 30 frames per second. Videos had a duration of 10 minutes and 56 s on average (minimum: 6 minutes 23 s, maximum: 40 minutes 34 s, standard deviation: 6 minutes 5 s). In total, more than nine hours of surgery have been video recorded.

Reference Standard

Tool Usage Annotation

All surgical tools visible in microscope videos were first listed and labeled by the surgeons (see Fig 1). Then, the usage of each tool in videos was annotated independently by two non-M.D. experts. A tool was considered to be in use whenever it was in contact with the eyeball. Therefore, a timestamp was recorded by both experts whenever one tool came into contact with the eyeball, and also when it stopped touching the eyeball. Up to three tools may be used simultaneously: two by the surgeon (one per hand) and sometimes one by an assistant. Annotations were performed at the frame level, using a web interface connected to an SQL database.

Fig. 1. Surgical tools that should be annotated in videos
1. biomarker 2. Charleux cannula 3. hydrodissection cannula 4. Rycroft cannula 5. viscoelastic cannula 6. cotton 7. capsulorhexis cystotome
8. Bonn forceps 9. capsulorhexis forceps 10. Troutman forceps 11. needle holder 12. irrigation / aspiration handpiece 13. phacoemulsifier handpiece 14. vitrectomy handpiece
15. implant injector 16. primary incision knife 17. secondary incision knife 18. micromanipulator 19. suture needle 20. Mendez ring 21. Vannas scissors


Finally, annotations from both experts were adjudicated: whenever expert 1 annotated that tool A was being used, while expert 2 annotated that tool B was being used instead of A, experts watched the video together and jointly determined the actual tool usage. However, the precise timing of tool/eyeball contacts was not adjudicated. Therefore, a probabilistic reference standard was obtained:

  • 0: both experts agree that the tool is not being used,
  • 1: both experts agree that the tool is being used,
  • 0.5: experts disagree.

Inter-rater agreement, before and after adjudication, is reported in Table 1.

Table 1. Inter-rater agreement (Cohen's kappa)
Tool Before adjudication After adjudication
biomarker 0.835 0.835
Charleux canula 0.949 0.963
hydrodissection canula 0.868 0.982
Rycroft canula 0.882 0.919
viscoelastic cannula 0.860 0.975
cotton 0.947 0.947
capsulorhexis cystotome 0.994 0.995
Bonn forceps 0.793 0.798
capsulorhexis forceps 0.836 0.849
Troutman forceps 0.764 0.764
needle holder 0.630 0.630
irrigation/aspiration handpiece 0.995 0.995
phacoemulsifier handpiece 0.996 0.997
vitrectomy handpiece 0.998 0.998
implant injector 0.980 0.980
primary incision knife 0.959 0.961
secondary incision knife 0.846 0.852
micromanipulator 0.990 0.995
suture needle 0.893 0.893
Mendez ring 0.941 0.953
Vannas scissors 0.823 0.823

Example of Result

Tool usage, during a typical surgery without any complications, is illustrated in Fig. 2.

Fig. 2. Tool usage during a typical surgery


Training and Test Sets

The dataset was divided into a training set (25 videos) and a test set (25 videos). Division was made in such a way that 1) each tool appears in the same number of videos from both subsets (plus or minus one) and 2) the test set only contains videos from surgeries performed by the renowned expert. Apart from that, division was made at random. In total, the training set contains 4 hours and 42 minutes of video and the test set contains 4 hours and 24 minutes of video.

Consortium for Open Medical Image Computing © 2012-