-
Notifications
You must be signed in to change notification settings - Fork 0
CheXpert Dataset
Aleco Kastanos edited this page May 19, 2020
·
1 revision
- 224,316 chest x-rays
- 65,240 patients
- 14 observations from radiology reports
- Validation is 200 studies annotated by 3 experts
- Test is 500 studies annotated by 5 experts
Extracts observations from free text reports in three stages:
- Mention extraction
- Mention classification: Either negative (no evidence for observation), uncertain observation, or positive (mention of observation)
- Mention aggregation: Positive label (1) - observed, uncertain label - if no positive and at least one uncertain mention (-1), negative label (0) - actively not observed, or no mention / observation ('blank')
There is also a "no finding" column which is assigned 1 if nothing is identified as positive or uncertain
U-ignore: Mask and ignore all unknowns - equivalent to dropping all rows with missing values
Binary Mapping: Map all unknowns to either 1 or 0
U-self-trained: Train a model on the U-ignore scheme to predict labels. Then use it to predict unknown values. You can use the class or the logit prediction.
3-class classification: Train model to predict 1, 0, or u. Make predictions and then ignore u and softmax over 1 and 0. You can then use the "more likely" class.