This posting is about suicide. If you, or someone you know, is having suicidal thoughts,
please call the National Suicide Prevention Lifeline at (800) 273-8255.

“I’m sorry for the demon I’ve become”

— Five Finger Death Punch, "Walk Away"

Recently, the press has been awash with articles on how AI can predict … well … everything. This is unlikely to be true. Or, being more precise, unlikely to be useful to critical domains. Like suicide.

Quartz surfaced a study from April 2017 that attempts to predict risk of suicide attempts using ML.

The paper is awash in detail about ML, but they partook in one of the greatest ML sins: Using a leaky variable.

At the highest level, they compared people who attempted suicide to people who did self-harm that was not judged to be a suicide attempt to a random cohort of hospital patients.

Then they used a set of metrics to characterize each person over time leading up to the event, building different models for 7, 14, 30, 60, 90, 180,365, and 720 days before the incident.

Building models with this nested approach is pretty subject to error: The 7 day results, for example, include some of the 14 day results as the person remains the same. Ignoring this “repeated measures design” makes results look more significant than they actually are.

They report AUC to assess the quality of their predictive models, and also give both Precision and Recall metrics. AUC measures the overall quality of a model. Precision measures how many of your “true” predictions were correct, while Recall measures the percentage of actual “trues” that you predicted as “true”.

Their AUCs are really high. 7 days out, their AUC is .84, with a precision of .79 and a recall of .95. These numbers are far higher than you’d expect in a social science study -- people are really complex, and finding models that predict that well is unusual. Even weirder, 720 days out, the AUC is .8, precision .74 and recall was .95. These are virtually the same as 7 days out; for this to be true, the people would need not to have changed at all over 2 years. This seems unlikely.

These kinds of results are usually due to “leaky variables”. Leaky variables are input signals that include some measure that is, in fact, the output. With a leaky variable, your input models include your output measure, and your model looks really good.

When I looked at their signal importance table, the first thing that jumped out was that “self-inflicted poisoning” was a powerful signal in all their models. Given that self-inflicted poisoning is likely a suicide attempt, having it in the input signals means that you are directly including the outcome measure. This is a classic leaky variable.

There are other issues, but leakage in the models will swamp any real findings, so it’s not worth trying to fix others. A note to all data scientists: Check your models for leaky variables. This is easy to do in simple models, and a common feature in xgboost-based models (like this one). For other types of mathematical models, additional software is required.

(547)