Privacy Algorithm: have you cheated?

A short blog about a cute algorithm we came across whilst reading on Bayesian Methods, a theme we may develop here as we build upon our machine learning skills.

We want to know the level of cheating in the population. I think it's safe to say that fewer cheats than reality - regardless of any assurances made by our intrepid interviewer - are going to 'fess up.

One approach to get to truth might be to figure out this admission rate, but this - I think you'll agree - is fraught with complication and variation.

Protect, don't assure
What about protecting anonymity (and embarrassment) instead of assuring it? The basic idea is to dilute the truth-telling cheat responses (our signal) with a whole load of noise (randomly created cheat-responses). That way the signaller is kept safe.

I don't know who came up with this particular algorithm - and can claim no credit - but can at least point you to Cam Davidson Pilon. Cam references it in his excellent online book Probabilistic Programming and Bayesian Methods for Hackers/

The humble pound coin
Without letting the interviewer see the results, flip...
    Heads: tell the truth (whatever it might be!)
    Go away happy and safe in the knowledge of anonymity (see below).

    Tails: throw again
        heads: say you're a cheater;
        tails: say you're not.

Long story short, if we have 100 responses we know that this will be made up of two distinct populations: the one we care about, a 'true population' of 50 responses (with a cheat/no-cheat mix); and mixed in with it a second, false population of 50 made up of 25cheat/25no-cheats. Simply strike that false population out, so for example if the survey says 30 cheats / 100 population, we end up with a more considered view: 5 cheats in 50.

Simples. Our cheating friends can answer in full knowledge that the interviewer has no idea if the response was truthful or if generated by a tails/head coin toss. Even better, the truth-teller knows that the interviewer will likely be forgiving and presume that the response was coin-toss generated.

The best about this game? Everyone knows the rules and no cheating's required!

Bayesian Inference
We can actually take the above idea and develop much further with Bayesian analysis, but that is for another day.

Posted in fintech.

Leave a Reply