How much do humans herd?

We can measure this.Choosing Ice-Creams: can we measure a lack of uniformity? There are 16 ice-creams at the parlour. If we witness the first 16 customers all choosing vanilla, predicting customer choice 17 is pretty straightforward; not so if those first 16 customers each choose a different flavour. This, in (vanilla) essence, is the problem […]

Another day, another hack.

Here’s a quick reminder for anyone who thinks it won’t happen to them: there are just over 3bn records below (and only 7bn or so of us on this planet).HCK = Hacked (2,200 mm records) LKD = Leaked (60 mm) INS = Inside Job (350 mm) PUB = Accidentally Published (72 mm) SCR = Poor […]

What is your password? A sneak preview.

So what do passwords look like?The short (overly crude) answer: they all look the same. RockYou The 2009 RockYou password database is perhaps the most famous, infamous rather, of breaches. It was substantial (32mm records) but what made it most distinctive, and extensively studied, was that the passwords were stored in cleartext (or plaintext). Let’s […]

Close, but no cigar

Barclays & Digital Catapult National Business Challenge EventAt very short notice – and that was our fault entirely – we found out about the Barclays and Digital Catapult Challenge. As described by them, the challenge was designed to allow Barclays to reach out to innovative tech companies and help design and build solutions. The link […]

What does personal data look like?

When thinking about personal data it can be useful to look to its physical storage on the database to build a good mental map. Give or take, for every account that you have the online provider will have a database record that looks something like that represented below.Companies used to have usernames, but these were […]

Nobody likes passwords…

…so why do they exist?For each of your online accounts there is a record in a database (just like a row in excel) at the company that reads like this: Database Row Email passwordHash Account Id 100 mary@email.com 7a2ccf251ecb20b2b84ce0e3c3f72a29 #1000 No self-respecting company ever actually knows or stores your password. It hides that password – […]

Know thy Neighbour

Sometimes it can be hard to visualise what text analytics can really mean – a lot of the time our brains seem to stop at keyword counting. Here’s one way of taking things a little further. Evolving Relationships A client had an interest in determining if relationships existed between various corporate entities. We can easily […]

Financial Disclosure: scanning for risk

We’re obviously very interested in financial disclosure. I say ‘obviously’ well-aware that our purist technical trader friends are interested in nothing but price, however let us assume that the disclosure information has function, and leave ideas about predictive capabilities for another time. Roughly stated, US companies accessing the capital markets need to file regularly with […]

Privacy Algorithm: have you cheated?

A short blog about a cute algorithm we came across whilst reading on Bayesian Methods, a theme we may develop here as we build upon our machine learning skills. We want to know the level of cheating in the population. I think it’s safe to say that fewer cheats than reality – regardless of any […]

Glorious Ambiguity: context is all

English of course not only has ambiguity, but is all the richer for it. I’m sure there’s good evolutionary social science making the case for ambiguity being absolutely essential for the success of a language, but we all instinctively know why this is so. We all value occasionally not quite meaning what we say. Latent […]