I was reading some interesting articles in IEEE Spectrum Magazine before going to sleep and I got to one that serves as an excuse to talk about something I have been meaning to post abut for a while.

This interview transcript describes how companies – Facebook in the example –  are able to obtain data from you without actually requesting you for it:

We all have things we don’t want to put on Facebook, and for some, the loss of privacy is so large that they stay off the social network entirely. But it turns out that, to quote heavyweight champion Joe Louis, “You can run, but you can’t hide.”

To quote a research paper published last month on PLoS One, “With the help of machine learning, social network operators can make predictions regarding the acquaintance or lack thereof between two nonmembers with a high rate of success.”

It’s been known for a while that Facebook makes a shadow profile of people it learns about who aren’t on Facebook. What the researchers here found was that they could predict, with a surprising degree of accuracy, whether two such nonmembers were acquainted with each other.

As the article describes, the science of Machine Learning and Data Mining are able to extract data from data, to the point that Facebook is able to know lots of things from people who do not even have a Facebook account. Some of my projects at work involved heavy doses of data mining and I have had a clash course in the last few months, which has sparkled my interest in this area.

There is great teams of researchers in AT&T working in the most interesting applications and ideas based on data mining and machine learning. I am lucky to have some of them as colleagues at the AT&T Security Research Center. Based on their work and from other researchers in AT&T Labs I can share here a couple of examples of how much information one can extract from other sources of data.