Pete Warden has managed to gather a lot of data from public Facebook profiles and this has been making the rounds of the blogosphere, especially since he is generously going to make the data available to researchers.
Sarah and I had a lengthy discussion about his data collection methods because we were trying to figure out just how much information he could have collected from public FB profiles. We randomly went through profiles of people we didn’t know just to see what could be found on those pages. Very often, it wasn’t a lot: the person’s name, a photo, pages (I’m guessing those are fan pages) and friends. Not even the city where they were living was listed. Of course, this method has its limits: we picked a very few samples and made our way through friends’ lists in order to move from person to person. Perhaps a different sample would have given us completely different results.
On the other hand, poking around in FB’s privacy settings, we came across these paragraphs:
When your friend visits a Facebook-enhanced application or website, they may want to share certain information to make the experience more social. For example, a greeting card application may use your birthday information to prompt your friend to send a card.If your friend uses an application that you do not use, you can control what types of information the application can access. Please note that applications will always be able to access your publicly available information (Name, Profile Picture, Gender, Current City, Networks, Friend List, and Pages) and information that is visible to Everyone.
In case you’re wondering, you can find this under Privacy Settings > Applications and Websites.
So maybe Pete was able to collect his data that way, which would explain how he could establish maps of relationships in the US.
The reason I’m writing about this, though, is more than just a question of methodology. It’s how people are taking his results and blatantly applying it to the population as a whole. That is just wrong.
The data that Pete collected may be a good source of information about people who use Facebook, but to then claim that it can be applied to the population in general is just wrong. danah boyd has already shown that there is a difference in the type of people who use FB and those who use MySpace. Does that mean that one is more representative of the general citizenry than the other? Nope. Neither are. They are each sub-cultures of the digital citizenry and the digital citizenry is a sub-culture of the population in general. I know it’s hard to believe, but not everybody is on Facebook.
That map of his may be an interesting look at how FB users are interconnected but it doesn’t mean that it’s true of people in the US in general.
Now I’m going to get all scientific-y and note that it’s quite possible that the data collected could turn out to be a good approximation of how the general population behaves but we can’t assume that that is true.
So in conclusion: Pete Warden’s data? Great for people who want to do research on Facebook users. Not so great for people who want to understand society in general.