Personal data, from private to public
July 2019 (perspective of an assistant professor)
I list the spectrum of personal data from private to semi-public to fully public, to get us thinking about what it really means to share our words and images with others.
For a longgg time I've been thinking about the spectrum of private to public as it relates to personal data, probably ever since 1997 when I posted my middle-school ramblings to my first website. I doubt anyone actually read what I wrote back then, even though it was technically public.
Fast forward 22 years. I just read Hidden cities by Nadia Eghbal, along with her June 2019 newsletter, which re-sparked this long-standing thought in my mind that just because something is online doesn't mean it's necessarily public. I also recently wrote Communicating, Fast and Slow, which classifies modern communication on a spectrum from fast to slow. That gave me the idea to try to classify personal data on a spectrum from private to public. Here goes!
The first category includes data that we expect to be private:
Even though you expect all of this data to be private, anything in the cloud is less private since others can more easily access it.
The next category includes data that we intentionally communicate to an audience but expect to be kept somewhat private. Note how this is starting to relate to Communicating, Fast and Slow. Let's go from the least to the most public here:
This final category includes data that we intentionally make public, but in practice not everything is equally public (see related links in Appendix). Again from the least to the most public ...
Note that just because you post something publicly doesn't mean that you're necessarily expecting a ton of people to view and react to it; people can get caught off-guard when content from their niche personal website gets featured on, say, major news outlets.
Another note: the longer your content is (whether it's writing, audio, or video), the less public it feels since it takes more effort for someone to sift through it to find something to react to. For instance, a short Twitter or Facebook post feels very public since everyone can instantly see and knee-jerk react to it within seconds. But a longform article, podcast, or book requires a lot more dedicated attention before someone spreads or reacts to it.
Also, the more well-known you are, the more of a multiplier there is on the “publicness” of anything you post. (Think of the difference between a random unknown person posting something and the CEO of a major company posting that exact same thing.)
1What if someone secretly records you and posts publicly without your permission? Or gossips with ill intent? This simple one-dimensional spectrum doesn't take deception into account.
2Encrypted data is more private than unencrypted.
3Using pseudonyms, throwaway accounts, or posting anonymously can make these more private.
4I've personally found audio and video formats to be much more expressive than writing because they feel less public. Since audio/video requires more dedication from my audience to consume, it's less likely that my words will get amplified out of context. For details, jump to 7:57 of my VidCon 2019 recap video. (In contrast, it's all too easy for someone to screenshot or copy/paste an excerpt of a written article and re-post it out of context or with ill-intentioned commentary.)
5Often these people aren't purposely looking to incite outrage for personal gain. They're just unaware that their views are unpopular with certain audiences that they're not even targeting. But it's so easy for anyone to pick up on those words and re-share them with communities that the post wasn't originally meant to reach.
Hopefully this spectrum has made you think more deeply about what it means for our personal data to be private, public, or somewhere in between. Modern life isn't as simple as “Well, if you don't want the entire world to see something, then don't post it online!” That's almost like telling people not to ever go out in public. Many of us communicate online publicly or semi-publicly since that's the most convenient way to reach our audience, which could be up to dozens or hundreds of likeminded people. And some of us write online to think out loud or to keep ourselves accountable/motivated, even if (almost) nobody is reading. But we're not prepared for the context collapse and unintended attention when our words spread beyond our intended bounds.
Another important facet I didn't have room to fit into this one-dimensional spectrum is time. What if you excitedly posted something super-publicly ten years ago, but now you wish that weren't online anymore since your viewpoints or identity might have changed during those years? Sometimes you can't even take your old content offline, since you don't control the website where it's posted (e.g., a discussion forum or online publication). And even if you could take it offline (e.g., from your own website), it's probably archived in the Internet Archive so someone will always be able to dig it up and re-post it with their own commentary.
Finally, what this article also didn't cover is that other people often post information about us, such as our parents, friends, classmates, teachers, or coworkers. They often use our names, words, and images in their own posts, which we can't fully control. Like it or not, our data is always going to be less private than we expect, so we all need to learn to cope with this reality.
Last modified: 2019-07-15