As I did with Instagram and Twitter, I’ve spent the last couple of days importing all of my posts from Facebook to this blog. Similar to those projects, I requested my archive in JSON format and then used an Apple Shortcut to parse it and upload via the WordPress API. The Shortcut is very similar to the one I used for Instagram, but with a few more edge cases and IF statements since FB allows for more post types than just images. (I’m not going to bother sharing it. If you were clever enough to follow the other two Shortcuts, you can figure it out.)
My earliest post was from 2007, and all together I had 4,083 days worth of posts to import. I only synced images to WordPress; I haven’t touched any videos yet (but I never really uploaded many of those to FB). It took me just short of 29 hours spaced out over the course of a week, not counting the time I spent manually reviewing and cleaning things up. (I deliberately slow down the API requests to avoid DDoSing my own site.)
And it bears repeating: Facebook’s data archive sucks. A brief list of problems I encountered:
- Blank status updates. This happened a lot more in the older data.
- Missing data when I “shared a Page/post/photo/link/video/event” from Facebook itself. This happened a lot more in the older data.
- Missing data when posting from other sites/apps, like Eventbrite, Foursquare, Tweetdeck, Spotify, Meetup, Runkeeper, etc. This happened a lot more in the older data.
- Duplicated content – there would be a “Kris Howard shared a link” item with a URL, and then a matching status update where I actually shared the URL. This happened a lot more with the data in recent years.
- URLs that I’m fairly certain I shared in comments on posts, but included as top-level items with zero context. This happened exclusively with data from the past couple years.
- Inconsistent links to FB users – most of the time when I tagged someone, their name would appear like this in the data: “Hey @[1108218380:2048:Rodd Snook]”. But then in recent years, that format disappeared.
- Dead links – not Facebook’s fault, but there are so, so many.
As soon as these errors started cropping up, I had to make the call whether to stop and adjust my Shortcut to handle them, or to clean them up manually. In most cases, I decided that I’d just manually review and fix. After every couple months’ worth of import, I’d pause and page through them on the site to see if any looked weird. I’d then manually edit and tidy up any issues.
There were other oddities I noticed in the data that aren’t really errors. For example, my earliest status updates are all sentence fragments that start with a verb. This is because back in the aughts Facebook had an explicit “What are you doing right now?” prompt. Kinda funny.
The archive also included posts that I made on other people’s profiles, mostly just “Happy birthday” wishes. The data does include the name of the person I was writing to, but I couldn’t be arsed creating a special case in my Shortcut to handle that. I ended up deleting most of those and just keeping the ones that amused me or where it was a family member.
The archive didn’t include posts that I made in Groups. That may have been an option when I downloaded my archive, but I decided it wasn’t worth the effort. I’ve never been a big Group user. It also doesn’t include the comments on any of my posts. Again, that may have been an option, but I figure discussions should be ephemeral. I’m okay with not having those.
Ultimately you could argue that this import had minimal value. Most of the content is actually already on this blog, either posted natively or included already in the Twitter or Instagram imports. But there are occasional gems in there that I didn’t post anywhere else, and I’m happy I preserved those. I don’t expect anyone to ever read them, but it’s an important part of my personal data archive and I’m glad I have it.
And now I just need to finish deleting all the content over on FB…