geoffrey gauchet

How I Leverage ifttt and tumblr for My New Blog

I recently relaunched this site. I went with something new and a little unconventional. The old site was completely custom built, with a really simple back end. It was a giant plain-text area and a field for a title. That was it. I used my own bastardization of MarkDown to create HTML in my posts. It worked, but was goofy and not very efficient.

I soon realized that the majority of my content creation on the web wasn’t done through blogging; it was done through social media, like Twitter and Facebook. This makes sense, especially with mobile apps being so accessible. I can send a short message or a photo from practically anywhere. It’s far easier than downloading the photo to my computer from my phone or camera, uploading it to my web server, then manually typing the “MarkDown” image display code. 

So, with this realization, I decided my personal site should be an aggregation of my entire web presence. I had thought of a series of boxes of each network, but that was too difficult to follow. So, taking a cue from one of my apps, incredible!, I decided to make it a consolidated stream in chronological order. Now, the big issue — how to grab content from Facebook, Twitter, Instagram, YouTube, Last.fm, foursquare (Photo check-ins) AND normal blog posts and order them properly?

The way incredible! does it is by using each API for each network. This works, but requires handling all the different authentication methods for each site: OAuth, OAuth 2, basic HTTP, etc. It could be insanely difficult to manage and keep track of.

I like Tumblr a lot. It’s a nice and easy system for creating a variety of posts for a blog. I use it for my web/app design blog. I wanted to leverage this system for my new site, especially with the added bonus of there being a mobile app.

So here’s how I merged 5 social networks and Tumblr blog posts into one cohesive stream.

Since Tumblr uses a variety of post types, I leveraged their system for the initial storage of all my posts. I then used my favorite, IFTTT (If This, Then That), to create some recipes to move data from my social networks into Tumblr. IFTTT allows you to authorize multiple websites, and then create tasks based on your activity on those sites. For instance, every time you post a link to Facebook, post that link on Tumblr. Very simple stuff and slick interface.

So, I created some recipes on IFTTT to pipe my posts from other sites into Tumblr.  This then automatically gave me a chronological list of everything I post from all these accounts, usually no more than 15 minutes behind when I posted it. The IFTTT task tags the Tumblr post with the name of the network it’s pulling from (“twitter”, “youtube”, etc) so I can track that and act on it.

But therein was a problem: I was getting duplicates. Not because of IFTTT, but because I sometimes post the same thing to Facebook and Twitter. IFTTT doesn’t cover handling that, and neither does Tumblr. So I couldn’t just use a straight up Tumblr site on my domain.

So, what I did was, I wrote a PHP script that connects to Tumblr and pulls all the posts from my Tumblr blog since the last pull of posts. This runs as a cron job every 15 minutes.

When it pulls the posts from Tumblr, it normalizes the data and slaps it into my own MySQL database, duplicates and all.

Now, here’s where it gets fun and I used some algorithms from incredible!. When a user visits my site, it pulls all the posts from my database, which is faster than pulling from Tumblr every time. As it looks through the posts, it compares each one with the one before it to see if they are exactly the same, or similar. I test for similarity by striping links and hashtags out of all posts. Then, I explode the content of the posts into arrays of each word in the post. Then, I loop through the arrays of both posts and count the percentage of words in the current. If they are at least 80% similar, I assume it’s the same post and I skip over the new one and leave the previous one. Here’s an example:

I have an Instagram post of a picture of me and Rhea’s homebrew. It has the caption “Our first six pack of our first homebrew!” and tag it with the name of our “brewery” on foursquare (Raise Your Glasses Brewing).

I also sent that Instagram photo to Twitter, so I have a tweet that says “Our first six pack of our first homebrew! @ Raise Your Glasses Brewing” followed by the URL of the image.

Now, a human can tell that these are about the same thing. But, using my algorithm, it determines that they are probably about the same thing, so it hides the duplicate and the visitor only sees one of them.

I also expand all the shortened URLs from tweets to their full length URLs. This allows me to look at the URL and act on it. For instance, if I tweet a URL to an image (ending in, say, “.png”), I can change that text post to a photo post on the fly and display the image, instead of it just being some words with a “t.co” link.

But what about deleting and editing? Easy!

I created a log in function where I can log in and when viewing a post, I get three extra buttons: Delete, Refresh, and Edit. Edit just opens Tumblr directly to the post editor for that post. Delete just deletes the post from my own database, but it remains on Tumblr. Refresh connects to Tumblr, downloads only that specific post, and refreshes the data in my database with the new stuff from Tumblr. I could probably automate this, but in all honesty, I won’t be editing posts that often and when I do, it’ll be one here and there.

In fact, with the 15 minute cron job to pull from Tumblr to by database, I can usually fix a typo or a dead link in Tumblr before my site even knows that post exists.

So, it’s a little convoluted, but it works and keeps me from having to create a full backend for my site. It also grants me more exposure as my stuff is still searchable on Tumblr. 


« Wherein I Attempt to Listen to an Entire Journey CD
January 11, 2011
Four More Beards! Four More Beards! »
November 7, 2012
View or Post Comments...