Storing data from the Twitter streaming API within a django project
For some reason I was afraid of the Twitter Streaming API, but it turns out the streaming API is super simple. If you have ever wanted to implement a twitter search bot, or just wanted to play around with a large amount of twitter search data, or statuses the streaming API is the way to go.
The Streaming API is a little tricky, but ultimately easy. You are just issuing a request to twitter, which doesn’t close for a long time. Over the length of the request Twitter will continue you pass data down the pipeline.
This is a little tricky to understand at first, it’s actually quite counter to how a lot of programing is done. It’s event/loop based, versus sequential. If you are used to how things work in Javascript this actually might come in handy.
As a matter of fact an environment like NodeJS does a really great job at handling the streaming API. In less then 10 minutes I was able to understand how the streaming API works because of twitter-node. With code like the following you get notified of every new status from group of people.
var TwitterNode = require('twitter-node').TwitterNode,
sys = require('sys')
twit = new TwitterNode({
user: 'username',
password: 'password',
follow: [8038312,40289924,68938254]
});
twit.addListener(
'tweet',
function(tweet) {
sys.puts("@" + tweet.user.screen_name + ": " + tweet.text);
}
);
Once you run this code, and then wait for a while, you will start to see tweets pop on the screen as they come down the pipeline.
I am really interested in using Node more, but I had already created a project in Django and I wasn’t interested in writing SQL to talk to my mysql database to store all the new data I was going to get from the Streaming API. This is a lossy version of what I did, but it’s close enough to give you a handle on whats going on. Tweepy has support for the streaming API, but they don’t have the docs yet so I had to dig through there code.
from tweepy.streaming import StreamListener, Stream
import simplejson
from datetime import datetime
import time
import locale
# Parses twitter dates stored in json # from twitter-python
def parse_datetime(string):
# Set locale for date parsing
locale.setlocale(locale.LC_TIME, 'C')
# We must parse datetime this way to work in python 2.4
date = datetime(*(time.strptime(string, '%a %b %d %H:%M:%S +0000 %Y')[0:6]))
# Reset locale back to the default setting
locale.setlocale(locale.LC_TIME, '')
return date
# You need to subclass the StreamListener
class MyStreamListener(StreamListener):
"""docstring for MyStreamListener"""
def on_data(self, data):
print "starting on data call"
data = simplejson.loads(data)
return True
def on_timeout(self):
print "we got a time out"
def on_error(self, status_code):
print "we got an error %s" % (status_code)
return False
mylisten = MyStreamListener()
mystream = Stream("voidfiles","hacker",mylisten,timeout=30)
mystream.filter(follow=[])
With something like that code I was able to save tweets that were from a group of twitterers. Because it’s python I was able use django’s ORM to store the tweets as they came.