"Twitter [provides] a rich data source for inducing the demographics of that language community" #wymhm

In April 2010, Twitter had approximately 106M registered users. The volume of data that flows through the Twitter pipe dwarfs any other publicly available linguistic corpus in existence (except the web itself), and unlike fixed corpora, it still flows. Such a huge dataset has proven itself to be a fertile resource for a number of natural language processing tasks (such as trend detection and sentiment analysis), but its value as a collection of colloquial language begs to be used for lexicography as well: if the purpose of a dictionary is to record actual usage, then Twitter data allows us to broaden the scope of our corpus beyond newswire, literary works and other forms of privileged publication and include the unedited language of everyday folks as well.