Improving classification of tweets using word-word co-occurrence information from a large external corpus

Author(s)

Publication date

2016

Publisher

Association for Computing Machinery (ACM)

Document type

Abstract

Classifying tweets is an intrinsically hard task as tweets are short messages which makes traditional bags of words based approach ine cient. In fact, bags of words approaches ig- nores relationships between important terms that do not co-occur literally. In this paper we resort to word-word co-occurence informa- tion from a large corpus to expand the vocabulary of another corpus consisting of tweets. Our results show that we are able to reduce the number of erroneous classi cations by 14% using co-occurence information.

Keywords

Version

acceptedVersion

Permanent URL (for citation purposes)

  • http://hdl.handle.net/10642/3723