- Cultural differences mentioned above. For example, "rocking" and "sick" are positive adjectives to members of some demographics and cultures but not others.
- English is a hard and ambiguous language. For instance, consider Chomsky's example: "old men and women". Is that [old men] and [old women] or does it mean [old men] and [women]. Another example is Eats, Shoots & Leaves.
- Sarcasm, innuendo, and double entendres. This is one of the current challenges in NLP. For a fun example take a look at the problem of detecting that's what she said.
- That people often use a bag of words model for sentiment analysis, at least as a first pass. That is, we analyze a document as a set of words and not a phrase. Thus, we will miss that the "not" in "not good" negates "good". In general, we will miss double negatives and other qualifiers. I love this illustrative example:
During a lecture the Oxford linguistic philosopher J.L. Austin made the claim that although a double negative in English implies a positive meaning, there is no language in which a double positive implies a negative. To which Morgenbesser responded in a dismissive tone, "Yeah, yeah."
The proper way to assess the performance is to examine precision, recall and F-scores:
refsets = collections.defaultdict(set)
testsets = collections.defaultdict(set)
for i, (feats, label) in enumerate(test_set):
observed = classifier.classify(feats)
These are all great numbers.
You can see that returning a continuous value from (-1,1) means that we could easily define neutral documents as those with some intermediate value, say (-0.1,0.1).
>>> from nltk import bigrams
Here is the source code: p-value.info github