Given vectors that represent words, how do we construct sentences? Do we add the vectors? Do we find centroids? Do we normalize before, after or not at all?
In fact, can we even say we are dealing with a vector space?
Remember, a vector space has the following 8 properties:
- Identity (of addition and multiplication)
- Distributivity (of scalars and vectors)
- Addition also has commutivity and associativity plus an inverse.
- Compatibilty: (ab)v = a(bv)
"The commutativity property of vector addition does not always hold in semantics. Therefore, this property shouldn't (always) hold in the embedding space either. Thus, the embedding space should not be called a vector space.
E.g. attempt to treat semantic composition as vector addition in the vector space:
vrescue dog = v rescue+ vdog (a dog which is trained to rescue people)
vdog rescue = vdog+ vrescue (the operation of saving a dog)
The phrases "rescue dog" and "dog rescue" mean different things, but in our hypothetical vector space, they would (incorrectly) have the same vector representation (due to commutativity).
Similarly for the associativity property."
At the same time, erroneous assumptions are not necessarily unacceptable (as the post points out). It's just a high-bias model.
Different Spaces
For fun, I tried the vectors that Word2Vec gave me. Now, there is no reason I could think of why the vectors this algorithm gives me for words should be used to form a sentence. But the results were surprising.
Word2Vec | |
Description | Accuracy (%) |
Raw vector addition | 81.0 |
Normalized vector addition | 27.9 |
Raw vector centroids | 7 |
Raw vector addition then normalizing | 7 |
[Note: figures are indicative only.]
That is, adding together Word2Vec generated word vectors to make a sentence meant my neural net produced decent (but not stellar) results.
More promising was combining vectors from category space. The results looked like this:
Category Space | |
Description | Accuracy (%) |
Normalized vector addition | 94.0 |
Normalized vector centroids | 92.1 |
Adding unnormalized vectors | 6.2 |
Normalized vector addition then normalizing | 5.3 |
[Note: figures are indicative only.]
Finally, concatenating (and truncating if there were more than 10 words per text and padding if there were fewer) the word vectors for a sentenceand feeding it into an ANN produced an accuracy of 94.2%. Naive Bayes and Random Forest gave a similar results (92.3% and 93.3% respectively)
Note: multiplying each vector by a factor that was between 0.5 and 1.5 made no difference to the accuracy of a Spark ANN. Weights will simply change accordingly. But "Neural Networks tend to converge faster, and K-Means generally gives better clustering with pre-processed features" (StackOverflow).
However, when I ran the same data on a similar TensorFlow implementation, my accuracy swung wildly between about 50% and 70%. Only when I normalized the data did TF give me an accuracy comparable to Spark's.
Conclusion
I had much better accuracy using category space rather than TF-IDF when classifying documents.
When I asked a Data Science PhD I work with which technique I should use (adding vectors; finding centroids, concatenating vectors etc) his answer was: "Yes".
No comments:
Post a Comment