-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Self-interactions are not included when the feature has value 1 #698
Comments
This is expected behaviour. I'll write an article about new feature generation algorythm in Vw's wiki today. In short - it generates simple combinations of features for interactons of same namespace instead of permutations (as it was before). The reason behind that is that your model won’t benefit from These new rules of feature interactions generation supposed to decrease number of generated features by filtering out features that won't improve model. The training shall converge faster and results may be slightly better. So this approach is enabled by default. You still can switch back to the old feature generation engine by specifying |
Why "It looks like such "duplicate" features also don't help with hash collisions resolution."? |
Wiki: https://github.com/JohnLangford/vowpal_wabbit/wiki/Feature-interactions
I believe that model gains such "unnecessary" features so fast (with grow of features in self-interacting namespace) that very soon they become a source of collisions. For example, I think I've also seen a paper that proves that you can't solve collision problems by using 2 different hashes in hashing-trick storing hashes to the same address-space. At least not for non toy datasets. Our situation is very similar to usage of 2 hashes for same feature in hashing-trick. |
Thanks for answer. I refer to my problem: text classification. In bigram model my data set generates 800M features, I can use only 28 bits (~300M), and I have couple of special non-text features. So I can try to replicate these features using above mentioned method. |
Well, it would be better to find out beforehand how badly collisions affects your model predictability. Do you want to make 100% collision protected only a "couple of special non-text features" or all interactions with them? If first and you think this helps you plus know some C++ then it's not difficult to do. You just need to replace "special non-text features" with hashes in some interval [0,100] in your dataset. "By default VW hashes string features and does not hash integer features.". And then plug a small C++ check into VW code that will make sure that other features will never get a hash from this interval. Perhaps there is a easier way but I prefer adjust VW code for my purposes instead playing with scripts etc. I can point you to the right piece of VW code to do that. And let's continue with this in another thread to not spam this thread as this discussion is offtopic |
@trufanov-nok , it's great that the interaction terms no longer include both "a * b" and "b * a". I was happy to see that change in your recent refactor. But not including "a * a" is a problem. If the feature "a" always has value 1, it's true that "a*a" is redundant with "a". The problem comes if "a" can sometimes take the value 1 and sometimes take values other than 1. Consider building a model on 1 feature with quadratic interaction terms, and say that you end up with a model that looks like y = 0.5 * x + 1.5 * x^2. If x = 1.001, VW outputs a prediction of 2.0035. If x = 0.999, VW outputs a prediction of 1.9965. However, if x = 1.0, VW outputs a prediction of 0.5. I don't think that these cases would be uncommon. You'd run into all the time if your features are counts, for example. I found it because I'm using the iris dataset for testing; it has continuous features, but the continuous value is sometimes exactly 1. I think that a discontinuity like this in a linear model is undesirable, and it's certainly very counterintuitive. This discontinuity also exists in the cost function when training, which I find equally undesirable. I'd prefer to not have the duplicate "a * b" and "b * a" terms, but if that means the self-interaction terms disappear when they're equal to 1, I can add on the "--permutations" flag. |
This seems like a convincing argument to me---continuity is certainly -John On 06/24/2015 11:47 AM, Stephen Hoover wrote:
|
Well, I'm convinced too. The problem is that if we allow I think out 2 fast fixes:
Thus my second proposal is:
My idea: let's let experienced user to decide what kind of dataset he has. So he'll be able to reduce number of generated features even more with one of these two flags. What you guys think? |
I actually like your first option, only generating self-interactions for terms with explicit weights. As long as this is well-documented, it should let people distinguish between features which could take on any weight and features which only take on "present" and "not present" values. It does lead to a gotcha if a user mixes explicit and implicit weights, but I do think that it would handle all of my use cases. I think that another good option is your proposal 2b), to have a command line flag which turns on the behavior in your first proposal. That way a new user wouldn't get tripped up by something they didn't expect, but expert users could still benefit from not generating so many unnecessary interaction terms. |
May be sticking with option 1 will be enough, @JohnLangford ? |
I like option 1, it seems intuitive to me: I agree with the idea of making reasonable defaults for beginners, while allowing experts to tune it. On the other hand, the number of command line options is also a burden for beginners ( |
Option 1 seems nonviable because the only thing available at the moment Our baseline approach should probably be to allow self-interaction -John On 06/24/2015 02:54 PM, Martin Popel wrote:
|
But what if I add a boolean flag to example class which will be set on if weight was initialized here ? |
There is more than one feature in an example, so that seems inadequate. -John On 06/24/2015 03:09 PM, Alexander Trufanov wrote:
|
Nope, this won't work either. Shall take a time and think about this. |
It does not seem worth it. This would imply that every feature takes up significantly more space -John On 06/24/2015 03:33 PM, Alexander Trufanov wrote:
|
Ok, i've done a required fix and tested VW with |
When using quadratic or cubic interaction terms, self interactions are not created from features which have a value of 1. Example:
echo '1 |f a b' | vw
shows 3 features (correct: "a", "b", and the constant).echo '1 |f a:2 b:2' | vw -q ff
shows 6 features, which is also correct: "a", "b", constant, "a * a", "b * b", and "a * b".echo '1 |f a b' | vw -q ff
shows 4 features. If I add an "--invert_hash", the model output iswhich is missing the "a * a" and "b * b" terms.
This behavior also happens at test time. If I extract model coefficients from an invert hash model file and compare predictions generated from those to predictions generated directly from VW, the predictions match everywhere except where a feature has a value of exactly 1.
I found this error in VW version 7.10.2, commit hash dc5532f, running on OS X 10.10.3. I know that the error was not present at commit hash bb68807, but I haven't done a bisection to locate exactly where it was introduced.
The text was updated successfully, but these errors were encountered: