Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerical precision problem in the outputted expression #15

Open
folivetti opened this issue Nov 21, 2022 · 1 comment
Open

Numerical precision problem in the outputted expression #15

folivetti opened this issue Nov 21, 2022 · 1 comment
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@folivetti
Copy link

Running some experiments with SBP using this dataset:

Pagie.csv

And by running this script:

from pyGPGOMEA import GPGOMEARegressor as GPG

def standardNotation(expr):
    expr = (expr.replace("X0", "x0")
            .replace("X1", "x1")
            .replace("X2", "x2")
            .replace("_", "")
            .replace("+-", "-")
            .replace("--", "+")
            .replace("^", "**")
            )
    expr = re.sub(r"/(-\d+\.\d+)", r"/(\1)", expr)
    return re.sub(r"\*(-\d+\.\d+)", r"*(\1)", expr)

est = GPG( popsize=500, generations=200,
    linearscaling=True, functions='+_-_*_div_log_exp', erc=True,
    initmaxtreeheight=6, maxtreeheight=20, maxsize=1000,
    subcross=0.0, sbagx=False,
    sbrdo=0.75, submut=0.25,
    unifdepthvar=True,
    tournament=4,
    sblibtype='p_10_9999_l_n',
    caching=False,
    gomea=False, ims=False, silent=True, parallel=False, seed=1 )

z = np.loadtxt("Pagie.csv", delimiter=",")
x = z[:,:-1]
y = z[:,-1]
x0 = x[:,0]
x1 = x[:,1]

est.fit(x,y)
eq = standardNotation(model(est))
yhat = eval(eq)
yhat2 = est.predict(x)
print(np.square(yhat-yhat2).mean()) # squared error between the predicted output from `predict` method and from evaluating the symbolic model

I get a mean squared error of 5624673608570.937, as discussed it is possibly due to truncation of the coefficient values.

@marcovirgolin
Copy link
Owner

Thank you @folivetti . This is indeed a rounding problem because C++ uses double precision but the output displays only the first (I think) 5 digits.

A way to fix this is to restrict the evolution to work up to a certain numerical precision, another (which you suggested and I report in order to remember) is to try to use scientific notation for the output.

Gotta find some time to do that, though. For now, I suggest using est.predict instead of re-interpreting the formula, to get the correct prediction.

@marcovirgolin marcovirgolin added enhancement New feature or request help wanted Extra attention is needed labels Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants
@folivetti @marcovirgolin and others