You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For NaN values generated by CloudCompare (when choosing a fixed radius), I see 2 possible solutions:
Filter these values before reading the file, or interpolate these values from neighboring points, otherwise do the classification without them and interpolate the classification afterward.
Or, if there are no points within a radius r, switch the method to feature calculation based on nearest neighbors.
The text was updated successfully, but these errors were encountered:
I've been exploring the missing values in RF classifier and I think there are some options:
Completely drop NaN values and train the model (not recommanded).
Fill in the missing values with median, mean, or mode.
Estimates missing features using nearest samples.
In scikit-learn, there is a class sklearn.impute.SimpleImputer that replace missing values using a descriptive statistic (e.g. mean, median, or most frequent) along each column, or using a constant value. There is also sklearn.impute.KNNImputer that complete missing values using k-Nearest Neighbors.
I'm also working on resolving large datasets memory saturation. For reading the data, I'm using now chunks reading as implemented in laspy. For training the model, I think Batch Learning can be useful. As explained here, the RandomForestClassifier has a parameter warm_start that "if it's set to True, the classifier reuses the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest".
For NaN values generated by CloudCompare (when choosing a fixed radius), I see 2 possible solutions:
Filter these values before reading the file, or interpolate these values from neighboring points, otherwise do the classification without them and interpolate the classification afterward.
Or, if there are no points within a radius r, switch the method to feature calculation based on nearest neighbors.
The text was updated successfully, but these errors were encountered: