-
Notifications
You must be signed in to change notification settings - Fork 1
Modeling
The Modeling page has tools for generating predictive models using prepared proxy/evidence data. Currently, four supervised models + Fuzzy logic are included:
- Logistic regression
- Random forests (classifier and regressor)
- Gradient boosting (classifier and regressor)
- MLP (Multilayer perceptron) (classifier and regressor)
- Fuzzy logic (memberships and overlay).
The modeling page has four tabs for all model types except Fuzzy (2): Data preparation, Training, Testing and Application. The method can be chosen at the top of the window (1).
The Data preparation tab offers tools for data processing methods that are commonly needed for transforming the proxy data to a form suitable for modeling. Raster preparation tools (1) are used to unify raster gridding and coordinate system, Data transformations (2) are used to transform data values to a form that can be optimally handled by the specific modeling method. CoDa transforms tools (3) are used for compositional data (such as geochemical concentrations). Additional tools that do not fall into these categories may be included later in the Other section (4).
To use any of the data processing tools, press the Open button and the corresponding processing algorithm will open. Note that these tools can be accessed directly from the processing toolbox as well.
In the Training tab, you can train the selected machine learning model. The following instructions apply to all models except Fuzzy Overlay. First, on top of the tab (1), the model is given a name and a save path. The saved model type should have .joblib format.
In the Evidence data section (2), layers used as input data are set. To add or delete rows/layers, press the plus and minus buttons. When you have selected all the evidence data, either set tags for the layers manually, e.g. "Magnetic", "Structures", etc., or press the Generate tags. Note that tags are not optional and need to be set! They are needed because machine learning models require the order and type of datasets in the application and testing phases to match the training data. The datasets themselves can (and should) differ of course – they should just represent the same type of evidence.
To train a model, a raster layer representing deposits needs to be given (3). This raster should ideally have both pixels marking locations of known occurrences (value 1) and known negatives (value 0). In practice, data is often limited and preparing a good deposit raster might require some extra work.
In the Model parameters section (4), you can set various parameters depending on the selected model. You can leave most of them as is, but some are obligatory (e.g. Neurons in MLPs). QGIS will notify you if you leave obligatory fields empty.
In the Validation settings section (5), you can select a validation method: A basic train-test data split is available for all models, but for most, cross validation (CV) methods can be used as well. You can also disable validation by setting Validation method as None.
To start a training process, press the Start training button. The Training log box below the button will shows information about the training as it goes on, shows possible error messages and the training result after completion. To stop the training process, press Cancel. To reset Model parameters and Validation settings, press Reset.
When a model is trained, information about it is saved and can be viewed in the History view.
In the Testing tab, you can test a previously trained model. In the first section (1), there are fields for selecting the trained model and setting save paths for the outputs. If you have previously trained a model, the first two fields (Model instance name and Model file) are filled automatically. If you have trained multiple models, remember to select the one you wish to test.
In the Evidence data section, you select the input data for testing the model. Set the evidence data so that on each row the layer matches the tag, e.g. select magnetic data if the tag is "Magnetic". Once the input data is selected, set a classification threshold (3) and choose a layer that contains the testing labels (4). Before testing the model, select which metrics (5) you wish to compute. Perform model testing by pressing Run. The metrics and other information about the execution are shown in the field below. The output prediction and probability layers are loaded in QGIS. With Reset button, you can change parameters (3) and metrics (5) back to defaults. You can stop the testing by using the Cancel button.
In Application tab, you can apply your trained and tested model on new, unseen data. The structure of the Application tab is very similar to the Testing tab: First, select your trained model and (optionally) set the output save paths (1). Then, select the data (2) which you wish to make predictions on. Set a classification threshold (3) or use the default value and press Run to apply the trained model to the data. Again, information about the execution are printed in the field below and output layers loaded in QGIS.
If you selected Fuzzy overlay as the model type, the modeling process is different than with the ML models described above. First off, there are only three tabs: Data preparation (same as for ML models), Fuzzy memberships, and Fuzzy overlay.
In the Fuzzy memberships tab, first select the input raster and (optionally) set a save path for the output raster (1). Then, in the Membership type drop-down menu, you must select which membership type you wish to compute. In the Parameters section (3), can set the model parameters. You can preview the result using the Preview button. To generate the output raster, press Run. By pressing Reset, you can reset the parameters to their default values.
In the Fuzzy overlay tab, you can compute the fuzzy overlay of two or more rasters. Typically you want to prepare fuzzy membership rasters and use them in this step. You can add and delete rasters using the plus and minus buttons. First select the inputs in the Rasters to overlay section (1). Note, that the inputs needs to be in range