Modelling nodes
Modelling contains 14 nodes specified for data handling, preprocessing, testing modeling robustness and testing the accuracy of the predictions:
- Create New Molecules
Create New Molecules enables the user to create a list of molecules by combining a series of substituents with a core molecule.
- Domain-APD
Domain APD enables the user to define the domain of applicability of the model using a method based on the Euclidean distances.
- Domain-Leverage
Domain Leverage enables the user to define the domain of applicability of the model using a method based on the extent of extrapolation
- Int 2 Double
Int 2 Double converts integer values of all columns to doubles.
- Kennard and Stone
Kennard-Stone node allows the selection of two representative subsets (as training and test sets) with a uniform distribution over an initial dataset.
- MLR
MLR node performs Multiple Linear Regression in order to model the relationships between a scalar dependent variable y and two or more independent variables denoted as X.
- Model Acceptability Criteria
Model Acceptability Criteria gives information about the Quality of Fit and Predictive Ability of a continuous QSAR Model.
- PLS
PLS node, performs Partial Least Squares (PLS) regression analysis by applying the SIMPLS algorithm.
- PLSLoadings
PLSLoadings node, performs the calculation of the loadings on the given data by applying the SIMPLS algorithm.
- PLSscores
PLSScores node, performs the calculation of the scores on the given data by applying the SIMPLS algorithm.
- Remove Column
Remove Column node removes the selected input columns of the table that contain the same values at a percentage equal or higher than a specified cutoff limit.
- Remove Duplicates
Remove Duplicates enables the user to remove the rows of the input table that contain the same values in selected columns. The filtered table contains all rows that are unique and the first one of each repeated row.
- Sphere Exclusion
Sphere Exclusion node allows the selection of two representative subsets (such as training and test sets). This method attempts to specify compounds which most effectively cover the available data space.
- Y-Randomization
Y Randomization (or Y-scrambling) is a technique, applied to ensure a QSAR model’s robustness.
- EnaloskNN
EnaloskNN node employs the k-nearest neighbors method for classification and regression. The prediction for the unknown endpoint of an instance is the value of the weighted average (in regression) or the majority vote (in classification) of the endpoints of the k nearest neighbors in the feature space.