**Modelling nodes**

Modelling contains 14 nodes specified for data handling, preprocessing, testing modeling robustness and testing the accuracy of the predictions:

**Create New Molecules**

*Create New Molecules* enables the user to create a list of molecules by combining a series of substituents with a core molecule.

**Domain-APD**

*Domain APD* enables the user to define the domain of applicability of the model using a method based on the Euclidean distances.

**Domain-Leverage**

*Domain Leverage* enables the user to define the domain of applicability of the model using a method based on the extent of extrapolation

**Int 2 Double**

*Int 2 Double* converts integer values of all columns to doubles.

**Kennard and Stone**

*Kennard-Stone* node allows the selection of two representative subsets (as training and test sets) with a uniform distribution over an initial dataset.

**MLR**

*MLR *node performs Multiple Linear Regression in order to model the relationships between a scalar dependent variable y and two or more independent variables denoted as X.

**Model Acceptability Criteria**

*Model Acceptability Criteria* gives information about the Quality of Fit and Predictive Ability of a continuous QSAR Model.

**PLS**

*PLS* node, performs Partial Least Squares (PLS) regression analysis by applying the SIMPLS algorithm.

**PLSLoadings**

*PLSLoadings* node, performs the calculation of the loadings on the given data by applying the SIMPLS algorithm.

**PLSscores**

*PLSScores* node, performs the calculation of the scores on the given data by applying the SIMPLS algorithm.

**Remove Column**

*Remove Column* node removes the selected input columns of the table that contain the same values at a percentage equal or higher than a specified cutoff limit.

**Remove Duplicates**

*Remove Duplicates* enables the user to remove the rows of the input table that contain the same values in selected columns. The filtered table contains all rows that are unique and the first one of each repeated row.

**Sphere Exclusion**

*Sphere Exclusion* node allows the selection of two representative subsets (such as training and test sets). This method attempts to specify compounds which most effectively cover the available data space.

**Y-Randomization**

*Y Randomization* (or Y-scrambling) is a technique, applied to ensure a QSAR modelâ€™s robustness.