Title: | Generalized Network-Based Dimensionality Reduction and Analysis |
---|---|
Description: | Non-parametric dimensionality reduction function. Reduction with and without feature selection. Plot functions. Automated feature selections. Kosztyan et. al. (2024) <doi:10.1016/j.eswa.2023.121779>. |
Authors: | Zsolt T. Kosztyan [aut, cre], Marcell T. Kurbucz [aut], Attila I. Katona [aut], Zahid Khan [aut] |
Maintainer: | Zsolt T. Kosztyan <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2.4 |
Built: | 2025-02-16 12:35:18 UTC |
Source: | https://github.com/kzst/nda |
The package of Generalized Network-based Dimensionality Reduction and Analysis (GNDA).
Zsolt T. Kosztyan*, Marcell T. Kurbucz, Attila I. Katona, Zahid Khan
e-mail*: [email protected]
Kosztyan, Z. T., Kurbucz, M. T., & Katona, A. I. (2022). Network-based dimensionality reduction of high-dimensional, low-sample-size datasets. Knowledge-Based Systems, 109180.
Kosztyán, Z. T., Katona, A. I., Kurbucz, M. T., & Lantos, Z. (2024). Generalized network-based dimensionality analysis. Expert Systems with Applications, 238, 121779. <URL: https://doi.org/10.1016/j.eswa.2023.121779>.
ndr
, ndrlm
, plot
, biplot
, summary
, dCor
.
Biplot function for Generalized Network-based Dimensionality Reduction and Analysis (GNDA)
## S3 method for class 'nda' biplot(x, main=NULL,...)
## S3 method for class 'nda' biplot(x, main=NULL,...)
x |
an object of class 'NDA'. |
main |
main title of biplot. |
... |
other graphical parameters. |
Zsolt T. Kosztyan*, Marcell T. Kurbucz, Attila I. Katona
e-mail*: [email protected]
Kosztyán, Z. T., Katona, A. I., Kurbucz, M. T., & Lantos, Z. (2024). Generalized network-based dimensionality analysis. Expert Systems with Applications, 238, 121779. <URL: https://doi.org/10.1016/j.eswa.2023.121779>.
# Biplot function without feature selection # Generate 200 x 50 random block matrix with 3 blocks and lambda=0 parameter df<-data_gen(200,50,3,0) p<-ndr(df) biplot(p)
# Biplot function without feature selection # Generate 200 x 50 random block matrix with 3 blocks and lambda=0 parameter df<-data_gen(200,50,3,0) p<-ndr(df) biplot(p)
Sample datasets for Generalized Network-based Dimensionality Reduction and Analysis (GNDA)
Covid'19 of countries (2020), where the data frame has 138 observations of 18 variables.
data("COVID19_2020")
data("COVID19_2020")
A data frame with 138 observations 18 variables.
Kurbucz, M. T. (2020). A joint dataset of official COVID-19 reports and the governance, trade and competitiveness indicators of World Bank group platforms. Data in brief, 31, 105881.
data(COVID19_2020)
data(COVID19_2020)
Sample datasets for Generalized Network-based Dimensionality Reduction and Analysis (GNDA)
Crimes in USA cities in 1990. Independent variables (X)
data("CrimesUSA1990.X")
data("CrimesUSA1990.X")
A data frame with 1994 observations 123 variables.
UCI - Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/communities+and+crime
data(CrimesUSA1990.X)
data(CrimesUSA1990.X)
Sample datasets for Generalized Network-based Dimensionality Reduction and Analysis (GNDA)
Crimes in USA cities in 1990. Dependent variable (Y)
data("CrimesUSA1990.Y")
data("CrimesUSA1990.Y")
A data frame with 1994 observations 1 variables.
UCI - Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/communities+and+crime
data(CrimesUSA1990.Y)
data(CrimesUSA1990.Y)
Sample datasets for Generalized Network-based Dimensionality Reduction and Analysis (GNDA)
CWTS Leiden's 2020 dataset, where the data frame has 1176 observations of 42 variables.
data("CWTS_2020")
data("CWTS_2020")
A data frame with 1176 observations of 42 variables.
CWTS Leiden Ranking 2020: https://www.leidenranking.com/ranking/2020/list
data(CWTS_2020)
data(CWTS_2020)
Generate random block matrix for Generalized Network-based Dimensionality Reduction and Analysis (GNDA)
data_gen(n,m,nfactors=2,lambda=1)
data_gen(n,m,nfactors=2,lambda=1)
n |
number of rows |
m |
number of columns |
nfactors |
number of blocks (factors, where the default value is 2) |
lambda |
exponential smoothing, where the default value is 1 |
n
, m
, nfactors
must beintegers, and they are not less than 1; lambda should be a positive real number.
M |
a dataframe of a block matrix |
Prof. Zsolt T. Kosztyan, Department of Quantitative Methods, Institute of Management, Faculty of Business and Economics, University of Pannonia, Hungary
e-mail: [email protected]
# Specification 30 by 10 random block matrices with 2 blocks/factors df<-data_gen(30,10) library(psych) scree(df) biplot(ndr(df)) # Specification 40 by 20 random block matrices with 3 blocks/factors df<-data_gen(40,20,3) library(psych) scree(df) biplot(ndr(df)) plot(ndr(df)) # Specification 50 by 20 random block matrices with 4 blocks/factors # lambda=0.1 df<-data_gen(50,15,4,0.1) scree(df) biplot(ndr(df)) plot(ndr(df))
# Specification 30 by 10 random block matrices with 2 blocks/factors df<-data_gen(30,10) library(psych) scree(df) biplot(ndr(df)) # Specification 40 by 20 random block matrices with 3 blocks/factors df<-data_gen(40,20,3) library(psych) scree(df) biplot(ndr(df)) plot(ndr(df)) # Specification 50 by 20 random block matrices with 4 blocks/factors # lambda=0.1 df<-data_gen(50,15,4,0.1) scree(df) biplot(ndr(df)) plot(ndr(df))
Calculating distance correlation of two vectors or columns of a matrix for Generalized Network-based Dimensionality Reduction and Analysis (GNDA).
The calculation is very slow for large matrices!
dCor(x,y=NULL)
dCor(x,y=NULL)
x |
a numeric vector, matrix or data frame. |
y |
NULL (default) or a vector, matrix or data frame with compatible dimensions to x. The default is equivalent to y = x (but more efficient). |
If x
is a numeric vector, y
must be specified. If x
is a numeric matrix or numeric data frame, y will be neglected.
Either a distance correlation coefficient of vectors x
and y
, or a distance correlation matrix of x
if x
is a matrix or a dataframe.
Prof. Zsolt T. Kosztyan, Department of Quantitative Methods, Institute of Management, Faculty of Business and Economics, University of Pannonia, Hungary
e-mail: [email protected]
Rizzo M, Szekely G (2021). _energy: E-Statistics: Multivariate Inference via the Energy of Data_. R package version 1.7-8, <URL: https://CRAN.R-project.org/package=energy>.
# Specification of distance correlation value of vectors x and y. x<-rnorm(36) y<-rnorm(36) dCor(x,y) # Specification of distance correlaction matrix. x<-matrix(rnorm(36),nrow=6) dCor(x)
# Specification of distance correlation value of vectors x and y. x<-rnorm(36) y<-rnorm(36) dCor(x,y) # Specification of distance correlaction matrix. x<-matrix(rnorm(36),nrow=6) dCor(x)
Calculating distance covariance of two vectors or columns of a matrix for Generalized Network-based Dimensionality Reduction and Analysis (GNDA).
The calculation is very slow for large matrices!
dCov(x,y=NULL)
dCov(x,y=NULL)
x |
a numeric vector, matrix or data frame. |
y |
NULL (default) or a vector, matrix or data frame with compatible dimensions to x. The default is equivalent to y = x (but more efficient). |
If x
is a numeric vector, y
must be specified. If x
is a numeric matrix or numeric data frame, y will be neglected.
Either a distance covariance value of vectors x
and y
, or a distance covariance matrix of x
if x
is a matrix or a dataframe.
Prof. Zsolt T. Kosztyan, Department of Quantitative Methods, Institute of Management, Faculty of Business and Economics, University of Pannonia, Hungary
e-mail: [email protected]
Rizzo M, Szekely G (2021). _energy: E-Statistics: Multivariate Inference via the Energy of Data_. R package version 1.7-8, <URL: https://CRAN.R-project.org/package=energy>.
# Specification of distance covariance value of vectors x and y. x<-rnorm(36) y<-rnorm(36) dCov(x,y) # Specification of distance covariance matrix. x<-matrix(rnorm(36),nrow=6) dCov(x)
# Specification of distance covariance value of vectors x and y. x<-rnorm(36) y<-rnorm(36) dCov(x,y) # Specification of distance covariance matrix. x<-matrix(rnorm(36),nrow=6) dCov(x)
Calculation of fitted values of Generalized Network-based Dimensionality Reduction and Linear Regression Model (NDRLM)
## S3 method for class 'ndrlm' fitted(object, ...)
## S3 method for class 'ndrlm' fitted(object, ...)
object |
an object of class 'ndrlm'. |
... |
further arguments passed to or from other methods. |
Fitted values (data frame)
Zsolt T. Kosztyan*, Marcell T. Kurbucz, Attila I. Katona
e-mail*: [email protected]
Kosztyán, Z. T., Katona, A. I., Kurbucz, M. T., & Lantos, Z. (2024). Generalized network-based dimensionality analysis. Expert Systems with Applications, 238, 121779. <URL: https://doi.org/10.1016/j.eswa.2023.121779>.
# Example of fitted function of NDRLM without optimization of fittings X<-freeny.x Y<-freeny.y NDRLM<-ndrlm(Y,X,optimize=FALSE) fitted(NDRLM)
# Example of fitted function of NDRLM without optimization of fittings X<-freeny.x Y<-freeny.y NDRLM<-ndrlm(Y,X,optimize=FALSE) fitted(NDRLM)
This function drops variables that have low communality values and/or are common indicators (i.e., correlates more than one latent variables).
fs.dimred(fn,DF,min_comm=0.25,com_comm=0.25)
fs.dimred(fn,DF,min_comm=0.25,com_comm=0.25)
fn |
It is a list variable of the output of a principal (PCA), a fa (FA), or an ndr (NDA) function. |
DF |
Numeric data frame, or a numeric matrix of the data table |
min_comm |
Scalar between 0 to 1. Minimal communality value, which a variable has to be achieved. The default value is 0.25. |
com_comm |
Scalar between 0 to 1. The minimal difference value between loadings. The default value is 0.25. |
This function only works with principal, and fa, and ndr functions.
This function drops each variable that has a low communality value (under min_comm value). In other words, that variable does not fit enough of any latent variable.
This function also drops so-called common indicators, which correlate highly with more than one latent variable. And the difference in the correlation is either lower than the com_comm value or the greatest absolute factor loading value is not twice greater than the second greatest factor loading.
dropped_low |
Numeric data frame or numeric matrix. Set of indicators (i.e. variables), which are dropped by their low communalities. This value is NULL if a correlation matrix is used as an input or there is no dropped indicator. |
dropped_com |
Numeric data frame or numeric matrix. Set of dropped common indicators (i.e. common variables). This value is NULL if a correlation matrix is used as an input or there is no dropped indicator. |
remain_DF |
Numeric data frame or numeric matrix. Set of retained indicators |
... |
Other outputs came from |
Zsolt T. Kosztyan*, Marcell T. Kurbucz, Attila I. Katona
e-mail*: [email protected]
Abonyi, J., Czvetkó, T., Kosztyán, Z. T., & Héberger, K. (2022). Factor analysis, sparse PCA, and Sum of Ranking Differences-based improvements of the Promethee-GAIA multicriteria decision support technique. Plos one, 17(2), e0264277. doi:10.1371/journal.pone.0264277
psych::principal
, psych::fa
, ndr
.
data<-I40_2020 library(psych) # Principal Component Analysis (PCA) pca<-principal(data,nfactors=2,covar=TRUE) pca # Feature selection with default values PCA<-fs.dimred(pca,data) PCA # List of dropped, low communality value indicators print(colnames(PCA$dropped_low)) # List of dropped, common communality value indicators print(colnames(PCA$dropped_com)) # List of retained indicators print(colnames(PCA$retained_DF)) ## Not run: # Principal Component Analysis (PCA) of correlation matrix pca<-principal(cor(data,method="spearman"),nfactors=2,covar=TRUE) pca # Feature selection min_comm<-0.25 # Minimal communality value com_comm<-0.20 # Minimal common communality value PCA<-fs.dimred(pca,cor(data,method="spearman"),min_comm,com_comm) PCA ## End(Not run)
data<-I40_2020 library(psych) # Principal Component Analysis (PCA) pca<-principal(data,nfactors=2,covar=TRUE) pca # Feature selection with default values PCA<-fs.dimred(pca,data) PCA # List of dropped, low communality value indicators print(colnames(PCA$dropped_low)) # List of dropped, common communality value indicators print(colnames(PCA$dropped_com)) # List of retained indicators print(colnames(PCA$retained_DF)) ## Not run: # Principal Component Analysis (PCA) of correlation matrix pca<-principal(cor(data,method="spearman"),nfactors=2,covar=TRUE) pca # Feature selection min_comm<-0.25 # Minimal communality value com_comm<-0.20 # Minimal common communality value PCA<-fs.dimred(pca,cor(data,method="spearman"),min_comm,com_comm) PCA ## End(Not run)
Drop variables if their MSA_i valus is lower than a threshold, in order to increase the overall KMO (MSA) value.
fs.KMO(data,min_MSA=0.5,cor.mtx=FALSE)
fs.KMO(data,min_MSA=0.5,cor.mtx=FALSE)
data |
A numeric data frame |
min_MSA |
A numeric value. Minimal MSA value for variable i |
cor.mtx |
Boolean value. The input is either a correlation matrix (cor.mtx=TRUE), or not (cor.mtx=FALSE) |
Low Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy does not suggest using principal component or factor analysis. Therefore, this function drop variables with low KMO/MSA values.
data |
Cleaned data or the cleaned correlation matrix. |
Zsolt T. Kosztyan*, Marcell T. Kurbucz, Attila I. Katona
e-mail*: [email protected]
Abonyi, J., Czvetkó, T., Kosztyán, Z. T., & Héberger, K. (2022). Factor analysis, sparse PCA, and Sum of Ranking Differences-based improvements of the Promethee-GAIA multicriteria decision support technique. Plos one, 17(2), e0264277. doi:10.1371/journal.pone.0264277
library(psych) data(I40_2020) data<-I40_2020 KMO(fs.KMO(data,min_MSA=0.7,cor.mtx=FALSE))
library(psych) data(I40_2020) data<-I40_2020 KMO(fs.KMO(data,min_MSA=0.7,cor.mtx=FALSE))
Sample datasets for Generalized Network-based Dimensionality Reduction and Analysis (GNDA)
Governmental and economic data of countries (2020), where the data frame has 138 observations of 2161 variables.
data("GOVDB2020")
data("GOVDB2020")
A data frame with 138 observations of 2161 variables.
Kurbucz, M. T. (2020). A joint dataset of official COVID-19 reports and the governance, trade and competitiveness indicators of World Bank group platforms. Data in brief, 31, 105881.
data(GOVDB2020)
data(GOVDB2020)
Sample datasets for Generalized Network-based Dimensionality Reduction and Analysis (GNDA)
NUTS2 regional development data (2020), where the data frame has 414 observations of 101 variables.
data("COVID19_2020")
data("COVID19_2020")
A data frame with 414 observations of 101 variables.
Honti, G., Czvetkó, T., & Abonyi, J. (2020). Data describing the regional Industry 4.0 readiness index. Data in Brief, 33, 106464.
data(I40_2020)
data(I40_2020)
The main function of Generalized Network-based Dimensionality Reduction and Analysis (GNDA).
ndr(r,covar=FALSE,cor_method=1,cor_type=1,min_R=0,min_comm=2,Gamma=1,null_model_type=4, mod_mode=6,min_evalue=0,min_communality=0,com_communalities=0,use_rotation=FALSE, rotation="oblimin",weight=NULL,seed=NULL)
ndr(r,covar=FALSE,cor_method=1,cor_type=1,min_R=0,min_comm=2,Gamma=1,null_model_type=4, mod_mode=6,min_evalue=0,min_communality=0,com_communalities=0,use_rotation=FALSE, rotation="oblimin",weight=NULL,seed=NULL)
r |
A numeric data frame |
covar |
If this value is FALSE (default), it finds the correlation matrix from the raw data. If this value is TRUE, it uses the matrix r as a correlation/similarity matrix. |
cor_method |
Correlation method (optional). '1' Pearson's correlation (default), '2' Spearman's correlation, '3' Kendall's correlation, '4' Distance correlation |
cor_type |
Correlation type (optional). '1' Bivariate correlation (default), '2' partial correlation, '3' semi-partial correlation |
min_R |
Minimal square correlation between indicators (default: 0). |
min_comm |
Minimal number of indicators per community (default: 2). |
Gamma |
Gamma parameter in multiresolution null modell (default: 1). |
null_model_type |
'1' Differential Newmann-Grivan's null model, '2' The null model is the mean of square correlations between indicators, '3' The null model is the specified minimal square correlation, '4' Newmann-Grivan's modell (default) |
mod_mode |
Community-based modularity calculation mode: '1' Louvain modularity, '2' Fast-greedy modularity, '3' Leading Eigen modularity, '4' Infomap modularity, '5' Walktrap modularity, '6' Leiden modularity (default) |
min_evalue |
Minimal eigenvector centrality value (default: 0) |
min_communality |
Minimal communality value of indicators (default: 0) |
com_communalities |
Minimal common communalities (default: 0) |
use_rotation |
FALSE no rotation (default), TRUE the rotation is used. |
rotation |
"none", "varimax", "quartimax", "promax", "oblimin", "simplimax", and "cluster" are possible rotations/transformations of the solution. "oblimin" is the default, if use_rotation is TRUE. |
weight |
The weights of columns. The defalt is NULL (no weights). |
seed |
default seed value (default=NULL, no seed) |
NDA both works on low and high simple size datasets. If min_evalue=min_communality=com_communalities=0 than there is no feature selection.
communality |
Communality estimates for each item. These are merely the sum of squared factor loadings for that item. It can be interpreted in correlation matrices. |
loadings |
A standard loading matrix of class “loadings". |
uniqueness |
Uniqueness values of indicators. |
factors |
Number of found factors. |
EVCs |
The list eigenvector centrality value of indicators. |
membership |
The membership value of indicators. |
weight |
The weight of indicators. |
scores |
Estimates of the factor scores are reported (if covar=FALSE). |
centers |
Colum mean of unstandardized score values. |
n.obs |
Number of observations specified or found. |
use_rotation |
FALSE no rotation (default), TRUE the rotation is used. |
rotation |
"none", "varimax", "quartimax", "promax", "oblimin", "simplimax", and "cluster" are possible rotations/transformations of the solution. "oblimin" is the default, if use_rotation is TRUE. |
fn |
Factor name: NDA |
seed |
applied seed value (default=NULL, no seed) |
Call |
Callback function |
Zsolt T. Kosztyan*, Marcell T. Kurbucz, Attila I. Katona
e-mail*: [email protected]
Kosztyan, Z. T., Kurbucz, M. T., & Katona, A. I. (2022). Network-based dimensionality reduction of high-dimensional, low-sample-size datasets. Knowledge-Based Systems, 109180. doi:10.1016/j.knosys.2022.109180
# Dimension reduction without using any hyperparameters data(swiss) df<-swiss p<-ndr(df) summary(p) plot(p) biplot(p) # Dimension reduction with using hyperparameters # min_R=0.1 # The mininal square correlation must be grater than 0.1 p<-ndr(df,min_R = 0.1) summary(p) plot(p) # min_evalue=0.1 # Minimal evector centalities must be greater than 0.1 p<-ndr(df,min_evalue = 0.1) summary(p) plot(p) # minimal and common communality value must be greater than 0.25 p<-ndr(df,min_communality = 0.25, com_communalities = 0.25) # Print factor matrix cor(p$scores) plot(p) # Use factor rotation p<-ndr(df,min_communality = 0.25, com_communalities = 0.25,use_rotation=TRUE) # Print factor matrix cor(p$scores) biplot(p) # Data reduction - clustering # Distance is Euclidean's distance # covar=TRUE means only the distance matrix is considered. q<-ndr(1-normalize(as.matrix(dist(df))),covar=TRUE) summary(q) plot(q)
# Dimension reduction without using any hyperparameters data(swiss) df<-swiss p<-ndr(df) summary(p) plot(p) biplot(p) # Dimension reduction with using hyperparameters # min_R=0.1 # The mininal square correlation must be grater than 0.1 p<-ndr(df,min_R = 0.1) summary(p) plot(p) # min_evalue=0.1 # Minimal evector centalities must be greater than 0.1 p<-ndr(df,min_evalue = 0.1) summary(p) plot(p) # minimal and common communality value must be greater than 0.25 p<-ndr(df,min_communality = 0.25, com_communalities = 0.25) # Print factor matrix cor(p$scores) plot(p) # Use factor rotation p<-ndr(df,min_communality = 0.25, com_communalities = 0.25,use_rotation=TRUE) # Print factor matrix cor(p$scores) biplot(p) # Data reduction - clustering # Distance is Euclidean's distance # covar=TRUE means only the distance matrix is considered. q<-ndr(1-normalize(as.matrix(dist(df))),covar=TRUE) summary(q) plot(q)
The main function of Generalized Network-based Dimensionality Reduction and Regression (GNDR) for supervised learning.
ndrlm(Y,X,latents="in",dircon=FALSE,optimize=TRUE, target="adj.r.square",rel_weight=FALSE, cor_method=1, cor_type=1,min_comm=2,Gamma=1, null_model_type=4,mod_mode=1,use_rotation=FALSE, rotation="oblimin",pareto=FALSE,fit_weights=NULL, lower.bounds.x = c(rep(-100,ncol(X))), upper.bounds.x = c(rep(100,ncol(X))), lower.bounds.latentx = c(0,0,0,0), upper.bounds.latentx = c(0.6,0.6,0.6,0.3), lower.bounds.y = c(rep(-100,ncol(Y))), upper.bounds.y = c(rep(100,ncol(Y))), lower.bounds.latenty = c(0,0,0,0), upper.bounds.latenty = c(0.6,0.6,0.6,0.3), popsize = 20, generations = 30, cprob = 0.7, cdist = 5, mprob = 0.2, mdist=10, seed=NULL)
ndrlm(Y,X,latents="in",dircon=FALSE,optimize=TRUE, target="adj.r.square",rel_weight=FALSE, cor_method=1, cor_type=1,min_comm=2,Gamma=1, null_model_type=4,mod_mode=1,use_rotation=FALSE, rotation="oblimin",pareto=FALSE,fit_weights=NULL, lower.bounds.x = c(rep(-100,ncol(X))), upper.bounds.x = c(rep(100,ncol(X))), lower.bounds.latentx = c(0,0,0,0), upper.bounds.latentx = c(0.6,0.6,0.6,0.3), lower.bounds.y = c(rep(-100,ncol(Y))), upper.bounds.y = c(rep(100,ncol(Y))), lower.bounds.latenty = c(0,0,0,0), upper.bounds.latenty = c(0.6,0.6,0.6,0.3), popsize = 20, generations = 30, cprob = 0.7, cdist = 5, mprob = 0.2, mdist=10, seed=NULL)
Y |
A numeric data frame of output variables |
X |
A numeric data frame of input variables |
latents |
The employs of latent variables: "in" employs latent-independent variables (default); "out" employs latent-dependent variables; "both" employs both latent-dependent and latent independent variables; "none" do not employs latent variable (= multiple regression) |
dircon |
Wether enable or disable direct connection between input and output variables (default=FALSE) |
optimize |
Optimization of fittings (default=TRUE) |
target |
Target performance measures. The possible target measure are "adj.r.square" = adjusted R square (default), "r.sqauare" = R square, "MAE" = mean absolute error, "MAPE" = mean absolute percentage error, "MASE" = mean absolute scaled error ,"MSE"= mean square error,"RMSE" = root mean square error |
rel_weight |
Use relative weights. In this case, all weights should be non-negative. (default=FALSE) |
cor_method |
Correlation method (optional). '1' Pearson's correlation (default), '2' Spearman's correlation, '3' Kendall's correlation, '4' Distance correlation |
cor_type |
Correlation type (optional). '1' Bivariate correlation (default), '2' partial correlation, '3' semi-partial correlation |
min_comm |
Minimal number of indicators per community (default: 2). |
Gamma |
Gamma parameter in multiresolution null modell (default: 1). |
null_model_type |
'1' Differential Newmann-Grivan's null model, '2' The null model is the mean of square correlations between indicators, '3' The null model is the specified minimal square correlation, '4' Newmann-Grivan's modell (default) |
mod_mode |
Community-based modularity calculation mode: '1' Louvain modularity (default), '2' Fast-greedy modularity, '3' Leading Eigen modularity, '4' Infomap modularity, '5' Walktrap modularity, '6' Leiden modularity |
use_rotation |
FALSE no rotation (default), TRUE the rotation is used. |
rotation |
"none", "varimax", "quartimax", "promax", "oblimin", "simplimax", and "cluster" are possible rotations/transformations of the solution. "oblimin" is the default, if use_rotation is TRUE. |
pareto |
in the case of multiple objectives TRUE (default value) provides pareto-optimal solution, while FALSE provides weighted mean of objective functions (see out_weights) |
fit_weights |
weights of fitting the output variables (weights of means of objectives) |
lower.bounds.x |
Lower bounds of weights of independent variables in GNDA |
upper.bounds.x |
Upper bounds of weights of independent variables in GNDA |
lower.bounds.latentx |
Lower bounds of hyper-parementers of GNDA for independent variables (values must be positive) |
upper.bounds.latentx |
Upper bounds of hyper-parementers of GNDA for independent variables (value must be lower than one) |
lower.bounds.y |
Lower bounds of weights of dependent variables in GNDA |
upper.bounds.y |
Upper bounds of weights of dependent variables in GNDA |
lower.bounds.latenty |
Lower bounds of hyper-parementers of GNDA for dependent variables (values must be positive) |
upper.bounds.latenty |
Upper bounds of hyper-parementers of GNDA for dependent variables (value must be lower than one) |
popsize |
size of population of NSGA-II for fitting betas (default=20) |
generations |
number of generations to breed of NSGA-II for fitting betas (default=30) |
cprob |
crossover probability of NSGA-II for fitting betas (default=0.7) |
cdist |
crossover distribution index of NSGA-II for fitting betas (default=5) |
mprob |
mutation probability of NSGA-II for fitting betas (default=0.2) |
mdist |
mutation distribution index of NSGA-II for fitting betas (default=10) |
seed |
default seed value (default=NULL, no seed) |
NDRLM is a variable fitting with feature selection based on the tunes of GNDA method with NSGA-II algorithm for parameter fittings.
fval |
Objective function for fitting |
target |
Target performance measures. The possible target measure are "adj.r.square" = adjusted R square (default), "r.sqauare" = R square, "MAE" = mean absolute error, "MAPE" = mean absolute percentage error, "MASE" = mean absolute scaled error ,"MSE"= mean square error,"RMSE" = root mean square error |
hyperparams |
optimized hyperparameters |
pareto |
in the case of multiple objectives TRUE provides pareto-optimal solution, while FALSE (default) provides weighted mean of objective functions (see out_weights) |
Y |
A numeric data frame of output variables |
X |
A numeric data frame of input variables |
latents |
Latent model: "in", "out", "both", "none" |
NDAin |
GNDA object, which is the result of model reduction and features selection in the case of employing latent-independent variables |
NDAin_weight |
Weights of input variables (used in |
NDAin_min_evalue |
Optimized minimal eigenvector centrality value (used in |
NDAin_min_communality |
Optimized minimal communality value of indicators (used in |
NDAin_com_communalities |
Optimized
minimal common communalities (used in |
NDAin_min_R |
Optimized
minimal square correlation between indicators (used in |
NDAout |
GNDA object, which is the result of model reduction and features selection in the case of employing latent-dependent variables |
NDAout_weight |
Weights of input variables (used in |
NDAout_min_evalue |
Optimized minimal eigenvector centrality value (used in |
NDAout_min_communality |
Optimized minimal communality value of indicators (used in |
NDAout_com_communalities |
Optimized
minimal common communalities (used in |
NDAout_min_R |
Optimized
minimal square correlation between indicators (used in |
fits |
List of linear regrassion models |
otimized |
Wheter fittings are optimized or not |
NSGA |
Outpot structure of NSGA-II optimization (list), if the optimization value is true (see in |
extra_vars.X |
Logic variable. If direct connection (dircon=TRUE) is allowed not only the latent but the excluded input variables are analyized in the linear models as extra input variables. |
extra_vars.Y |
Logic variable. If direct connection (dircon=TRUE) is allowed not only the latent but the excluded output variables are analyized in the linear models as extra input variables. |
dircon_X |
The list of input variables which are directly connected to output variables. |
dircon_Y |
The list of output variables which are directly connected to output variables. |
seed |
applied seed value (default=NULL, no seed) |
fn |
Function (regression) name: NDRLM |
Call |
Callback function |
Zsolt T. Kosztyan*, Marcell T. Kurbucz, Attila I. Katona
e-mail*: [email protected]
Kosztyan, Z. T., Kurbucz, M. T., & Katona, A. I. (2022). Network-based dimensionality reduction of high-dimensional, low-sample-size datasets. Knowledge-Based Systems, 109180. doi:10.1016/j.knosys.2022.109180
ndr
, plot
, summary
, mco::nsga2
.
# Using NDRLM without fitting optimization X<-freeny.x Y<-freeny.y NDRLM<-ndrlm(Y,X,optimize=FALSE) summary(NDRLM) plot(NDRLM) ## Not run: # Using NDRLM with optimized fitting NDRLM<-ndrlm(Y,X) summary(NDRLM) # Using Leiden's modularity for grouping variables X<-freeny.x Y<-freeny.y NDRLM<-ndrlm(Y,X,mod_mode=6) plot(NDRLM) # Using relative weights NDRLM<-ndrlm(Y,X,mod_mode=6,rel_weight=TRUE) plot(NDRLM) # Using Spearman's correlation NDRLM<-ndrlm(Y,X,cor_method=2) summary(NDRLM) # Using greater population and generations NDRLM<-ndrlm(Y,X,popsize=52,generations=40) summary(NDRLM) # No latent variables NDRLM<-ndrlm(Y,X,latents="none") plot(NDRLM) # In-out model library(lavaan) df<-PoliticalDemocracy # Data of Political Democracy dem<-PoliticalDemocracy[,c(1:8)] ind60<-PoliticalDemocracy[,-c(1:8)] NBSEM<-ndrlm(dem,ind60,latents = "both",seed = 2) plot(NBSEM) ## End(Not run)
# Using NDRLM without fitting optimization X<-freeny.x Y<-freeny.y NDRLM<-ndrlm(Y,X,optimize=FALSE) summary(NDRLM) plot(NDRLM) ## Not run: # Using NDRLM with optimized fitting NDRLM<-ndrlm(Y,X) summary(NDRLM) # Using Leiden's modularity for grouping variables X<-freeny.x Y<-freeny.y NDRLM<-ndrlm(Y,X,mod_mode=6) plot(NDRLM) # Using relative weights NDRLM<-ndrlm(Y,X,mod_mode=6,rel_weight=TRUE) plot(NDRLM) # Using Spearman's correlation NDRLM<-ndrlm(Y,X,cor_method=2) summary(NDRLM) # Using greater population and generations NDRLM<-ndrlm(Y,X,popsize=52,generations=40) summary(NDRLM) # No latent variables NDRLM<-ndrlm(Y,X,latents="none") plot(NDRLM) # In-out model library(lavaan) df<-PoliticalDemocracy # Data of Political Democracy dem<-PoliticalDemocracy[,c(1:8)] ind60<-PoliticalDemocracy[,-c(1:8)] NBSEM<-ndrlm(dem,ind60,latents = "both",seed = 2) plot(NBSEM) ## End(Not run)
Min-max normalization for data matrices and data frames
normalize(x,type="all")
normalize(x,type="all")
x |
A data frame or data matrix. |
type |
The type of normalization. "row" normalization row by row, "col" normalization column by column, and "all" normalization for the entire data frame/matrix (default) |
Returns a normalized data.frame/matrix.
Zsolt T. Kosztyan, University of Pannonia
e-mail: [email protected]
mtx<-matrix(rnorm(20),5,4) n_mtx<-normalize(mtx) # Fully normalized matrix r_mtx<-normalize(mtx,type="row") # Normalize row by row c_mtx<-normalize(mtx,type="col") # Normalize col by col print(n_mtx) # Print fully normalized matrix
mtx<-matrix(rnorm(20),5,4) n_mtx<-normalize(mtx) # Fully normalized matrix r_mtx<-normalize(mtx,type="row") # Normalize row by row c_mtx<-normalize(mtx,type="col") # Normalize col by col print(n_mtx) # Print fully normalized matrix
Calculating partial distance correlation of two columns of a matrix for Generalized Network-based Dimensionality Reduction and Analysis (GNDA).
The calculation is very slow for large matrices!
pdCor(x)
pdCor(x)
x |
a a numeric matrix, or a numeric data frame |
Partial distance correlation matrix of x
.
Prof. Zsolt T. Kosztyan, Department of Quantitative Methods, Institute of Management, Faculty of Business and Economics, University of Pannonia, Hungary
e-mail: [email protected]
Rizzo M, Szekely G (2021). _energy: E-Statistics: Multivariate Inference via the Energy of Data_. R package version 1.7-8, <URL: https://CRAN.R-project.org/package=energy>.
# Specification of partial distance correlaction matrix. x<-matrix(rnorm(36),nrow=6) pdCor(x)
# Specification of partial distance correlaction matrix. x<-matrix(rnorm(36),nrow=6) pdCor(x)
Plot variable network graph
## S3 method for class 'nda' plot(x, cuts=0.3, interactive=TRUE,edgescale=1.0,labeldist=-1.5,show_weights=FALSE,...)
## S3 method for class 'nda' plot(x, cuts=0.3, interactive=TRUE,edgescale=1.0,labeldist=-1.5,show_weights=FALSE,...)
x |
an object of class 'NDA'. |
cuts |
minimal square correlation value for an edge in the correlation network graph (default 0.3). |
interactive |
Plot interactive visNetwork graph or non-interactive igraph plot (default TRUE). |
edgescale |
Proportion scale value of edge width. |
labeldist |
Vertex label distance in non-interactive igraph plot (default value =-1.5). |
show_weights |
Show edge weights (default FALSE)). |
... |
other graphical parameters. |
Zsolt T. Kosztyan*, Marcell T. Kurbucz, Attila I. Katona
e-mail*: [email protected]
Kosztyán, Z. T., Katona, A. I., Kurbucz, M. T., & Lantos, Z. (2024). Generalized network-based dimensionality analysis. Expert Systems with Applications, 238, 121779. <URL: https://doi.org/10.1016/j.eswa.2023.121779>.
# Plot function with feature selection data("CrimesUSA1990.X") df<-CrimesUSA1990.X p<-ndr(df) biplot(p,main="Biplot of CrimesUSA1990 without feature selection") # Plot function with feature selection # minimal eigen values (min_evalue) is 0.0065 # minimal communality value (min_communality) is 0.1 # minimal common communality value (com_communalities) is 0.1 p<-ndr(df,min_evalue = 0.0065,min_communality = 0.1,com_communalities = 0.1) # Plot with default (cuts=0.3) plot(p) # Plot with higher cuts plot(p,cuts=0.6) # GNDA is used for clustering, where the similarity function is the 1-Euclidean distance # Data is the swiss data SIM<-1-normalize(as.matrix(dist(swiss))) q<-ndr(SIM,covar = TRUE) plot(q,interactive = FALSE)
# Plot function with feature selection data("CrimesUSA1990.X") df<-CrimesUSA1990.X p<-ndr(df) biplot(p,main="Biplot of CrimesUSA1990 without feature selection") # Plot function with feature selection # minimal eigen values (min_evalue) is 0.0065 # minimal communality value (min_communality) is 0.1 # minimal common communality value (com_communalities) is 0.1 p<-ndr(df,min_evalue = 0.0065,min_communality = 0.1,com_communalities = 0.1) # Plot with default (cuts=0.3) plot(p) # Plot with higher cuts plot(p,cuts=0.6) # GNDA is used for clustering, where the similarity function is the 1-Euclidean distance # Data is the swiss data SIM<-1-normalize(as.matrix(dist(swiss))) q<-ndr(SIM,covar = TRUE) plot(q,interactive = FALSE)
Plot the structural equation model, based on the GNDR
## S3 method for class 'ndrlm' plot(x, sig=0.05, interactive=FALSE,...)
## S3 method for class 'ndrlm' plot(x, sig=0.05, interactive=FALSE,...)
x |
An object of class 'NDRLM'. |
sig |
Significance level of relationships |
interactive |
Plot interactive visNetwork graph or non-interactive igraph plot (default FALSE). |
... |
other graphical parameters. |
Zsolt T. Kosztyan*, Marcell T. Kurbucz, Attila I. Katona
e-mail*: [email protected]
Kosztyán, Z. T., Katona, A. I., Kurbucz, M. T., & Lantos, Z. (2024). Generalized network-based dimensionality analysis. Expert Systems with Applications, 238, 121779. <URL: https://doi.org/10.1016/j.eswa.2023.121779>.
# Plot function for non-optimized SEM X<-freeny.x Y<-freeny.y NDRLM<-ndrlm(Y,X,optimize=FALSE) plot(NDRLM)
# Plot function for non-optimized SEM X<-freeny.x Y<-freeny.y NDRLM<-ndrlm(Y,X,optimize=FALSE) plot(NDRLM)
Calculation of predicted values of Generalized Network-based Dimensionality Reduction and Analysis (GNDA)
## S3 method for class 'nda' predict(object, newdata, ...)
## S3 method for class 'nda' predict(object, newdata, ...)
object |
An object of class 'nda'. |
newdata |
A required data frame in which to look for variables with which to predict. |
... |
further arguments passed to or from other methods. |
Residual values (data frame)
Zsolt T. Kosztyan*, Marcell T. Kurbucz, Attila I. Katona
e-mail*: [email protected]
Kosztyán, Z. T., Katona, A. I., Kurbucz, M. T., & Lantos, Z. (2024). Generalized network-based dimensionality analysis. Expert Systems with Applications, 238, 121779. <URL: https://doi.org/10.1016/j.eswa.2023.121779>.
# Example of prediction function of GNDA set.seed(1) # Fix the random seed data(swiss) # Use Swiss dataset resdata<-swiss sample <- sample(c(TRUE, FALSE), nrow(resdata), replace=TRUE, prob=c(0.9,0.1)) train <- resdata[sample, ] # Split the dataset to train and test test <- resdata[!sample, ] p<-ndr(train) # Use GNDA only on the train dataset P<-ndr(swiss) # USE GNDA on the entire dataset res<-predict(p,test) # Calculate the prediction to the test dataset real<-P$scores[!sample, ] cor(real,res) # The correlation between original and predicted values
# Example of prediction function of GNDA set.seed(1) # Fix the random seed data(swiss) # Use Swiss dataset resdata<-swiss sample <- sample(c(TRUE, FALSE), nrow(resdata), replace=TRUE, prob=c(0.9,0.1)) train <- resdata[sample, ] # Split the dataset to train and test test <- resdata[!sample, ] p<-ndr(train) # Use GNDA only on the train dataset P<-ndr(swiss) # USE GNDA on the entire dataset res<-predict(p,test) # Calculate the prediction to the test dataset real<-P$scores[!sample, ] cor(real,res) # The correlation between original and predicted values
Calculation of predicted values of Generalized Network-based Dimensionality Reduction and Regression with Linear Models (NDRLM)
## S3 method for class 'ndrlm' predict(object, newdata, se.fit = FALSE, scale = NULL, df = Inf, interval = c("none", "confidence", "prediction"), level = 0.95, type = c("response", "terms"), terms = NULL, na.action = stats::na.pass, pred.var = 1/weights, weights = 1, ...)
## S3 method for class 'ndrlm' predict(object, newdata, se.fit = FALSE, scale = NULL, df = Inf, interval = c("none", "confidence", "prediction"), level = 0.95, type = c("response", "terms"), terms = NULL, na.action = stats::na.pass, pred.var = 1/weights, weights = 1, ...)
object |
An object of class 'ndrlm'. |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
se.fit |
A switch indicating if standard errors are required. |
scale |
Scale parameter for std.err. calculation. |
df |
Degrees of freedom for scale. |
interval |
Type of interval calculation. Can be abbreviated. |
level |
Tolerance/confidence level. |
type |
Type of prediction (response or model term). Can be abbreviated. |
terms |
If type = "terms", which terms (default is all terms), a character vector. |
na.action |
function determining what should be done with missing values in newdata. The default is to predict NA. |
pred.var |
the variance(s) for future observations to be assumed for prediction intervals. See ‘Details’. |
weights |
the variance(s) for future observations to be assumed for prediction intervals. See ‘Details’. |
... |
further arguments passed to or from other methods. |
predict.ndrlm produces predicted values, obtained by evaluating the multiple regression function and model reduction by GNDA in the frame newdata (which defaults to model.frame(object)). If the logical se.fit is TRUE, standard errors of the predictions are calculated. If the numeric argument scale is set (with optional df), it is used as the residual standard deviation in the computation of the standard errors, otherwise this is extracted from the model fit. Setting intervals specifies computation of confidence or prediction (tolerance) intervals at the specified level, sometimes referred to as narrow vs. wide intervals.
If the fit is rank-deficient, some of the columns of the design matrix will have been dropped. Prediction from such a fit only makes sense if newdata is contained in the same subspace as the original data. That cannot be checked accurately, so a warning is issued.
If newdata is omitted the predictions are based on the data used for the fit. In that case how cases with missing values in the original fit are handled is determined by the na.action argument of that fit. If na.action = na.omit omitted cases will not appear in the predictions, whereas if na.action = na.exclude they will appear (in predictions, standard errors or interval limits), with value NA. See also napredict.
The prediction intervals are for a single observation at each case in newdata (or by default, the data used for the fit) with error variance(s) pred.var. This can be a multiple of res.var, the estimated value of standard deviation: the default is to assume that future observations have the same error variance as those used for fitting. If weights is supplied, the inverse of this is used as a scale factor. For a weighted fit, if the prediction is for the original data frame, weights defaults to the weights used for the model fit, with a warning since it might not be the intended result. If the fit was weighted and newdata is given, the default is to assume constant prediction variance, with a warning.
predict.ndrlm produces list of a vector of predictions or a matrix of predictions and bounds with column names fit, lwr, and upr if interval is set. For type = "terms" this is a matrix with a column per term and may have an attribute "constant".
The 'prediction' list contains the following element:
fit |
vector or matrix as above |
se.fit |
residual standard deviations |
residual.scale |
residual standard deviations |
df |
degrees of freedom for residual |
Variables are first looked for in newdata and then searched for in the usual way (which will include the environment of the formula used in the fit). A warning will be given if the variables found are not of the same length as those in newdata if it was supplied.
Notice that prediction variances and prediction intervals always refer to future observations, possibly corresponding to the same predictors as used for the fit. The variance of the residuals will be smaller.
Strictly speaking, the formula used for prediction limits assumes that the degrees of freedom for the fit are the same as those for the residual variance. This may not be the case if res.var is not obtained from the fit.
Zsolt T. Kosztyan*, Marcell T. Kurbucz, Attila I. Katona
e-mail*: [email protected]
Kosztyán, Z. T., Katona, A. I., Kurbucz, M. T., & Lantos, Z. (2024). Generalized network-based dimensionality analysis. Expert Systems with Applications, 238, 121779. <URL: https://doi.org/10.1016/j.eswa.2023.121779>.
# Example of prediction function of NDRLM without optimization of fittings set.seed(1) X<-as.data.frame(freeny.x) Y<-as.data.frame(freeny.y) sample <- sample(c(TRUE, FALSE), nrow(X), replace=TRUE, prob=c(0.9,0.1)) train.X <- X[sample, ] # Split the dataset X to train and test test.X <- X[!sample, ] train.Y <- as.data.frame(Y[sample,]) # Split the dataset Y to train and test colnames(train.Y)<-colnames(Y) test.Y <- as.data.frame(Y[!sample,]) colnames(test.Y)<-colnames(Y) train<-cbind(train.Y,train.X) test<-cbind(test.Y,test.X) res<-predict(lm(x~.,train),test) cor(test.Y,res) # The correlation between original and predicted values # Use NDRLM without optimization NDRLM<-ndrlm(train.Y,train.X,optimize=FALSE) # Calculate the prediction to the test dataset res<-predict(NDRLM,test) cor(test.Y,res[[1]]) # The correlation between original and predicted values
# Example of prediction function of NDRLM without optimization of fittings set.seed(1) X<-as.data.frame(freeny.x) Y<-as.data.frame(freeny.y) sample <- sample(c(TRUE, FALSE), nrow(X), replace=TRUE, prob=c(0.9,0.1)) train.X <- X[sample, ] # Split the dataset X to train and test test.X <- X[!sample, ] train.Y <- as.data.frame(Y[sample,]) # Split the dataset Y to train and test colnames(train.Y)<-colnames(Y) test.Y <- as.data.frame(Y[!sample,]) colnames(test.Y)<-colnames(Y) train<-cbind(train.Y,train.X) test<-cbind(test.Y,test.X) res<-predict(lm(x~.,train),test) cor(test.Y,res) # The correlation between original and predicted values # Use NDRLM without optimization NDRLM<-ndrlm(train.Y,train.X,optimize=FALSE) # Calculate the prediction to the test dataset res<-predict(NDRLM,test) cor(test.Y,res[[1]]) # The correlation between original and predicted values
Print summary of Generalized Network-based Dimensionality Reduction and Analysis (GNDA)
## S3 method for class 'nda' print(x, digits = getOption("digits"), ...)
## S3 method for class 'nda' print(x, digits = getOption("digits"), ...)
x |
an object of class 'nda'. |
digits |
the number of significant digits to use when |
... |
additional arguments affecting the summary produced. |
Zsolt T. Kosztyan*, Marcell T. Kurbucz, Attila I. Katona
e-mail*: [email protected]
Kosztyán, Z. T., Katona, A. I., Kurbucz, M. T., & Lantos, Z. (2024). Generalized network-based dimensionality analysis. Expert Systems with Applications, 238, 121779. <URL: https://doi.org/10.1016/j.eswa.2023.121779>.
# Example of summary function of NDA without feature selection data("CrimesUSA1990.X") df<-CrimesUSA1990.X p<-ndr(df) summary(p) # Example of summary function of NDA with feature selection # minimal eigen values (min_evalue) is 0.0065 # minimal communality value (min_communality) is 0.1 # minimal common communality value (com_communalities) is 0.1 p<-ndr(df,min_evalue = 0.0065,min_communality = 0.1,com_communalities = 0.1) print(p)
# Example of summary function of NDA without feature selection data("CrimesUSA1990.X") df<-CrimesUSA1990.X p<-ndr(df) summary(p) # Example of summary function of NDA with feature selection # minimal eigen values (min_evalue) is 0.0065 # minimal communality value (min_communality) is 0.1 # minimal common communality value (com_communalities) is 0.1 p<-ndr(df,min_evalue = 0.0065,min_communality = 0.1,com_communalities = 0.1) print(p)
Print summary of Generalized Network-based Dimensionality Reduction and Linear Regression Model (NDRLM)
## S3 method for class 'ndrlm' print(x, digits = getOption("digits"), ...)
## S3 method for class 'ndrlm' print(x, digits = getOption("digits"), ...)
x |
an object of class 'ndrlm'. |
digits |
the number of significant digits to use when |
... |
additional arguments affecting the summary produced. |
Zsolt T. Kosztyan*, Marcell T. Kurbucz, Attila I. Katona
e-mail*: [email protected]
Kosztyán, Z. T., Katona, A. I., Kurbucz, M. T., & Lantos, Z. (2024). Generalized network-based dimensionality analysis. Expert Systems with Applications, 238, 121779. <URL: https://doi.org/10.1016/j.eswa.2023.121779>.
# Example of print function of NDRLM without optimization of fittings X<-freeny.x Y<-freeny.y NDRLM<-ndrlm(Y,X,optimize=FALSE) print(NDRLM)
# Example of print function of NDRLM without optimization of fittings X<-freeny.x Y<-freeny.y NDRLM<-ndrlm(Y,X,optimize=FALSE) print(NDRLM)
Calculation of residual values of Generalized Network-based Dimensionality Reduction and Linear Regression Model (NDRLM)
## S3 method for class 'ndrlm' residuals(object, ...)
## S3 method for class 'ndrlm' residuals(object, ...)
object |
an object of class 'ndrlm'. |
... |
further arguments passed to or from other methods. |
Residual values (data frame)
Zsolt T. Kosztyan*, Marcell T. Kurbucz, Attila I. Katona
e-mail*: [email protected]
Kosztyán, Z. T., Katona, A. I., Kurbucz, M. T., & Lantos, Z. (2024). Generalized network-based dimensionality analysis. Expert Systems with Applications, 238, 121779. <URL: https://doi.org/10.1016/j.eswa.2023.121779>.
# Example of residual function of NDRLM without optimization of fittings X<-freeny.x Y<-freeny.y NDRLM<-ndrlm(Y,X,optimize=FALSE) # Normality test for residuals shapiro.test(residuals(NDRLM))
# Example of residual function of NDRLM without optimization of fittings X<-freeny.x Y<-freeny.y NDRLM<-ndrlm(Y,X,optimize=FALSE) # Normality test for residuals shapiro.test(residuals(NDRLM))
Calculating semi-partial distance correlation of two columns of a matrix for Generalized Network-based Dimensionality Reduction and Analysis (GNDA).
The calculation is very slow for large matrices!
spdCor(x)
spdCor(x)
x |
a a numeric matrix, or a numeric data frame |
Semi-partial distance correlation matrix of x
.
Prof. Zsolt T. Kosztyan, Department of Quantitative Methods, Institute of Management, Faculty of Business and Economics, University of Pannonia, Hungary
e-mail: [email protected]
Rizzo M, Szekely G (2021). _energy: E-Statistics: Multivariate Inference via the Energy of Data_. R package version 1.7-8, <URL: https://CRAN.R-project.org/package=energy>.
# Specification of semi-partial distance correlaction matrix. x<-matrix(rnorm(36),nrow=6) spdCor(x)
# Specification of semi-partial distance correlaction matrix. x<-matrix(rnorm(36),nrow=6) spdCor(x)
Print summary of Generalized Network-based Dimensionality Reduction and Analysis (GNDA)
## S3 method for class 'nda' summary(object, digits = getOption("digits"), ...)
## S3 method for class 'nda' summary(object, digits = getOption("digits"), ...)
object |
an object of class 'nda'. |
digits |
the number of significant digits to use when |
... |
additional arguments affecting the summary produced. |
communality |
Communality estimates for each item. These are merely the sum of squared factor loadings for that item. It can be interpreted in correlation matrices. |
loadings |
A standard loading matrix of class “loadings". |
uniqueness |
Uniqueness values of indicators. |
factors |
Number of found factors. |
scores |
Estimates of the factor scores are reported (if covar=FALSE). |
n.obs |
Number of observations specified or found. |
Zsolt T. Kosztyan*, Marcell T. Kurbucz, Attila I. Katona
e-mail*: [email protected]
Kosztyán, Z. T., Katona, A. I., Kurbucz, M. T., & Lantos, Z. (2024). Generalized network-based dimensionality analysis. Expert Systems with Applications, 238, 121779. <URL: https://doi.org/10.1016/j.eswa.2023.121779>.
# Example of summary function of NDA without feature selection data("CrimesUSA1990.X") df<-CrimesUSA1990.X p<-ndr(df) summary(p) # Example of summary function of NDA with feature selection # minimal eigen values (min_evalue) is 0.0065 # minimal communality value (min_communality) is 0.1 # minimal common communality value (com_communalities) is 0.1 p<-ndr(df,min_evalue = 0.0065,min_communality = 0.1,com_communalities = 0.1) summary(p)
# Example of summary function of NDA without feature selection data("CrimesUSA1990.X") df<-CrimesUSA1990.X p<-ndr(df) summary(p) # Example of summary function of NDA with feature selection # minimal eigen values (min_evalue) is 0.0065 # minimal communality value (min_communality) is 0.1 # minimal common communality value (com_communalities) is 0.1 p<-ndr(df,min_evalue = 0.0065,min_communality = 0.1,com_communalities = 0.1) summary(p)
Print summary of Generalized Network-based Dimensionality Reduction and Linear Regression Model (NDRLM)
## S3 method for class 'ndrlm' summary(object, digits = getOption("digits"), ...)
## S3 method for class 'ndrlm' summary(object, digits = getOption("digits"), ...)
object |
an object of class 'ndrlm'. |
digits |
the number of significant digits to use when |
... |
additional arguments affecting the summary produced. |
Call |
Callback function |
fval |
Objective function for fitting |
pareto |
in the case of multiple objectives TRUE (default value) provides pareto-optimal solution, while FALSE provides weighted mean of objective functions (see out_weights) |
X |
A numeric data frame of input variables |
Y |
A numeric data frame of output variables |
NDA |
GNDA object, which is the result of model reduction and features selection |
fits |
List of linear regrassion models |
NDA_weight |
Weights of input variables (used in |
NDA_min_evalue |
Optimized minimal eigenvector centrality value (used in |
NDA_min_communality |
Optimized minimal communality value of indicators (used in |
NDA_com_communalities |
Optimized
minimal common communalities (used in |
NDA_min_R |
Optimized
minimal square correlation between indicators (used in |
NSGA |
Outpot structure of NSGA-II optimization (list), if the optimization value is true (see in |
fn |
Function (regression) name: NDLM |
Zsolt T. Kosztyan*, Marcell T. Kurbucz, Attila I. Katona
e-mail*: [email protected]
Kosztyán, Z. T., Katona, A. I., Kurbucz, M. T., & Lantos, Z. (2024). Generalized network-based dimensionality analysis. Expert Systems with Applications, 238, 121779. <URL: https://doi.org/10.1016/j.eswa.2023.121779>.
# Example of summary function of NDRLM without optimization of fittings X<-freeny.x Y<-freeny.y NDRLM<-ndrlm(Y,X,optimize=FALSE) summary(NDRLM)
# Example of summary function of NDRLM without optimization of fittings X<-freeny.x Y<-freeny.y NDRLM<-ndrlm(Y,X,optimize=FALSE) summary(NDRLM)