Dimensionality Reduction
FACTOR_MODEL
Factor analysis models observed features as linear combinations of a smaller set of latent factors plus feature-specific noise. It is useful when the goal is to explain shared covariance structure rather than simply maximize retained variance.
The model takes the form:
X = F W + \mu + \epsilon
where X is the observed data, F are the latent factors, W is the loading matrix, \mu is the feature mean, and \epsilon is the feature-specific noise.
This wrapper accepts rows as samples and columns as features. It returns transformed factor scores together with factor loadings, estimated noise variances, per-feature means, and the fitted log-likelihood path.
Excel Usage
=FACTOR_MODEL(data, n_components, factor_rotation, fa_svd_method, max_iter, random_state)
data(list[list], required): 2D array of numeric input data with rows as samples and columns as features.n_components(int, optional, default: null): Number of latent factors to estimate. Leave blank to keep the estimator default.factor_rotation(str, optional, default: “none”): Optional rotation applied to the fitted loading matrix.fa_svd_method(str, optional, default: “randomized”): Singular value decomposition method used during fitting.max_iter(int, optional, default: 1000): Maximum number of fitting iterations.random_state(int, optional, default: null): Integer seed for randomized fitting paths. Leave blank for the estimator default.
Returns (dict): Excel data type containing latent scores, loadings, noise variances, and log-likelihood summaries.
Example 1: Fit an unrotated two-factor model with randomized SVD
Inputs:
| data | n_components | factor_rotation | fa_svd_method | max_iter | random_state | |||
|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 1 | 0.8 | 2 | none | randomized | 600 | 0 |
| 2.2 | 1.1 | 1.2 | 0.9 | |||||
| 1.8 | 0.9 | 0.8 | 0.7 | |||||
| 5 | 4.2 | 4.1 | 3.9 | |||||
| 5.1 | 4.1 | 4.2 | 4 | |||||
| 4.8 | 4 | 3.9 | 3.8 | |||||
| 3 | 2.4 | 2.2 | 2 | |||||
| 3.1 | 2.5 | 2.3 | 2.1 |
Excel formula:
=FACTOR_MODEL({2,1,1,0.8;2.2,1.1,1.2,0.9;1.8,0.9,0.8,0.7;5,4.2,4.1,3.9;5.1,4.1,4.2,4;4.8,4,3.9,3.8;3,2.4,2.2,2;3.1,2.5,2.3,2.1}, 2, "none", "randomized", 600, 0)
Expected output:
{"type":"Double","basicValue":19.9171,"properties":{"final_loglike":{"type":"Double","basicValue":19.9171},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":4},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.08516},{"type":"Double","basicValue":-0.640792}],[{"type":"Double","basicValue":-0.959455},{"type":"Double","basicValue":-1.26449}],[{"type":"Double","basicValue":-1.21086},{"type":"Double","basicValue":-0.0170943}],[{"type":"Double","basicValue":1.24179},{"type":"Double","basicValue":0.00352144}],[{"type":"Double","basicValue":1.27562},{"type":"Double","basicValue":-1.14684}],[{"type":"Double","basicValue":1.09515},{"type":"Double","basicValue":0.060673}],[{"type":"Double","basicValue":-0.216393},{"type":"Double","basicValue":1.51114}],[{"type":"Double","basicValue":-0.14069},{"type":"Double","basicValue":1.49388}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":1.30317},{"type":"Double","basicValue":1.34059},{"type":"Double","basicValue":1.3394},{"type":"Double","basicValue":1.34994}],[{"type":"Double","basicValue":-0.0587273},{"type":"Double","basicValue":0.108721},{"type":"Double","basicValue":0.0028651},{"type":"Double","basicValue":0.019216}]]},"noise_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":0.000186822}],[{"type":"Double","basicValue":0.000374229}],[{"type":"Double","basicValue":0.000848299}],[{"type":"Double","basicValue":0.0016567}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":3.375}],[{"type":"Double","basicValue":2.525}],[{"type":"Double","basicValue":2.4625}],[{"type":"Double","basicValue":2.275}]]},"loglike":{"type":"Array","elements":[[{"type":"Double","basicValue":-28.3465}],[{"type":"Double","basicValue":-17.406}],[{"type":"Double","basicValue":-6.77301}],[{"type":"Double","basicValue":2.72231}],[{"type":"Double","basicValue":9.52349}],[{"type":"Double","basicValue":13.2404}],[{"type":"Double","basicValue":15.4178}],[{"type":"Double","basicValue":16.7325}],[{"type":"Double","basicValue":17.5815}],[{"type":"Double","basicValue":18.1711}],[{"type":"Double","basicValue":18.5907}],[{"type":"Double","basicValue":18.8887}],[{"type":"Double","basicValue":19.1004}],[{"type":"Double","basicValue":19.2534}],[{"type":"Double","basicValue":19.3678}],[{"type":"Double","basicValue":19.4565}],[{"type":"Double","basicValue":19.5275}],[{"type":"Double","basicValue":19.5856}],[{"type":"Double","basicValue":19.6341}],[{"type":"Double","basicValue":19.6751}],[{"type":"Double","basicValue":19.7101}],[{"type":"Double","basicValue":19.7404}],[{"type":"Double","basicValue":19.7668}],[{"type":"Double","basicValue":19.79}],[{"type":"Double","basicValue":19.8106}],[{"type":"Double","basicValue":19.8289}],[{"type":"Double","basicValue":19.8454}],[{"type":"Double","basicValue":19.8603}],[{"type":"Double","basicValue":19.8737}],[{"type":"Double","basicValue":19.886}],[{"type":"Double","basicValue":19.8972}],[{"type":"Double","basicValue":19.9076}],[{"type":"Double","basicValue":19.9171}]]},"n_iter":{"type":"Double","basicValue":33}}}
Example 2: Apply varimax rotation with the LAPACK SVD path
Inputs:
| data | n_components | factor_rotation | fa_svd_method | max_iter | random_state | |||
|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 1 | 0.8 | 2 | varimax | lapack | 600 | 0 |
| 2.2 | 1.1 | 1.2 | 0.9 | |||||
| 1.8 | 0.9 | 0.8 | 0.7 | |||||
| 5 | 4.2 | 4.1 | 3.9 | |||||
| 5.1 | 4.1 | 4.2 | 4 | |||||
| 4.8 | 4 | 3.9 | 3.8 | |||||
| 3 | 2.4 | 2.2 | 2 | |||||
| 3.1 | 2.5 | 2.3 | 2.1 |
Excel formula:
=FACTOR_MODEL({2,1,1,0.8;2.2,1.1,1.2,0.9;1.8,0.9,0.8,0.7;5,4.2,4.1,3.9;5.1,4.1,4.2,4;4.8,4,3.9,3.8;3,2.4,2.2,2;3.1,2.5,2.3,2.1}, 2, "varimax", "lapack", 600, 0)
Expected output:
{"type":"Double","basicValue":19.9171,"properties":{"final_loglike":{"type":"Double","basicValue":19.9171},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":4},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.24431},{"type":"Double","basicValue":-0.199699}],[{"type":"Double","basicValue":-1.54579},{"type":"Double","basicValue":0.360577}],[{"type":"Double","basicValue":-0.942826},{"type":"Double","basicValue":-0.759974}],[{"type":"Double","basicValue":0.957964},{"type":"Double","basicValue":0.790168}],[{"type":"Double","basicValue":0.249498},{"type":"Double","basicValue":1.69711}],[{"type":"Double","basicValue":0.881595},{"type":"Double","basicValue":0.652552}],[{"type":"Double","basicValue":0.798316},{"type":"Double","basicValue":-1.30118}],[{"type":"Double","basicValue":0.845554},{"type":"Double","basicValue":-1.23956}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.965454},{"type":"Double","basicValue":1.10117},{"type":"Double","basicValue":1.03266},{"type":"Double","basicValue":1.05122}],[{"type":"Double","basicValue":0.877264},{"type":"Double","basicValue":0.772289},{"type":"Double","basicValue":0.852994},{"type":"Double","basicValue":0.847144}]]},"noise_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":0.000186822}],[{"type":"Double","basicValue":0.000374229}],[{"type":"Double","basicValue":0.000848299}],[{"type":"Double","basicValue":0.0016567}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":3.375}],[{"type":"Double","basicValue":2.525}],[{"type":"Double","basicValue":2.4625}],[{"type":"Double","basicValue":2.275}]]},"loglike":{"type":"Array","elements":[[{"type":"Double","basicValue":-28.3465}],[{"type":"Double","basicValue":-17.406}],[{"type":"Double","basicValue":-6.77301}],[{"type":"Double","basicValue":2.72231}],[{"type":"Double","basicValue":9.52349}],[{"type":"Double","basicValue":13.2404}],[{"type":"Double","basicValue":15.4178}],[{"type":"Double","basicValue":16.7325}],[{"type":"Double","basicValue":17.5815}],[{"type":"Double","basicValue":18.1711}],[{"type":"Double","basicValue":18.5907}],[{"type":"Double","basicValue":18.8887}],[{"type":"Double","basicValue":19.1004}],[{"type":"Double","basicValue":19.2534}],[{"type":"Double","basicValue":19.3678}],[{"type":"Double","basicValue":19.4565}],[{"type":"Double","basicValue":19.5275}],[{"type":"Double","basicValue":19.5856}],[{"type":"Double","basicValue":19.6341}],[{"type":"Double","basicValue":19.6751}],[{"type":"Double","basicValue":19.7101}],[{"type":"Double","basicValue":19.7404}],[{"type":"Double","basicValue":19.7668}],[{"type":"Double","basicValue":19.79}],[{"type":"Double","basicValue":19.8106}],[{"type":"Double","basicValue":19.8289}],[{"type":"Double","basicValue":19.8454}],[{"type":"Double","basicValue":19.8603}],[{"type":"Double","basicValue":19.8737}],[{"type":"Double","basicValue":19.886}],[{"type":"Double","basicValue":19.8972}],[{"type":"Double","basicValue":19.9076}],[{"type":"Double","basicValue":19.9171}]]},"n_iter":{"type":"Double","basicValue":33}}}
Example 3: Fit a single latent factor across three observed features
Inputs:
| data | n_components | factor_rotation | fa_svd_method | max_iter | random_state | ||
|---|---|---|---|---|---|---|---|
| 1 | 1.1 | 0.9 | 1 | none | lapack | 600 | 0 |
| 2 | 2.1 | 1.8 | |||||
| 3 | 3.2 | 2.9 | |||||
| 4 | 4.1 | 4 | |||||
| 5 | 5.2 | 5.1 | |||||
| 6 | 6.1 | 5.9 |
Excel formula:
=FACTOR_MODEL({1,1.1,0.9;2,2.1,1.8;3,3.2,2.9;4,4.1,4;5,5.2,5.1;6,6.1,5.9}, 1, "none", "lapack", 600, 0)
Expected output:
{"type":"Double","basicValue":5.52431,"properties":{"final_loglike":{"type":"Double","basicValue":5.52431},"component_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.46771}],[{"type":"Double","basicValue":-0.89173}],[{"type":"Double","basicValue":-0.270316}],[{"type":"Double","basicValue":0.28352}],[{"type":"Double","basicValue":0.904934}],[{"type":"Double","basicValue":1.44131}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":1.70741},{"type":"Double","basicValue":1.71799},{"type":"Double","basicValue":1.75716}]]},"noise_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":0.00139972}],[{"type":"Double","basicValue":0.000774029}],[{"type":"Double","basicValue":0.00459521}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":3.5}],[{"type":"Double","basicValue":3.63333}],[{"type":"Double","basicValue":3.43333}]]},"loglike":{"type":"Array","elements":[[{"type":"Double","basicValue":-26.1316}],[{"type":"Double","basicValue":-19.5905}],[{"type":"Double","basicValue":-13.1569}],[{"type":"Double","basicValue":-7.0199}],[{"type":"Double","basicValue":-1.64193}],[{"type":"Double","basicValue":2.2234}],[{"type":"Double","basicValue":4.21805}],[{"type":"Double","basicValue":5.00012}],[{"type":"Double","basicValue":5.30885}],[{"type":"Double","basicValue":5.43552}],[{"type":"Double","basicValue":5.48545}],[{"type":"Double","basicValue":5.50637}],[{"type":"Double","basicValue":5.51722}],[{"type":"Double","basicValue":5.52431}]]},"n_iter":{"type":"Double","basicValue":14}}}
Example 4: Apply quartimax rotation on a four-feature matrix
Inputs:
| data | n_components | factor_rotation | fa_svd_method | max_iter | random_state | |||
|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 0.9 | 1.1 | 2 | quartimax | randomized | 800 | 5 |
| 1.2 | 2.1 | 1 | 1.2 | |||||
| 4.8 | 5 | 4.9 | 5.1 | |||||
| 5 | 5.2 | 5.1 | 5.2 | |||||
| 2.5 | 3 | 2.4 | 2.8 | |||||
| 2.7 | 3.2 | 2.6 | 3 | |||||
| 6 | 1 | 5.9 | 1.1 | |||||
| 6.2 | 1.1 | 6 | 1.2 |
Excel formula:
=FACTOR_MODEL({1,2,0.9,1.1;1.2,2.1,1,1.2;4.8,5,4.9,5.1;5,5.2,5.1,5.2;2.5,3,2.4,2.8;2.7,3.2,2.6,3;6,1,5.9,1.1;6.2,1.1,6,1.2}, 2, "quartimax", "randomized", 800, 5)
Expected output:
{"type":"Double","basicValue":-13.288,"properties":{"final_loglike":{"type":"Double","basicValue":-13.288},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":4},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.34344},{"type":"Double","basicValue":-0.576798}],[{"type":"Double","basicValue":-1.26736},{"type":"Double","basicValue":-0.529243}],[{"type":"Double","basicValue":0.550676},{"type":"Double","basicValue":1.44898}],[{"type":"Double","basicValue":0.648064},{"type":"Double","basicValue":1.5441}],[{"type":"Double","basicValue":-0.611582},{"type":"Double","basicValue":0.173514}],[{"type":"Double","basicValue":-0.515077},{"type":"Double","basicValue":0.293415}],[{"type":"Double","basicValue":1.23132},{"type":"Double","basicValue":-1.20076}],[{"type":"Double","basicValue":1.3074},{"type":"Double","basicValue":-1.1532}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":1.94917},{"type":"Double","basicValue":-0.00481887},{"type":"Double","basicValue":1.97864},{"type":"Double","basicValue":0.355679}],[{"type":"Double","basicValue":0.0419101},{"type":"Double","basicValue":1.49992},{"type":"Double","basicValue":0.137768},{"type":"Double","basicValue":1.60137}]]},"noise_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":0.000849877}],[{"type":"Double","basicValue":0.00733668}],[{"type":"Double","basicValue":0.000992214}],[{"type":"Double","basicValue":0.0124078}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":3.675}],[{"type":"Double","basicValue":2.825}],[{"type":"Double","basicValue":3.6}],[{"type":"Double","basicValue":2.5875}]]},"loglike":{"type":"Array","elements":[[{"type":"Double","basicValue":-51.9082}],[{"type":"Double","basicValue":-46.4325}],[{"type":"Double","basicValue":-41.0478}],[{"type":"Double","basicValue":-35.8109}],[{"type":"Double","basicValue":-30.8392}],[{"type":"Double","basicValue":-26.3029}],[{"type":"Double","basicValue":-22.3914}],[{"type":"Double","basicValue":-19.231}],[{"type":"Double","basicValue":-16.8385}],[{"type":"Double","basicValue":-15.1717}],[{"type":"Double","basicValue":-14.1574}],[{"type":"Double","basicValue":-13.6451}],[{"type":"Double","basicValue":-13.4311}],[{"type":"Double","basicValue":-13.3518}],[{"type":"Double","basicValue":-13.3215}],[{"type":"Double","basicValue":-13.3067}],[{"type":"Double","basicValue":-13.2966}],[{"type":"Double","basicValue":-13.288}]]},"n_iter":{"type":"Double","basicValue":18}}}
Python Code
Show Code
import numpy as np
from sklearn.decomposition import FactorAnalysis as SklearnFactorAnalysis
def factor_model(data, n_components=None, factor_rotation='none', fa_svd_method='randomized', max_iter=1000, random_state=None):
"""
Fit factor analysis and return latent scores with loadings and likelihood summaries.
See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FactorAnalysis.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
n_components (int, optional): Number of latent factors to estimate. Leave blank to keep the estimator default. Default is None.
factor_rotation (str, optional): Optional rotation applied to the fitted loading matrix. Valid options: None, Varimax, Quartimax. Default is 'none'.
fa_svd_method (str, optional): Singular value decomposition method used during fitting. Valid options: Randomized, Lapack. Default is 'randomized'.
max_iter (int, optional): Maximum number of fitting iterations. Default is 1000.
random_state (int, optional): Integer seed for randomized fitting paths. Leave blank for the estimator default. Default is None.
Returns:
dict: Excel data type containing latent scores, loadings, noise variances, and log-likelihood summaries.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
if data_np.shape[0] < 2:
return None, "Error: data must contain at least 2 samples"
return data_np, None
def orient_projection(scores, components):
score_np = np.array(scores, dtype=float, copy=True)
component_np = np.array(components, dtype=float, copy=True)
limit = min(score_np.shape[1], component_np.shape[0])
for index in range(limit):
component_row = component_np[index, :]
pivot = int(np.argmax(np.abs(component_row)))
pivot_value = component_row[pivot]
if pivot_value == 0 and score_np.shape[0] > 0:
score_column = score_np[:, index]
pivot_value = score_column[int(np.argmax(np.abs(score_column)))]
if pivot_value < 0:
component_np[index, :] *= -1.0
score_np[:, index] *= -1.0
return score_np, component_np
try:
data_np, error = parse_data(data)
if error:
return error
component_total = None if n_components in (None, "") else int(n_components)
if component_total is not None and (component_total < 1 or component_total > data_np.shape[1]):
return f"Error: n_components must be between 1 and {data_np.shape[1]}"
rotation_value = str(factor_rotation).strip().lower()
if rotation_value not in {"none", "varimax", "quartimax"}:
return "Error: factor_rotation must be 'none', 'varimax', or 'quartimax'"
svd_method_value = str(fa_svd_method).strip().lower()
if svd_method_value not in {"randomized", "lapack"}:
return "Error: fa_svd_method must be 'randomized' or 'lapack'"
if int(max_iter) < 1:
return "Error: max_iter must be at least 1"
fitted = SklearnFactorAnalysis(
n_components=component_total,
rotation=None if rotation_value == "none" else rotation_value,
svd_method=svd_method_value,
max_iter=int(max_iter),
random_state=None if random_state in (None, "") else int(random_state)
)
scores_np = np.asarray(fitted.fit_transform(data_np), dtype=float)
components_np = np.asarray(fitted.components_, dtype=float)
scores_np, components_np = orient_projection(scores_np, components_np)
noise_variance = np.atleast_1d(np.asarray(fitted.noise_variance_, dtype=float))
feature_means = np.atleast_1d(np.asarray(fitted.mean_, dtype=float))
loglike = np.atleast_1d(np.asarray(fitted.loglike_, dtype=float))
final_loglike = float(loglike[-1])
return {
"type": "Double",
"basicValue": final_loglike,
"properties": {
"final_loglike": {"type": "Double", "basicValue": final_loglike},
"component_count": {"type": "Double", "basicValue": float(components_np.shape[0])},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"scores": {"type": "Array", "elements": mat(scores_np.tolist())},
"components": {"type": "Array", "elements": mat(components_np.tolist())},
"noise_variance": {"type": "Array", "elements": col(noise_variance.tolist())},
"feature_means": {"type": "Array", "elements": col(feature_means.tolist())},
"loglike": {"type": "Array", "elements": col(loglike.tolist())},
"n_iter": {"type": "Double", "basicValue": float(fitted.n_iter_)}
}
}
except Exception as e:
return f"Error: {str(e)}"Online Calculator
FASTICA
FastICA estimates statistically independent latent sources whose linear mixtures reconstruct the observed features. It is commonly used for source separation, blind signal recovery, and compact independent latent representations.
The linear mixing model is defined as:
X = S A
where X is the observed mixture, S are the independent source signals, and A is the mixing matrix. The algorithm finds an unmixing matrix W to approximate S \approx X W.
This wrapper accepts rows as samples and columns as features. It returns the recovered source matrix together with the fitted unmixing matrix, mixing matrix, feature means, and iteration count from the ICA fit.
Excel Usage
=FASTICA(data, n_components, ica_algorithm, ica_fun, max_iter, tol, random_state)
data(list[list], required): 2D array of numeric input data with rows as samples and columns as features.n_components(int, optional, default: null): Number of independent components to estimate. Leave blank to keep the estimator default.ica_algorithm(str, optional, default: “parallel”): Optimization strategy used to estimate the independent components.ica_fun(str, optional, default: “logcosh”): Nonlinearity used to approximate negentropy during fitting.max_iter(int, optional, default: 300): Maximum number of fitting iterations.tol(float, optional, default: 0.0001): Positive convergence tolerance for the unmixing update.random_state(int, optional, default: null): Integer seed for deterministic initialization. Leave blank for the estimator default.
Returns (dict): Excel data type containing recovered sources, unmixing matrices, and iteration summaries.
Example 1: Recover two independent components with the parallel algorithm
Inputs:
| data | n_components | ica_algorithm | ica_fun | max_iter | tol | random_state | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | 2 | parallel | logcosh | 500 | 0.0001 | 0 |
| 1 | 0.4 | ||||||
| 2 | 0.8 | ||||||
| 0 | 1 | ||||||
| 0.4 | 1.2 | ||||||
| 0.8 | 1.4 | ||||||
| 2 | 2 | ||||||
| 2.4 | 2.2 |
Excel formula:
=FASTICA({0,0;1,0.4;2,0.8;0,1;0.4,1.2;0.8,1.4;2,2;2.4,2.2}, 2, "parallel", "logcosh", 500, 0.0001, 0)
Expected output:
{"type":"Double","basicValue":2,"properties":{"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":2},"sources":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.23979},{"type":"Double","basicValue":-1.04907}],[{"type":"Double","basicValue":-1.28376},{"type":"Double","basicValue":0.0934344}],[{"type":"Double","basicValue":-1.32773},{"type":"Double","basicValue":1.23593}],[{"type":"Double","basicValue":0.612094},{"type":"Double","basicValue":-1.3068}],[{"type":"Double","basicValue":0.668581},{"type":"Double","basicValue":-0.860105}],[{"type":"Double","basicValue":0.725068},{"type":"Double","basicValue":-0.413413}],[{"type":"Double","basicValue":0.894529},{"type":"Double","basicValue":0.92666}],[{"type":"Double","basicValue":0.951016},{"type":"Double","basicValue":1.37335}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.784725},{"type":"Double","basicValue":1.85189}],[{"type":"Double","basicValue":1.24559},{"type":"Double","basicValue":-0.257729}]]},"mixing":{"type":"Array","elements":[[{"type":"Double","basicValue":0.122469},{"type":"Double","basicValue":0.879987}],[{"type":"Double","basicValue":0.591886},{"type":"Double","basicValue":0.372889}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":1.075}],[{"type":"Double","basicValue":1.125}]]},"n_iter":{"type":"Double","basicValue":3}}}
Example 2: Fit ICA with the deflation algorithm and exponential nonlinearity
Inputs:
| data | n_components | ica_algorithm | ica_fun | max_iter | tol | random_state | ||
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | 2 | deflation | exp | 600 | 0.0001 | 4 |
| 1 | 0.3 | 1.1 | ||||||
| 2 | 0.6 | 1.2 | ||||||
| 0 | 1 | 2 | ||||||
| 0.3 | 1.2 | 2.1 | ||||||
| 0.6 | 1.4 | 2.2 | ||||||
| 2 | 2 | 3 | ||||||
| 2.3 | 2.2 | 3.1 |
Excel formula:
=FASTICA({0,0,1;1,0.3,1.1;2,0.6,1.2;0,1,2;0.3,1.2,2.1;0.6,1.4,2.2;2,2,3;2.3,2.2,3.1}, 2, "deflation", "exp", 600, 0.0001, 4)
Expected output:
{"type":"Double","basicValue":2,"properties":{"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":3},"sources":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.06113},{"type":"Double","basicValue":-1.02565}],[{"type":"Double","basicValue":-1.25726},{"type":"Double","basicValue":0.14239}],[{"type":"Double","basicValue":-1.45339},{"type":"Double","basicValue":1.31042}],[{"type":"Double","basicValue":0.471187},{"type":"Double","basicValue":-1.22815}],[{"type":"Double","basicValue":0.547425},{"type":"Double","basicValue":-0.893195}],[{"type":"Double","basicValue":0.623664},{"type":"Double","basicValue":-0.558241}],[{"type":"Double","basicValue":1.02663},{"type":"Double","basicValue":0.958731}],[{"type":"Double","basicValue":1.10287},{"type":"Double","basicValue":1.29368}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.488436},{"type":"Double","basicValue":0.695373},{"type":"Double","basicValue":0.836946}],[{"type":"Double","basicValue":1.19469},{"type":"Double","basicValue":-0.0320314},{"type":"Double","basicValue":-0.170472}]]},"mixing":{"type":"Array","elements":[[{"type":"Double","basicValue":0.120532},{"type":"Double","basicValue":0.881659}],[{"type":"Double","basicValue":0.65395},{"type":"Double","basicValue":0.313877}],[{"type":"Double","basicValue":0.721831},{"type":"Double","basicValue":0.253746}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":1.025}],[{"type":"Double","basicValue":1.0875}],[{"type":"Double","basicValue":1.9625}]]},"n_iter":{"type":"Double","basicValue":3}}}
Example 3: Estimate a single independent component from two-feature mixtures
Inputs:
| data | n_components | ica_algorithm | ica_fun | max_iter | tol | random_state | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | parallel | cube | 500 | 0.0001 | 2 |
| 1 | 1.5 | ||||||
| 2 | 2 | ||||||
| 3 | 2.5 | ||||||
| 4 | 3 | ||||||
| 5 | 3.5 |
Excel formula:
=FASTICA({0,1;1,1.5;2,2;3,2.5;4,3;5,3.5}, 1, "parallel", "cube", 500, 0.0001, 2)
Expected output:
{"type":"Double","basicValue":1,"properties":{"component_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"sources":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.46385}],[{"type":"Double","basicValue":-0.87831}],[{"type":"Double","basicValue":-0.29277}],[{"type":"Double","basicValue":0.29277}],[{"type":"Double","basicValue":0.87831}],[{"type":"Double","basicValue":1.46385}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.468432},{"type":"Double","basicValue":0.234216}]]},"mixing":{"type":"Array","elements":[[{"type":"Double","basicValue":1.70783}],[{"type":"Double","basicValue":0.853913}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":2.5}],[{"type":"Double","basicValue":2.25}]]},"n_iter":{"type":"Double","basicValue":1}}}
Example 4: Recover two components from a three-feature mixed signal matrix
Inputs:
| data | n_components | ica_algorithm | ica_fun | max_iter | tol | random_state | ||
|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 0.5 | 2 | parallel | logcosh | 700 | 0.0001 | 6 |
| 2 | 0.5 | 1 | ||||||
| 3 | 1 | 1.5 | ||||||
| 0.5 | 2 | 1 | ||||||
| 1 | 2.5 | 1.5 | ||||||
| 1.5 | 3 | 2 | ||||||
| 3 | 3 | 3 | ||||||
| 3.5 | 3.5 | 3.5 |
Excel formula:
=FASTICA({1,0,0.5;2,0.5,1;3,1,1.5;0.5,2,1;1,2.5,1.5;1.5,3,2;3,3,3;3.5,3.5,3.5}, 2, "parallel", "logcosh", 700, 0.0001, 6)
Expected output:
{"type":"Double","basicValue":2,"properties":{"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":3},"sources":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.40153},{"type":"Double","basicValue":-0.694154}],[{"type":"Double","basicValue":-1.26564},{"type":"Double","basicValue":0.167602}],[{"type":"Double","basicValue":-1.12976},{"type":"Double","basicValue":1.02936}],[{"type":"Double","basicValue":0.382655},{"type":"Double","basicValue":-1.43255}],[{"type":"Double","basicValue":0.693594},{"type":"Double","basicValue":-0.986126}],[{"type":"Double","basicValue":1.00453},{"type":"Double","basicValue":-0.539703}],[{"type":"Double","basicValue":0.702604},{"type":"Double","basicValue":1.00457}],[{"type":"Double","basicValue":1.01354},{"type":"Double","basicValue":1.451}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.350111},{"type":"Double","basicValue":0.748755},{"type":"Double","basicValue":0.223235}],[{"type":"Double","basicValue":0.830667},{"type":"Double","basicValue":-0.2361},{"type":"Double","basicValue":0.298278}]]},"mixing":{"type":"Array","elements":[[{"type":"Double","basicValue":0.0906056},{"type":"Double","basicValue":1.03867}],[{"type":"Double","basicValue":1.17569},{"type":"Double","basicValue":0.281982}],[{"type":"Double","basicValue":0.678285},{"type":"Double","basicValue":0.683204}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":1.9375}],[{"type":"Double","basicValue":1.9375}],[{"type":"Double","basicValue":1.75}]]},"n_iter":{"type":"Double","basicValue":5}}}
Python Code
Show Code
import numpy as np
from sklearn.decomposition import FastICA as SklearnFastICA
def fastica(data, n_components=None, ica_algorithm='parallel', ica_fun='logcosh', max_iter=300, tol=0.0001, random_state=None):
"""
Fit independent component analysis and return source signals with unmixing matrices.
See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
n_components (int, optional): Number of independent components to estimate. Leave blank to keep the estimator default. Default is None.
ica_algorithm (str, optional): Optimization strategy used to estimate the independent components. Valid options: Parallel, Deflation. Default is 'parallel'.
ica_fun (str, optional): Nonlinearity used to approximate negentropy during fitting. Valid options: Logcosh, Exp, Cube. Default is 'logcosh'.
max_iter (int, optional): Maximum number of fitting iterations. Default is 300.
tol (float, optional): Positive convergence tolerance for the unmixing update. Default is 0.0001.
random_state (int, optional): Integer seed for deterministic initialization. Leave blank for the estimator default. Default is None.
Returns:
dict: Excel data type containing recovered sources, unmixing matrices, and iteration summaries.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
if data_np.shape[0] < 2:
return None, "Error: data must contain at least 2 samples"
return data_np, None
def canonicalize_ica(sources, components, mixing):
source_np = np.array(sources, dtype=float, copy=True)
component_np = np.array(components, dtype=float, copy=True)
mixing_np = np.array(mixing, dtype=float, copy=True)
limit = min(source_np.shape[1], component_np.shape[0], mixing_np.shape[1])
for index in range(limit):
component_row = component_np[index, :]
pivot = int(np.argmax(np.abs(component_row)))
pivot_value = component_row[pivot]
if pivot_value == 0 and source_np.shape[0] > 0:
source_column = source_np[:, index]
pivot_value = source_column[int(np.argmax(np.abs(source_column)))]
if pivot_value < 0:
component_np[index, :] *= -1.0
source_np[:, index] *= -1.0
mixing_np[:, index] *= -1.0
order = sorted(
range(limit),
key=lambda index: tuple(np.round(component_np[index, :], 12).tolist())
)
source_np = source_np[:, order]
component_np = component_np[order, :]
mixing_np = mixing_np[:, order]
return source_np, component_np, mixing_np
try:
data_np, error = parse_data(data)
if error:
return error
component_total = None if n_components in (None, "") else int(n_components)
max_components = min(data_np.shape[0], data_np.shape[1])
if component_total is not None and (component_total < 1 or component_total > max_components):
return f"Error: n_components must be between 1 and {max_components}"
algorithm_value = str(ica_algorithm).strip().lower()
if algorithm_value not in {"parallel", "deflation"}:
return "Error: ica_algorithm must be 'parallel' or 'deflation'"
fun_value = str(ica_fun).strip().lower()
if fun_value not in {"logcosh", "exp", "cube"}:
return "Error: ica_fun must be 'logcosh', 'exp', or 'cube'"
if int(max_iter) < 1:
return "Error: max_iter must be at least 1"
if float(tol) <= 0:
return "Error: tol must be greater than 0"
fitted = SklearnFastICA(
n_components=component_total,
algorithm=algorithm_value,
fun=fun_value,
max_iter=int(max_iter),
tol=float(tol),
random_state=None if random_state in (None, "") else int(random_state)
)
sources_np = np.asarray(fitted.fit_transform(data_np), dtype=float)
components_np = np.asarray(fitted.components_, dtype=float)
mixing_np = np.asarray(fitted.mixing_, dtype=float)
sources_np, components_np, mixing_np = canonicalize_ica(sources_np, components_np, mixing_np)
feature_means = np.atleast_1d(np.asarray(fitted.mean_, dtype=float))
return {
"type": "Double",
"basicValue": float(components_np.shape[0]),
"properties": {
"component_count": {"type": "Double", "basicValue": float(components_np.shape[0])},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"sources": {"type": "Array", "elements": mat(sources_np.tolist())},
"components": {"type": "Array", "elements": mat(components_np.tolist())},
"mixing": {"type": "Array", "elements": mat(mixing_np.tolist())},
"feature_means": {"type": "Array", "elements": col(feature_means.tolist())},
"n_iter": {"type": "Double", "basicValue": float(fitted.n_iter_)}
}
}
except Exception as e:
return f"Error: {str(e)}"Online Calculator
KBEST_CLASSIF
SelectKBest keeps the features with the strongest univariate relationship to a classification target according to a scoring function. This wrapper fixes the score function to ANOVA F-values, which works well for dense numeric tabular inputs with categorical class labels.
The ANOVA F-value for a feature is calculated as:
F = \frac{\text{between-group variance}}{\text{within-group variance}}
This wrapper accepts rows as samples and columns as features, plus a target supplied as a single row, single column, or scalar when only one sample is present. It returns the filtered feature matrix together with feature scores, p-values, a boolean support mask, and the 1-based indices of the selected columns.
Excel Usage
=KBEST_CLASSIF(data, target, k)
data(list[list], required): 2D array of numeric feature data with rows as samples and columns as features.target(list[list], required): Target labels as a single row, single column, or scalar when only one sample is present.k(int, optional, default: 1): Number of top-scoring features to keep.
Returns (dict): Excel data type containing the selected feature matrix, feature scores, p-values, and support mask.
Example 1: Keep the two strongest features for a binary numeric target
Inputs:
| data | target | k | ||
|---|---|---|---|---|
| 0.1 | 10 | 1 | 0 | 2 |
| 0 | 11 | 1.1 | 0 | |
| 0 | 12 | 0.9 | 0 | |
| 1 | 30 | 5 | 1 | |
| 1 | 31 | 4.9 | 1 | |
| 1 | 29 | 5.1 | 1 |
Excel formula:
=KBEST_CLASSIF({0.1,10,1;0,11,1.1;0,12,0.9;1,30,5;1,31,4.9;1,29,5.1}, {0;0;0;1;1;1}, 2)
Expected output:
{"type":"Double","basicValue":2,"properties":{"selected_feature_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}]]},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":3}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":841}],[{"type":"Double","basicValue":541.5}],[{"type":"Double","basicValue":2400}]]},"p_values":{"type":"Array","elements":[[{"type":"Double","basicValue":0.00000841636}],[{"type":"Double","basicValue":0.0000202128}],[{"type":"Double","basicValue":0.00000103878}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":0.1},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1.1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0.9}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":5}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":4.9}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":5.1}]]}}}
Example 2: Flatten a single-row string target range for feature ranking
Inputs:
| data | target | k | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 5 | 0 | cold | cold | cold | hot | hot | hot | 2 |
| 1.2 | 4.8 | 0.1 | |||||||
| 0.8 | 5.2 | 0 | |||||||
| 5 | 1 | 3 | |||||||
| 5.1 | 1.2 | 3.1 | |||||||
| 4.9 | 0.8 | 2.9 |
Excel formula:
=KBEST_CLASSIF({1,5,0;1.2,4.8,0.1;0.8,5.2,0;5,1,3;5.1,1.2,3.1;4.9,0.8,2.9}, {"cold","cold","cold","hot","hot","hot"}, 2)
Expected output:
{"type":"Double","basicValue":2,"properties":{"selected_feature_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"hot"}]]},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":3}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":960}],[{"type":"Double","basicValue":600}],[{"type":"Double","basicValue":1980.25}]]},"p_values":{"type":"Array","elements":[[{"type":"Double","basicValue":0.00000646545}],[{"type":"Double","basicValue":0.0000164831}],[{"type":"Double","basicValue":0.00000152493}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1.2},{"type":"Double","basicValue":0.1}],[{"type":"Double","basicValue":0.8},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":5},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":5.1},{"type":"Double","basicValue":3.1}],[{"type":"Double","basicValue":4.9},{"type":"Double","basicValue":2.9}]]}}}
Example 3: Select two features across three separated classes
Inputs:
| data | target | k | |||
|---|---|---|---|---|---|
| 0 | 0 | 10 | 1 | left | 2 |
| 0.1 | 0.2 | 9.8 | 1.2 | left | |
| 5 | 5 | 2 | 10 | center | |
| 5.2 | 4.8 | 2.1 | 9.7 | center | |
| 10 | 0 | 5 | 20 | right | |
| 10.1 | 0.2 | 5.2 | 19.8 | right |
Excel formula:
=KBEST_CLASSIF({0,0,10,1;0.1,0.2,9.8,1.2;5,5,2,10;5.2,4.8,2.1,9.7;10,0,5,20;10.1,0.2,5.2,19.8}, {"left";"left";"center";"center";"right";"right"}, 2)
Expected output:
{"type":"Double","basicValue":2,"properties":{"selected_feature_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":4},"class_count":{"type":"Double","basicValue":3},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"right"}]]},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":4}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":5000.17}],[{"type":"Double","basicValue":768}],[{"type":"Double","basicValue":2088.11}],[{"type":"Double","basicValue":6247.12}]]},"p_values":{"type":"Array","elements":[[{"type":"Double","basicValue":0.00000519356}],[{"type":"Double","basicValue":0.0000860645}],[{"type":"Double","basicValue":0.0000192326}],[{"type":"Double","basicValue":0.0000037193}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0.1},{"type":"Double","basicValue":1.2}],[{"type":"Double","basicValue":5},{"type":"Double","basicValue":10}],[{"type":"Double","basicValue":5.2},{"type":"Double","basicValue":9.7}],[{"type":"Double","basicValue":10},{"type":"Double","basicValue":20}],[{"type":"Double","basicValue":10.1},{"type":"Double","basicValue":19.8}]]}}}
Example 4: Rank a single strong feature against a boolean target
Inputs:
| data | target | k |
|---|---|---|
| 0.1 | false | 1 |
| 0.2 | false | |
| 0.3 | false | |
| 1.2 | true | |
| 1.3 | true | |
| 1.4 | true |
Excel formula:
=KBEST_CLASSIF({0.1;0.2;0.3;1.2;1.3;1.4}, {FALSE;FALSE;FALSE;TRUE;TRUE;TRUE}, 1)
Expected output:
{"type":"Double","basicValue":1,"properties":{"selected_feature_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":true}]]},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":181.5}]]},"p_values":{"type":"Array","elements":[[{"type":"Double","basicValue":0.000175635}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":0.1}],[{"type":"Double","basicValue":0.2}],[{"type":"Double","basicValue":0.3}],[{"type":"Double","basicValue":1.2}],[{"type":"Double","basicValue":1.3}],[{"type":"Double","basicValue":1.4}]]}}}
Python Code
Show Code
import numpy as np
from sklearn.feature_selection import SelectKBest as SklearnSelectKBest
from sklearn.feature_selection import f_classif as sklearn_f_classif
def kbest_classif(data, target, k=1):
"""
Select the top-scoring classification features and return the filtered matrix with score summaries.
See: https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
target (list[list]): Target labels as a single row, single column, or scalar when only one sample is present.
k (int, optional): Number of top-scoring features to keep. Default is 1.
Returns:
dict: Excel data type containing the selected feature matrix, feature scores, p-values, and support mask.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
return data_np, None
def parse_target(value, sample_count):
if not isinstance(value, list):
labels = [value]
elif not value:
return None, None, "Error: target must be non-empty"
elif all(not isinstance(item, list) for item in value):
labels = value
elif len(value) == 1:
labels = value[0]
elif all(isinstance(row, list) and len(row) == 1 for row in value):
labels = [row[0] for row in value]
else:
return None, None, "Error: target must be a single row or column"
if len(labels) != sample_count:
return None, None, "Error: target length must match sample count"
parsed = []
classes = []
for item in labels:
item = py(item)
if isinstance(item, str):
if not item.strip():
return None, None, "Error: target labels must not be blank"
elif isinstance(item, bool):
item = bool(item)
elif isinstance(item, (int, float)) and not isinstance(item, bool):
if not np.isfinite(float(item)):
return None, None, "Error: target labels must be finite"
item = float(item) if isinstance(item, float) else int(item)
else:
return None, None, "Error: target labels must be scalar string, boolean, or numeric values"
parsed.append(item)
if not any(type(existing) is type(item) and existing == item for existing in classes):
classes.append(item)
if len(classes) < 2:
return None, None, "Error: target must contain at least 2 classes"
return np.asarray(parsed, dtype=object), classes, None
try:
data_np, error = parse_data(data)
if error:
return error
target_np, classes, error = parse_target(target, data_np.shape[0])
if error:
return error
selected_total = int(k)
if selected_total < 1 or selected_total > data_np.shape[1]:
return f"Error: k must be between 1 and {data_np.shape[1]}"
fitted = SklearnSelectKBest(score_func=sklearn_f_classif, k=selected_total)
transformed_np = np.asarray(fitted.fit_transform(data_np, target_np), dtype=float)
support_mask = fitted.get_support().tolist()
selected_indices = [index + 1 for index, keep in enumerate(support_mask) if keep]
scores = np.atleast_1d(np.asarray(fitted.scores_, dtype=float))
p_values = np.atleast_1d(np.asarray(fitted.pvalues_, dtype=float))
return {
"type": "Double",
"basicValue": float(len(selected_indices)),
"properties": {
"selected_feature_count": {"type": "Double", "basicValue": float(len(selected_indices))},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"class_count": {"type": "Double", "basicValue": float(len(classes))},
"classes": {"type": "Array", "elements": col(classes)},
"selected_indices": {"type": "Array", "elements": col(selected_indices)},
"support_mask": {"type": "Array", "elements": col(support_mask)},
"scores": {"type": "Array", "elements": col(scores.tolist())},
"p_values": {"type": "Array", "elements": col(p_values.tolist())},
"selected_data": {"type": "Array", "elements": mat(transformed_np.tolist())}
}
}
except Exception as e:
return f"Error: {str(e)}"Online Calculator
KERNEL_PCA
Kernel PCA performs principal component analysis in an implicit feature space defined by a kernel function. It can capture nonlinear structure that ordinary linear PCA cannot represent directly.
The kernel trick replaces dot products in the feature space with a kernel function evaluation:
K(x_i, x_j) = \phi(x_i)^T \phi(x_j)
where \phi is the implicit nonlinear mapping.
This wrapper accepts rows as samples and columns as features. It returns the nonlinear embedding together with retained eigenvalues and the number of extracted components.
Excel Usage
=KERNEL_PCA(data, n_components, kpca_kernel, gamma, degree, coef_zero, kpca_solver, random_state)
data(list[list], required): 2D array of numeric input data with rows as samples and columns as features.n_components(int, optional, default: 2): Number of nonlinear components to keep.kpca_kernel(str, optional, default: “linear”): Kernel function used to build the implicit feature space.gamma(float, optional, default: null): Kernel coefficient for RBF, polynomial, and sigmoid kernels. Leave blank to keep the estimator default.degree(int, optional, default: 3): Polynomial degree when using the polynomial kernel.coef_zero(float, optional, default: 1): Independent kernel term for polynomial and sigmoid kernels.kpca_solver(str, optional, default: “auto”): Eigensolver used to extract kernel principal components.random_state(int, optional, default: null): Integer seed used by randomized or ARPACK eigensolver paths. Leave blank for the estimator default.
Returns (dict): Excel data type containing the nonlinear embedding and retained kernel eigenvalues.
Example 1: Compute a linear kernel PCA embedding on a dense matrix
Inputs:
| data | n_components | kpca_kernel | gamma | degree | coef_zero | kpca_solver | random_state | |
|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 2 | linear | 3 | 1 | dense | 0 | |
| 2 | 1 | |||||||
| 3 | 1 | |||||||
| 4 | 2 | |||||||
| 5 | 3 | |||||||
| 6 | 5 |
Excel formula:
=KERNEL_PCA({1,0;2,1;3,1;4,2;5,3;6,5}, 2, "linear", , 3, 1, "dense", 0)
Expected output:
{"type":"Double","basicValue":33.5,"properties":{"retained_eigenvalue_sum":{"type":"Double","basicValue":33.5},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":-3.18939},{"type":"Double","basicValue":0.27894}],[{"type":"Double","basicValue":-1.77556},{"type":"Double","basicValue":0.312058}],[{"type":"Double","basicValue":-1.05209},{"type":"Double","basicValue":-0.378295}],[{"type":"Double","basicValue":0.361736},{"type":"Double","basicValue":-0.345177}],[{"type":"Double","basicValue":1.77556},{"type":"Double","basicValue":-0.312058}],[{"type":"Double","basicValue":3.87974},{"type":"Double","basicValue":0.444532}]]},"eigenvalues":{"type":"Array","elements":[[{"type":"Double","basicValue":32.7676}],[{"type":"Double","basicValue":0.732432}]]}}}
Example 2: Embed nonlinear structure with an RBF kernel
Inputs:
| data | n_components | kpca_kernel | gamma | degree | coef_zero | kpca_solver | random_state | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 2 | rbf | 0.5 | 3 | 1 | dense | 0 |
| 0 | 1 | |||||||
| 1 | 0 | |||||||
| 1 | 1 | |||||||
| 3 | 3 | |||||||
| 3 | 4 | |||||||
| 4 | 3 | |||||||
| 4 | 4 |
Excel formula:
=KERNEL_PCA({0,0;0,1;1,0;1,1;3,3;3,4;4,3;4,4}, 2, "rbf", 0.5, 3, 1, "dense", 0)
Expected output:
{"type":"Double","basicValue":3.21596,"properties":{"retained_eigenvalue_sum":{"type":"Double","basicValue":3.21596},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":2},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":0.569339},{"type":"Double","basicValue":-0.397182}],[{"type":"Double","basicValue":0.568369},{"type":"Double","basicValue":-0.00317267}],[{"type":"Double","basicValue":0.568369},{"type":"Double","basicValue":-0.00317267}],[{"type":"Double","basicValue":0.563152},{"type":"Double","basicValue":0.403527}],[{"type":"Double","basicValue":-0.563152},{"type":"Double","basicValue":0.403527}],[{"type":"Double","basicValue":-0.568369},{"type":"Double","basicValue":-0.00317267}],[{"type":"Double","basicValue":-0.568369},{"type":"Double","basicValue":-0.00317267}],[{"type":"Double","basicValue":-0.569339},{"type":"Double","basicValue":-0.397182}]]},"eigenvalues":{"type":"Array","elements":[[{"type":"Double","basicValue":2.57475}],[{"type":"Double","basicValue":0.641216}]]}}}
Example 3: Use a polynomial kernel to capture curved structure
Inputs:
| data | n_components | kpca_kernel | gamma | degree | coef_zero | kpca_solver | random_state | |
|---|---|---|---|---|---|---|---|---|
| -2 | 4 | 2 | poly | 0.5 | 2 | 1 | dense | 0 |
| -1 | 1 | |||||||
| 0 | 0 | |||||||
| 1 | 1 | |||||||
| 2 | 4 | |||||||
| 3 | 9 |
Excel formula:
=KERNEL_PCA({-2,4;-1,1;0,0;1,1;2,4;3,9}, 2, "poly", 0.5, 2, 1, "dense", 0)
Expected output:
{"type":"Double","basicValue":1666.56,"properties":{"retained_eigenvalue_sum":{"type":"Double","basicValue":1666.56},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":-5.31672},{"type":"Double","basicValue":7.70236}],[{"type":"Double","basicValue":-10.2249},{"type":"Double","basicValue":-0.491365}],[{"type":"Double","basicValue":-10.4901},{"type":"Double","basicValue":-1.83236}],[{"type":"Double","basicValue":-9.44703},{"type":"Double","basicValue":-2.28291}],[{"type":"Double","basicValue":0.0169504},{"type":"Double","basicValue":-2.96006}],[{"type":"Double","basicValue":35.4618},{"type":"Double","basicValue":-0.135669}]]},"eigenvalues":{"type":"Array","elements":[[{"type":"Double","basicValue":1589.65}],[{"type":"Double","basicValue":76.9174}]]}}}
Example 4: Use the ARPACK solver for a seeded compact embedding
Inputs:
| data | n_components | kpca_kernel | gamma | degree | coef_zero | kpca_solver | random_state | ||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 2 | linear | 3 | 1 | arpack | 4 | |
| 2 | 1 | 1 | |||||||
| 3 | 2 | 1 | |||||||
| 4 | 3 | 2 | |||||||
| 5 | 5 | 3 | |||||||
| 6 | 8 | 5 |
Excel formula:
=KERNEL_PCA({1,1,0;2,1,1;3,2,1;4,3,2;5,5,3;6,8,5}, 2, "linear", , 3, 1, "arpack", 4)
Expected output:
{"type":"Double","basicValue":70.5138,"properties":{"retained_eigenvalue_sum":{"type":"Double","basicValue":70.5138},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":-3.88006},{"type":"Double","basicValue":0.776081}],[{"type":"Double","basicValue":-2.9143},{"type":"Double","basicValue":-0.0346085}],[{"type":"Double","basicValue":-1.69564},{"type":"Double","basicValue":-0.332132}],[{"type":"Double","basicValue":0.000600645},{"type":"Double","basicValue":-0.600607}],[{"type":"Double","basicValue":2.42731},{"type":"Double","basicValue":-0.326868}],[{"type":"Double","basicValue":6.06208},{"type":"Double","basicValue":0.518135}]]},"eigenvalues":{"type":"Array","elements":[[{"type":"Double","basicValue":69.0639}],[{"type":"Double","basicValue":1.44985}]]}}}
Python Code
Show Code
import numpy as np
from sklearn.decomposition import KernelPCA as SklearnKernelPCA
def kernel_pca(data, n_components=2, kpca_kernel='linear', gamma=None, degree=3, coef_zero=1, kpca_solver='auto', random_state=None):
"""
Fit kernel PCA and return nonlinear embeddings with retained eigenvalue summaries.
See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.KernelPCA.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
n_components (int, optional): Number of nonlinear components to keep. Default is 2.
kpca_kernel (str, optional): Kernel function used to build the implicit feature space. Valid options: Linear, RBF, Polynomial, Sigmoid, Cosine. Default is 'linear'.
gamma (float, optional): Kernel coefficient for RBF, polynomial, and sigmoid kernels. Leave blank to keep the estimator default. Default is None.
degree (int, optional): Polynomial degree when using the polynomial kernel. Default is 3.
coef_zero (float, optional): Independent kernel term for polynomial and sigmoid kernels. Default is 1.
kpca_solver (str, optional): Eigensolver used to extract kernel principal components. Valid options: Auto, Dense, Arpack, Randomized. Default is 'auto'.
random_state (int, optional): Integer seed used by randomized or ARPACK eigensolver paths. Leave blank for the estimator default. Default is None.
Returns:
dict: Excel data type containing the nonlinear embedding and retained kernel eigenvalues.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
if data_np.shape[0] < 2:
return None, "Error: data must contain at least 2 samples"
return data_np, None
def orient_embedding(embedding):
embedding_np = np.array(embedding, dtype=float, copy=True)
for index in range(embedding_np.shape[1]):
column = embedding_np[:, index]
pivot_value = column[int(np.argmax(np.abs(column)))] if column.size else 0.0
if pivot_value < 0:
embedding_np[:, index] *= -1.0
return embedding_np
try:
data_np, error = parse_data(data)
if error:
return error
component_total = int(n_components)
max_components = data_np.shape[0]
if component_total < 1 or component_total > max_components:
return f"Error: n_components must be between 1 and {max_components}"
kernel_value = str(kpca_kernel).strip().lower()
if kernel_value not in {"linear", "rbf", "poly", "sigmoid", "cosine"}:
return "Error: kpca_kernel must be 'linear', 'rbf', 'poly', 'sigmoid', or 'cosine'"
solver_value = str(kpca_solver).strip().lower()
if solver_value not in {"auto", "dense", "arpack", "randomized"}:
return "Error: kpca_solver must be 'auto', 'dense', 'arpack', or 'randomized'"
if solver_value == "arpack" and component_total >= data_np.shape[0]:
return "Error: n_components must be below the sample count when kpca_solver is 'arpack'"
if int(degree) < 1:
return "Error: degree must be at least 1"
gamma_value = None if gamma in (None, "") else float(gamma)
if gamma_value is not None and gamma_value <= 0:
return "Error: gamma must be greater than 0 when provided"
fitted = SklearnKernelPCA(
n_components=component_total,
kernel=kernel_value,
gamma=gamma_value,
degree=int(degree),
coef0=float(coef_zero),
eigen_solver=solver_value,
random_state=None if random_state in (None, "") else int(random_state)
)
embedding_np = orient_embedding(fitted.fit_transform(data_np))
eigenvalues = np.atleast_1d(np.asarray(fitted.eigenvalues_, dtype=float))
retained_sum = float(np.sum(eigenvalues))
return {
"type": "Double",
"basicValue": retained_sum,
"properties": {
"retained_eigenvalue_sum": {"type": "Double", "basicValue": retained_sum},
"component_count": {"type": "Double", "basicValue": float(embedding_np.shape[1])},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"embedding": {"type": "Array", "elements": mat(np.asarray(embedding_np, dtype=float).tolist())},
"eigenvalues": {"type": "Array", "elements": col(eigenvalues.tolist())}
}
}
except Exception as e:
return f"Error: {str(e)}"Online Calculator
NMF_FACTOR
Nonnegative matrix factorization decomposes a nonnegative data matrix into a product of two smaller nonnegative matrices. It is commonly used for topic-like count matrices, additive parts-based representations, and compact nonnegative latent factors.
The decomposition approximates the data matrix V:
V \approx W H
where W contains the non-negative factor weights and H contains the non-negative components.
This wrapper accepts rows as samples and columns as features. It returns the sample-by-component weight matrix together with the learned component matrix, reconstruction error, and iteration count for the fitted factorization.
Excel Usage
=NMF_FACTOR(data, n_components, nmf_init, nmf_solver, max_iter, random_state)
data(list[list], required): 2D array of nonnegative numeric input data with rows as samples and columns as features.n_components(int, optional, default: 2): Number of latent components to learn.nmf_init(str, optional, default: “nndsvda”): Initialization strategy for the nonnegative factors.nmf_solver(str, optional, default: “cd”): Numerical solver used to optimize the factorization.max_iter(int, optional, default: 400): Maximum number of optimization iterations.random_state(int, optional, default: null): Integer seed used by randomized initialization paths. Leave blank for the estimator default.
Returns (dict): Excel data type containing factor weights, learned components, and reconstruction error.
Example 1: Factor a small count-like matrix into two nonnegative components
Inputs:
| data | n_components | nmf_init | nmf_solver | max_iter | random_state | |||
|---|---|---|---|---|---|---|---|---|
| 4 | 1 | 0 | 0 | 2 | nndsvda | cd | 600 | 0 |
| 5 | 1 | 0 | 0 | |||||
| 0 | 0 | 3 | 4 | |||||
| 0 | 0 | 4 | 5 | |||||
| 3 | 1 | 0 | 0 | |||||
| 0 | 0 | 5 | 4 |
Excel formula:
=NMF_FACTOR({4,1,0,0;5,1,0,0;0,0,3,4;0,0,4,5;3,1,0,0;0,0,5,4}, 2, "nndsvda", "cd", 600, 0)
Expected output:
{"type":"Double","basicValue":1.22303,"properties":{"reconstruction_error":{"type":"Double","basicValue":1.22303},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":4},"weights":{"type":"Array","elements":[[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1.23956}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1.53187}],[{"type":"Double","basicValue":0.32396},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0.416078},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0.947249}],[{"type":"Double","basicValue":0.412981},{"type":"Double","basicValue":3.06249e-8}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0},{"type":"Double","basicValue":10.4789},{"type":"Double","basicValue":11.2079}],[{"type":"Double","basicValue":3.23389},{"type":"Double","basicValue":0.777898},{"type":"Double","basicValue":4.30763e-9},{"type":"Double","basicValue":0}]]},"n_iter":{"type":"Double","basicValue":7}}}
Example 2: Fit one nonnegative factor on a one-trend matrix
Inputs:
| data | n_components | nmf_init | nmf_solver | max_iter | random_state | |
|---|---|---|---|---|---|---|
| 1 | 2 | 1 | nndsvda | cd | 600 | 0 |
| 2 | 4 | |||||
| 3 | 6 | |||||
| 4 | 8 | |||||
| 5 | 10 |
Excel formula:
=NMF_FACTOR({1,2;2,4;3,6;4,8;5,10}, 1, "nndsvda", "cd", 600, 0)
Expected output:
{"type":"Double","basicValue":2.49492e-15,"properties":{"reconstruction_error":{"type":"Double","basicValue":2.49492e-15},"component_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":2},"weights":{"type":"Array","elements":[[{"type":"Double","basicValue":0.5491}],[{"type":"Double","basicValue":1.0982}],[{"type":"Double","basicValue":1.6473}],[{"type":"Double","basicValue":2.1964}],[{"type":"Double","basicValue":2.7455}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":1.82116},{"type":"Double","basicValue":3.64232}]]},"n_iter":{"type":"Double","basicValue":5}}}
Example 3: Use seeded random initialization for two-factor fitting
Inputs:
| data | n_components | nmf_init | nmf_solver | max_iter | random_state | ||
|---|---|---|---|---|---|---|---|
| 2 | 0 | 1 | 2 | random | cd | 600 | 7 |
| 3 | 0 | 1 | |||||
| 0 | 4 | 2 | |||||
| 0 | 5 | 3 | |||||
| 2 | 0 | 2 | |||||
| 0 | 4 | 3 |
Excel formula:
=NMF_FACTOR({2,0,1;3,0,1;0,4,2;0,5,3;2,0,2;0,4,3}, 2, "random", "cd", 600, 7)
Expected output:
{"type":"Double","basicValue":1.06794,"properties":{"reconstruction_error":{"type":"Double","basicValue":1.06794},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"weights":{"type":"Array","elements":[[{"type":"Double","basicValue":0.911616},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1.27198},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3.66795}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":4.80079}],[{"type":"Double","basicValue":1.06152},{"type":"Double","basicValue":0.337401}],[{"type":"Double","basicValue":0.0791118},{"type":"Double","basicValue":4.06045}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":2.16697},{"type":"Double","basicValue":0},{"type":"Double","basicValue":1.14776}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1.03418},{"type":"Double","basicValue":0.63678}]]},"n_iter":{"type":"Double","basicValue":10}}}
Example 4: Fit a nonnegative matrix with the multiplicative update solver
Inputs:
| data | n_components | nmf_init | nmf_solver | max_iter | random_state | ||
|---|---|---|---|---|---|---|---|
| 3 | 1 | 0 | 2 | nndsvda | mu | 800 | 0 |
| 4 | 1 | 0 | |||||
| 0 | 2 | 5 | |||||
| 0 | 3 | 6 | |||||
| 3 | 1 | 1 | |||||
| 0 | 2 | 4 |
Excel formula:
=NMF_FACTOR({3,1,0;4,1,0;0,2,5;0,3,6;3,1,1;0,2,4}, 2, "nndsvda", "mu", 800, 0)
Expected output:
{"type":"Double","basicValue":0.471063,"properties":{"reconstruction_error":{"type":"Double","basicValue":0.471063},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"weights":{"type":"Array","elements":[[{"type":"Double","basicValue":0.0169444},{"type":"Double","basicValue":1.07715}],[{"type":"Double","basicValue":1.2837e-8},{"type":"Double","basicValue":1.41179}],[{"type":"Double","basicValue":1.17558},{"type":"Double","basicValue":0.0000200891}],[{"type":"Double","basicValue":1.46493},{"type":"Double","basicValue":0.0153092}],[{"type":"Double","basicValue":0.218198},{"type":"Double","basicValue":1.04456}],[{"type":"Double","basicValue":0.976617},{"type":"Double","basicValue":0.0102081}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":2.41145e-13},{"type":"Double","basicValue":1.92801},{"type":"Double","basicValue":4.14787}],[{"type":"Double","basicValue":2.82985},{"type":"Double","basicValue":0.721546},{"type":"Double","basicValue":0.0127197}]]},"n_iter":{"type":"Double","basicValue":70}}}
Python Code
Show Code
import numpy as np
from sklearn.decomposition import NMF as SklearnNMF
def nmf_factor(data, n_components=2, nmf_init='nndsvda', nmf_solver='cd', max_iter=400, random_state=None):
"""
Fit nonnegative matrix factorization and return factor weights with learned components.
See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of nonnegative numeric input data with rows as samples and columns as features.
n_components (int, optional): Number of latent components to learn. Default is 2.
nmf_init (str, optional): Initialization strategy for the nonnegative factors. Valid options: NNDSVDa, NNDSVD, NNDSVDar, Random. Default is 'nndsvda'.
nmf_solver (str, optional): Numerical solver used to optimize the factorization. Valid options: Coordinate Descent, Multiplicative Update. Default is 'cd'.
max_iter (int, optional): Maximum number of optimization iterations. Default is 400.
random_state (int, optional): Integer seed used by randomized initialization paths. Leave blank for the estimator default. Default is None.
Returns:
dict: Excel data type containing factor weights, learned components, and reconstruction error.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
if np.any(data_np < 0):
return None, "Error: data must contain only nonnegative numeric values"
if data_np.shape[0] < 2:
return None, "Error: data must contain at least 2 samples"
return data_np, None
try:
data_np, error = parse_data(data)
if error:
return error
component_total = int(n_components)
max_components = min(data_np.shape[0], data_np.shape[1])
if component_total < 1 or component_total > max_components:
return f"Error: n_components must be between 1 and {max_components}"
init_value = str(nmf_init).strip().lower()
if init_value not in {"nndsvda", "nndsvd", "nndsvdar", "random"}:
return "Error: nmf_init must be 'nndsvda', 'nndsvd', 'nndsvdar', or 'random'"
solver_value = str(nmf_solver).strip().lower()
if solver_value not in {"cd", "mu"}:
return "Error: nmf_solver must be 'cd' or 'mu'"
if int(max_iter) < 1:
return "Error: max_iter must be at least 1"
fitted = SklearnNMF(
n_components=component_total,
init=init_value,
solver=solver_value,
max_iter=int(max_iter),
random_state=None if random_state in (None, "") else int(random_state)
)
weights_np = np.asarray(fitted.fit_transform(data_np), dtype=float)
components_np = np.asarray(fitted.components_, dtype=float)
reconstruction_error = float(fitted.reconstruction_err_)
return {
"type": "Double",
"basicValue": reconstruction_error,
"properties": {
"reconstruction_error": {"type": "Double", "basicValue": reconstruction_error},
"component_count": {"type": "Double", "basicValue": float(components_np.shape[0])},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"weights": {"type": "Array", "elements": mat(weights_np.tolist())},
"components": {"type": "Array", "elements": mat(components_np.tolist())},
"n_iter": {"type": "Double", "basicValue": float(fitted.n_iter_)}
}
}
except Exception as e:
return f"Error: {str(e)}"Online Calculator
PCA_REDUCE
Principal component analysis projects tabular data onto orthogonal directions that capture the largest possible variance in descending order. It is commonly used to compress correlated features into a smaller set of informative components.
For a centered data matrix X, the projection is given by:
X = T P^T
where T are the principal component scores and P are the orthogonal principal axes (loadings).
This wrapper accepts rows as samples and columns as features. It returns the transformed component scores together with component loadings, explained variance values, explained variance ratios, and singular values for the fitted projection.
Excel Usage
=PCA_REDUCE(data, n_components, whiten, pca_solver, random_state)
data(list[list], required): 2D array of numeric input data with rows as samples and columns as features.n_components(int, optional, default: null): Number of principal components to keep. Leave blank to keep the estimator default.whiten(bool, optional, default: false): Whether to rescale components to unit variance after projection.pca_solver(str, optional, default: “auto”): Singular value decomposition strategy used to fit the PCA model.random_state(int, optional, default: null): Integer seed used when the randomized solver path is active. Leave blank for the estimator default.
Returns (dict): Excel data type containing projected scores, component loadings, and explained variance summaries.
Example 1: Reduce a dense three-feature matrix with the full solver
Inputs:
| data | n_components | whiten | pca_solver | random_state | ||
|---|---|---|---|---|---|---|
| 2 | 1 | 0 | 2 | false | full | 0 |
| 3 | 1 | 1 | ||||
| 4 | 2 | 1 | ||||
| 5 | 3 | 2 | ||||
| 6 | 4 | 3 | ||||
| 7 | 5 | 4 |
Excel formula:
=PCA_REDUCE({2,1,0;3,1,1;4,2,1;5,3,2;6,4,3;7,5,4}, 2, FALSE, "full", 0)
Expected output:
{"type":"Double","basicValue":0.994454,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.994454},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-3.49841},{"type":"Double","basicValue":0.277463}],[{"type":"Double","basicValue":-2.34023},{"type":"Double","basicValue":-0.495054}],[{"type":"Double","basicValue":-1.12509},{"type":"Double","basicValue":0.142399}],[{"type":"Double","basicValue":0.598075},{"type":"Double","basicValue":0.0837313}],[{"type":"Double","basicValue":2.32124},{"type":"Double","basicValue":0.0250639}],[{"type":"Double","basicValue":4.04441},{"type":"Double","basicValue":-0.0336034}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.650144},{"type":"Double","basicValue":0.564993},{"type":"Double","basicValue":0.508031}],[{"type":"Double","basicValue":-0.0763973},{"type":"Double","basicValue":0.713849},{"type":"Double","basicValue":-0.69612}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.986028}],[{"type":"Double","basicValue":0.00842663}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":8.2169}],[{"type":"Double","basicValue":0.0702219}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":6.40972}],[{"type":"Double","basicValue":0.592545}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":4.5}],[{"type":"Double","basicValue":2.66667}],[{"type":"Double","basicValue":1.83333}]]}}}
Example 2: Whiten a centered matrix into two principal components
Inputs:
| data | n_components | whiten | pca_solver | random_state | ||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 2 | true | full | 0 |
| 2 | 4 | 5 | ||||
| 3 | 6 | 7 | ||||
| 4 | 8 | 9 | ||||
| 5 | 10 | 11 | ||||
| 6 | 12 | 13 |
Excel formula:
=PCA_REDUCE({1,2,3;2,4,5;3,6,7;4,8,9;5,10,11;6,12,13}, 2, TRUE, "full", 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":1},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.33631},{"type":"Double","basicValue":-1.78183}],[{"type":"Double","basicValue":-0.801784},{"type":"Double","basicValue":0.503593}],[{"type":"Double","basicValue":-0.267261},{"type":"Double","basicValue":0.0925721}],[{"type":"Double","basicValue":0.267261},{"type":"Double","basicValue":-0.0925721}],[{"type":"Double","basicValue":0.801784},{"type":"Double","basicValue":-0.503593}],[{"type":"Double","basicValue":1.33631},{"type":"Double","basicValue":-1.14049}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.333333},{"type":"Double","basicValue":0.666667},{"type":"Double","basicValue":0.666667}],[{"type":"Double","basicValue":0.942809},{"type":"Double","basicValue":-0.235702},{"type":"Double","basicValue":-0.235702}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1.70433e-33}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":31.5}],[{"type":"Double","basicValue":5.36864e-32}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":12.5499}],[{"type":"Double","basicValue":5.18104e-16}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":3.5}],[{"type":"Double","basicValue":7}],[{"type":"Double","basicValue":8}]]}}}
Example 3: Reduce one-feature data to a single component
Inputs:
| data | n_components | whiten | pca_solver | random_state |
|---|---|---|---|---|
| 1 | 1 | false | full | 0 |
| 2 | ||||
| 4 | ||||
| 7 | ||||
| 11 |
Excel formula:
=PCA_REDUCE({1;2;4;7;11}, 1, FALSE, "full", 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":1},"component_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":1},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-4}],[{"type":"Double","basicValue":-3}],[{"type":"Double","basicValue":-1}],[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":6}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":16.5}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":8.12404}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":5}]]}}}
Example 4: Use a seeded randomized PCA fit on a wider matrix
Inputs:
| data | n_components | whiten | pca_solver | random_state | |||
|---|---|---|---|---|---|---|---|
| 1 | 0 | 2 | 1 | 2 | false | randomized | 7 |
| 2 | 1 | 3 | 1 | ||||
| 3 | 1 | 4 | 2 | ||||
| 4 | 2 | 5 | 3 | ||||
| 5 | 3 | 6 | 5 | ||||
| 6 | 5 | 7 | 8 |
Excel formula:
=PCA_REDUCE({1,0,2,1;2,1,3,1;3,1,4,2;4,2,5,3;5,3,6,5;6,5,7,8}, 2, FALSE, "randomized", 7)
Expected output:
{"type":"Double","basicValue":0.996287,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.996287},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":4},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-4.59092},{"type":"Double","basicValue":0.905985}],[{"type":"Double","basicValue":-3.27712},{"type":"Double","basicValue":-0.00260405}],[{"type":"Double","basicValue":-1.7394},{"type":"Double","basicValue":-0.460947}],[{"type":"Double","basicValue":0.225926},{"type":"Double","basicValue":-0.747519}],[{"type":"Double","basicValue":2.84277},{"type":"Double","basicValue":-0.412074}],[{"type":"Double","basicValue":6.53874},{"type":"Double","basicValue":0.717159}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.443101},{"type":"Double","basicValue":0.427595},{"type":"Double","basicValue":0.443101},{"type":"Double","basicValue":0.651526}],[{"type":"Double","basicValue":-0.54018},{"type":"Double","basicValue":0.171771},{"type":"Double","basicValue":-0.54018},{"type":"Double","basicValue":0.622017}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.970519}],[{"type":"Double","basicValue":0.0257683}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":17.1458}],[{"type":"Double","basicValue":0.455239}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":9.259}],[{"type":"Double","basicValue":1.50871}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":3.5}],[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":4.5}],[{"type":"Double","basicValue":3.33333}]]}}}
Python Code
Show Code
import numpy as np
from sklearn.decomposition import PCA as SklearnPCA
def pca_reduce(data, n_components=None, whiten=False, pca_solver='auto', random_state=None):
"""
Fit principal component analysis and return projected scores with variance summaries.
See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
n_components (int, optional): Number of principal components to keep. Leave blank to keep the estimator default. Default is None.
whiten (bool, optional): Whether to rescale components to unit variance after projection. Default is False.
pca_solver (str, optional): Singular value decomposition strategy used to fit the PCA model. Valid options: Auto, Full, Randomized, Arpack. Default is 'auto'.
random_state (int, optional): Integer seed used when the randomized solver path is active. Leave blank for the estimator default. Default is None.
Returns:
dict: Excel data type containing projected scores, component loadings, and explained variance summaries.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
if data_np.shape[0] < 2:
return None, "Error: data must contain at least 2 samples"
return data_np, None
def orient_projection(scores, components):
score_np = np.array(scores, dtype=float, copy=True)
component_np = np.array(components, dtype=float, copy=True)
limit = min(score_np.shape[1], component_np.shape[0])
for index in range(limit):
component_row = component_np[index, :]
pivot = int(np.argmax(np.abs(component_row)))
pivot_value = component_row[pivot]
if pivot_value == 0 and score_np.shape[0] > 0:
score_column = score_np[:, index]
pivot_value = score_column[int(np.argmax(np.abs(score_column)))]
if pivot_value < 0:
component_np[index, :] *= -1.0
score_np[:, index] *= -1.0
return score_np, component_np
try:
data_np, error = parse_data(data)
if error:
return error
component_total = None if n_components in (None, "") else int(n_components)
max_components = min(data_np.shape[0], data_np.shape[1])
if component_total is not None and (component_total < 1 or component_total > max_components):
return f"Error: n_components must be between 1 and {max_components}"
solver_value = str(pca_solver).strip().lower()
if solver_value not in {"auto", "full", "randomized", "arpack"}:
return "Error: pca_solver must be 'auto', 'full', 'randomized', or 'arpack'"
if solver_value == "arpack" and (component_total is None or component_total >= max_components):
return "Error: n_components must be set below min(n_samples, n_features) when pca_solver is 'arpack'"
fitted = SklearnPCA(
n_components=component_total,
whiten=bool(whiten),
svd_solver=solver_value,
random_state=None if random_state in (None, "") else int(random_state)
)
scores_np = np.asarray(fitted.fit_transform(data_np), dtype=float)
components_np = np.asarray(fitted.components_, dtype=float)
scores_np, components_np = orient_projection(scores_np, components_np)
variance_ratio = np.atleast_1d(np.asarray(fitted.explained_variance_ratio_, dtype=float))
explained_variance = np.atleast_1d(np.asarray(fitted.explained_variance_, dtype=float))
singular_values = np.atleast_1d(np.asarray(fitted.singular_values_, dtype=float))
mean_values = np.atleast_1d(np.asarray(fitted.mean_, dtype=float))
variance_ratio_sum = float(np.sum(variance_ratio))
return {
"type": "Double",
"basicValue": variance_ratio_sum,
"properties": {
"explained_variance_ratio_sum": {"type": "Double", "basicValue": variance_ratio_sum},
"component_count": {"type": "Double", "basicValue": float(components_np.shape[0])},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"scores": {"type": "Array", "elements": mat(scores_np.tolist())},
"components": {"type": "Array", "elements": mat(components_np.tolist())},
"explained_variance_ratio": {"type": "Array", "elements": col(variance_ratio.tolist())},
"explained_variance": {"type": "Array", "elements": col(explained_variance.tolist())},
"singular_values": {"type": "Array", "elements": col(singular_values.tolist())},
"feature_means": {"type": "Array", "elements": col(mean_values.tolist())}
}
}
except Exception as e:
return f"Error: {str(e)}"Online Calculator
TSNE_EMBED
t-distributed stochastic neighbor embedding maps high-dimensional samples into a low-dimensional space by preserving local neighborhoods rather than global linear structure. It is most useful for visualization-oriented exploratory analysis.
The algorithm minimizes the Kullback-Leibler divergence between the joint probabilities of the high-dimensional and low-dimensional representations:
KL(P || Q) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}}
where P and Q represent the pairwise similarities in the original and embedded spaces respectively.
This wrapper accepts rows as samples and columns as features. It returns the fitted embedding together with the final KL divergence, the number of embedding dimensions, and the iteration count reported by the fitted estimator.
Excel Usage
=TSNE_EMBED(data, n_components, perplexity, learning_rate, max_iter, tsne_init, tsne_method, random_state)
data(list[list], required): 2D array of numeric input data with rows as samples and columns as features.n_components(int, optional, default: 2): Dimension of the embedded output space.perplexity(float, optional, default: 30): Effective neighborhood size used in the t-SNE objective.learning_rate(float, optional, default: 200): Positive optimizer learning rate for the embedding updates.max_iter(int, optional, default: 1000): Maximum number of optimization iterations.tsne_init(str, optional, default: “pca”): Initialization scheme for the starting embedding.tsne_method(str, optional, default: “barnes_hut”): Gradient computation method used during optimization.random_state(int, optional, default: null): Integer seed for deterministic initialization. Leave blank for the estimator default.
Returns (dict): Excel data type containing the fitted embedding and final KL divergence.
Example 1: Embed two separated clusters with the exact solver
Inputs:
| data | n_components | perplexity | learning_rate | max_iter | tsne_init | tsne_method | random_state | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 2 | 2.5 | 80 | 500 | pca | exact | 0 |
| 0 | 1 | |||||||
| 1 | 0 | |||||||
| 1 | 1 | |||||||
| 5 | 5 | |||||||
| 5 | 6 | |||||||
| 6 | 5 | |||||||
| 6 | 6 |
Excel formula:
=TSNE_EMBED({0,0;0,1;1,0;1,1;5,5;5,6;6,5;6,6}, 2, 2.5, 80, 500, "pca", "exact", 0)
Expected output:
{"type":"Double","basicValue":1.27152,"properties":{"kl_divergence":{"type":"Double","basicValue":1.27152},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":2},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":308.174},{"type":"Double","basicValue":-596.145}],[{"type":"Double","basicValue":565.379},{"type":"Double","basicValue":140.845}],[{"type":"Double","basicValue":710.635},{"type":"Double","basicValue":-1128.97}],[{"type":"Double","basicValue":-343.572},{"type":"Double","basicValue":1700.94}],[{"type":"Double","basicValue":1682.58},{"type":"Double","basicValue":-627.124}],[{"type":"Double","basicValue":-1589.38},{"type":"Double","basicValue":723.269}],[{"type":"Double","basicValue":-803.921},{"type":"Double","basicValue":71.7089}],[{"type":"Double","basicValue":-435.913},{"type":"Double","basicValue":644.484}]]},"n_iter":{"type":"Double","basicValue":499}}}
Example 2: Use seeded random initialization on three compact groups
Inputs:
| data | n_components | perplexity | learning_rate | max_iter | tsne_init | tsne_method | random_state | ||
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 2 | 2 | 60 | 500 | random | exact | 3 |
| 0.2 | 0.1 | 0 | |||||||
| 5 | 5 | 5 | |||||||
| 5.1 | 5.2 | 5 | |||||||
| 10 | 0 | 10 | |||||||
| 10.2 | 0.1 | 10.1 |
Excel formula:
=TSNE_EMBED({0,0,0;0.2,0.1,0;5,5,5;5.1,5.2,5;10,0,10;10.2,0.1,10.1}, 2, 2, 60, 500, "random", "exact", 3)
Expected output:
{"type":"Double","basicValue":1.36008,"properties":{"kl_divergence":{"type":"Double","basicValue":1.36008},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":914.209},{"type":"Double","basicValue":-274.882}],[{"type":"Double","basicValue":-890.978},{"type":"Double","basicValue":295.72}],[{"type":"Double","basicValue":-234.515},{"type":"Double","basicValue":453.258}],[{"type":"Double","basicValue":256.056},{"type":"Double","basicValue":-448.663}],[{"type":"Double","basicValue":156.849},{"type":"Double","basicValue":102.724}],[{"type":"Double","basicValue":-143.79},{"type":"Double","basicValue":-103.507}]]},"n_iter":{"type":"Double","basicValue":499}}}
Example 3: Embed one-dimensional samples into two dimensions
Inputs:
| data | n_components | perplexity | learning_rate | max_iter | tsne_init | tsne_method | random_state |
|---|---|---|---|---|---|---|---|
| 0 | 2 | 2 | 50 | 500 | random | exact | 1 |
| 1 | |||||||
| 2 | |||||||
| 3 | |||||||
| 4 | |||||||
| 5 |
Excel formula:
=TSNE_EMBED({0;1;2;3;4;5}, 2, 2, 50, 500, "random", "exact", 1)
Expected output:
{"type":"Double","basicValue":0.123383,"properties":{"kl_divergence":{"type":"Double","basicValue":0.123383},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":-145.516},{"type":"Double","basicValue":21.1913}],[{"type":"Double","basicValue":-93.6204},{"type":"Double","basicValue":23.9552}],[{"type":"Double","basicValue":-30.9413},{"type":"Double","basicValue":25.6826}],[{"type":"Double","basicValue":37.3494},{"type":"Double","basicValue":26.9504}],[{"type":"Double","basicValue":100.051},{"type":"Double","basicValue":27.51}],[{"type":"Double","basicValue":152.001},{"type":"Double","basicValue":26.1001}]]},"n_iter":{"type":"Double","basicValue":499}}}
Example 4: Produce a three-dimensional embedding with Barnes-Hut t-SNE
Inputs:
| data | n_components | perplexity | learning_rate | max_iter | tsne_init | tsne_method | random_state | ||
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | 3 | 2.5 | 75 | 500 | pca | barnes_hut | 5 |
| 0.1 | 0.2 | 1.1 | |||||||
| 1 | 1 | 2 | |||||||
| 1.2 | 1.1 | 2.1 | |||||||
| 5 | 5 | 0 | |||||||
| 5.2 | 5.1 | 0.2 | |||||||
| 6 | 6 | 1 | |||||||
| 6.1 | 6.2 | 1.1 |
Excel formula:
=TSNE_EMBED({0,0,1;0.1,0.2,1.1;1,1,2;1.2,1.1,2.1;5,5,0;5.2,5.1,0.2;6,6,1;6.1,6.2,1.1}, 3, 2.5, 75, 500, "pca", "barnes_hut", 5)
Expected output:
{"type":"Double","basicValue":1.79411,"properties":{"kl_divergence":{"type":"Double","basicValue":1.79411},"component_count":{"type":"Double","basicValue":3},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":3},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":180.549},{"type":"Double","basicValue":-89.0916},{"type":"Double","basicValue":19.9081}],[{"type":"Double","basicValue":-64.7},{"type":"Double","basicValue":68.89},{"type":"Double","basicValue":-60.3671}],[{"type":"Double","basicValue":89.9451},{"type":"Double","basicValue":12.4007},{"type":"Double","basicValue":13.6712}],[{"type":"Double","basicValue":52.8532},{"type":"Double","basicValue":3.74339},{"type":"Double","basicValue":-82.2136}],[{"type":"Double","basicValue":-29.3893},{"type":"Double","basicValue":-133.062},{"type":"Double","basicValue":3.69885}],[{"type":"Double","basicValue":-54.7223},{"type":"Double","basicValue":-38.0214},{"type":"Double","basicValue":32.0658}],[{"type":"Double","basicValue":-95.3113},{"type":"Double","basicValue":-87.4605},{"type":"Double","basicValue":133.372}],[{"type":"Double","basicValue":193.254},{"type":"Double","basicValue":649.021},{"type":"Double","basicValue":40.2834}]]},"n_iter":{"type":"Double","basicValue":499}}}
Python Code
Show Code
import numpy as np
from sklearn.manifold import TSNE as SklearnTSNE
def tsne_embed(data, n_components=2, perplexity=30, learning_rate=200, max_iter=1000, tsne_init='pca', tsne_method='barnes_hut', random_state=None):
"""
Fit t-SNE and return a low-dimensional embedding with the final KL divergence.
See: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
n_components (int, optional): Dimension of the embedded output space. Default is 2.
perplexity (float, optional): Effective neighborhood size used in the t-SNE objective. Default is 30.
learning_rate (float, optional): Positive optimizer learning rate for the embedding updates. Default is 200.
max_iter (int, optional): Maximum number of optimization iterations. Default is 1000.
tsne_init (str, optional): Initialization scheme for the starting embedding. Valid options: PCA, Random. Default is 'pca'.
tsne_method (str, optional): Gradient computation method used during optimization. Valid options: Barnes-Hut, Exact. Default is 'barnes_hut'.
random_state (int, optional): Integer seed for deterministic initialization. Leave blank for the estimator default. Default is None.
Returns:
dict: Excel data type containing the fitted embedding and final KL divergence.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
if data_np.shape[0] < 3:
return None, "Error: data must contain at least 3 samples"
return data_np, None
def orient_embedding(embedding):
embedding_np = np.array(embedding, dtype=float, copy=True)
for index in range(embedding_np.shape[1]):
column = embedding_np[:, index]
pivot_value = column[int(np.argmax(np.abs(column)))] if column.size else 0.0
if pivot_value < 0:
embedding_np[:, index] *= -1.0
return embedding_np
try:
data_np, error = parse_data(data)
if error:
return error
component_total = int(n_components)
if component_total < 1:
return "Error: n_components must be at least 1"
perplexity_value = float(perplexity)
if perplexity_value <= 0 or perplexity_value >= data_np.shape[0]:
return f"Error: perplexity must be greater than 0 and less than {data_np.shape[0]}"
learning_rate_value = float(learning_rate)
if learning_rate_value <= 0:
return "Error: learning_rate must be greater than 0"
if int(max_iter) < 250:
return "Error: max_iter must be at least 250"
init_value = str(tsne_init).strip().lower()
if init_value not in {"pca", "random"}:
return "Error: tsne_init must be 'pca' or 'random'"
method_value = str(tsne_method).strip().lower()
if method_value not in {"barnes_hut", "exact"}:
return "Error: tsne_method must be 'barnes_hut' or 'exact'"
if method_value == "barnes_hut" and component_total > 3:
return "Error: n_components must be 3 or less when tsne_method is 'barnes_hut'"
fitted = SklearnTSNE(
n_components=component_total,
perplexity=perplexity_value,
learning_rate=learning_rate_value,
max_iter=int(max_iter),
init=init_value,
method=method_value,
random_state=None if random_state in (None, "") else int(random_state)
)
embedding_np = orient_embedding(fitted.fit_transform(data_np))
kl_divergence = float(fitted.kl_divergence_)
n_iter_value = float(getattr(fitted, "n_iter_", int(max_iter)))
return {
"type": "Double",
"basicValue": kl_divergence,
"properties": {
"kl_divergence": {"type": "Double", "basicValue": kl_divergence},
"component_count": {"type": "Double", "basicValue": float(embedding_np.shape[1])},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"embedding": {"type": "Array", "elements": mat(np.asarray(embedding_np, dtype=float).tolist())},
"n_iter": {"type": "Double", "basicValue": n_iter_value}
}
}
except Exception as e:
return f"Error: {str(e)}"Online Calculator
TSVD_REDUCE
Truncated singular value decomposition projects a matrix into a lower-dimensional linear subspace without centering the original features first. It is often used for sparse matrices, count matrices, and other high-dimensional tabular inputs where centering would be undesirable.
The decomposition approximates the data matrix X:
X \approx U \Sigma V^T
where U and V are matrices with orthogonal columns and \Sigma is a diagonal matrix containing the singular values.
This wrapper accepts rows as samples and columns as features. It returns the transformed scores together with learned component vectors, explained variance values, explained variance ratios, and singular values from the fitted decomposition.
Excel Usage
=TSVD_REDUCE(data, n_components, tsvd_algorithm, n_iter, random_state)
data(list[list], required): 2D array of numeric input data with rows as samples and columns as features.n_components(int, optional, default: 2): Number of singular vectors to retain.tsvd_algorithm(str, optional, default: “randomized”): Solver used to compute the truncated decomposition.n_iter(int, optional, default: 5): Number of power iterations used by the randomized solver.random_state(int, optional, default: null): Integer seed for reproducible randomized solver runs. Leave blank for the estimator default.
Returns (dict): Excel data type containing projected scores, component vectors, and explained variance summaries.
Example 1: Reduce a count-like matrix with randomized truncated SVD
Inputs:
| data | n_components | tsvd_algorithm | n_iter | random_state | |||
|---|---|---|---|---|---|---|---|
| 4 | 0 | 1 | 0 | 2 | randomized | 7 | 0 |
| 3 | 1 | 0 | 0 | ||||
| 0 | 2 | 4 | 1 | ||||
| 0 | 1 | 5 | 2 | ||||
| 5 | 0 | 1 | 0 | ||||
| 0 | 3 | 4 | 2 |
Excel formula:
=TSVD_REDUCE({4,0,1,0;3,1,0,0;0,2,4,1;0,1,5,2;5,0,1,0;0,3,4,2}, 2, "randomized", 7, 0)
Expected output:
{"type":"Double","basicValue":0.931194,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.931194},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":4},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":1.93405},{"type":"Double","basicValue":3.6262}],[{"type":"Double","basicValue":1.19906},{"type":"Double","basicValue":2.75861}],[{"type":"Double","basicValue":4.37825},{"type":"Double","basicValue":-1.22495}],[{"type":"Double","basicValue":5.14271},{"type":"Double","basicValue":-1.4662}],[{"type":"Double","basicValue":2.20955},{"type":"Double","basicValue":4.58516}],[{"type":"Double","basicValue":5.05578},{"type":"Double","basicValue":-1.49309}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.275501},{"type":"Double","basicValue":0.372556},{"type":"Double","basicValue":0.832042},{"type":"Double","basicValue":0.304972}],[{"type":"Double","basicValue":0.958959},{"type":"Double","basicValue":-0.118264},{"type":"Double","basicValue":-0.209636},{"type":"Double","basicValue":-0.149877}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":2.51791}],[{"type":"Double","basicValue":6.6647}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.255337}],[{"type":"Double","basicValue":0.675857}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":9.0132}],[{"type":"Double","basicValue":6.90381}]]}}}
Example 2: Use the ARPACK solver on a compact dense matrix
Inputs:
| data | n_components | tsvd_algorithm | n_iter | random_state | ||
|---|---|---|---|---|---|---|
| 1 | 2 | 0 | 2 | arpack | 5 | 0 |
| 2 | 3 | 1 | ||||
| 3 | 5 | 2 | ||||
| 4 | 6 | 3 | ||||
| 5 | 8 | 5 |
Excel formula:
=TSVD_REDUCE({1,2,0;2,3,1;3,5,2;4,6,3;5,8,5}, 2, "arpack", 5, 0)
Expected output:
{"type":"Double","basicValue":0.997991,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.997991},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":3},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":2.03633},{"type":"Double","basicValue":-0.912029}],[{"type":"Double","basicValue":3.70179},{"type":"Double","basicValue":-0.528017}],[{"type":"Double","basicValue":6.14111},{"type":"Double","basicValue":-0.528017}],[{"type":"Double","basicValue":7.80657},{"type":"Double","basicValue":-0.144005}],[{"type":"Double","basicValue":10.6489},{"type":"Double","basicValue":0.768025}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.488612},{"type":"Double","basicValue":0.773861},{"type":"Double","basicValue":0.402986}],[{"type":"Double","basicValue":-0.144005},{"type":"Double","basicValue":-0.384012},{"type":"Double","basicValue":0.912029}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":9.17313}],[{"type":"Double","basicValue":0.327742}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.963564}],[{"type":"Double","basicValue":0.0344267}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":15.1626}],[{"type":"Double","basicValue":1.41421}]]}}}
Example 3: Keep only one singular vector from a two-feature matrix
Inputs:
| data | n_components | tsvd_algorithm | n_iter | random_state | |
|---|---|---|---|---|---|
| 1 | 0 | 1 | randomized | 5 | 3 |
| 2 | 1 | ||||
| 3 | 1 | ||||
| 4 | 2 | ||||
| 5 | 3 |
Excel formula:
=TSVD_REDUCE({1,0;2,1;3,1;4,2;5,3}, 1, "randomized", 5, 3)
Expected output:
{"type":"Double","basicValue":0.96652,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.96652},"component_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":2},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":0.889168}],[{"type":"Double","basicValue":2.23592}],[{"type":"Double","basicValue":3.12508}],[{"type":"Double","basicValue":4.47183}],[{"type":"Double","basicValue":5.81858}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.889168},{"type":"Double","basicValue":0.457582}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":2.93822}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.96652}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":8.33122}]]}}}
Example 4: Retain three components from a wider dense matrix
Inputs:
| data | n_components | tsvd_algorithm | n_iter | random_state | ||||
|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0 | 3 | 1 | 3 | randomized | 9 | 11 |
| 3 | 1 | 1 | 4 | 1 | ||||
| 4 | 2 | 1 | 5 | 2 | ||||
| 5 | 3 | 2 | 6 | 3 | ||||
| 6 | 5 | 3 | 8 | 5 | ||||
| 7 | 8 | 5 | 11 | 8 |
Excel formula:
=TSVD_REDUCE({2,1,0,3,1;3,1,1,4,1;4,2,1,5,2;5,3,2,6,3;6,5,3,8,5;7,8,5,11,8}, 3, "randomized", 9, 11)
Expected output:
{"type":"Double","basicValue":0.997767,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.997767},"component_count":{"type":"Double","basicValue":3},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":5},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":3.64627},{"type":"Double","basicValue":1.13513},{"type":"Double","basicValue":-0.644274}],[{"type":"Double","basicValue":4.99393},{"type":"Double","basicValue":1.68159},{"type":"Double","basicValue":0.159584}],[{"type":"Double","basicValue":6.88991},{"type":"Double","basicValue":1.58207},{"type":"Double","basicValue":-0.121217}],[{"type":"Double","basicValue":9.02932},{"type":"Double","basicValue":1.14663},{"type":"Double","basicValue":0.357838}],[{"type":"Double","basicValue":12.6065},{"type":"Double","basicValue":0.0815798},{"type":"Double","basicValue":0.104611}],[{"type":"Double","basicValue":17.8649},{"type":"Double","basicValue":-1.949},{"type":"Double","basicValue":-0.121041}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.458187},{"type":"Double","basicValue":0.395869},{"type":"Double","basicValue":0.243427},{"type":"Double","basicValue":0.646052},{"type":"Double","basicValue":0.395869}],[{"type":"Double","basicValue":0.530095},{"type":"Double","basicValue":-0.490949},{"type":"Double","basicValue":-0.335912},{"type":"Double","basicValue":0.352279},{"type":"Double","basicValue":-0.490949}],[{"type":"Double","basicValue":0.45148},{"type":"Double","basicValue":-0.162402},{"type":"Double","basicValue":0.759855},{"type":"Double","basicValue":-0.407477},{"type":"Double","basicValue":-0.162402}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":23.4302}],[{"type":"Double","basicValue":1.58077}],[{"type":"Double","basicValue":0.0995387}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.931}],[{"type":"Double","basicValue":0.0628122}],[{"type":"Double","basicValue":0.00395518}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":25.403}],[{"type":"Double","basicValue":3.42626}],[{"type":"Double","basicValue":0.780315}]]}}}
Python Code
Show Code
import numpy as np
from sklearn.decomposition import TruncatedSVD as SklearnTruncatedSVD
def tsvd_reduce(data, n_components=2, tsvd_algorithm='randomized', n_iter=5, random_state=None):
"""
Fit truncated singular value decomposition and return projected scores with variance summaries.
See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
n_components (int, optional): Number of singular vectors to retain. Default is 2.
tsvd_algorithm (str, optional): Solver used to compute the truncated decomposition. Valid options: Randomized, Arpack. Default is 'randomized'.
n_iter (int, optional): Number of power iterations used by the randomized solver. Default is 5.
random_state (int, optional): Integer seed for reproducible randomized solver runs. Leave blank for the estimator default. Default is None.
Returns:
dict: Excel data type containing projected scores, component vectors, and explained variance summaries.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
if data_np.shape[0] < 2:
return None, "Error: data must contain at least 2 samples"
return data_np, None
def orient_projection(scores, components):
score_np = np.array(scores, dtype=float, copy=True)
component_np = np.array(components, dtype=float, copy=True)
limit = min(score_np.shape[1], component_np.shape[0])
for index in range(limit):
component_row = component_np[index, :]
pivot = int(np.argmax(np.abs(component_row)))
pivot_value = component_row[pivot]
if pivot_value == 0 and score_np.shape[0] > 0:
score_column = score_np[:, index]
pivot_value = score_column[int(np.argmax(np.abs(score_column)))]
if pivot_value < 0:
component_np[index, :] *= -1.0
score_np[:, index] *= -1.0
return score_np, component_np
try:
data_np, error = parse_data(data)
if error:
return error
component_total = int(n_components)
max_components = min(data_np.shape[0], data_np.shape[1])
if component_total < 1 or component_total > max_components:
return f"Error: n_components must be between 1 and {max_components}"
algorithm_value = str(tsvd_algorithm).strip().lower()
if algorithm_value not in {"randomized", "arpack"}:
return "Error: tsvd_algorithm must be 'randomized' or 'arpack'"
if algorithm_value == "arpack" and component_total >= max_components:
return "Error: n_components must be below min(n_samples, n_features) when tsvd_algorithm is 'arpack'"
if int(n_iter) < 1:
return "Error: n_iter must be at least 1"
fitted = SklearnTruncatedSVD(
n_components=component_total,
algorithm=algorithm_value,
n_iter=int(n_iter),
random_state=None if random_state in (None, "") else int(random_state)
)
scores_np = np.asarray(fitted.fit_transform(data_np), dtype=float)
components_np = np.asarray(fitted.components_, dtype=float)
scores_np, components_np = orient_projection(scores_np, components_np)
explained_variance = np.atleast_1d(np.asarray(fitted.explained_variance_, dtype=float))
variance_ratio = np.atleast_1d(np.asarray(fitted.explained_variance_ratio_, dtype=float))
singular_values = np.atleast_1d(np.asarray(fitted.singular_values_, dtype=float))
variance_ratio_sum = float(np.sum(variance_ratio))
return {
"type": "Double",
"basicValue": variance_ratio_sum,
"properties": {
"explained_variance_ratio_sum": {"type": "Double", "basicValue": variance_ratio_sum},
"component_count": {"type": "Double", "basicValue": float(components_np.shape[0])},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"scores": {"type": "Array", "elements": mat(scores_np.tolist())},
"components": {"type": "Array", "elements": mat(components_np.tolist())},
"explained_variance": {"type": "Array", "elements": col(explained_variance.tolist())},
"explained_variance_ratio": {"type": "Array", "elements": col(variance_ratio.tolist())},
"singular_values": {"type": "Array", "elements": col(singular_values.tolist())}
}
}
except Exception as e:
return f"Error: {str(e)}"Online Calculator
VARIANCE_SEL
Variance threshold feature selection removes columns whose training-set variance falls below a chosen cutoff. It is a simple unsupervised filter for dropping constant or nearly constant features before modeling.
The sample variance for a feature X with n samples is computed as:
\text{Var}(X) = \frac{1}{n} \sum_{i=1}^n (x_i - \mu)^2
where \mu is the mean of the feature.
This wrapper accepts rows as samples and columns as features. It returns the filtered matrix together with per-feature variances, a boolean support mask, and the 1-based indices of the selected columns.
Excel Usage
=VARIANCE_SEL(data, threshold)
data(list[list], required): 2D array of numeric input data with rows as samples and columns as features.threshold(float, optional, default: 0): Minimum variance required for a feature to be kept.
Returns (dict): Excel data type containing the selected feature matrix, support mask, and per-feature variances.
Example 1: Remove constant columns with the default threshold
Inputs:
| data | threshold | |||
|---|---|---|---|---|
| 1 | 0 | 5 | 10 | 0 |
| 1 | 1 | 5 | 11 | |
| 1 | 0 | 5 | 12 | |
| 1 | 1 | 5 | 13 |
Excel formula:
=VARIANCE_SEL({1,0,5,10;1,1,5,11;1,0,5,12;1,1,5,13}, 0)
Expected output:
{"type":"Double","basicValue":2,"properties":{"selected_feature_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":4},"feature_count":{"type":"Double","basicValue":4},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":4}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0.25}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1.25}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":0},{"type":"Double","basicValue":10}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":11}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":12}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":13}]]}}}
Example 2: Drop weakly varying columns with a positive threshold
Inputs:
| data | threshold | ||
|---|---|---|---|
| 0 | 1 | 10 | 0.01 |
| 0.1 | 1.1 | 11 | |
| 0 | 0.9 | 12 | |
| 0.1 | 1 | 13 | |
| 0 | 1.1 | 14 |
Excel formula:
=VARIANCE_SEL({0,1,10;0.1,1.1,11;0,0.9,12;0.1,1,13;0,1.1,14}, 0.01)
Expected output:
{"type":"Double","basicValue":1,"properties":{"selected_feature_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":3},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":3}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.0024}],[{"type":"Double","basicValue":0.0056}],[{"type":"Double","basicValue":2}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":10}],[{"type":"Double","basicValue":11}],[{"type":"Double","basicValue":12}],[{"type":"Double","basicValue":13}],[{"type":"Double","basicValue":14}]]}}}
Example 3: Keep a single varying feature column
Inputs:
| data | threshold |
|---|---|
| 2 | 0 |
| 3 | |
| 5 | |
| 8 | |
| 13 |
Excel formula:
=VARIANCE_SEL({2;3;5;8;13}, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"selected_feature_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":1},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":true}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":11}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":5}],[{"type":"Double","basicValue":8}],[{"type":"Double","basicValue":13}]]}}}
Example 4: Keep only the highest-variance columns with a larger threshold
Inputs:
| data | threshold | ||
|---|---|---|---|
| 10 | 0 | 100 | 30 |
| 12 | 0 | 110 | |
| 14 | 0 | 130 | |
| 16 | 0 | 160 | |
| 18 | 0 | 200 |
Excel formula:
=VARIANCE_SEL({10,0,100;12,0,110;14,0,130;16,0,160;18,0,200}, 30)
Expected output:
{"type":"Double","basicValue":1,"properties":{"selected_feature_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":3},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":3}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":8}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1320}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":100}],[{"type":"Double","basicValue":110}],[{"type":"Double","basicValue":130}],[{"type":"Double","basicValue":160}],[{"type":"Double","basicValue":200}]]}}}
Python Code
Show Code
import numpy as np
from sklearn.feature_selection import VarianceThreshold as SklearnVarianceThreshold
def variance_sel(data, threshold=0):
"""
Remove low-variance features and return the selected columns with variance summaries.
See: https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
threshold (float, optional): Minimum variance required for a feature to be kept. Default is 0.
Returns:
dict: Excel data type containing the selected feature matrix, support mask, and per-feature variances.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
return data_np, None
try:
data_np, error = parse_data(data)
if error:
return error
threshold_value = float(threshold)
if threshold_value < 0:
return "Error: threshold must be non-negative"
fitted = SklearnVarianceThreshold(threshold=threshold_value)
transformed_np = np.asarray(fitted.fit_transform(data_np), dtype=float)
variances = np.atleast_1d(np.asarray(fitted.variances_, dtype=float))
support_mask = fitted.get_support().tolist()
selected_indices = [index + 1 for index, keep in enumerate(support_mask) if keep]
return {
"type": "Double",
"basicValue": float(len(selected_indices)),
"properties": {
"selected_feature_count": {"type": "Double", "basicValue": float(len(selected_indices))},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"selected_indices": {"type": "Array", "elements": col(selected_indices)},
"support_mask": {"type": "Array", "elements": col(support_mask)},
"variances": {"type": "Array", "elements": col(variances.tolist())},
"selected_data": {"type": "Array", "elements": mat(transformed_np.tolist())}
}
}
except Exception as e:
return f"Error: {str(e)}"Online Calculator