Dimensionality Reduction

FACTOR_MODEL

Factor analysis models observed features as linear combinations of a smaller set of latent factors plus feature-specific noise. It is useful when the goal is to explain shared covariance structure rather than simply maximize retained variance.

The model takes the form:

X = F W + \mu + \epsilon

where X is the observed data, F are the latent factors, W is the loading matrix, \mu is the feature mean, and \epsilon is the feature-specific noise.

This wrapper accepts rows as samples and columns as features. It returns transformed factor scores together with factor loadings, estimated noise variances, per-feature means, and the fitted log-likelihood path.

Excel Usage

=FACTOR_MODEL(data, n_components, factor_rotation, fa_svd_method, max_iter, random_state)

data (list[list], required): 2D array of numeric input data with rows as samples and columns as features.
n_components (int, optional, default: null): Number of latent factors to estimate. Leave blank to keep the estimator default.
factor_rotation (str, optional, default: “none”): Optional rotation applied to the fitted loading matrix.
fa_svd_method (str, optional, default: “randomized”): Singular value decomposition method used during fitting.
max_iter (int, optional, default: 1000): Maximum number of fitting iterations.
random_state (int, optional, default: null): Integer seed for randomized fitting paths. Leave blank for the estimator default.

Returns (dict): Excel data type containing latent scores, loadings, noise variances, and log-likelihood summaries.

Example 1: Fit an unrotated two-factor model with randomized SVD

Inputs:

data				n_components	factor_rotation	fa_svd_method	max_iter	random_state
2	1	1	0.8	2	none	randomized	600	0
2.2	1.1	1.2	0.9
1.8	0.9	0.8	0.7
5	4.2	4.1	3.9
5.1	4.1	4.2	4
4.8	4	3.9	3.8
3	2.4	2.2	2
3.1	2.5	2.3	2.1

Excel formula:

=FACTOR_MODEL({2,1,1,0.8;2.2,1.1,1.2,0.9;1.8,0.9,0.8,0.7;5,4.2,4.1,3.9;5.1,4.1,4.2,4;4.8,4,3.9,3.8;3,2.4,2.2,2;3.1,2.5,2.3,2.1}, 2, "none", "randomized", 600, 0)

Expected output:

{"type":"Double","basicValue":19.9171,"properties":{"final_loglike":{"type":"Double","basicValue":19.9171},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":4},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.08516},{"type":"Double","basicValue":-0.640792}],[{"type":"Double","basicValue":-0.959455},{"type":"Double","basicValue":-1.26449}],[{"type":"Double","basicValue":-1.21086},{"type":"Double","basicValue":-0.0170943}],[{"type":"Double","basicValue":1.24179},{"type":"Double","basicValue":0.00352144}],[{"type":"Double","basicValue":1.27562},{"type":"Double","basicValue":-1.14684}],[{"type":"Double","basicValue":1.09515},{"type":"Double","basicValue":0.060673}],[{"type":"Double","basicValue":-0.216393},{"type":"Double","basicValue":1.51114}],[{"type":"Double","basicValue":-0.14069},{"type":"Double","basicValue":1.49388}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":1.30317},{"type":"Double","basicValue":1.34059},{"type":"Double","basicValue":1.3394},{"type":"Double","basicValue":1.34994}],[{"type":"Double","basicValue":-0.0587273},{"type":"Double","basicValue":0.108721},{"type":"Double","basicValue":0.0028651},{"type":"Double","basicValue":0.019216}]]},"noise_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":0.000186822}],[{"type":"Double","basicValue":0.000374229}],[{"type":"Double","basicValue":0.000848299}],[{"type":"Double","basicValue":0.0016567}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":3.375}],[{"type":"Double","basicValue":2.525}],[{"type":"Double","basicValue":2.4625}],[{"type":"Double","basicValue":2.275}]]},"loglike":{"type":"Array","elements":[[{"type":"Double","basicValue":-28.3465}],[{"type":"Double","basicValue":-17.406}],[{"type":"Double","basicValue":-6.77301}],[{"type":"Double","basicValue":2.72231}],[{"type":"Double","basicValue":9.52349}],[{"type":"Double","basicValue":13.2404}],[{"type":"Double","basicValue":15.4178}],[{"type":"Double","basicValue":16.7325}],[{"type":"Double","basicValue":17.5815}],[{"type":"Double","basicValue":18.1711}],[{"type":"Double","basicValue":18.5907}],[{"type":"Double","basicValue":18.8887}],[{"type":"Double","basicValue":19.1004}],[{"type":"Double","basicValue":19.2534}],[{"type":"Double","basicValue":19.3678}],[{"type":"Double","basicValue":19.4565}],[{"type":"Double","basicValue":19.5275}],[{"type":"Double","basicValue":19.5856}],[{"type":"Double","basicValue":19.6341}],[{"type":"Double","basicValue":19.6751}],[{"type":"Double","basicValue":19.7101}],[{"type":"Double","basicValue":19.7404}],[{"type":"Double","basicValue":19.7668}],[{"type":"Double","basicValue":19.79}],[{"type":"Double","basicValue":19.8106}],[{"type":"Double","basicValue":19.8289}],[{"type":"Double","basicValue":19.8454}],[{"type":"Double","basicValue":19.8603}],[{"type":"Double","basicValue":19.8737}],[{"type":"Double","basicValue":19.886}],[{"type":"Double","basicValue":19.8972}],[{"type":"Double","basicValue":19.9076}],[{"type":"Double","basicValue":19.9171}]]},"n_iter":{"type":"Double","basicValue":33}}}

Example 2: Apply varimax rotation with the LAPACK SVD path

Inputs:

data				n_components	factor_rotation	fa_svd_method	max_iter	random_state
2	1	1	0.8	2	varimax	lapack	600	0
2.2	1.1	1.2	0.9
1.8	0.9	0.8	0.7
5	4.2	4.1	3.9
5.1	4.1	4.2	4
4.8	4	3.9	3.8
3	2.4	2.2	2
3.1	2.5	2.3	2.1

Excel formula:

=FACTOR_MODEL({2,1,1,0.8;2.2,1.1,1.2,0.9;1.8,0.9,0.8,0.7;5,4.2,4.1,3.9;5.1,4.1,4.2,4;4.8,4,3.9,3.8;3,2.4,2.2,2;3.1,2.5,2.3,2.1}, 2, "varimax", "lapack", 600, 0)

Expected output:

{"type":"Double","basicValue":19.9171,"properties":{"final_loglike":{"type":"Double","basicValue":19.9171},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":4},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.24431},{"type":"Double","basicValue":-0.199699}],[{"type":"Double","basicValue":-1.54579},{"type":"Double","basicValue":0.360577}],[{"type":"Double","basicValue":-0.942826},{"type":"Double","basicValue":-0.759974}],[{"type":"Double","basicValue":0.957964},{"type":"Double","basicValue":0.790168}],[{"type":"Double","basicValue":0.249498},{"type":"Double","basicValue":1.69711}],[{"type":"Double","basicValue":0.881595},{"type":"Double","basicValue":0.652552}],[{"type":"Double","basicValue":0.798316},{"type":"Double","basicValue":-1.30118}],[{"type":"Double","basicValue":0.845554},{"type":"Double","basicValue":-1.23956}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.965454},{"type":"Double","basicValue":1.10117},{"type":"Double","basicValue":1.03266},{"type":"Double","basicValue":1.05122}],[{"type":"Double","basicValue":0.877264},{"type":"Double","basicValue":0.772289},{"type":"Double","basicValue":0.852994},{"type":"Double","basicValue":0.847144}]]},"noise_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":0.000186822}],[{"type":"Double","basicValue":0.000374229}],[{"type":"Double","basicValue":0.000848299}],[{"type":"Double","basicValue":0.0016567}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":3.375}],[{"type":"Double","basicValue":2.525}],[{"type":"Double","basicValue":2.4625}],[{"type":"Double","basicValue":2.275}]]},"loglike":{"type":"Array","elements":[[{"type":"Double","basicValue":-28.3465}],[{"type":"Double","basicValue":-17.406}],[{"type":"Double","basicValue":-6.77301}],[{"type":"Double","basicValue":2.72231}],[{"type":"Double","basicValue":9.52349}],[{"type":"Double","basicValue":13.2404}],[{"type":"Double","basicValue":15.4178}],[{"type":"Double","basicValue":16.7325}],[{"type":"Double","basicValue":17.5815}],[{"type":"Double","basicValue":18.1711}],[{"type":"Double","basicValue":18.5907}],[{"type":"Double","basicValue":18.8887}],[{"type":"Double","basicValue":19.1004}],[{"type":"Double","basicValue":19.2534}],[{"type":"Double","basicValue":19.3678}],[{"type":"Double","basicValue":19.4565}],[{"type":"Double","basicValue":19.5275}],[{"type":"Double","basicValue":19.5856}],[{"type":"Double","basicValue":19.6341}],[{"type":"Double","basicValue":19.6751}],[{"type":"Double","basicValue":19.7101}],[{"type":"Double","basicValue":19.7404}],[{"type":"Double","basicValue":19.7668}],[{"type":"Double","basicValue":19.79}],[{"type":"Double","basicValue":19.8106}],[{"type":"Double","basicValue":19.8289}],[{"type":"Double","basicValue":19.8454}],[{"type":"Double","basicValue":19.8603}],[{"type":"Double","basicValue":19.8737}],[{"type":"Double","basicValue":19.886}],[{"type":"Double","basicValue":19.8972}],[{"type":"Double","basicValue":19.9076}],[{"type":"Double","basicValue":19.9171}]]},"n_iter":{"type":"Double","basicValue":33}}}

Example 3: Fit a single latent factor across three observed features

Inputs:

data			n_components	factor_rotation	fa_svd_method	max_iter	random_state
1	1.1	0.9	1	none	lapack	600	0
2	2.1	1.8
3	3.2	2.9
4	4.1	4
5	5.2	5.1
6	6.1	5.9

Excel formula:

=FACTOR_MODEL({1,1.1,0.9;2,2.1,1.8;3,3.2,2.9;4,4.1,4;5,5.2,5.1;6,6.1,5.9}, 1, "none", "lapack", 600, 0)

Expected output:

{"type":"Double","basicValue":5.52431,"properties":{"final_loglike":{"type":"Double","basicValue":5.52431},"component_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.46771}],[{"type":"Double","basicValue":-0.89173}],[{"type":"Double","basicValue":-0.270316}],[{"type":"Double","basicValue":0.28352}],[{"type":"Double","basicValue":0.904934}],[{"type":"Double","basicValue":1.44131}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":1.70741},{"type":"Double","basicValue":1.71799},{"type":"Double","basicValue":1.75716}]]},"noise_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":0.00139972}],[{"type":"Double","basicValue":0.000774029}],[{"type":"Double","basicValue":0.00459521}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":3.5}],[{"type":"Double","basicValue":3.63333}],[{"type":"Double","basicValue":3.43333}]]},"loglike":{"type":"Array","elements":[[{"type":"Double","basicValue":-26.1316}],[{"type":"Double","basicValue":-19.5905}],[{"type":"Double","basicValue":-13.1569}],[{"type":"Double","basicValue":-7.0199}],[{"type":"Double","basicValue":-1.64193}],[{"type":"Double","basicValue":2.2234}],[{"type":"Double","basicValue":4.21805}],[{"type":"Double","basicValue":5.00012}],[{"type":"Double","basicValue":5.30885}],[{"type":"Double","basicValue":5.43552}],[{"type":"Double","basicValue":5.48545}],[{"type":"Double","basicValue":5.50637}],[{"type":"Double","basicValue":5.51722}],[{"type":"Double","basicValue":5.52431}]]},"n_iter":{"type":"Double","basicValue":14}}}

Example 4: Apply quartimax rotation on a four-feature matrix

Inputs:

data				n_components	factor_rotation	fa_svd_method	max_iter	random_state
1	2	0.9	1.1	2	quartimax	randomized	800	5
1.2	2.1	1	1.2
4.8	5	4.9	5.1
5	5.2	5.1	5.2
2.5	3	2.4	2.8
2.7	3.2	2.6	3
6	1	5.9	1.1
6.2	1.1	6	1.2

Excel formula:

=FACTOR_MODEL({1,2,0.9,1.1;1.2,2.1,1,1.2;4.8,5,4.9,5.1;5,5.2,5.1,5.2;2.5,3,2.4,2.8;2.7,3.2,2.6,3;6,1,5.9,1.1;6.2,1.1,6,1.2}, 2, "quartimax", "randomized", 800, 5)

Expected output:

{"type":"Double","basicValue":-13.288,"properties":{"final_loglike":{"type":"Double","basicValue":-13.288},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":4},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.34344},{"type":"Double","basicValue":-0.576798}],[{"type":"Double","basicValue":-1.26736},{"type":"Double","basicValue":-0.529243}],[{"type":"Double","basicValue":0.550676},{"type":"Double","basicValue":1.44898}],[{"type":"Double","basicValue":0.648064},{"type":"Double","basicValue":1.5441}],[{"type":"Double","basicValue":-0.611582},{"type":"Double","basicValue":0.173514}],[{"type":"Double","basicValue":-0.515077},{"type":"Double","basicValue":0.293415}],[{"type":"Double","basicValue":1.23132},{"type":"Double","basicValue":-1.20076}],[{"type":"Double","basicValue":1.3074},{"type":"Double","basicValue":-1.1532}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":1.94917},{"type":"Double","basicValue":-0.00481887},{"type":"Double","basicValue":1.97864},{"type":"Double","basicValue":0.355679}],[{"type":"Double","basicValue":0.0419101},{"type":"Double","basicValue":1.49992},{"type":"Double","basicValue":0.137768},{"type":"Double","basicValue":1.60137}]]},"noise_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":0.000849877}],[{"type":"Double","basicValue":0.00733668}],[{"type":"Double","basicValue":0.000992214}],[{"type":"Double","basicValue":0.0124078}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":3.675}],[{"type":"Double","basicValue":2.825}],[{"type":"Double","basicValue":3.6}],[{"type":"Double","basicValue":2.5875}]]},"loglike":{"type":"Array","elements":[[{"type":"Double","basicValue":-51.9082}],[{"type":"Double","basicValue":-46.4325}],[{"type":"Double","basicValue":-41.0478}],[{"type":"Double","basicValue":-35.8109}],[{"type":"Double","basicValue":-30.8392}],[{"type":"Double","basicValue":-26.3029}],[{"type":"Double","basicValue":-22.3914}],[{"type":"Double","basicValue":-19.231}],[{"type":"Double","basicValue":-16.8385}],[{"type":"Double","basicValue":-15.1717}],[{"type":"Double","basicValue":-14.1574}],[{"type":"Double","basicValue":-13.6451}],[{"type":"Double","basicValue":-13.4311}],[{"type":"Double","basicValue":-13.3518}],[{"type":"Double","basicValue":-13.3215}],[{"type":"Double","basicValue":-13.3067}],[{"type":"Double","basicValue":-13.2966}],[{"type":"Double","basicValue":-13.288}]]},"n_iter":{"type":"Double","basicValue":18}}}

Python Code

Show Code

import numpy as np
from sklearn.decomposition import FactorAnalysis as SklearnFactorAnalysis

def factor_model(data, n_components=None, factor_rotation='none', fa_svd_method='randomized', max_iter=1000, random_state=None):
    """
    Fit factor analysis and return latent scores with loadings and likelihood summaries.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FactorAnalysis.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
        n_components (int, optional): Number of latent factors to estimate. Leave blank to keep the estimator default. Default is None.
        factor_rotation (str, optional): Optional rotation applied to the fitted loading matrix. Valid options: None, Varimax, Quartimax. Default is 'none'.
        fa_svd_method (str, optional): Singular value decomposition method used during fitting. Valid options: Randomized, Lapack. Default is 'randomized'.
        max_iter (int, optional): Maximum number of fitting iterations. Default is 1000.
        random_state (int, optional): Integer seed for randomized fitting paths. Leave blank for the estimator default. Default is None.

    Returns:
        dict: Excel data type containing latent scores, loadings, noise variances, and log-likelihood summaries.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def mat(values):
        return [[cell(value) for value in row] for row in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        if data_np.shape[0] < 2:
            return None, "Error: data must contain at least 2 samples"
        return data_np, None

    def orient_projection(scores, components):
        score_np = np.array(scores, dtype=float, copy=True)
        component_np = np.array(components, dtype=float, copy=True)
        limit = min(score_np.shape[1], component_np.shape[0])
        for index in range(limit):
            component_row = component_np[index, :]
            pivot = int(np.argmax(np.abs(component_row)))
            pivot_value = component_row[pivot]
            if pivot_value == 0 and score_np.shape[0] > 0:
                score_column = score_np[:, index]
                pivot_value = score_column[int(np.argmax(np.abs(score_column)))]
            if pivot_value < 0:
                component_np[index, :] *= -1.0
                score_np[:, index] *= -1.0
        return score_np, component_np

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        component_total = None if n_components in (None, "") else int(n_components)
        if component_total is not None and (component_total < 1 or component_total > data_np.shape[1]):
            return f"Error: n_components must be between 1 and {data_np.shape[1]}"

        rotation_value = str(factor_rotation).strip().lower()
        if rotation_value not in {"none", "varimax", "quartimax"}:
            return "Error: factor_rotation must be 'none', 'varimax', or 'quartimax'"

        svd_method_value = str(fa_svd_method).strip().lower()
        if svd_method_value not in {"randomized", "lapack"}:
            return "Error: fa_svd_method must be 'randomized' or 'lapack'"

        if int(max_iter) < 1:
            return "Error: max_iter must be at least 1"

        fitted = SklearnFactorAnalysis(
            n_components=component_total,
            rotation=None if rotation_value == "none" else rotation_value,
            svd_method=svd_method_value,
            max_iter=int(max_iter),
            random_state=None if random_state in (None, "") else int(random_state)
        )

        scores_np = np.asarray(fitted.fit_transform(data_np), dtype=float)
        components_np = np.asarray(fitted.components_, dtype=float)
        scores_np, components_np = orient_projection(scores_np, components_np)
        noise_variance = np.atleast_1d(np.asarray(fitted.noise_variance_, dtype=float))
        feature_means = np.atleast_1d(np.asarray(fitted.mean_, dtype=float))
        loglike = np.atleast_1d(np.asarray(fitted.loglike_, dtype=float))
        final_loglike = float(loglike[-1])

        return {
            "type": "Double",
            "basicValue": final_loglike,
            "properties": {
                "final_loglike": {"type": "Double", "basicValue": final_loglike},
                "component_count": {"type": "Double", "basicValue": float(components_np.shape[0])},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "scores": {"type": "Array", "elements": mat(scores_np.tolist())},
                "components": {"type": "Array", "elements": mat(components_np.tolist())},
                "noise_variance": {"type": "Array", "elements": col(noise_variance.tolist())},
                "feature_means": {"type": "Array", "elements": col(feature_means.tolist())},
                "loglike": {"type": "Array", "elements": col(loglike.tolist())},
                "n_iter": {"type": "Double", "basicValue": float(fitted.n_iter_)}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

data *

2D array of numeric input data with rows as samples and columns as features.

n_components

Number of latent factors to estimate. Leave blank to keep the estimator default.

factor_rotation

Optional rotation applied to the fitted loading matrix.

fa_svd_method

Singular value decomposition method used during fitting.

max_iter

Maximum number of fitting iterations.

random_state

Integer seed for randomized fitting paths. Leave blank for the estimator default.

FASTICA

FastICA estimates statistically independent latent sources whose linear mixtures reconstruct the observed features. It is commonly used for source separation, blind signal recovery, and compact independent latent representations.

The linear mixing model is defined as:

X = S A

where X is the observed mixture, S are the independent source signals, and A is the mixing matrix. The algorithm finds an unmixing matrix W to approximate S \approx X W.

This wrapper accepts rows as samples and columns as features. It returns the recovered source matrix together with the fitted unmixing matrix, mixing matrix, feature means, and iteration count from the ICA fit.

Excel Usage

=FASTICA(data, n_components, ica_algorithm, ica_fun, max_iter, tol, random_state)

data (list[list], required): 2D array of numeric input data with rows as samples and columns as features.
n_components (int, optional, default: null): Number of independent components to estimate. Leave blank to keep the estimator default.
ica_algorithm (str, optional, default: “parallel”): Optimization strategy used to estimate the independent components.
ica_fun (str, optional, default: “logcosh”): Nonlinearity used to approximate negentropy during fitting.
max_iter (int, optional, default: 300): Maximum number of fitting iterations.
tol (float, optional, default: 0.0001): Positive convergence tolerance for the unmixing update.
random_state (int, optional, default: null): Integer seed for deterministic initialization. Leave blank for the estimator default.

Returns (dict): Excel data type containing recovered sources, unmixing matrices, and iteration summaries.

Example 1: Recover two independent components with the parallel algorithm

Inputs:

data		n_components	ica_algorithm	ica_fun	max_iter	tol	random_state
0	0	2	parallel	logcosh	500	0.0001	0
1	0.4
2	0.8
0	1
0.4	1.2
0.8	1.4
2	2
2.4	2.2

Excel formula:

=FASTICA({0,0;1,0.4;2,0.8;0,1;0.4,1.2;0.8,1.4;2,2;2.4,2.2}, 2, "parallel", "logcosh", 500, 0.0001, 0)

Expected output:

{"type":"Double","basicValue":2,"properties":{"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":2},"sources":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.23979},{"type":"Double","basicValue":-1.04907}],[{"type":"Double","basicValue":-1.28376},{"type":"Double","basicValue":0.0934344}],[{"type":"Double","basicValue":-1.32773},{"type":"Double","basicValue":1.23593}],[{"type":"Double","basicValue":0.612094},{"type":"Double","basicValue":-1.3068}],[{"type":"Double","basicValue":0.668581},{"type":"Double","basicValue":-0.860105}],[{"type":"Double","basicValue":0.725068},{"type":"Double","basicValue":-0.413413}],[{"type":"Double","basicValue":0.894529},{"type":"Double","basicValue":0.92666}],[{"type":"Double","basicValue":0.951016},{"type":"Double","basicValue":1.37335}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.784725},{"type":"Double","basicValue":1.85189}],[{"type":"Double","basicValue":1.24559},{"type":"Double","basicValue":-0.257729}]]},"mixing":{"type":"Array","elements":[[{"type":"Double","basicValue":0.122469},{"type":"Double","basicValue":0.879987}],[{"type":"Double","basicValue":0.591886},{"type":"Double","basicValue":0.372889}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":1.075}],[{"type":"Double","basicValue":1.125}]]},"n_iter":{"type":"Double","basicValue":3}}}

Example 2: Fit ICA with the deflation algorithm and exponential nonlinearity

Inputs:

data			n_components	ica_algorithm	ica_fun	max_iter	tol	random_state
0	0	1	2	deflation	exp	600	0.0001	4
1	0.3	1.1
2	0.6	1.2
0	1	2
0.3	1.2	2.1
0.6	1.4	2.2
2	2	3
2.3	2.2	3.1

Excel formula:

=FASTICA({0,0,1;1,0.3,1.1;2,0.6,1.2;0,1,2;0.3,1.2,2.1;0.6,1.4,2.2;2,2,3;2.3,2.2,3.1}, 2, "deflation", "exp", 600, 0.0001, 4)

Expected output:

{"type":"Double","basicValue":2,"properties":{"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":3},"sources":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.06113},{"type":"Double","basicValue":-1.02565}],[{"type":"Double","basicValue":-1.25726},{"type":"Double","basicValue":0.14239}],[{"type":"Double","basicValue":-1.45339},{"type":"Double","basicValue":1.31042}],[{"type":"Double","basicValue":0.471187},{"type":"Double","basicValue":-1.22815}],[{"type":"Double","basicValue":0.547425},{"type":"Double","basicValue":-0.893195}],[{"type":"Double","basicValue":0.623664},{"type":"Double","basicValue":-0.558241}],[{"type":"Double","basicValue":1.02663},{"type":"Double","basicValue":0.958731}],[{"type":"Double","basicValue":1.10287},{"type":"Double","basicValue":1.29368}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.488436},{"type":"Double","basicValue":0.695373},{"type":"Double","basicValue":0.836946}],[{"type":"Double","basicValue":1.19469},{"type":"Double","basicValue":-0.0320314},{"type":"Double","basicValue":-0.170472}]]},"mixing":{"type":"Array","elements":[[{"type":"Double","basicValue":0.120532},{"type":"Double","basicValue":0.881659}],[{"type":"Double","basicValue":0.65395},{"type":"Double","basicValue":0.313877}],[{"type":"Double","basicValue":0.721831},{"type":"Double","basicValue":0.253746}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":1.025}],[{"type":"Double","basicValue":1.0875}],[{"type":"Double","basicValue":1.9625}]]},"n_iter":{"type":"Double","basicValue":3}}}

Example 3: Estimate a single independent component from two-feature mixtures

Inputs:

data		n_components	ica_algorithm	ica_fun	max_iter	tol	random_state
0	1	1	parallel	cube	500	0.0001	2
1	1.5
2	2
3	2.5
4	3
5	3.5

Excel formula:

=FASTICA({0,1;1,1.5;2,2;3,2.5;4,3;5,3.5}, 1, "parallel", "cube", 500, 0.0001, 2)

Expected output:

{"type":"Double","basicValue":1,"properties":{"component_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"sources":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.46385}],[{"type":"Double","basicValue":-0.87831}],[{"type":"Double","basicValue":-0.29277}],[{"type":"Double","basicValue":0.29277}],[{"type":"Double","basicValue":0.87831}],[{"type":"Double","basicValue":1.46385}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.468432},{"type":"Double","basicValue":0.234216}]]},"mixing":{"type":"Array","elements":[[{"type":"Double","basicValue":1.70783}],[{"type":"Double","basicValue":0.853913}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":2.5}],[{"type":"Double","basicValue":2.25}]]},"n_iter":{"type":"Double","basicValue":1}}}

Example 4: Recover two components from a three-feature mixed signal matrix

Inputs:

data			n_components	ica_algorithm	ica_fun	max_iter	tol	random_state
1	0	0.5	2	parallel	logcosh	700	0.0001	6
2	0.5	1
3	1	1.5
0.5	2	1
1	2.5	1.5
1.5	3	2
3	3	3
3.5	3.5	3.5

Excel formula:

=FASTICA({1,0,0.5;2,0.5,1;3,1,1.5;0.5,2,1;1,2.5,1.5;1.5,3,2;3,3,3;3.5,3.5,3.5}, 2, "parallel", "logcosh", 700, 0.0001, 6)

Expected output:

{"type":"Double","basicValue":2,"properties":{"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":3},"sources":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.40153},{"type":"Double","basicValue":-0.694154}],[{"type":"Double","basicValue":-1.26564},{"type":"Double","basicValue":0.167602}],[{"type":"Double","basicValue":-1.12976},{"type":"Double","basicValue":1.02936}],[{"type":"Double","basicValue":0.382655},{"type":"Double","basicValue":-1.43255}],[{"type":"Double","basicValue":0.693594},{"type":"Double","basicValue":-0.986126}],[{"type":"Double","basicValue":1.00453},{"type":"Double","basicValue":-0.539703}],[{"type":"Double","basicValue":0.702604},{"type":"Double","basicValue":1.00457}],[{"type":"Double","basicValue":1.01354},{"type":"Double","basicValue":1.451}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.350111},{"type":"Double","basicValue":0.748755},{"type":"Double","basicValue":0.223235}],[{"type":"Double","basicValue":0.830667},{"type":"Double","basicValue":-0.2361},{"type":"Double","basicValue":0.298278}]]},"mixing":{"type":"Array","elements":[[{"type":"Double","basicValue":0.0906056},{"type":"Double","basicValue":1.03867}],[{"type":"Double","basicValue":1.17569},{"type":"Double","basicValue":0.281982}],[{"type":"Double","basicValue":0.678285},{"type":"Double","basicValue":0.683204}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":1.9375}],[{"type":"Double","basicValue":1.9375}],[{"type":"Double","basicValue":1.75}]]},"n_iter":{"type":"Double","basicValue":5}}}

Python Code

Show Code

import numpy as np
from sklearn.decomposition import FastICA as SklearnFastICA

def fastica(data, n_components=None, ica_algorithm='parallel', ica_fun='logcosh', max_iter=300, tol=0.0001, random_state=None):
    """
    Fit independent component analysis and return source signals with unmixing matrices.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
        n_components (int, optional): Number of independent components to estimate. Leave blank to keep the estimator default. Default is None.
        ica_algorithm (str, optional): Optimization strategy used to estimate the independent components. Valid options: Parallel, Deflation. Default is 'parallel'.
        ica_fun (str, optional): Nonlinearity used to approximate negentropy during fitting. Valid options: Logcosh, Exp, Cube. Default is 'logcosh'.
        max_iter (int, optional): Maximum number of fitting iterations. Default is 300.
        tol (float, optional): Positive convergence tolerance for the unmixing update. Default is 0.0001.
        random_state (int, optional): Integer seed for deterministic initialization. Leave blank for the estimator default. Default is None.

    Returns:
        dict: Excel data type containing recovered sources, unmixing matrices, and iteration summaries.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def mat(values):
        return [[cell(value) for value in row] for row in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        if data_np.shape[0] < 2:
            return None, "Error: data must contain at least 2 samples"
        return data_np, None

    def canonicalize_ica(sources, components, mixing):
        source_np = np.array(sources, dtype=float, copy=True)
        component_np = np.array(components, dtype=float, copy=True)
        mixing_np = np.array(mixing, dtype=float, copy=True)
        limit = min(source_np.shape[1], component_np.shape[0], mixing_np.shape[1])
        for index in range(limit):
            component_row = component_np[index, :]
            pivot = int(np.argmax(np.abs(component_row)))
            pivot_value = component_row[pivot]
            if pivot_value == 0 and source_np.shape[0] > 0:
                source_column = source_np[:, index]
                pivot_value = source_column[int(np.argmax(np.abs(source_column)))]
            if pivot_value < 0:
                component_np[index, :] *= -1.0
                source_np[:, index] *= -1.0
                mixing_np[:, index] *= -1.0

        order = sorted(
            range(limit),
            key=lambda index: tuple(np.round(component_np[index, :], 12).tolist())
        )
        source_np = source_np[:, order]
        component_np = component_np[order, :]
        mixing_np = mixing_np[:, order]
        return source_np, component_np, mixing_np

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        component_total = None if n_components in (None, "") else int(n_components)
        max_components = min(data_np.shape[0], data_np.shape[1])
        if component_total is not None and (component_total < 1 or component_total > max_components):
            return f"Error: n_components must be between 1 and {max_components}"

        algorithm_value = str(ica_algorithm).strip().lower()
        if algorithm_value not in {"parallel", "deflation"}:
            return "Error: ica_algorithm must be 'parallel' or 'deflation'"

        fun_value = str(ica_fun).strip().lower()
        if fun_value not in {"logcosh", "exp", "cube"}:
            return "Error: ica_fun must be 'logcosh', 'exp', or 'cube'"

        if int(max_iter) < 1:
            return "Error: max_iter must be at least 1"
        if float(tol) <= 0:
            return "Error: tol must be greater than 0"

        fitted = SklearnFastICA(
            n_components=component_total,
            algorithm=algorithm_value,
            fun=fun_value,
            max_iter=int(max_iter),
            tol=float(tol),
            random_state=None if random_state in (None, "") else int(random_state)
        )

        sources_np = np.asarray(fitted.fit_transform(data_np), dtype=float)
        components_np = np.asarray(fitted.components_, dtype=float)
        mixing_np = np.asarray(fitted.mixing_, dtype=float)
        sources_np, components_np, mixing_np = canonicalize_ica(sources_np, components_np, mixing_np)
        feature_means = np.atleast_1d(np.asarray(fitted.mean_, dtype=float))

        return {
            "type": "Double",
            "basicValue": float(components_np.shape[0]),
            "properties": {
                "component_count": {"type": "Double", "basicValue": float(components_np.shape[0])},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "sources": {"type": "Array", "elements": mat(sources_np.tolist())},
                "components": {"type": "Array", "elements": mat(components_np.tolist())},
                "mixing": {"type": "Array", "elements": mat(mixing_np.tolist())},
                "feature_means": {"type": "Array", "elements": col(feature_means.tolist())},
                "n_iter": {"type": "Double", "basicValue": float(fitted.n_iter_)}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

data *

2D array of numeric input data with rows as samples and columns as features.

n_components

Number of independent components to estimate. Leave blank to keep the estimator default.

ica_algorithm

Optimization strategy used to estimate the independent components.

ica_fun

Nonlinearity used to approximate negentropy during fitting.

max_iter

Maximum number of fitting iterations.

tol

Positive convergence tolerance for the unmixing update.

random_state

Integer seed for deterministic initialization. Leave blank for the estimator default.

KBEST_CLASSIF

SelectKBest keeps the features with the strongest univariate relationship to a classification target according to a scoring function. This wrapper fixes the score function to ANOVA F-values, which works well for dense numeric tabular inputs with categorical class labels.

The ANOVA F-value for a feature is calculated as:

F = \frac{\text{between-group variance}}{\text{within-group variance}}

This wrapper accepts rows as samples and columns as features, plus a target supplied as a single row, single column, or scalar when only one sample is present. It returns the filtered feature matrix together with feature scores, p-values, a boolean support mask, and the 1-based indices of the selected columns.

Excel Usage

=KBEST_CLASSIF(data, target, k)

data (list[list], required): 2D array of numeric feature data with rows as samples and columns as features.
target (list[list], required): Target labels as a single row, single column, or scalar when only one sample is present.
k (int, optional, default: 1): Number of top-scoring features to keep.

Returns (dict): Excel data type containing the selected feature matrix, feature scores, p-values, and support mask.

Example 1: Keep the two strongest features for a binary numeric target

Inputs:

data			target	k
0.1	10	1	0	2
0	11	1.1	0
0	12	0.9	0
1	30	5	1
1	31	4.9	1
1	29	5.1	1

Excel formula:

=KBEST_CLASSIF({0.1,10,1;0,11,1.1;0,12,0.9;1,30,5;1,31,4.9;1,29,5.1}, {0;0;0;1;1;1}, 2)

Expected output:

{"type":"Double","basicValue":2,"properties":{"selected_feature_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}]]},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":3}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":841}],[{"type":"Double","basicValue":541.5}],[{"type":"Double","basicValue":2400}]]},"p_values":{"type":"Array","elements":[[{"type":"Double","basicValue":0.00000841636}],[{"type":"Double","basicValue":0.0000202128}],[{"type":"Double","basicValue":0.00000103878}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":0.1},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1.1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0.9}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":5}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":4.9}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":5.1}]]}}}

Example 2: Flatten a single-row string target range for feature ranking

Inputs:

data			target						k
1	5	0	cold	cold	cold	hot	hot	hot	2
1.2	4.8	0.1
0.8	5.2	0
5	1	3
5.1	1.2	3.1
4.9	0.8	2.9

Excel formula:

=KBEST_CLASSIF({1,5,0;1.2,4.8,0.1;0.8,5.2,0;5,1,3;5.1,1.2,3.1;4.9,0.8,2.9}, {"cold","cold","cold","hot","hot","hot"}, 2)

Expected output:

{"type":"Double","basicValue":2,"properties":{"selected_feature_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"hot"}]]},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":3}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":960}],[{"type":"Double","basicValue":600}],[{"type":"Double","basicValue":1980.25}]]},"p_values":{"type":"Array","elements":[[{"type":"Double","basicValue":0.00000646545}],[{"type":"Double","basicValue":0.0000164831}],[{"type":"Double","basicValue":0.00000152493}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1.2},{"type":"Double","basicValue":0.1}],[{"type":"Double","basicValue":0.8},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":5},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":5.1},{"type":"Double","basicValue":3.1}],[{"type":"Double","basicValue":4.9},{"type":"Double","basicValue":2.9}]]}}}

Example 3: Select two features across three separated classes

Inputs:

data				target	k
0	0	10	1	left	2
0.1	0.2	9.8	1.2	left
5	5	2	10	center
5.2	4.8	2.1	9.7	center
10	0	5	20	right
10.1	0.2	5.2	19.8	right

Excel formula:

=KBEST_CLASSIF({0,0,10,1;0.1,0.2,9.8,1.2;5,5,2,10;5.2,4.8,2.1,9.7;10,0,5,20;10.1,0.2,5.2,19.8}, {"left";"left";"center";"center";"right";"right"}, 2)

Expected output:

{"type":"Double","basicValue":2,"properties":{"selected_feature_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":4},"class_count":{"type":"Double","basicValue":3},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"right"}]]},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":4}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":5000.17}],[{"type":"Double","basicValue":768}],[{"type":"Double","basicValue":2088.11}],[{"type":"Double","basicValue":6247.12}]]},"p_values":{"type":"Array","elements":[[{"type":"Double","basicValue":0.00000519356}],[{"type":"Double","basicValue":0.0000860645}],[{"type":"Double","basicValue":0.0000192326}],[{"type":"Double","basicValue":0.0000037193}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0.1},{"type":"Double","basicValue":1.2}],[{"type":"Double","basicValue":5},{"type":"Double","basicValue":10}],[{"type":"Double","basicValue":5.2},{"type":"Double","basicValue":9.7}],[{"type":"Double","basicValue":10},{"type":"Double","basicValue":20}],[{"type":"Double","basicValue":10.1},{"type":"Double","basicValue":19.8}]]}}}

Example 4: Rank a single strong feature against a boolean target

Inputs:

data	target	k
0.1	false	1
0.2	false
0.3	false
1.2	true
1.3	true
1.4	true

Excel formula:

=KBEST_CLASSIF({0.1;0.2;0.3;1.2;1.3;1.4}, {FALSE;FALSE;FALSE;TRUE;TRUE;TRUE}, 1)

Expected output:

{"type":"Double","basicValue":1,"properties":{"selected_feature_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":true}]]},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":181.5}]]},"p_values":{"type":"Array","elements":[[{"type":"Double","basicValue":0.000175635}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":0.1}],[{"type":"Double","basicValue":0.2}],[{"type":"Double","basicValue":0.3}],[{"type":"Double","basicValue":1.2}],[{"type":"Double","basicValue":1.3}],[{"type":"Double","basicValue":1.4}]]}}}

Python Code

Show Code

import numpy as np
from sklearn.feature_selection import SelectKBest as SklearnSelectKBest
from sklearn.feature_selection import f_classif as sklearn_f_classif

def kbest_classif(data, target, k=1):
    """
    Select the top-scoring classification features and return the filtered matrix with score summaries.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
        target (list[list]): Target labels as a single row, single column, or scalar when only one sample is present.
        k (int, optional): Number of top-scoring features to keep. Default is 1.

    Returns:
        dict: Excel data type containing the selected feature matrix, feature scores, p-values, and support mask.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def mat(values):
        return [[cell(value) for value in row] for row in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        return data_np, None

    def parse_target(value, sample_count):
        if not isinstance(value, list):
            labels = [value]
        elif not value:
            return None, None, "Error: target must be non-empty"
        elif all(not isinstance(item, list) for item in value):
            labels = value
        elif len(value) == 1:
            labels = value[0]
        elif all(isinstance(row, list) and len(row) == 1 for row in value):
            labels = [row[0] for row in value]
        else:
            return None, None, "Error: target must be a single row or column"

        if len(labels) != sample_count:
            return None, None, "Error: target length must match sample count"

        parsed = []
        classes = []
        for item in labels:
            item = py(item)
            if isinstance(item, str):
                if not item.strip():
                    return None, None, "Error: target labels must not be blank"
            elif isinstance(item, bool):
                item = bool(item)
            elif isinstance(item, (int, float)) and not isinstance(item, bool):
                if not np.isfinite(float(item)):
                    return None, None, "Error: target labels must be finite"
                item = float(item) if isinstance(item, float) else int(item)
            else:
                return None, None, "Error: target labels must be scalar string, boolean, or numeric values"
            parsed.append(item)
            if not any(type(existing) is type(item) and existing == item for existing in classes):
                classes.append(item)

        if len(classes) < 2:
            return None, None, "Error: target must contain at least 2 classes"
        return np.asarray(parsed, dtype=object), classes, None

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        target_np, classes, error = parse_target(target, data_np.shape[0])
        if error:
            return error

        selected_total = int(k)
        if selected_total < 1 or selected_total > data_np.shape[1]:
            return f"Error: k must be between 1 and {data_np.shape[1]}"

        fitted = SklearnSelectKBest(score_func=sklearn_f_classif, k=selected_total)
        transformed_np = np.asarray(fitted.fit_transform(data_np, target_np), dtype=float)
        support_mask = fitted.get_support().tolist()
        selected_indices = [index + 1 for index, keep in enumerate(support_mask) if keep]
        scores = np.atleast_1d(np.asarray(fitted.scores_, dtype=float))
        p_values = np.atleast_1d(np.asarray(fitted.pvalues_, dtype=float))

        return {
            "type": "Double",
            "basicValue": float(len(selected_indices)),
            "properties": {
                "selected_feature_count": {"type": "Double", "basicValue": float(len(selected_indices))},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "class_count": {"type": "Double", "basicValue": float(len(classes))},
                "classes": {"type": "Array", "elements": col(classes)},
                "selected_indices": {"type": "Array", "elements": col(selected_indices)},
                "support_mask": {"type": "Array", "elements": col(support_mask)},
                "scores": {"type": "Array", "elements": col(scores.tolist())},
                "p_values": {"type": "Array", "elements": col(p_values.tolist())},
                "selected_data": {"type": "Array", "elements": mat(transformed_np.tolist())}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

data *

2D array of numeric feature data with rows as samples and columns as features.

target *

Target labels as a single row, single column, or scalar when only one sample is present.

Number of top-scoring features to keep.

KERNEL_PCA

Kernel PCA performs principal component analysis in an implicit feature space defined by a kernel function. It can capture nonlinear structure that ordinary linear PCA cannot represent directly.

The kernel trick replaces dot products in the feature space with a kernel function evaluation:

K(x_i, x_j) = \phi(x_i)^T \phi(x_j)

where \phi is the implicit nonlinear mapping.

This wrapper accepts rows as samples and columns as features. It returns the nonlinear embedding together with retained eigenvalues and the number of extracted components.

Excel Usage

=KERNEL_PCA(data, n_components, kpca_kernel, gamma, degree, coef_zero, kpca_solver, random_state)

data (list[list], required): 2D array of numeric input data with rows as samples and columns as features.
n_components (int, optional, default: 2): Number of nonlinear components to keep.
kpca_kernel (str, optional, default: “linear”): Kernel function used to build the implicit feature space.
gamma (float, optional, default: null): Kernel coefficient for RBF, polynomial, and sigmoid kernels. Leave blank to keep the estimator default.
degree (int, optional, default: 3): Polynomial degree when using the polynomial kernel.
coef_zero (float, optional, default: 1): Independent kernel term for polynomial and sigmoid kernels.
kpca_solver (str, optional, default: “auto”): Eigensolver used to extract kernel principal components.
random_state (int, optional, default: null): Integer seed used by randomized or ARPACK eigensolver paths. Leave blank for the estimator default.

Returns (dict): Excel data type containing the nonlinear embedding and retained kernel eigenvalues.

Example 1: Compute a linear kernel PCA embedding on a dense matrix

Inputs:

data		n_components	kpca_kernel	degree	coef_zero	kpca_solver	random_state
1	0	2	linear	3	1	dense	0
2	1
3	1
4	2
5	3
6	5

Excel formula:

=KERNEL_PCA({1,0;2,1;3,1;4,2;5,3;6,5}, 2, "linear", , 3, 1, "dense", 0)

Expected output:

{"type":"Double","basicValue":33.5,"properties":{"retained_eigenvalue_sum":{"type":"Double","basicValue":33.5},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":-3.18939},{"type":"Double","basicValue":0.27894}],[{"type":"Double","basicValue":-1.77556},{"type":"Double","basicValue":0.312058}],[{"type":"Double","basicValue":-1.05209},{"type":"Double","basicValue":-0.378295}],[{"type":"Double","basicValue":0.361736},{"type":"Double","basicValue":-0.345177}],[{"type":"Double","basicValue":1.77556},{"type":"Double","basicValue":-0.312058}],[{"type":"Double","basicValue":3.87974},{"type":"Double","basicValue":0.444532}]]},"eigenvalues":{"type":"Array","elements":[[{"type":"Double","basicValue":32.7676}],[{"type":"Double","basicValue":0.732432}]]}}}

Example 2: Embed nonlinear structure with an RBF kernel

Inputs:

data		n_components	kpca_kernel	gamma	degree	coef_zero	kpca_solver	random_state
0	0	2	rbf	0.5	3	1	dense	0
0	1
1	0
1	1
3	3
3	4
4	3
4	4

Excel formula:

=KERNEL_PCA({0,0;0,1;1,0;1,1;3,3;3,4;4,3;4,4}, 2, "rbf", 0.5, 3, 1, "dense", 0)

Expected output:

{"type":"Double","basicValue":3.21596,"properties":{"retained_eigenvalue_sum":{"type":"Double","basicValue":3.21596},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":2},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":0.569339},{"type":"Double","basicValue":-0.397182}],[{"type":"Double","basicValue":0.568369},{"type":"Double","basicValue":-0.00317267}],[{"type":"Double","basicValue":0.568369},{"type":"Double","basicValue":-0.00317267}],[{"type":"Double","basicValue":0.563152},{"type":"Double","basicValue":0.403527}],[{"type":"Double","basicValue":-0.563152},{"type":"Double","basicValue":0.403527}],[{"type":"Double","basicValue":-0.568369},{"type":"Double","basicValue":-0.00317267}],[{"type":"Double","basicValue":-0.568369},{"type":"Double","basicValue":-0.00317267}],[{"type":"Double","basicValue":-0.569339},{"type":"Double","basicValue":-0.397182}]]},"eigenvalues":{"type":"Array","elements":[[{"type":"Double","basicValue":2.57475}],[{"type":"Double","basicValue":0.641216}]]}}}

Example 3: Use a polynomial kernel to capture curved structure

Inputs:

data		n_components	kpca_kernel	gamma	degree	coef_zero	kpca_solver	random_state
-2	4	2	poly	0.5	2	1	dense	0
-1	1
0	0
1	1
2	4
3	9

Excel formula:

=KERNEL_PCA({-2,4;-1,1;0,0;1,1;2,4;3,9}, 2, "poly", 0.5, 2, 1, "dense", 0)

Expected output:

{"type":"Double","basicValue":1666.56,"properties":{"retained_eigenvalue_sum":{"type":"Double","basicValue":1666.56},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":-5.31672},{"type":"Double","basicValue":7.70236}],[{"type":"Double","basicValue":-10.2249},{"type":"Double","basicValue":-0.491365}],[{"type":"Double","basicValue":-10.4901},{"type":"Double","basicValue":-1.83236}],[{"type":"Double","basicValue":-9.44703},{"type":"Double","basicValue":-2.28291}],[{"type":"Double","basicValue":0.0169504},{"type":"Double","basicValue":-2.96006}],[{"type":"Double","basicValue":35.4618},{"type":"Double","basicValue":-0.135669}]]},"eigenvalues":{"type":"Array","elements":[[{"type":"Double","basicValue":1589.65}],[{"type":"Double","basicValue":76.9174}]]}}}

Example 4: Use the ARPACK solver for a seeded compact embedding

Inputs:

data			n_components	kpca_kernel	degree	coef_zero	kpca_solver	random_state
1	1	0	2	linear	3	1	arpack	4
2	1	1
3	2	1
4	3	2
5	5	3
6	8	5

Excel formula:

=KERNEL_PCA({1,1,0;2,1,1;3,2,1;4,3,2;5,5,3;6,8,5}, 2, "linear", , 3, 1, "arpack", 4)

Expected output:

{"type":"Double","basicValue":70.5138,"properties":{"retained_eigenvalue_sum":{"type":"Double","basicValue":70.5138},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":-3.88006},{"type":"Double","basicValue":0.776081}],[{"type":"Double","basicValue":-2.9143},{"type":"Double","basicValue":-0.0346085}],[{"type":"Double","basicValue":-1.69564},{"type":"Double","basicValue":-0.332132}],[{"type":"Double","basicValue":0.000600645},{"type":"Double","basicValue":-0.600607}],[{"type":"Double","basicValue":2.42731},{"type":"Double","basicValue":-0.326868}],[{"type":"Double","basicValue":6.06208},{"type":"Double","basicValue":0.518135}]]},"eigenvalues":{"type":"Array","elements":[[{"type":"Double","basicValue":69.0639}],[{"type":"Double","basicValue":1.44985}]]}}}

Python Code

Show Code

import numpy as np
from sklearn.decomposition import KernelPCA as SklearnKernelPCA

def kernel_pca(data, n_components=2, kpca_kernel='linear', gamma=None, degree=3, coef_zero=1, kpca_solver='auto', random_state=None):
    """
    Fit kernel PCA and return nonlinear embeddings with retained eigenvalue summaries.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.KernelPCA.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
        n_components (int, optional): Number of nonlinear components to keep. Default is 2.
        kpca_kernel (str, optional): Kernel function used to build the implicit feature space. Valid options: Linear, RBF, Polynomial, Sigmoid, Cosine. Default is 'linear'.
        gamma (float, optional): Kernel coefficient for RBF, polynomial, and sigmoid kernels. Leave blank to keep the estimator default. Default is None.
        degree (int, optional): Polynomial degree when using the polynomial kernel. Default is 3.
        coef_zero (float, optional): Independent kernel term for polynomial and sigmoid kernels. Default is 1.
        kpca_solver (str, optional): Eigensolver used to extract kernel principal components. Valid options: Auto, Dense, Arpack, Randomized. Default is 'auto'.
        random_state (int, optional): Integer seed used by randomized or ARPACK eigensolver paths. Leave blank for the estimator default. Default is None.

    Returns:
        dict: Excel data type containing the nonlinear embedding and retained kernel eigenvalues.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def mat(values):
        return [[cell(value) for value in row] for row in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        if data_np.shape[0] < 2:
            return None, "Error: data must contain at least 2 samples"
        return data_np, None

    def orient_embedding(embedding):
        embedding_np = np.array(embedding, dtype=float, copy=True)
        for index in range(embedding_np.shape[1]):
            column = embedding_np[:, index]
            pivot_value = column[int(np.argmax(np.abs(column)))] if column.size else 0.0
            if pivot_value < 0:
                embedding_np[:, index] *= -1.0
        return embedding_np

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        component_total = int(n_components)
        max_components = data_np.shape[0]
        if component_total < 1 or component_total > max_components:
            return f"Error: n_components must be between 1 and {max_components}"

        kernel_value = str(kpca_kernel).strip().lower()
        if kernel_value not in {"linear", "rbf", "poly", "sigmoid", "cosine"}:
            return "Error: kpca_kernel must be 'linear', 'rbf', 'poly', 'sigmoid', or 'cosine'"

        solver_value = str(kpca_solver).strip().lower()
        if solver_value not in {"auto", "dense", "arpack", "randomized"}:
            return "Error: kpca_solver must be 'auto', 'dense', 'arpack', or 'randomized'"
        if solver_value == "arpack" and component_total >= data_np.shape[0]:
            return "Error: n_components must be below the sample count when kpca_solver is 'arpack'"

        if int(degree) < 1:
            return "Error: degree must be at least 1"

        gamma_value = None if gamma in (None, "") else float(gamma)
        if gamma_value is not None and gamma_value <= 0:
            return "Error: gamma must be greater than 0 when provided"

        fitted = SklearnKernelPCA(
            n_components=component_total,
            kernel=kernel_value,
            gamma=gamma_value,
            degree=int(degree),
            coef0=float(coef_zero),
            eigen_solver=solver_value,
            random_state=None if random_state in (None, "") else int(random_state)
        )

        embedding_np = orient_embedding(fitted.fit_transform(data_np))
        eigenvalues = np.atleast_1d(np.asarray(fitted.eigenvalues_, dtype=float))
        retained_sum = float(np.sum(eigenvalues))

        return {
            "type": "Double",
            "basicValue": retained_sum,
            "properties": {
                "retained_eigenvalue_sum": {"type": "Double", "basicValue": retained_sum},
                "component_count": {"type": "Double", "basicValue": float(embedding_np.shape[1])},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "embedding": {"type": "Array", "elements": mat(np.asarray(embedding_np, dtype=float).tolist())},
                "eigenvalues": {"type": "Array", "elements": col(eigenvalues.tolist())}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

data *

2D array of numeric input data with rows as samples and columns as features.

n_components

Number of nonlinear components to keep.

kpca_kernel

Kernel function used to build the implicit feature space.

gamma

Kernel coefficient for RBF, polynomial, and sigmoid kernels. Leave blank to keep the estimator default.

degree

Polynomial degree when using the polynomial kernel.

coef_zero

Independent kernel term for polynomial and sigmoid kernels.

kpca_solver

Eigensolver used to extract kernel principal components.

random_state

Integer seed used by randomized or ARPACK eigensolver paths. Leave blank for the estimator default.

NMF_FACTOR

Nonnegative matrix factorization decomposes a nonnegative data matrix into a product of two smaller nonnegative matrices. It is commonly used for topic-like count matrices, additive parts-based representations, and compact nonnegative latent factors.

The decomposition approximates the data matrix V:

V \approx W H

where W contains the non-negative factor weights and H contains the non-negative components.

This wrapper accepts rows as samples and columns as features. It returns the sample-by-component weight matrix together with the learned component matrix, reconstruction error, and iteration count for the fitted factorization.

Excel Usage

=NMF_FACTOR(data, n_components, nmf_init, nmf_solver, max_iter, random_state)

data (list[list], required): 2D array of nonnegative numeric input data with rows as samples and columns as features.
n_components (int, optional, default: 2): Number of latent components to learn.
nmf_init (str, optional, default: “nndsvda”): Initialization strategy for the nonnegative factors.
nmf_solver (str, optional, default: “cd”): Numerical solver used to optimize the factorization.
max_iter (int, optional, default: 400): Maximum number of optimization iterations.
random_state (int, optional, default: null): Integer seed used by randomized initialization paths. Leave blank for the estimator default.

Returns (dict): Excel data type containing factor weights, learned components, and reconstruction error.

Example 1: Factor a small count-like matrix into two nonnegative components

Inputs:

data				n_components	nmf_init	nmf_solver	max_iter	random_state
4	1	0	0	2	nndsvda	cd	600	0
5	1	0	0
0	0	3	4
0	0	4	5
3	1	0	0
0	0	5	4

Excel formula:

=NMF_FACTOR({4,1,0,0;5,1,0,0;0,0,3,4;0,0,4,5;3,1,0,0;0,0,5,4}, 2, "nndsvda", "cd", 600, 0)

Expected output:

{"type":"Double","basicValue":1.22303,"properties":{"reconstruction_error":{"type":"Double","basicValue":1.22303},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":4},"weights":{"type":"Array","elements":[[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1.23956}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1.53187}],[{"type":"Double","basicValue":0.32396},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0.416078},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0.947249}],[{"type":"Double","basicValue":0.412981},{"type":"Double","basicValue":3.06249e-8}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0},{"type":"Double","basicValue":10.4789},{"type":"Double","basicValue":11.2079}],[{"type":"Double","basicValue":3.23389},{"type":"Double","basicValue":0.777898},{"type":"Double","basicValue":4.30763e-9},{"type":"Double","basicValue":0}]]},"n_iter":{"type":"Double","basicValue":7}}}

Example 2: Fit one nonnegative factor on a one-trend matrix

Inputs:

data		n_components	nmf_init	nmf_solver	max_iter	random_state
1	2	1	nndsvda	cd	600	0
2	4
3	6
4	8
5	10

Excel formula:

=NMF_FACTOR({1,2;2,4;3,6;4,8;5,10}, 1, "nndsvda", "cd", 600, 0)

Expected output:

{"type":"Double","basicValue":2.49492e-15,"properties":{"reconstruction_error":{"type":"Double","basicValue":2.49492e-15},"component_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":2},"weights":{"type":"Array","elements":[[{"type":"Double","basicValue":0.5491}],[{"type":"Double","basicValue":1.0982}],[{"type":"Double","basicValue":1.6473}],[{"type":"Double","basicValue":2.1964}],[{"type":"Double","basicValue":2.7455}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":1.82116},{"type":"Double","basicValue":3.64232}]]},"n_iter":{"type":"Double","basicValue":5}}}

Example 3: Use seeded random initialization for two-factor fitting

Inputs:

data			n_components	nmf_init	nmf_solver	max_iter	random_state
2	0	1	2	random	cd	600	7
3	0	1
0	4	2
0	5	3
2	0	2
0	4	3

Excel formula:

=NMF_FACTOR({2,0,1;3,0,1;0,4,2;0,5,3;2,0,2;0,4,3}, 2, "random", "cd", 600, 7)

Expected output:

{"type":"Double","basicValue":1.06794,"properties":{"reconstruction_error":{"type":"Double","basicValue":1.06794},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"weights":{"type":"Array","elements":[[{"type":"Double","basicValue":0.911616},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1.27198},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3.66795}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":4.80079}],[{"type":"Double","basicValue":1.06152},{"type":"Double","basicValue":0.337401}],[{"type":"Double","basicValue":0.0791118},{"type":"Double","basicValue":4.06045}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":2.16697},{"type":"Double","basicValue":0},{"type":"Double","basicValue":1.14776}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1.03418},{"type":"Double","basicValue":0.63678}]]},"n_iter":{"type":"Double","basicValue":10}}}

Example 4: Fit a nonnegative matrix with the multiplicative update solver

Inputs:

data			n_components	nmf_init	nmf_solver	max_iter	random_state
3	1	0	2	nndsvda	mu	800	0
4	1	0
0	2	5
0	3	6
3	1	1
0	2	4

Excel formula:

=NMF_FACTOR({3,1,0;4,1,0;0,2,5;0,3,6;3,1,1;0,2,4}, 2, "nndsvda", "mu", 800, 0)

Expected output:

{"type":"Double","basicValue":0.471063,"properties":{"reconstruction_error":{"type":"Double","basicValue":0.471063},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"weights":{"type":"Array","elements":[[{"type":"Double","basicValue":0.0169444},{"type":"Double","basicValue":1.07715}],[{"type":"Double","basicValue":1.2837e-8},{"type":"Double","basicValue":1.41179}],[{"type":"Double","basicValue":1.17558},{"type":"Double","basicValue":0.0000200891}],[{"type":"Double","basicValue":1.46493},{"type":"Double","basicValue":0.0153092}],[{"type":"Double","basicValue":0.218198},{"type":"Double","basicValue":1.04456}],[{"type":"Double","basicValue":0.976617},{"type":"Double","basicValue":0.0102081}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":2.41145e-13},{"type":"Double","basicValue":1.92801},{"type":"Double","basicValue":4.14787}],[{"type":"Double","basicValue":2.82985},{"type":"Double","basicValue":0.721546},{"type":"Double","basicValue":0.0127197}]]},"n_iter":{"type":"Double","basicValue":70}}}

Python Code

Show Code

import numpy as np
from sklearn.decomposition import NMF as SklearnNMF

def nmf_factor(data, n_components=2, nmf_init='nndsvda', nmf_solver='cd', max_iter=400, random_state=None):
    """
    Fit nonnegative matrix factorization and return factor weights with learned components.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of nonnegative numeric input data with rows as samples and columns as features.
        n_components (int, optional): Number of latent components to learn. Default is 2.
        nmf_init (str, optional): Initialization strategy for the nonnegative factors. Valid options: NNDSVDa, NNDSVD, NNDSVDar, Random. Default is 'nndsvda'.
        nmf_solver (str, optional): Numerical solver used to optimize the factorization. Valid options: Coordinate Descent, Multiplicative Update. Default is 'cd'.
        max_iter (int, optional): Maximum number of optimization iterations. Default is 400.
        random_state (int, optional): Integer seed used by randomized initialization paths. Leave blank for the estimator default. Default is None.

    Returns:
        dict: Excel data type containing factor weights, learned components, and reconstruction error.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def mat(values):
        return [[cell(value) for value in row] for row in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        if np.any(data_np < 0):
            return None, "Error: data must contain only nonnegative numeric values"
        if data_np.shape[0] < 2:
            return None, "Error: data must contain at least 2 samples"
        return data_np, None

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        component_total = int(n_components)
        max_components = min(data_np.shape[0], data_np.shape[1])
        if component_total < 1 or component_total > max_components:
            return f"Error: n_components must be between 1 and {max_components}"

        init_value = str(nmf_init).strip().lower()
        if init_value not in {"nndsvda", "nndsvd", "nndsvdar", "random"}:
            return "Error: nmf_init must be 'nndsvda', 'nndsvd', 'nndsvdar', or 'random'"

        solver_value = str(nmf_solver).strip().lower()
        if solver_value not in {"cd", "mu"}:
            return "Error: nmf_solver must be 'cd' or 'mu'"

        if int(max_iter) < 1:
            return "Error: max_iter must be at least 1"

        fitted = SklearnNMF(
            n_components=component_total,
            init=init_value,
            solver=solver_value,
            max_iter=int(max_iter),
            random_state=None if random_state in (None, "") else int(random_state)
        )

        weights_np = np.asarray(fitted.fit_transform(data_np), dtype=float)
        components_np = np.asarray(fitted.components_, dtype=float)
        reconstruction_error = float(fitted.reconstruction_err_)

        return {
            "type": "Double",
            "basicValue": reconstruction_error,
            "properties": {
                "reconstruction_error": {"type": "Double", "basicValue": reconstruction_error},
                "component_count": {"type": "Double", "basicValue": float(components_np.shape[0])},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "weights": {"type": "Array", "elements": mat(weights_np.tolist())},
                "components": {"type": "Array", "elements": mat(components_np.tolist())},
                "n_iter": {"type": "Double", "basicValue": float(fitted.n_iter_)}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

data *

2D array of nonnegative numeric input data with rows as samples and columns as features.

n_components

Number of latent components to learn.

nmf_init

Initialization strategy for the nonnegative factors.

nmf_solver

Numerical solver used to optimize the factorization.

max_iter

Maximum number of optimization iterations.

random_state

Integer seed used by randomized initialization paths. Leave blank for the estimator default.

PCA_REDUCE

Principal component analysis projects tabular data onto orthogonal directions that capture the largest possible variance in descending order. It is commonly used to compress correlated features into a smaller set of informative components.

For a centered data matrix X, the projection is given by:

X = T P^T

where T are the principal component scores and P are the orthogonal principal axes (loadings).

This wrapper accepts rows as samples and columns as features. It returns the transformed component scores together with component loadings, explained variance values, explained variance ratios, and singular values for the fitted projection.

Excel Usage

=PCA_REDUCE(data, n_components, whiten, pca_solver, random_state)

data (list[list], required): 2D array of numeric input data with rows as samples and columns as features.
n_components (int, optional, default: null): Number of principal components to keep. Leave blank to keep the estimator default.
whiten (bool, optional, default: false): Whether to rescale components to unit variance after projection.
pca_solver (str, optional, default: “auto”): Singular value decomposition strategy used to fit the PCA model.
random_state (int, optional, default: null): Integer seed used when the randomized solver path is active. Leave blank for the estimator default.

Returns (dict): Excel data type containing projected scores, component loadings, and explained variance summaries.

Example 1: Reduce a dense three-feature matrix with the full solver

Inputs:

data			n_components	whiten	pca_solver	random_state
2	1	0	2	false	full	0
3	1	1
4	2	1
5	3	2
6	4	3
7	5	4

Excel formula:

=PCA_REDUCE({2,1,0;3,1,1;4,2,1;5,3,2;6,4,3;7,5,4}, 2, FALSE, "full", 0)

Expected output:

{"type":"Double","basicValue":0.994454,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.994454},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-3.49841},{"type":"Double","basicValue":0.277463}],[{"type":"Double","basicValue":-2.34023},{"type":"Double","basicValue":-0.495054}],[{"type":"Double","basicValue":-1.12509},{"type":"Double","basicValue":0.142399}],[{"type":"Double","basicValue":0.598075},{"type":"Double","basicValue":0.0837313}],[{"type":"Double","basicValue":2.32124},{"type":"Double","basicValue":0.0250639}],[{"type":"Double","basicValue":4.04441},{"type":"Double","basicValue":-0.0336034}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.650144},{"type":"Double","basicValue":0.564993},{"type":"Double","basicValue":0.508031}],[{"type":"Double","basicValue":-0.0763973},{"type":"Double","basicValue":0.713849},{"type":"Double","basicValue":-0.69612}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.986028}],[{"type":"Double","basicValue":0.00842663}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":8.2169}],[{"type":"Double","basicValue":0.0702219}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":6.40972}],[{"type":"Double","basicValue":0.592545}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":4.5}],[{"type":"Double","basicValue":2.66667}],[{"type":"Double","basicValue":1.83333}]]}}}

Example 2: Whiten a centered matrix into two principal components

Inputs:

data			n_components	whiten	pca_solver	random_state
1	2	3	2	true	full	0
2	4	5
3	6	7
4	8	9
5	10	11
6	12	13

Excel formula:

=PCA_REDUCE({1,2,3;2,4,5;3,6,7;4,8,9;5,10,11;6,12,13}, 2, TRUE, "full", 0)

Expected output:

{"type":"Double","basicValue":1,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":1},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.33631},{"type":"Double","basicValue":-1.78183}],[{"type":"Double","basicValue":-0.801784},{"type":"Double","basicValue":0.503593}],[{"type":"Double","basicValue":-0.267261},{"type":"Double","basicValue":0.0925721}],[{"type":"Double","basicValue":0.267261},{"type":"Double","basicValue":-0.0925721}],[{"type":"Double","basicValue":0.801784},{"type":"Double","basicValue":-0.503593}],[{"type":"Double","basicValue":1.33631},{"type":"Double","basicValue":-1.14049}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.333333},{"type":"Double","basicValue":0.666667},{"type":"Double","basicValue":0.666667}],[{"type":"Double","basicValue":0.942809},{"type":"Double","basicValue":-0.235702},{"type":"Double","basicValue":-0.235702}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1.70433e-33}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":31.5}],[{"type":"Double","basicValue":5.36864e-32}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":12.5499}],[{"type":"Double","basicValue":5.18104e-16}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":3.5}],[{"type":"Double","basicValue":7}],[{"type":"Double","basicValue":8}]]}}}

Example 3: Reduce one-feature data to a single component

Inputs:

data	n_components	whiten	pca_solver	random_state
1	1	false	full	0
2
4
7
11

Excel formula:

=PCA_REDUCE({1;2;4;7;11}, 1, FALSE, "full", 0)

Expected output:

Example 4: Use a seeded randomized PCA fit on a wider matrix

Inputs:

data				n_components	whiten	pca_solver	random_state
1	0	2	1	2	false	randomized	7
2	1	3	1
3	1	4	2
4	2	5	3
5	3	6	5
6	5	7	8

Excel formula:

=PCA_REDUCE({1,0,2,1;2,1,3,1;3,1,4,2;4,2,5,3;5,3,6,5;6,5,7,8}, 2, FALSE, "randomized", 7)

Expected output:

{"type":"Double","basicValue":0.996287,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.996287},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":4},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-4.59092},{"type":"Double","basicValue":0.905985}],[{"type":"Double","basicValue":-3.27712},{"type":"Double","basicValue":-0.00260405}],[{"type":"Double","basicValue":-1.7394},{"type":"Double","basicValue":-0.460947}],[{"type":"Double","basicValue":0.225926},{"type":"Double","basicValue":-0.747519}],[{"type":"Double","basicValue":2.84277},{"type":"Double","basicValue":-0.412074}],[{"type":"Double","basicValue":6.53874},{"type":"Double","basicValue":0.717159}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.443101},{"type":"Double","basicValue":0.427595},{"type":"Double","basicValue":0.443101},{"type":"Double","basicValue":0.651526}],[{"type":"Double","basicValue":-0.54018},{"type":"Double","basicValue":0.171771},{"type":"Double","basicValue":-0.54018},{"type":"Double","basicValue":0.622017}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.970519}],[{"type":"Double","basicValue":0.0257683}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":17.1458}],[{"type":"Double","basicValue":0.455239}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":9.259}],[{"type":"Double","basicValue":1.50871}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":3.5}],[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":4.5}],[{"type":"Double","basicValue":3.33333}]]}}}

Python Code

Show Code

import numpy as np
from sklearn.decomposition import PCA as SklearnPCA

def pca_reduce(data, n_components=None, whiten=False, pca_solver='auto', random_state=None):
    """
    Fit principal component analysis and return projected scores with variance summaries.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
        n_components (int, optional): Number of principal components to keep. Leave blank to keep the estimator default. Default is None.
        whiten (bool, optional): Whether to rescale components to unit variance after projection. Default is False.
        pca_solver (str, optional): Singular value decomposition strategy used to fit the PCA model. Valid options: Auto, Full, Randomized, Arpack. Default is 'auto'.
        random_state (int, optional): Integer seed used when the randomized solver path is active. Leave blank for the estimator default. Default is None.

    Returns:
        dict: Excel data type containing projected scores, component loadings, and explained variance summaries.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def mat(values):
        return [[cell(value) for value in row] for row in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        if data_np.shape[0] < 2:
            return None, "Error: data must contain at least 2 samples"
        return data_np, None

    def orient_projection(scores, components):
        score_np = np.array(scores, dtype=float, copy=True)
        component_np = np.array(components, dtype=float, copy=True)
        limit = min(score_np.shape[1], component_np.shape[0])
        for index in range(limit):
            component_row = component_np[index, :]
            pivot = int(np.argmax(np.abs(component_row)))
            pivot_value = component_row[pivot]
            if pivot_value == 0 and score_np.shape[0] > 0:
                score_column = score_np[:, index]
                pivot_value = score_column[int(np.argmax(np.abs(score_column)))]
            if pivot_value < 0:
                component_np[index, :] *= -1.0
                score_np[:, index] *= -1.0
        return score_np, component_np

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        component_total = None if n_components in (None, "") else int(n_components)
        max_components = min(data_np.shape[0], data_np.shape[1])
        if component_total is not None and (component_total < 1 or component_total > max_components):
            return f"Error: n_components must be between 1 and {max_components}"

        solver_value = str(pca_solver).strip().lower()
        if solver_value not in {"auto", "full", "randomized", "arpack"}:
            return "Error: pca_solver must be 'auto', 'full', 'randomized', or 'arpack'"
        if solver_value == "arpack" and (component_total is None or component_total >= max_components):
            return "Error: n_components must be set below min(n_samples, n_features) when pca_solver is 'arpack'"

        fitted = SklearnPCA(
            n_components=component_total,
            whiten=bool(whiten),
            svd_solver=solver_value,
            random_state=None if random_state in (None, "") else int(random_state)
        )

        scores_np = np.asarray(fitted.fit_transform(data_np), dtype=float)
        components_np = np.asarray(fitted.components_, dtype=float)
        scores_np, components_np = orient_projection(scores_np, components_np)
        variance_ratio = np.atleast_1d(np.asarray(fitted.explained_variance_ratio_, dtype=float))
        explained_variance = np.atleast_1d(np.asarray(fitted.explained_variance_, dtype=float))
        singular_values = np.atleast_1d(np.asarray(fitted.singular_values_, dtype=float))
        mean_values = np.atleast_1d(np.asarray(fitted.mean_, dtype=float))
        variance_ratio_sum = float(np.sum(variance_ratio))

        return {
            "type": "Double",
            "basicValue": variance_ratio_sum,
            "properties": {
                "explained_variance_ratio_sum": {"type": "Double", "basicValue": variance_ratio_sum},
                "component_count": {"type": "Double", "basicValue": float(components_np.shape[0])},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "scores": {"type": "Array", "elements": mat(scores_np.tolist())},
                "components": {"type": "Array", "elements": mat(components_np.tolist())},
                "explained_variance_ratio": {"type": "Array", "elements": col(variance_ratio.tolist())},
                "explained_variance": {"type": "Array", "elements": col(explained_variance.tolist())},
                "singular_values": {"type": "Array", "elements": col(singular_values.tolist())},
                "feature_means": {"type": "Array", "elements": col(mean_values.tolist())}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

data *

2D array of numeric input data with rows as samples and columns as features.

n_components

Number of principal components to keep. Leave blank to keep the estimator default.

whiten

Whether to rescale components to unit variance after projection.

pca_solver

Singular value decomposition strategy used to fit the PCA model.

random_state

Integer seed used when the randomized solver path is active. Leave blank for the estimator default.

TSNE_EMBED

t-distributed stochastic neighbor embedding maps high-dimensional samples into a low-dimensional space by preserving local neighborhoods rather than global linear structure. It is most useful for visualization-oriented exploratory analysis.

The algorithm minimizes the Kullback-Leibler divergence between the joint probabilities of the high-dimensional and low-dimensional representations:

KL(P || Q) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}}

where P and Q represent the pairwise similarities in the original and embedded spaces respectively.

This wrapper accepts rows as samples and columns as features. It returns the fitted embedding together with the final KL divergence, the number of embedding dimensions, and the iteration count reported by the fitted estimator.

Excel Usage

=TSNE_EMBED(data, n_components, perplexity, learning_rate, max_iter, tsne_init, tsne_method, random_state)

data (list[list], required): 2D array of numeric input data with rows as samples and columns as features.
n_components (int, optional, default: 2): Dimension of the embedded output space.
perplexity (float, optional, default: 30): Effective neighborhood size used in the t-SNE objective.
learning_rate (float, optional, default: 200): Positive optimizer learning rate for the embedding updates.
max_iter (int, optional, default: 1000): Maximum number of optimization iterations.
tsne_init (str, optional, default: “pca”): Initialization scheme for the starting embedding.
tsne_method (str, optional, default: “barnes_hut”): Gradient computation method used during optimization.
random_state (int, optional, default: null): Integer seed for deterministic initialization. Leave blank for the estimator default.

Returns (dict): Excel data type containing the fitted embedding and final KL divergence.

Example 1: Embed two separated clusters with the exact solver

Inputs:

data		n_components	perplexity	learning_rate	max_iter	tsne_init	tsne_method	random_state
0	0	2	2.5	80	500	pca	exact	0
0	1
1	0
1	1
5	5
5	6
6	5
6	6

Excel formula:

=TSNE_EMBED({0,0;0,1;1,0;1,1;5,5;5,6;6,5;6,6}, 2, 2.5, 80, 500, "pca", "exact", 0)

Expected output:

{"type":"Double","basicValue":1.27152,"properties":{"kl_divergence":{"type":"Double","basicValue":1.27152},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":2},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":308.174},{"type":"Double","basicValue":-596.145}],[{"type":"Double","basicValue":565.379},{"type":"Double","basicValue":140.845}],[{"type":"Double","basicValue":710.635},{"type":"Double","basicValue":-1128.97}],[{"type":"Double","basicValue":-343.572},{"type":"Double","basicValue":1700.94}],[{"type":"Double","basicValue":1682.58},{"type":"Double","basicValue":-627.124}],[{"type":"Double","basicValue":-1589.38},{"type":"Double","basicValue":723.269}],[{"type":"Double","basicValue":-803.921},{"type":"Double","basicValue":71.7089}],[{"type":"Double","basicValue":-435.913},{"type":"Double","basicValue":644.484}]]},"n_iter":{"type":"Double","basicValue":499}}}

Example 2: Use seeded random initialization on three compact groups

Inputs:

data			n_components	perplexity	learning_rate	max_iter	tsne_init	tsne_method	random_state
0	0	0	2	2	60	500	random	exact	3
0.2	0.1	0
5	5	5
5.1	5.2	5
10	0	10
10.2	0.1	10.1

Excel formula:

=TSNE_EMBED({0,0,0;0.2,0.1,0;5,5,5;5.1,5.2,5;10,0,10;10.2,0.1,10.1}, 2, 2, 60, 500, "random", "exact", 3)

Expected output:

{"type":"Double","basicValue":1.36008,"properties":{"kl_divergence":{"type":"Double","basicValue":1.36008},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":914.209},{"type":"Double","basicValue":-274.882}],[{"type":"Double","basicValue":-890.978},{"type":"Double","basicValue":295.72}],[{"type":"Double","basicValue":-234.515},{"type":"Double","basicValue":453.258}],[{"type":"Double","basicValue":256.056},{"type":"Double","basicValue":-448.663}],[{"type":"Double","basicValue":156.849},{"type":"Double","basicValue":102.724}],[{"type":"Double","basicValue":-143.79},{"type":"Double","basicValue":-103.507}]]},"n_iter":{"type":"Double","basicValue":499}}}

Example 3: Embed one-dimensional samples into two dimensions

Inputs:

data	n_components	perplexity	learning_rate	max_iter	tsne_init	tsne_method	random_state
0	2	2	50	500	random	exact	1
1
2
3
4
5

Excel formula:

=TSNE_EMBED({0;1;2;3;4;5}, 2, 2, 50, 500, "random", "exact", 1)

Expected output:

{"type":"Double","basicValue":0.123383,"properties":{"kl_divergence":{"type":"Double","basicValue":0.123383},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":-145.516},{"type":"Double","basicValue":21.1913}],[{"type":"Double","basicValue":-93.6204},{"type":"Double","basicValue":23.9552}],[{"type":"Double","basicValue":-30.9413},{"type":"Double","basicValue":25.6826}],[{"type":"Double","basicValue":37.3494},{"type":"Double","basicValue":26.9504}],[{"type":"Double","basicValue":100.051},{"type":"Double","basicValue":27.51}],[{"type":"Double","basicValue":152.001},{"type":"Double","basicValue":26.1001}]]},"n_iter":{"type":"Double","basicValue":499}}}

Example 4: Produce a three-dimensional embedding with Barnes-Hut t-SNE

Inputs:

data			n_components	perplexity	learning_rate	max_iter	tsne_init	tsne_method	random_state
0	0	1	3	2.5	75	500	pca	barnes_hut	5
0.1	0.2	1.1
1	1	2
1.2	1.1	2.1
5	5	0
5.2	5.1	0.2
6	6	1
6.1	6.2	1.1

Excel formula:

=TSNE_EMBED({0,0,1;0.1,0.2,1.1;1,1,2;1.2,1.1,2.1;5,5,0;5.2,5.1,0.2;6,6,1;6.1,6.2,1.1}, 3, 2.5, 75, 500, "pca", "barnes_hut", 5)

Expected output:

{"type":"Double","basicValue":1.79411,"properties":{"kl_divergence":{"type":"Double","basicValue":1.79411},"component_count":{"type":"Double","basicValue":3},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":3},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":180.549},{"type":"Double","basicValue":-89.0916},{"type":"Double","basicValue":19.9081}],[{"type":"Double","basicValue":-64.7},{"type":"Double","basicValue":68.89},{"type":"Double","basicValue":-60.3671}],[{"type":"Double","basicValue":89.9451},{"type":"Double","basicValue":12.4007},{"type":"Double","basicValue":13.6712}],[{"type":"Double","basicValue":52.8532},{"type":"Double","basicValue":3.74339},{"type":"Double","basicValue":-82.2136}],[{"type":"Double","basicValue":-29.3893},{"type":"Double","basicValue":-133.062},{"type":"Double","basicValue":3.69885}],[{"type":"Double","basicValue":-54.7223},{"type":"Double","basicValue":-38.0214},{"type":"Double","basicValue":32.0658}],[{"type":"Double","basicValue":-95.3113},{"type":"Double","basicValue":-87.4605},{"type":"Double","basicValue":133.372}],[{"type":"Double","basicValue":193.254},{"type":"Double","basicValue":649.021},{"type":"Double","basicValue":40.2834}]]},"n_iter":{"type":"Double","basicValue":499}}}

Python Code

Show Code

import numpy as np
from sklearn.manifold import TSNE as SklearnTSNE

def tsne_embed(data, n_components=2, perplexity=30, learning_rate=200, max_iter=1000, tsne_init='pca', tsne_method='barnes_hut', random_state=None):
    """
    Fit t-SNE and return a low-dimensional embedding with the final KL divergence.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
        n_components (int, optional): Dimension of the embedded output space. Default is 2.
        perplexity (float, optional): Effective neighborhood size used in the t-SNE objective. Default is 30.
        learning_rate (float, optional): Positive optimizer learning rate for the embedding updates. Default is 200.
        max_iter (int, optional): Maximum number of optimization iterations. Default is 1000.
        tsne_init (str, optional): Initialization scheme for the starting embedding. Valid options: PCA, Random. Default is 'pca'.
        tsne_method (str, optional): Gradient computation method used during optimization. Valid options: Barnes-Hut, Exact. Default is 'barnes_hut'.
        random_state (int, optional): Integer seed for deterministic initialization. Leave blank for the estimator default. Default is None.

    Returns:
        dict: Excel data type containing the fitted embedding and final KL divergence.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def mat(values):
        return [[cell(value) for value in row] for row in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        if data_np.shape[0] < 3:
            return None, "Error: data must contain at least 3 samples"
        return data_np, None

    def orient_embedding(embedding):
        embedding_np = np.array(embedding, dtype=float, copy=True)
        for index in range(embedding_np.shape[1]):
            column = embedding_np[:, index]
            pivot_value = column[int(np.argmax(np.abs(column)))] if column.size else 0.0
            if pivot_value < 0:
                embedding_np[:, index] *= -1.0
        return embedding_np

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        component_total = int(n_components)
        if component_total < 1:
            return "Error: n_components must be at least 1"

        perplexity_value = float(perplexity)
        if perplexity_value <= 0 or perplexity_value >= data_np.shape[0]:
            return f"Error: perplexity must be greater than 0 and less than {data_np.shape[0]}"

        learning_rate_value = float(learning_rate)
        if learning_rate_value <= 0:
            return "Error: learning_rate must be greater than 0"

        if int(max_iter) < 250:
            return "Error: max_iter must be at least 250"

        init_value = str(tsne_init).strip().lower()
        if init_value not in {"pca", "random"}:
            return "Error: tsne_init must be 'pca' or 'random'"

        method_value = str(tsne_method).strip().lower()
        if method_value not in {"barnes_hut", "exact"}:
            return "Error: tsne_method must be 'barnes_hut' or 'exact'"
        if method_value == "barnes_hut" and component_total > 3:
            return "Error: n_components must be 3 or less when tsne_method is 'barnes_hut'"

        fitted = SklearnTSNE(
            n_components=component_total,
            perplexity=perplexity_value,
            learning_rate=learning_rate_value,
            max_iter=int(max_iter),
            init=init_value,
            method=method_value,
            random_state=None if random_state in (None, "") else int(random_state)
        )

        embedding_np = orient_embedding(fitted.fit_transform(data_np))
        kl_divergence = float(fitted.kl_divergence_)
        n_iter_value = float(getattr(fitted, "n_iter_", int(max_iter)))

        return {
            "type": "Double",
            "basicValue": kl_divergence,
            "properties": {
                "kl_divergence": {"type": "Double", "basicValue": kl_divergence},
                "component_count": {"type": "Double", "basicValue": float(embedding_np.shape[1])},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "embedding": {"type": "Array", "elements": mat(np.asarray(embedding_np, dtype=float).tolist())},
                "n_iter": {"type": "Double", "basicValue": n_iter_value}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

data *

2D array of numeric input data with rows as samples and columns as features.

n_components

Dimension of the embedded output space.

perplexity

Effective neighborhood size used in the t-SNE objective.

learning_rate

Positive optimizer learning rate for the embedding updates.

max_iter

Maximum number of optimization iterations.

tsne_init

Initialization scheme for the starting embedding.

tsne_method

Gradient computation method used during optimization.

random_state

Integer seed for deterministic initialization. Leave blank for the estimator default.

TSVD_REDUCE

Truncated singular value decomposition projects a matrix into a lower-dimensional linear subspace without centering the original features first. It is often used for sparse matrices, count matrices, and other high-dimensional tabular inputs where centering would be undesirable.

The decomposition approximates the data matrix X:

X \approx U \Sigma V^T

where U and V are matrices with orthogonal columns and \Sigma is a diagonal matrix containing the singular values.

This wrapper accepts rows as samples and columns as features. It returns the transformed scores together with learned component vectors, explained variance values, explained variance ratios, and singular values from the fitted decomposition.

Excel Usage

=TSVD_REDUCE(data, n_components, tsvd_algorithm, n_iter, random_state)

data (list[list], required): 2D array of numeric input data with rows as samples and columns as features.
n_components (int, optional, default: 2): Number of singular vectors to retain.
tsvd_algorithm (str, optional, default: “randomized”): Solver used to compute the truncated decomposition.
n_iter (int, optional, default: 5): Number of power iterations used by the randomized solver.
random_state (int, optional, default: null): Integer seed for reproducible randomized solver runs. Leave blank for the estimator default.

Returns (dict): Excel data type containing projected scores, component vectors, and explained variance summaries.

Example 1: Reduce a count-like matrix with randomized truncated SVD

Inputs:

data				n_components	tsvd_algorithm	n_iter	random_state
4	0	1	0	2	randomized	7	0
3	1	0	0
0	2	4	1
0	1	5	2
5	0	1	0
0	3	4	2

Excel formula:

=TSVD_REDUCE({4,0,1,0;3,1,0,0;0,2,4,1;0,1,5,2;5,0,1,0;0,3,4,2}, 2, "randomized", 7, 0)

Expected output:

{"type":"Double","basicValue":0.931194,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.931194},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":4},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":1.93405},{"type":"Double","basicValue":3.6262}],[{"type":"Double","basicValue":1.19906},{"type":"Double","basicValue":2.75861}],[{"type":"Double","basicValue":4.37825},{"type":"Double","basicValue":-1.22495}],[{"type":"Double","basicValue":5.14271},{"type":"Double","basicValue":-1.4662}],[{"type":"Double","basicValue":2.20955},{"type":"Double","basicValue":4.58516}],[{"type":"Double","basicValue":5.05578},{"type":"Double","basicValue":-1.49309}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.275501},{"type":"Double","basicValue":0.372556},{"type":"Double","basicValue":0.832042},{"type":"Double","basicValue":0.304972}],[{"type":"Double","basicValue":0.958959},{"type":"Double","basicValue":-0.118264},{"type":"Double","basicValue":-0.209636},{"type":"Double","basicValue":-0.149877}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":2.51791}],[{"type":"Double","basicValue":6.6647}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.255337}],[{"type":"Double","basicValue":0.675857}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":9.0132}],[{"type":"Double","basicValue":6.90381}]]}}}

Example 2: Use the ARPACK solver on a compact dense matrix

Inputs:

data			n_components	tsvd_algorithm	n_iter	random_state
1	2	0	2	arpack	5	0
2	3	1
3	5	2
4	6	3
5	8	5

Excel formula:

=TSVD_REDUCE({1,2,0;2,3,1;3,5,2;4,6,3;5,8,5}, 2, "arpack", 5, 0)

Expected output:

{"type":"Double","basicValue":0.997991,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.997991},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":3},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":2.03633},{"type":"Double","basicValue":-0.912029}],[{"type":"Double","basicValue":3.70179},{"type":"Double","basicValue":-0.528017}],[{"type":"Double","basicValue":6.14111},{"type":"Double","basicValue":-0.528017}],[{"type":"Double","basicValue":7.80657},{"type":"Double","basicValue":-0.144005}],[{"type":"Double","basicValue":10.6489},{"type":"Double","basicValue":0.768025}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.488612},{"type":"Double","basicValue":0.773861},{"type":"Double","basicValue":0.402986}],[{"type":"Double","basicValue":-0.144005},{"type":"Double","basicValue":-0.384012},{"type":"Double","basicValue":0.912029}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":9.17313}],[{"type":"Double","basicValue":0.327742}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.963564}],[{"type":"Double","basicValue":0.0344267}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":15.1626}],[{"type":"Double","basicValue":1.41421}]]}}}

Example 3: Keep only one singular vector from a two-feature matrix

Inputs:

data		n_components	tsvd_algorithm	n_iter	random_state
1	0	1	randomized	5	3
2	1
3	1
4	2
5	3

Excel formula:

=TSVD_REDUCE({1,0;2,1;3,1;4,2;5,3}, 1, "randomized", 5, 3)

Expected output:

{"type":"Double","basicValue":0.96652,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.96652},"component_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":2},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":0.889168}],[{"type":"Double","basicValue":2.23592}],[{"type":"Double","basicValue":3.12508}],[{"type":"Double","basicValue":4.47183}],[{"type":"Double","basicValue":5.81858}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.889168},{"type":"Double","basicValue":0.457582}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":2.93822}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.96652}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":8.33122}]]}}}

Example 4: Retain three components from a wider dense matrix

Inputs:

data					n_components	tsvd_algorithm	n_iter	random_state
2	1	0	3	1	3	randomized	9	11
3	1	1	4	1
4	2	1	5	2
5	3	2	6	3
6	5	3	8	5
7	8	5	11	8

Excel formula:

=TSVD_REDUCE({2,1,0,3,1;3,1,1,4,1;4,2,1,5,2;5,3,2,6,3;6,5,3,8,5;7,8,5,11,8}, 3, "randomized", 9, 11)

Expected output:

{"type":"Double","basicValue":0.997767,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.997767},"component_count":{"type":"Double","basicValue":3},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":5},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":3.64627},{"type":"Double","basicValue":1.13513},{"type":"Double","basicValue":-0.644274}],[{"type":"Double","basicValue":4.99393},{"type":"Double","basicValue":1.68159},{"type":"Double","basicValue":0.159584}],[{"type":"Double","basicValue":6.88991},{"type":"Double","basicValue":1.58207},{"type":"Double","basicValue":-0.121217}],[{"type":"Double","basicValue":9.02932},{"type":"Double","basicValue":1.14663},{"type":"Double","basicValue":0.357838}],[{"type":"Double","basicValue":12.6065},{"type":"Double","basicValue":0.0815798},{"type":"Double","basicValue":0.104611}],[{"type":"Double","basicValue":17.8649},{"type":"Double","basicValue":-1.949},{"type":"Double","basicValue":-0.121041}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.458187},{"type":"Double","basicValue":0.395869},{"type":"Double","basicValue":0.243427},{"type":"Double","basicValue":0.646052},{"type":"Double","basicValue":0.395869}],[{"type":"Double","basicValue":0.530095},{"type":"Double","basicValue":-0.490949},{"type":"Double","basicValue":-0.335912},{"type":"Double","basicValue":0.352279},{"type":"Double","basicValue":-0.490949}],[{"type":"Double","basicValue":0.45148},{"type":"Double","basicValue":-0.162402},{"type":"Double","basicValue":0.759855},{"type":"Double","basicValue":-0.407477},{"type":"Double","basicValue":-0.162402}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":23.4302}],[{"type":"Double","basicValue":1.58077}],[{"type":"Double","basicValue":0.0995387}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.931}],[{"type":"Double","basicValue":0.0628122}],[{"type":"Double","basicValue":0.00395518}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":25.403}],[{"type":"Double","basicValue":3.42626}],[{"type":"Double","basicValue":0.780315}]]}}}

Python Code

Show Code

import numpy as np
from sklearn.decomposition import TruncatedSVD as SklearnTruncatedSVD

def tsvd_reduce(data, n_components=2, tsvd_algorithm='randomized', n_iter=5, random_state=None):
    """
    Fit truncated singular value decomposition and return projected scores with variance summaries.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
        n_components (int, optional): Number of singular vectors to retain. Default is 2.
        tsvd_algorithm (str, optional): Solver used to compute the truncated decomposition. Valid options: Randomized, Arpack. Default is 'randomized'.
        n_iter (int, optional): Number of power iterations used by the randomized solver. Default is 5.
        random_state (int, optional): Integer seed for reproducible randomized solver runs. Leave blank for the estimator default. Default is None.

    Returns:
        dict: Excel data type containing projected scores, component vectors, and explained variance summaries.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def mat(values):
        return [[cell(value) for value in row] for row in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        if data_np.shape[0] < 2:
            return None, "Error: data must contain at least 2 samples"
        return data_np, None

    def orient_projection(scores, components):
        score_np = np.array(scores, dtype=float, copy=True)
        component_np = np.array(components, dtype=float, copy=True)
        limit = min(score_np.shape[1], component_np.shape[0])
        for index in range(limit):
            component_row = component_np[index, :]
            pivot = int(np.argmax(np.abs(component_row)))
            pivot_value = component_row[pivot]
            if pivot_value == 0 and score_np.shape[0] > 0:
                score_column = score_np[:, index]
                pivot_value = score_column[int(np.argmax(np.abs(score_column)))]
            if pivot_value < 0:
                component_np[index, :] *= -1.0
                score_np[:, index] *= -1.0
        return score_np, component_np

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        component_total = int(n_components)
        max_components = min(data_np.shape[0], data_np.shape[1])
        if component_total < 1 or component_total > max_components:
            return f"Error: n_components must be between 1 and {max_components}"

        algorithm_value = str(tsvd_algorithm).strip().lower()
        if algorithm_value not in {"randomized", "arpack"}:
            return "Error: tsvd_algorithm must be 'randomized' or 'arpack'"
        if algorithm_value == "arpack" and component_total >= max_components:
            return "Error: n_components must be below min(n_samples, n_features) when tsvd_algorithm is 'arpack'"
        if int(n_iter) < 1:
            return "Error: n_iter must be at least 1"

        fitted = SklearnTruncatedSVD(
            n_components=component_total,
            algorithm=algorithm_value,
            n_iter=int(n_iter),
            random_state=None if random_state in (None, "") else int(random_state)
        )

        scores_np = np.asarray(fitted.fit_transform(data_np), dtype=float)
        components_np = np.asarray(fitted.components_, dtype=float)
        scores_np, components_np = orient_projection(scores_np, components_np)
        explained_variance = np.atleast_1d(np.asarray(fitted.explained_variance_, dtype=float))
        variance_ratio = np.atleast_1d(np.asarray(fitted.explained_variance_ratio_, dtype=float))
        singular_values = np.atleast_1d(np.asarray(fitted.singular_values_, dtype=float))
        variance_ratio_sum = float(np.sum(variance_ratio))

        return {
            "type": "Double",
            "basicValue": variance_ratio_sum,
            "properties": {
                "explained_variance_ratio_sum": {"type": "Double", "basicValue": variance_ratio_sum},
                "component_count": {"type": "Double", "basicValue": float(components_np.shape[0])},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "scores": {"type": "Array", "elements": mat(scores_np.tolist())},
                "components": {"type": "Array", "elements": mat(components_np.tolist())},
                "explained_variance": {"type": "Array", "elements": col(explained_variance.tolist())},
                "explained_variance_ratio": {"type": "Array", "elements": col(variance_ratio.tolist())},
                "singular_values": {"type": "Array", "elements": col(singular_values.tolist())}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

data *

2D array of numeric input data with rows as samples and columns as features.

n_components

Number of singular vectors to retain.

tsvd_algorithm

Solver used to compute the truncated decomposition.

n_iter

Number of power iterations used by the randomized solver.

random_state

Integer seed for reproducible randomized solver runs. Leave blank for the estimator default.

VARIANCE_SEL

Variance threshold feature selection removes columns whose training-set variance falls below a chosen cutoff. It is a simple unsupervised filter for dropping constant or nearly constant features before modeling.

The sample variance for a feature X with n samples is computed as:

\text{Var}(X) = \frac{1}{n} \sum_{i=1}^n (x_i - \mu)^2

where \mu is the mean of the feature.

This wrapper accepts rows as samples and columns as features. It returns the filtered matrix together with per-feature variances, a boolean support mask, and the 1-based indices of the selected columns.

Excel Usage

=VARIANCE_SEL(data, threshold)

data (list[list], required): 2D array of numeric input data with rows as samples and columns as features.
threshold (float, optional, default: 0): Minimum variance required for a feature to be kept.

Returns (dict): Excel data type containing the selected feature matrix, support mask, and per-feature variances.

Example 1: Remove constant columns with the default threshold

Inputs:

data				threshold
1	0	5	10	0
1	1	5	11
1	0	5	12
1	1	5	13

Excel formula:

=VARIANCE_SEL({1,0,5,10;1,1,5,11;1,0,5,12;1,1,5,13}, 0)

Expected output:

{"type":"Double","basicValue":2,"properties":{"selected_feature_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":4},"feature_count":{"type":"Double","basicValue":4},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":4}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0.25}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1.25}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":0},{"type":"Double","basicValue":10}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":11}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":12}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":13}]]}}}

Example 2: Drop weakly varying columns with a positive threshold

Inputs:

data			threshold
0	1	10	0.01
0.1	1.1	11
0	0.9	12
0.1	1	13
0	1.1	14

Excel formula:

=VARIANCE_SEL({0,1,10;0.1,1.1,11;0,0.9,12;0.1,1,13;0,1.1,14}, 0.01)

Expected output:

{"type":"Double","basicValue":1,"properties":{"selected_feature_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":3},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":3}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.0024}],[{"type":"Double","basicValue":0.0056}],[{"type":"Double","basicValue":2}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":10}],[{"type":"Double","basicValue":11}],[{"type":"Double","basicValue":12}],[{"type":"Double","basicValue":13}],[{"type":"Double","basicValue":14}]]}}}

Example 3: Keep a single varying feature column

Inputs:

data	threshold
2	0
3
5
8
13

Excel formula:

=VARIANCE_SEL({2;3;5;8;13}, 0)

Expected output:

{"type":"Double","basicValue":1,"properties":{"selected_feature_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":1},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":true}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":11}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":5}],[{"type":"Double","basicValue":8}],[{"type":"Double","basicValue":13}]]}}}

Example 4: Keep only the highest-variance columns with a larger threshold

Inputs:

data		threshold
10	100	30
12	110
14	130
16	160
18	200

Excel formula:

=VARIANCE_SEL({10,0,100;12,0,110;14,0,130;16,0,160;18,0,200}, 30)

Expected output:

{"type":"Double","basicValue":1,"properties":{"selected_feature_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":3},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":3}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":8}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1320}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":100}],[{"type":"Double","basicValue":110}],[{"type":"Double","basicValue":130}],[{"type":"Double","basicValue":160}],[{"type":"Double","basicValue":200}]]}}}

Python Code

Show Code

import numpy as np
from sklearn.feature_selection import VarianceThreshold as SklearnVarianceThreshold

def variance_sel(data, threshold=0):
    """
    Remove low-variance features and return the selected columns with variance summaries.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
        threshold (float, optional): Minimum variance required for a feature to be kept. Default is 0.

    Returns:
        dict: Excel data type containing the selected feature matrix, support mask, and per-feature variances.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def mat(values):
        return [[cell(value) for value in row] for row in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        return data_np, None

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        threshold_value = float(threshold)
        if threshold_value < 0:
            return "Error: threshold must be non-negative"

        fitted = SklearnVarianceThreshold(threshold=threshold_value)
        transformed_np = np.asarray(fitted.fit_transform(data_np), dtype=float)
        variances = np.atleast_1d(np.asarray(fitted.variances_, dtype=float))
        support_mask = fitted.get_support().tolist()
        selected_indices = [index + 1 for index, keep in enumerate(support_mask) if keep]

        return {
            "type": "Double",
            "basicValue": float(len(selected_indices)),
            "properties": {
                "selected_feature_count": {"type": "Double", "basicValue": float(len(selected_indices))},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "selected_indices": {"type": "Array", "elements": col(selected_indices)},
                "support_mask": {"type": "Array", "elements": col(support_mask)},
                "variances": {"type": "Array", "elements": col(variances.tolist())},
                "selected_data": {"type": "Array", "elements": mat(transformed_np.tolist())}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

data *

2D array of numeric input data with rows as samples and columns as features.

threshold

Minimum variance required for a feature to be kept.