Show HN: OS Library for Conditional Gaussian Mixture Modelling in Python

2 points by sitmo 2 days ago

I've been working on a compact Python library called cgmm for regression modelling with Conditional Gaussian Mixture Models. It allows flexible, data-driven regression beyond Gaussian and linear assumptions.

It integrates with scikit-learn, comes with documentation and examples, and is available on PyPI.

Key features:

* model non-Gaussian conditional distributions

* capture non-linear dependencies

* handle heteroscedastic noise (variance that changes with inputs)

* provide full predictive distributions, not just point estimates

The current release added:

* Mixture of Experts (MoE): Softmax-gated experts with linear mean functions (Jordan & Jacobs, “Hierarchical Mixtures of Experts and the EM Algorithm”, Neural Computation, 1994)

* Direct conditional likelihood optimization: implementing EM from Jaakkola & Haussler, “Expectation-Maximization Algorithms for Conditional Likelihoods”, ICML 2000

Examples now cover a range of applications:

* ViX volatility Monte Carlo simulation (non-linear, non-Gaussian SDEs)

* Multivariate seasonal forecasts (temperature, windspeed, light intensity)

* Iris dataset + scikit-learn benchmarks

* Generative modelling of handwritten digits

Links:

Docs: https://cgmm.readthedocs.io/en/latest/

GitHub: https://github.com/sitmo/cgmm

PyPI: https://pypi.org/project/cgmm/

I'd love to get feedback from the community, especially on use cases where people model non-Gaussian, non-linear data.

sitmo 2 days ago

A quick note on how cgmm relates to existing tools:

* scikit-learn's GaussianMixture models the unconditional distribution of data. cgmm, on the other hand, models conditional distributions (p(y|x)), which makes it more suitable for regression and forecasting tasks.

* Compared to linear or generalized linear models, cgmm can capture multi-modal outputs, non-Gaussian behavior, and input-dependent variance.

* Compared to Bayesian frameworks (like PyMC or Stan), cgmm is more focused and lightweight: it provides efficient EM-based algorithms and scikit-learn–style APIs rather than full Bayesian inference.

So I see cgmm as complementary, a middle ground between simple regression models and full probabilistic programming frameworks, with a focus on conditional mixture models that are easy to drop into existing Python/ML pipelines.