AI Experiment Logs & Ablation Studies

CMP Architecture Timeline

Early CMP

Initial tests validating the relational binding matrix. High memory footprint, unstable gradients over long horizons.

CMP v1

Introduced the Hash Encoder. Proved token-savings hypothesis but struggled with OOV (Out of Vocabulary) generalization and morphology.

CMP v1.5

Sparse-gated refinement introduced. Stabilized the recurrent state, allowing for deeper networks.

CMP v1.9

Deprecated the Hash Encoder. Introduced character n-gram features and LearnedSparseEncoder in the Yudi AI framework. Current focus on predictive output head.

PC-CMP Direction

Active experiments testing Predictive Coding (local prediction-error updates) as a replacement for global backpropagation in CMP.

Recent Results

Model Version	Params	Dataset	Training Method	Result	Interpretation / Caveat
CMP v1.5	12M	WikiText-103 (Sub)	Global BPTT	Converged faster than dense baseline.	Transformer PPL gap remains high on this specific task.
CMP v1.9	34M	Synthetic Relational	Global BPTT	Perfect recall on bounded entity-role mapping.	Does not prove generalization to unstructured natural language.
PC-CMP (Toy)	1M	MNIST Sequences	Predictive Coding	Local updates reached 92% of BPTT accuracy.	Predictive coding open questions remain at 10M+ scale.

Failure Modes & Corrections

Honest documentation of what hasn't worked.

Hash Encoder Limitations

In CMP v1, we utilized a Hash Encoder to rapidly sparsify token inputs. While computationally cheap, we discovered severe failure modes regarding Out-of-Vocabulary (OOV) terms and morphologically rich languages. The hash collisions created semantic smearing that was impossible to recover from in later layers. This led to the v1.9 correction: replacing it entirely with a LearnedSparseEncoder.

Transformer Perplexity (PPL) Gap

Despite theoretical memory advantages, our dense-matching baselines (like standard LLaMA-architecture Transformers) still drastically outperform current CMP versions on pure next-token prediction perplexity. We are investigating whether this is an architectural flaw in CMP or simply a symptom of using token-optimized datasets for a non-token architecture.

Need for Matched Baselines

Early experiments compared highly-optimized Transformer code against unoptimized PyTorch CMP implementations. We have since corrected our evaluation framework to ensure FLOPS-matched baselines, which reduced some of the "miraculous" efficiency gains we initially observed.

Open Benchmarks

Areas where we are actively seeking collaboration or developing new evaluation metrics:

Continual Learning

Few-Shot Learning

Cross-Modal Grounding

Interpretability

OOV and Morphology

Memory and State Tracking

Collaborate on Benchmarks