Independently Verified

Quality & Trust

Every record in the BantuNomics corpus is validated, certified, and auditable. Our three-line defense system ensures linguistic accuracy from generation through delivery.

Three Lines of Defense

1

Engine Validation

Pre-Recording

The BTS engine generates every form deterministically from the cartridge. Built-in constraint checks reject impossible morpheme combinations before any data leaves the system.

2

Acoustic Validation

Post-Recording

Automated pipeline measures F₀ contours, duration, spectral features, and compares against the engine's tonal predictions. Each recording receives a pass/fail per thesis.

3

Human Review

Peer Verification

Native speaker reviewers verify naturalness, intelligibility, and correctness. Critical records receive multiple independent reviews before certification.

The 7-Dimension Analysis Matrix

Every recording is analyzed across seven independent acoustic dimensions.

1
F₀ Pitch Needle
Tracks fundamental frequency contour against predicted tonal pattern. Measures H/L tone accuracy at each mora.
2
Prosodic Speedometer
Verifies phrase-medial syllables are short and sentence-final penultimates are correctly lengthened (Dᵣ ≈ 0.45).
3
Nasal Bridge Auditor
Spectral analysis verifies nasal consonant harmony (l→n) was correctly articulated in nasal contexts.
4
Moraic Alignment
Ensures syllable boundaries align with mora-level predictions from the engine's tonal decomposition.
5
Spectral Tilt
Measures the balance of harmonic energy — distinguishes breathy, modal, and pressed phonation across speakers.
6
Formant Mapping
F₁/F₂ analysis verifies vowel identity against the syllable inventory baseline. Critical for vowel harmony validation.
7
LDR Stability
Linguistic Delta Report tracks variance between stress-test metrics and bulk data. Δ < 1% confirms internal consistency.

Validation Passport

Every record carries a machine-readable certificate documenting exactly which rules were verified.

// Example Validation Passport (simplified)
{
"record_id": "bem_v_bomba_01847",
"validations": {
"meeusen_rule": "PASS",
"binary_spreading": "PASS",
"nasal_harmony": "PASS",
"penultimate_length": "PASS",
"f0_contour_match": "0.94"
},
"certified": true,
"bts_version": "3.2.1"
}

MRS Audit

~130 stress-test records exercise every boundary case. Independently reviewed by frontier AI.

100% Tonal Accuracy

Gemini 2.5 Flash independently reviewed 132 stress-test records and found zero tonal logic errors.

48/48
Meeussen's Rule
82/82
Binary Spreading
22/22
Nasal Harmony
26/26
Melodic Override

LDR Certification

Mathematical proof of dataset internal consistency.

How It Works

The Linguistic Delta Report computes the variance (Δ) between MRS stress-test metrics and the same metrics across the full bulk dataset.

If the same rules are applied uniformly, the delta approaches zero. This is the mathematical "check engine light" for the entire corpus.

Certification Thresholds

Δ < 1% AUTHENTIC — certified consistent
Δ 1-5% REVIEW — manual inspection required
Δ > 5% REJECT — systematic error detected