3. The density matrix as a feature

Stage 6 Hilbert → ρ → linear SVM 10-fold × 10 CV N = 28

The pivot in Section 2 ended on a question. What does a feature space of the right shape look like for this signal? Theta/beta ratio is one specific reduction of a much richer object. The classifier that survived cross-validation was reading off relative beta power across the back of the head, but the deeper observation is that the cleaned EEG carries cross-channel and cross-frequency structure the textbook power-and-coherence menu collapses into pairwise scalars. The density matrix is the smallest object I know that keeps that structure intact.

The construction is short. For each band \(b\) in \(\{\delta, \theta, \alpha, \beta\}\), zero-phase Butterworth bandpass each of the fifteen channels, take the Hilbert transform of each, and stack them at every time point into a complex column vector \(\psi(t) \in \mathbb{C}^{15}\). Normalize \(\psi(t)\) to unit length, take the outer product against itself, and average over all kept time samples across all epochs:

\[ \rho_b \;=\; \frac{1}{T} \sum_t \frac{\psi(t)\,\psi(t)^{\dagger}}{\|\psi(t)\|^2} \]

The result is a Hermitian, positive semi-definite, trace-one \(15 \times 15\) complex matrix. It is a density matrix in the formal quantum-mechanical sense, even though every step of the construction is classical signal processing. Across twenty-eight children and four bands, max trace error is \(4 \times 10^{-16}\) and min eigenvalue stays above \(-7 \times 10^{-17}\), both within float64 round-off. Cauchy-Schwarz holds throughout.

Figure 3 (interactive). Cohort-mean magnitude of \(\rho_b\) across the four bands (\(N = 28\)). Switch the band with the dropdown. Hover any cell to read the channel pair, \(|\rho_{ij}|\), and the off-diagonal phase \(\arg(\rho_{ij})\) in radians. Diagonal entries are channel occupation probabilities; off-diagonal entries carry cross-channel covariance and phase relationships that pairwise coherence and band power summaries throw away. The alpha band’s central-parietal block is consistent with the SHAP top-three from Section 2 (Pz, P3, O1).

Figure 4. Animated construction of \(\rho_b\) from the time-averaged outer product of the band-limited analytic signal across channels for a single subject.

For each subject and band, I flatten the matrix into a real feature vector. Fifteen real diagonals plus the real and imaginary parts of the 105 strict upper-triangle entries gives 225 reals per band, 900 features in total at \(N = 15\). The \((\mathrm{Re}, \mathrm{Im})\) decomposition is mathematically equivalent to magnitude and phase but avoids the cyclic-angle pathology that breaks linear classifiers on phase. A linear SVM with top-ten ANOVA feature selection inside each fold runs under repeated stratified 10-fold cross-validation, ten repeats, fixed seed, the same fold indices reused for every model on every feature set. On the same twenty-eight children, that classifier reaches a mean balanced accuracy of 0.657 and a mean ROC AUC of 0.780.

For comparison, I ran the same children through the standard QEEG menu under the same cross-validation: absolute and relative band power, frontal alpha asymmetry, theta/beta ratio, fronto-parietal coherence, alpha reactivity, and band-limited covariance, 622 features in total. The best classifier on that feature set is a Random Forest at 0.618 mean balanced accuracy, 0.615 ROC AUC. On the same data, the density-matrix features pull the AUC from 0.615 to 0.780. The 95% bootstrap confidence intervals on the means do not overlap (DM-SVM [0.71, 0.85]; classical-RF [0.54, 0.70]), and a paired Wilcoxon test across the 100 matched folds gives \(W = 1721\), \(p = 0.005\) on AUC. On balanced accuracy the gap is smaller and not distinguishable from zero at this \(N\) (\(W = 2193\), \(p = 0.25\)) because balanced accuracy at \(N = 28\) takes a small set of discrete values, but the AUC contrast survives the paired test on the same folds.

The density matrix also appears in supervised quantum machine learning, but as a fidelity kernel rather than a feature. By Schuld’s 2021 result on the kernel-equivalence of supervised quantum models (Schuld, 2021), a quantum-kernel SVM with feature map \(|\psi(x)\rangle\) computes pairwise similarities as squared overlaps:

\[ K_Q(x, y) \;=\; |\langle\psi(x)|\psi(y)\rangle|^2 \;=\; \mathrm{Tr}(\rho_x \rho_y) \]

Read this equation, term by term

K_Q(x, y): The quantum-fidelity kernel: a single non-negative number measuring how similar inputs x and y are after the encoding circuit lifts them into Hilbert space.
|ψ(x)⟩, |ψ(y)⟩: The two inputs encoded as quantum state vectors. The bra-ket symbols are physics shorthand for column vectors and their conjugate transposes.
⟨ψ(x) | ψ(y)⟩: The complex inner product. Close to one when the two encoded states overlap; close to zero when they are orthogonal.
| · |²: Squared magnitude. Maps the complex inner product back to a real, non-negative similarity.
Tr(ρ_x ρ_y): The Hilbert-Schmidt inner product on density matrices. Schuld (2021) showed this equals the squared overlap above, so the kernel can be computed from the density matrices alone, without ever evaluating the encoding circuit.

If the quantum kernel and the explicit density-matrix features were equivalent on this data, an SVM with a precomputed Hilbert-Schmidt kernel \(K(s, t) = \frac{1}{B}\sum_b \mathrm{Tr}(\rho_b^s \rho_b^t)\) should land in the same neighbourhood as a parameterised quantum-circuit kernel on quantum-inspired summary features (QEPP, von Neumann entropy, and related terms from the pipeline’s Stage 5). On this cohort, under matched cross-validation, the two kernels do not converge. The Hilbert-Schmidt SVM reaches mean balanced accuracy 0.540, AUC 0.395. The parameterised quantum kernel on the Stage 5 features reaches 0.495 and 0.525. The paired Wilcoxon difference between the two kernels is +0.045 with \(p = 0.44\), not distinguishable from zero.

Figure 5. Mean ROC AUC across five classifiers under the same matched 10-fold × 10 cross-validation, \(N = 28\). Whiskers on the top three bars are 95% bootstrap CIs over 10 000 resamples of the 100 per-fold AUCs; the two kernel proxies are mean only because per-fold AUCs were not retained by the original Stage 5 / Hilbert-Schmidt evaluators and the rerun is queued for the next pipeline build. The two kernel routes to the density matrix (red) sit on either side of the random-classifier line at 0.50, while the direct-feature classifiers (purple) cluster well above the classical baseline (teal); the DM-SVM and classical-RF intervals do not overlap. The dashed vertical rule marks chance performance.

The two kernels reach below the chance line for different reasons, and the right reading needs both. The parameterised quantum-circuit kernel runs on the Stage 5 quantum-inspired summary features, which are first compressed through PCA to six dimensions and rescaled to \([0, \pi]\) before encoding; none of those steps preserve the off-diagonal complex structure that the explicit density matrix exposes, so the QSVM is operating on a lossy projection. The Hilbert-Schmidt kernel, by contrast, reads \(\rho\) directly without PCA. At \(N = 28\) with the binarised executive-function outcome the 95% bootstrap CI on AUC = 0.395 overlaps the 0.5 chance line, so whether that score reflects a structured anti-alignment between the cohort-mean fidelity geometry and the median split or sub-sample variance at this \(N\) is exactly what the ds004284 replication is designed to test. What the explicit-feature result shows is more limited and more defensible: on this cohort and at this scale, the entries of \(\rho\) carry information that survives explicit feature extraction in a way the similarity-matrix view does not.

Figure 6. Two-dimensional metric-MDS embedding of the 28 subjects under each kernel, coloured by the median split on Global Executive Function. The linear-kernel embedding on the flat density-matrix features (after standardisation), shown first, separates the two groups more cleanly; clicking through to the Hilbert-Schmidt embedding shows substantial overlap between groups under the fidelity geometry. The geometry is the visualisation of the AUC contrast in Figure 5.

Figure 7 (interactive). Per-fold balanced accuracy under the same matched 10-fold × 10-repeats split used for Figure 5, with each fold’s score averaged across the ten repeats. Drag the slider or hover any bar to see the top features the classifier relied on at that fold. Classical features are top-three relative beta power at posterior channels (Pz, P3, O1) every fold; the structured-feature panel surfaces a mixture of \(\rho_{ii}\) diagonals and off-diagonal real / imaginary parts that varies across folds, which is the small-sample noise that underwrites the per-fold variance you can see in the bar heights.

What the data shows, end to end: a linear classifier on the 900 entries of \(\rho\) outperforms the best classical QEEG classifier on this cohort by a substantial AUC gap (0.780 versus 0.615). Two parallel kernel routes to the same density matrix, one through the Hilbert-Schmidt inner product and one through a parameterised circuit, both underperform the direct features. The signal lives in the entries of \(\rho\). The kernels are one specific way of accessing it that, at this scale and on these features, throws information away.

Notation

Symbol	Meaning
\(\rho_b\)	density matrix at frequency band \(b \in \{\delta, \theta, \alpha, \beta\}\), Hermitian positive semi-definite, trace one
\(\psi(t)\)	complex multichannel analytic signal at time \(t\), vector in \(\mathbb{C}^{15}\)
\(\psi(t)^{\dagger}\)	conjugate transpose of \(\psi(t)\)
\(K_Q(x, y)\)	quantum-fidelity kernel: squared overlap of embedded states
\(\mathrm{Tr}(\rho_x \rho_y)\)	Hilbert-Schmidt inner product of two density matrices
\(p\)	paired Wilcoxon two-sided p-value comparing matched per-fold accuracies

→ Continue to Section 4: What this opens up

References

Schuld, M. (2021). Supervised quantum machine learning models are kernel methods. arXiv Preprint. https://arxiv.org/abs/2101.11020