Cognitive Representation Learner

Updated 29 January 2026

CogRL is a neural framework that automatically discovers cognitive models by extracting interpretable Knowledge Components from problem content.
It employs tailored architectures—CNNs for images and bi-LSTMs for text—to process diverse input modalities and construct binary Q-matrices via thresholded bottleneck activations.
CogRL improves adaptive tutoring by yielding lower AFM prediction errors and near-perfect correlations in skill learning rate estimates compared to human-annotated models.

The Cognitive Representation Learner (CogRL) is a neural framework for automatic cognitive model discovery in domains where student performance data is unavailable and substantial human knowledge engineering is infeasible. CogRL is designed to extract interpretable skill structure directly from problem content, producing representations that map to Knowledge Components (KCs) and enable estimation of skill difficulty and learning rate parameters. This is achieved through a principled recipe: neural architecture selection based on input modality, training for answer prediction, and systematic extraction and thresholding of intermediate representations.

1. Framework Architecture and Modalities

CogRL is implemented as a pipeline tailored to the input modalities present in the tutoring domain, specifically images (e.g., Rumble Blocks, Chinese Character recognition) and variable-length text (Article Selection). Two architectures operationalize the approach:

Convolutional Neural Network (CNN; used for images):
- Input: RGB image $x \in \mathbb{R}^{H \times W \times 3}$
- Layers: One convolutional layer ( $K_j$ filters $k_{ij} \in \mathbb{R}^{r \times r}$ , stride $s$ ), per-channel learned nonlinearity ( $y_j = g_j \cdot \tanh(\sum_i k_{ij} * x_i)$ ), flattening, fully-connected layer of size $D=50$ (pre-output bottleneck layer), output layer (sigmoid or softmax dependent on domain).
- Rumble Blocks configuration: $H\times W=75\times 100$ , $r=10$ , $s=5$ , $J=10$ filters.
- Chinese Character configuration: $H\times W=16\times 16$ , $r=4$ , $s=2$ , $J=10$ filters.
Bi-directional LSTM (used for variable-length sequences):
- Input: Sentence with a blank, split into left/right character sequences.
- Embedding: Each character is mapped to a 32-dimensional vector.
- LSTM: Uses standard gates—input ( $i_t$ ), forget ( $f_t$ ), cell ( $g_t$ , $c_t$ ), output ( $o_t$ ), and hidden state ( $h_t$ ).
- Processing: Forward LSTM over left segment, backward LSTM over right segment; final hidden states concatenated, fully connected to 256 units, followed by 50-unit pre-output layer, then 3-way softmax for choices (“a,” “an,” “the”).

Both architectures employ a linear activation in the pre-output (bottleneck) layer to ensure direct interpretability.

2. Training Protocol and Objective Functions

CogRL networks are trained as supervised classifiers for mapping problem content to correct answers. The training objective is cross-entropy loss with optional L2 regularization:

$L(\Theta) = -\sum_{p} \sum_{c} y^p_c \log \hat{y}^p_c + \lambda \|\Theta\|_2^2$

where $x^p$ is problem $p$ ’s raw content, $y^p$ is its correct-answer one-hot vector, $\hat{y}^p = f(x^p;\Theta)$ is the predicted output, and $\lambda$ is the weight decay (typically set to zero or $1e^{-4}$ ). Training utilizes stochastic gradient descent (SGD) or Adam with batch size 32, initial learning rate of $0.01$, and early stopping on held-out splits or fixed epoch count (e.g., 20).

3. Representation Extraction and Q-Matrix Construction

Post-training, each problem $p$ is forwarded through the network, and its $D$ -dimensional (here, $D=50$ ) bottleneck activations $r^p = (r^p_1,\ldots,r^p_D)$ are interpreted as candidate KCs. The binary Q-matrix $Q \in \{0,1\}^{P \times D}$ (where $P$ is the number of problems) is constructed via thresholding:

$Q_{p, j} = \begin{cases} 1 & \text{if } r^p_j \geq \tau \ 0 & \text{otherwise} \end{cases}$

with threshold $\tau=0.95$ . This signifies that if activation $r^p_j$ exceeds $0.95$, problem $p$ is considered to require KC $j$ .

4. Skill-Parameter Estimation with AFM

For domains with available student logs, the discovered Q-matrix is utilized to fit an Additive Factors Model (AFM):

$\text{logit} \Pr(y_{i, p}=1) = \theta_i + \sum_{j=1}^D Q_{p, j} \left[\beta_j + \gamma_j N_{i, j}(p)\right]$

where $y_{i,p}$ is correctness of student $i$ on problem $p$ , $\theta_i$ denotes student ability, $\beta_j$ KC difficulty, $\gamma_j$ KC learning rate, and $N_{i,j}(p)$ is the number of prior opportunities for student $i$ with KC $j$ before $p$ . Parameters are fit by maximum likelihood estimation (e.g., using glmnet). To assay agreement between CogRL-enabled cognitive models and human-annotated baselines, Pearson correlation is computed on $\{\beta_j, \gamma_j\}$ .

5. Quantitative Results and Comparative Performance

CogRL yields lower RMSE for AFM predictions compared to faculty transfer, identical transfer, and best human model baselines across multiple datasets. Table 1 summarizes item-stratified RMSE:

Dataset	Faculty	Identical	Human	CogRL
Chinese Character	0.471	0.493	0.465	0.444
Rumble Blocks	0.451	0.537	0.451	0.449
Article Selection	0.415	0.522	0.411	0.399

For skill-parameter estimation (Article Selection), Apprentice learner simulations using CogRL representations produced AFM parameter estimates closely matched with actual student fits:

Method	$\rho_{\text{intercept}}$	$\rho_{\text{slope}}$
Human-Authored Features	0.742	–0.187
CogRL Representations	0.748	0.986

A high correlation $\rho_{\text{slope}}=0.986$ demonstrates that CogRL effectively captures skill learning rates without access to student response data.

6. Implementation Considerations

Preprocessing:
- Images rescaled (75×100 Rumble, 16×16 Chinese), pixel values normalized to [0,1].
- Article Selection text lowercased, punctuation removed (excluding word separators), split at blank, characters mapped to indices (alphabet plus blank).
Network hyperparameters: Pre-output dimension $D=50$ , character embeddings of size 32, LSTM hidden layer size 256, convolutions with 10 filters, $\tanh$ activation in conv, standard LSTM gating ( $\sigma$ , $\tanh$ ).
Training: Batch size 32, initial LR ∼ $1e^{-2}$ with decay, optional regularization with $\lambda \sim 1e^{-4}$ .
Q-matrix threshold: $\tau=0.95$ ; tunable in [0.8, 0.99].

7. Applications and Practical Significance

CogRL is applicable to bootstrapping cognitive models in ill-structured, perceptually rich domains where hand-authoring KCs is challenging or impossible. By bypassing the need for student performance data at the discovery stage, CogRL enables generation of effective Q-matrices and AFM estimates for new tutoring systems. The approach is empirically validated to produce skill parameters in near-perfect agreement ( $\rho \approx 0.99$ ) with those derived from human-labeled models and real student outcomes. This suggests utility in rapid prototyping, feature engineering, and adaptive tutor initialization for domains lacking prior cognitive modeling infrastructure (Chaplot et al., 2018).

Markdown Report Issue Upgrade to Chat

References (1)

Learning Cognitive Models using Neural Networks (2018)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cognitive Representation Learner (CogRL).