Papers
Topics
Authors
Recent
Search
2000 character limit reached

Cognitive Representation Learner

Updated 29 January 2026
  • CogRL is a neural framework that automatically discovers cognitive models by extracting interpretable Knowledge Components from problem content.
  • It employs tailored architectures—CNNs for images and bi-LSTMs for text—to process diverse input modalities and construct binary Q-matrices via thresholded bottleneck activations.
  • CogRL improves adaptive tutoring by yielding lower AFM prediction errors and near-perfect correlations in skill learning rate estimates compared to human-annotated models.

The Cognitive Representation Learner (CogRL) is a neural framework for automatic cognitive model discovery in domains where student performance data is unavailable and substantial human knowledge engineering is infeasible. CogRL is designed to extract interpretable skill structure directly from problem content, producing representations that map to Knowledge Components (KCs) and enable estimation of skill difficulty and learning rate parameters. This is achieved through a principled recipe: neural architecture selection based on input modality, training for answer prediction, and systematic extraction and thresholding of intermediate representations.

1. Framework Architecture and Modalities

CogRL is implemented as a pipeline tailored to the input modalities present in the tutoring domain, specifically images (e.g., Rumble Blocks, Chinese Character recognition) and variable-length text (Article Selection). Two architectures operationalize the approach:

  • Convolutional Neural Network (CNN; used for images):
    • Input: RGB image xRH×W×3x \in \mathbb{R}^{H \times W \times 3}
    • Layers: One convolutional layer (KjK_j filters kijRr×rk_{ij} \in \mathbb{R}^{r \times r}, stride ss), per-channel learned nonlinearity (yj=gjtanh(ikijxi)y_j = g_j \cdot \tanh(\sum_i k_{ij} * x_i)), flattening, fully-connected layer of size D=50D=50 (pre-output bottleneck layer), output layer (sigmoid or softmax dependent on domain).
    • Rumble Blocks configuration: H×W=75×100H\times W=75\times 100, r=10r=10, s=5s=5, J=10J=10 filters.
    • Chinese Character configuration: H×W=16×16H\times W=16\times 16, r=4r=4, s=2s=2, J=10J=10 filters.
  • Bi-directional LSTM (used for variable-length sequences):
    • Input: Sentence with a blank, split into left/right character sequences.
    • Embedding: Each character is mapped to a 32-dimensional vector.
    • LSTM: Uses standard gates—input (iti_t), forget (ftf_t), cell (gtg_t, ctc_t), output (oto_t), and hidden state (hth_t).
    • Processing: Forward LSTM over left segment, backward LSTM over right segment; final hidden states concatenated, fully connected to 256 units, followed by 50-unit pre-output layer, then 3-way softmax for choices (“a,” “an,” “the”).

Both architectures employ a linear activation in the pre-output (bottleneck) layer to ensure direct interpretability.

2. Training Protocol and Objective Functions

CogRL networks are trained as supervised classifiers for mapping problem content to correct answers. The training objective is cross-entropy loss with optional L2 regularization:

L(Θ)=pcycplogy^cp+λΘ22L(\Theta) = -\sum_{p} \sum_{c} y^p_c \log \hat{y}^p_c + \lambda \|\Theta\|_2^2

where xpx^p is problem pp’s raw content, ypy^p is its correct-answer one-hot vector, y^p=f(xp;Θ)\hat{y}^p = f(x^p;\Theta) is the predicted output, and λ\lambda is the weight decay (typically set to zero or 1e41e^{-4}). Training utilizes stochastic gradient descent (SGD) or Adam with batch size 32, initial learning rate of $0.01$, and early stopping on held-out splits or fixed epoch count (e.g., 20).

3. Representation Extraction and Q-Matrix Construction

Post-training, each problem pp is forwarded through the network, and its DD-dimensional (here, D=50D=50) bottleneck activations rp=(r1p,,rDp)r^p = (r^p_1,\ldots,r^p_D) are interpreted as candidate KCs. The binary Q-matrix Q{0,1}P×DQ \in \{0,1\}^{P \times D} (where PP is the number of problems) is constructed via thresholding:

Qp,j={1if rjpτ 0otherwiseQ_{p, j} = \begin{cases} 1 & \text{if } r^p_j \geq \tau \ 0 & \text{otherwise} \end{cases}

with threshold τ=0.95\tau=0.95. This signifies that if activation rjpr^p_j exceeds $0.95$, problem pp is considered to require KC jj.

4. Skill-Parameter Estimation with AFM

For domains with available student logs, the discovered Q-matrix is utilized to fit an Additive Factors Model (AFM):

logitPr(yi,p=1)=θi+j=1DQp,j[βj+γjNi,j(p)]\text{logit} \Pr(y_{i, p}=1) = \theta_i + \sum_{j=1}^D Q_{p, j} \left[\beta_j + \gamma_j N_{i, j}(p)\right]

where yi,py_{i,p} is correctness of student ii on problem pp, θi\theta_i denotes student ability, βj\beta_j KC difficulty, γj\gamma_j KC learning rate, and Ni,j(p)N_{i,j}(p) is the number of prior opportunities for student ii with KC jj before pp. Parameters are fit by maximum likelihood estimation (e.g., using glmnet). To assay agreement between CogRL-enabled cognitive models and human-annotated baselines, Pearson correlation is computed on {βj,γj}\{\beta_j, \gamma_j\}.

5. Quantitative Results and Comparative Performance

CogRL yields lower RMSE for AFM predictions compared to faculty transfer, identical transfer, and best human model baselines across multiple datasets. Table 1 summarizes item-stratified RMSE:

Dataset Faculty Identical Human CogRL
Chinese Character 0.471 0.493 0.465 0.444
Rumble Blocks 0.451 0.537 0.451 0.449
Article Selection 0.415 0.522 0.411 0.399

For skill-parameter estimation (Article Selection), Apprentice learner simulations using CogRL representations produced AFM parameter estimates closely matched with actual student fits:

Method ρintercept\rho_{\text{intercept}} ρslope\rho_{\text{slope}}
Human-Authored Features 0.742 –0.187
CogRL Representations 0.748 0.986

A high correlation ρslope=0.986\rho_{\text{slope}}=0.986 demonstrates that CogRL effectively captures skill learning rates without access to student response data.

6. Implementation Considerations

  • Preprocessing:
    • Images rescaled (75×100 Rumble, 16×16 Chinese), pixel values normalized to [0,1].
    • Article Selection text lowercased, punctuation removed (excluding word separators), split at blank, characters mapped to indices (alphabet plus blank).
  • Network hyperparameters: Pre-output dimension D=50D=50, character embeddings of size 32, LSTM hidden layer size 256, convolutions with 10 filters, tanh\tanh activation in conv, standard LSTM gating (σ\sigma, tanh\tanh).
  • Training: Batch size 32, initial LR ∼ 1e21e^{-2} with decay, optional regularization with λ1e4\lambda \sim 1e^{-4}.
  • Q-matrix threshold: τ=0.95\tau=0.95; tunable in [0.8, 0.99].

7. Applications and Practical Significance

CogRL is applicable to bootstrapping cognitive models in ill-structured, perceptually rich domains where hand-authoring KCs is challenging or impossible. By bypassing the need for student performance data at the discovery stage, CogRL enables generation of effective Q-matrices and AFM estimates for new tutoring systems. The approach is empirically validated to produce skill parameters in near-perfect agreement (ρ0.99\rho \approx 0.99) with those derived from human-labeled models and real student outcomes. This suggests utility in rapid prototyping, feature engineering, and adaptive tutor initialization for domains lacking prior cognitive modeling infrastructure (Chaplot et al., 2018).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cognitive Representation Learner (CogRL).