Cognitive Representation Learner
- CogRL is a neural framework that automatically discovers cognitive models by extracting interpretable Knowledge Components from problem content.
- It employs tailored architectures—CNNs for images and bi-LSTMs for text—to process diverse input modalities and construct binary Q-matrices via thresholded bottleneck activations.
- CogRL improves adaptive tutoring by yielding lower AFM prediction errors and near-perfect correlations in skill learning rate estimates compared to human-annotated models.
The Cognitive Representation Learner (CogRL) is a neural framework for automatic cognitive model discovery in domains where student performance data is unavailable and substantial human knowledge engineering is infeasible. CogRL is designed to extract interpretable skill structure directly from problem content, producing representations that map to Knowledge Components (KCs) and enable estimation of skill difficulty and learning rate parameters. This is achieved through a principled recipe: neural architecture selection based on input modality, training for answer prediction, and systematic extraction and thresholding of intermediate representations.
1. Framework Architecture and Modalities
CogRL is implemented as a pipeline tailored to the input modalities present in the tutoring domain, specifically images (e.g., Rumble Blocks, Chinese Character recognition) and variable-length text (Article Selection). Two architectures operationalize the approach:
- Convolutional Neural Network (CNN; used for images):
- Input: RGB image
- Layers: One convolutional layer ( filters , stride ), per-channel learned nonlinearity (), flattening, fully-connected layer of size (pre-output bottleneck layer), output layer (sigmoid or softmax dependent on domain).
- Rumble Blocks configuration: , , , filters.
- Chinese Character configuration: , , , filters.
- Bi-directional LSTM (used for variable-length sequences):
- Input: Sentence with a blank, split into left/right character sequences.
- Embedding: Each character is mapped to a 32-dimensional vector.
- LSTM: Uses standard gates—input (), forget (), cell (, ), output (), and hidden state ().
- Processing: Forward LSTM over left segment, backward LSTM over right segment; final hidden states concatenated, fully connected to 256 units, followed by 50-unit pre-output layer, then 3-way softmax for choices (“a,” “an,” “the”).
Both architectures employ a linear activation in the pre-output (bottleneck) layer to ensure direct interpretability.
2. Training Protocol and Objective Functions
CogRL networks are trained as supervised classifiers for mapping problem content to correct answers. The training objective is cross-entropy loss with optional L2 regularization:
where is problem ’s raw content, is its correct-answer one-hot vector, is the predicted output, and is the weight decay (typically set to zero or ). Training utilizes stochastic gradient descent (SGD) or Adam with batch size 32, initial learning rate of $0.01$, and early stopping on held-out splits or fixed epoch count (e.g., 20).
3. Representation Extraction and Q-Matrix Construction
Post-training, each problem is forwarded through the network, and its -dimensional (here, ) bottleneck activations are interpreted as candidate KCs. The binary Q-matrix (where is the number of problems) is constructed via thresholding:
with threshold . This signifies that if activation exceeds $0.95$, problem is considered to require KC .
4. Skill-Parameter Estimation with AFM
For domains with available student logs, the discovered Q-matrix is utilized to fit an Additive Factors Model (AFM):
where is correctness of student on problem , denotes student ability, KC difficulty, KC learning rate, and is the number of prior opportunities for student with KC before . Parameters are fit by maximum likelihood estimation (e.g., using glmnet). To assay agreement between CogRL-enabled cognitive models and human-annotated baselines, Pearson correlation is computed on .
5. Quantitative Results and Comparative Performance
CogRL yields lower RMSE for AFM predictions compared to faculty transfer, identical transfer, and best human model baselines across multiple datasets. Table 1 summarizes item-stratified RMSE:
| Dataset | Faculty | Identical | Human | CogRL |
|---|---|---|---|---|
| Chinese Character | 0.471 | 0.493 | 0.465 | 0.444 |
| Rumble Blocks | 0.451 | 0.537 | 0.451 | 0.449 |
| Article Selection | 0.415 | 0.522 | 0.411 | 0.399 |
For skill-parameter estimation (Article Selection), Apprentice learner simulations using CogRL representations produced AFM parameter estimates closely matched with actual student fits:
| Method | ||
|---|---|---|
| Human-Authored Features | 0.742 | –0.187 |
| CogRL Representations | 0.748 | 0.986 |
A high correlation demonstrates that CogRL effectively captures skill learning rates without access to student response data.
6. Implementation Considerations
- Preprocessing:
- Images rescaled (75×100 Rumble, 16×16 Chinese), pixel values normalized to [0,1].
- Article Selection text lowercased, punctuation removed (excluding word separators), split at blank, characters mapped to indices (alphabet plus blank).
- Network hyperparameters: Pre-output dimension , character embeddings of size 32, LSTM hidden layer size 256, convolutions with 10 filters, activation in conv, standard LSTM gating (, ).
- Training: Batch size 32, initial LR ∼ with decay, optional regularization with .
- Q-matrix threshold: ; tunable in [0.8, 0.99].
7. Applications and Practical Significance
CogRL is applicable to bootstrapping cognitive models in ill-structured, perceptually rich domains where hand-authoring KCs is challenging or impossible. By bypassing the need for student performance data at the discovery stage, CogRL enables generation of effective Q-matrices and AFM estimates for new tutoring systems. The approach is empirically validated to produce skill parameters in near-perfect agreement () with those derived from human-labeled models and real student outcomes. This suggests utility in rapid prototyping, feature engineering, and adaptive tutor initialization for domains lacking prior cognitive modeling infrastructure (Chaplot et al., 2018).