Papers
Topics
Authors
Recent
Search
2000 character limit reached

Architectural Fingerprinting Overview

Updated 18 January 2026
  • Architectural fingerprinting is the process of identifying unique system architectures by analyzing invariant external signals, spanning networks, hardware, and machine learning models.
  • It employs specialized methods such as traffic segmentation in SCADA, timing analysis in SDNs, and sub-unit variability detection in GPUs to derive robust fingerprints.
  • Empirical evaluations demonstrate high accuracy with metrics like F-score ≈ 0.97 in SCADA mapping and near-zero inter-device similarity in hardware fingerprinting.

Architectural fingerprinting is the process of inferring, identifying, or uniquely distinguishing the structure and internal components of technological systems—ranging from industrial control networks and hardware devices to neural networks and LLMs—using externally observable features, typically through black-box access or passive monitoring. Unlike implementation-level or software-only fingerprinting, architectural fingerprinting leverages invariants rooted in hardware design, protocol structure, or learned model behaviors to yield high-confidence identification or fine-grained mapping of architectural elements across a diverse set of computing environments.

1. Principles and Scope of Architectural Fingerprinting

Architectural fingerprinting operates by systematically extracting features that are indicative of system structure or intrinsic hardware/machine-learning model properties, in a manner that is agnostic to vendor, software configuration, or protocol implementation detail. The central aim is to recover key aspects of the system "architecture"—for example, network roles and communication hierarchy in SCADA networks, flow-table installation logic in SDNs, microarchitectural process variation in GPUs/DRAM, or family membership of a DNN or LLM—using only externally obtainable signals, such as network traces, side-channel delays, or observable model outputs.

Underlying this approach is the hypothesis that despite protocol irregularities, randomization at the implementation layer, or user-configurable features, certain invariants persist due to their roots in hardware manufacturing variability, protocol semantics, or statistical properties of learned function classes. These invariants form the crux of the fingerprint and can be formalized in terms of traffic statistics, timing behaviors, bit-flip patterns, hardware response vectors, or output-label distributions.

2. Architectural Fingerprinting in Communication Networks and Cyber-Physical Systems

The architectural fingerprinting of Supervisory Control And Data Acquisition (SCADA) networks in critical infrastructure exemplifies passive, protocol-agnostic inference by extracting five robust network-level invariants: periodicity, communication durability, device complexity gap, service popularity, and segment size. These are extracted from raw pcap traces using algorithmic segmentation and scoring techniques, as outlined below (Jeon et al., 2016):

  • Traffic segmentation: Packets are grouped into "communication segments" based on inter-packet timing (tcommt_{\text{comm}}), and represented as five-tuples SrcIP,SrcPort,DstIP,DstPort,SegSize\langle \mathrm{SrcIP}, \mathrm{SrcPort}, \mathrm{DstIP}, \mathrm{DstPort}, \mathrm{SegSize}\rangle.
  • Composite scoring: Each tuple is ranked by a composite likelihood score f(ft)=pR×dR×cR×uR×sRf(ft) = pR \times dR \times cR \times uR \times sR, where:
    • pRpR = mean/variance of segment inter-arrival time (periodicity),
    • dRdR = total uptime (sum of inter-arrivals) times log\log(segment count) (durability),
    • cRcR = ratio of source to destination port diversity (complexity gap),
    • uRuR = relative port popularity,
    • sRsR = normalized segment size.
  • Inference workflow: The highest scoring tuple reveals the SCADA protocol port, while node degree and traffic dominance identify field devices and master servers. The hierarchical role of each host is deduced using quantitative thresholds, without the need to decode application-layer contents or protocol semantics.

This approach is validated on month-scale real-world traces, achieving precision and recall near unity (e.g., F-score ≈ 0.9709), with full SCADA maps reconstructable from small trace subsets, demonstrating high practicality and robustness. The paradigm generalizes to other network architectures where traffic features are dictated by protocol or operational invariants.

3. Hardware and Microarchitectural Fingerprinting

Hardware-level architectural fingerprinting identifies devices by exploiting manufacturing-level physical variations inaccessible to user-level modification. Several canonical techniques are highlighted:

  • GPU fingerprinting: As demonstrated in DrawnApart (Laor et al., 2022), sub-unit variability among GPU execution units (EUs), measured as timing vectors via WebGL in JavaScript, forms stable, device-unique signatures. The measurement involves crafting shaders that individually stall specific EUs, logging per-EU timing across multiple iterations, and constructing normalized trace embeddings for classification or kk-NN matching. Results show substantial gains over base identification rates, and robust tracking even among identically configured devices.
  • DRAM fingerprinting via Rowhammer: FPHammer (Li et al., 2022) leverages the location and reproducibility of Rowhammer-induced bit flips as a persistent, device-unique signature. The process involves (1) controlled hammering of aggressor/victim rows under fixed patterns and repetition, (2) encoding flipped-bit locations as high-dimensional binary vectors, (3) similarity scoring using a modified Jaccard index or Hamming distance. The method achieves intra-device J0.850.88J' \approx 0.85{-}0.88 and inter-device J=0J' = 0, providing long-term stable identification, immune to OS reinstalls or software changes.

A broader class of hardware features accessible via web APIs (canvas/WebGL for GPU, getUserMedia for camera PRNU, Web Audio for audio chains, DeviceMotion for IMU) has also been proposed (Nakibly et al., 2015). Each class has distinct extraction and comparison procedures grounded in device-specific physical response characteristics.

4. Architectural Fingerprinting in Software-Defined Networks

In software-defined networking (SDN), architectural fingerprinting infers control-plane and data-plane separation by exploiting observable side-channel timing effects. The separation leads to distinct, fingerprintable signatures in packet Round-Trip Time (RTT) and packet-pair dispersion (Cui et al., 2015):

  • Active adversary: Sends carefully crafted probe packets to measure RTT or inter-packet dispersion; new flow rule installations manifest as 1101{-}10 ms extra delay, revealing flow-table update events.
  • Passive adversary: Observes natural or ambient flows, as short-term RTT or dispersion outliers indicate controller invocations.
  • Classification: Measurement samples are segmented by status (no-install/install) to estimate class-conditional probability densities. A threshold minimizes equal-error rate (EER), which is empirically below 2% for hardware switches (e.g., k=3k=3 HW: EER =1.08%=1.08\% for dispersion, 0.43%0.43\% for RTT).
  • Countermeasures: Artificially delaying the first few packets of new or idle flows (based on fitted Generalized Pareto delay models) confounds timing attacks, driving adversarial EER to ≈50% with negligible steady-state performance cost.

The insights generalize to any virtualized or hybrid architecture where fast-path and control-path processing latencies differ by orders of magnitude, providing a powerful lens for both security risk assessment and network forensics.

5. Fingerprinting the Architecture of Machine Learning Models

Architectural fingerprinting extends to machine learning, where the goal is to recover model family, variant, or even model identity through black-box querying or passive output analysis. The FBI framework (Maho et al., 2022) and “Invisible Traces” LLM fingerprinting (Bhardwaj et al., 30 Jan 2025) formalize this as a two-tier detection and identification problem:

  • Detection and identification tasks: Given a blacklist of possible vanilla models AA (and families {Ci}\{\mathcal{C}_i\}) and a black-box model bb, the challenge is to decide if bb is a (possibly modified) variant in C\mathcal{C} (“detection”), and if so, which family or member (“identification”).
  • Greedy discrimination and MI-based methods: For finite candidate sets, adaptively-selected benign inputs are used to split candidate pools by maximizing output variability. When bb is an unknown but related variant, empirical mutual information (MI) between response distributions yields a normalized distance metric. Thresholded MI distances enable robust family-level identification even under quantization, pruning, or fine-tuning.
  • Hybrid static-dynamic LLM fingerprinting: (Bhardwaj et al., 30 Jan 2025) proposes dual modules:
    • Static: query the system with minimal, high-discrepancy prompts and classify the model via response embeddings,
    • Dynamic: observe outputs from authentic user or system prompts and employ transformer-based classifiers to build a posterior over candidate models.
    • Fusion of both modules achieves high identification rates (e.g., >86%>86\% at n=10n=10 queries) across a suite of LLMs, with detailed per-class breakdowns confirming practical efficacy against real-world multi-agent, multi-model scenarios.

This class of techniques enables not only instance identification but recovery of architectural family (e.g., ResNet, ViT, Swin for DNNs, or GPT-4, Claude, Gemini for LLMs), with precision tunable via the number and diversity of queries.

6. Evaluation Methodologies and Quantitative Performance

Architectural fingerprinting schemes are commonly benchmarked on real-world or large-scale synthetic datasets with the following characteristics:

  • Ground-truth: Known mapping between observable features and ground-truth architectural assignments (e.g., device inventory, network configuration, model family).
  • Metrics: Precision, recall, and FF-score for component discovery (TP/(TP+FP)\mathrm{TP}/(\mathrm{TP}+\mathrm{FP}), etc.); for identification, top-kk accuracy, true/false-positive/negative rates under controlled query budgets.
  • Separation of intra-/inter-device or intra-/inter-family distances: Key for establishing the robustness and uniqueness of extracted fingerprints.
  • Time and computation scaling: Latency for fingerprint convergence, scalability to population size (number of devices or models), effect of environmental conditions.

Empirical results from multiple domains confirm that architectural fingerprinting achieves (i) low error rates (EER <2%<2\% for SDN, F1F\approx1 in SCADA node mapping), (ii) strong inter-device or inter-model separation (FPHammer: Jinter=0J'_\text{inter}=0; DrawnApart: multiple-fold gains over random for device ID), (iii) rapid convergence using a fraction of available observations (SCADA: <6%<6\% of trace, LLM: n=10n=10 queries), and (iv) resilience under realistic system drift or variant generation.

7. Practical Considerations, Limitations, and Emerging Directions

While architectural fingerprinting offers high efficacy, several limitations and deployment factors require consideration:

  • Requirement for persistent invariants: Methods depend on features resilient to moderate system or environment changes but may break under adversarial obfuscation, randomized scheduling, aggressive transformation, or strong cryptographic protections.
  • DRAM and GPU fingerprinting: Dependence on cache bypass (clflush), reverse-engineered address mapping, or specific weak configurations (Rowhammer-prone, multi-EU architectures).
  • SDN and networking: Vulnerability countered by probabilistic delay injection; effectiveness may degrade under network churn, parallel control channels, or unpredictable queueing.
  • ML and LLM fingerprinting: Success dependent on query diversity, coverage of variant families, and ongoing retraining to match model evolution. Manual fingerprinting of LLMs is brittle; passive-only methods may lag behind rapidly deployed model families.
  • Privacy and policy: Widespread deployment of architectural fingerprinting, particularly via browser APIs or MLaaS endpoints, raises privacy concerns and may prompt standardization or user-centric mitigations (timing randomization, API restrictions).

Open research questions include the fusion of multi-modal architectural fingerprints, extension to non-standard architectures (e.g., neuromorphic or chiplet designs), and resilient detection under multi-agent, federated, or adversarial settings. Scaling to larger populations, cross-platform robustness, and the mitigation of fingerprint reuse (preventing linkability) remain prominent themes. The collective body of work demonstrates the breadth and rigor of architectural fingerprinting techniques, with generalization potential across application, hardware, and network domains (Jeon et al., 2016, Nakibly et al., 2015, Cui et al., 2015, Li et al., 2022, Laor et al., 2022, Maho et al., 2022, Bhardwaj et al., 30 Jan 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Architectural Fingerprinting.