TMV-Hunter: TLS MitM Vulnerability Detection

Updated 6 February 2026

TMV-Hunter is a dynamic analysis tool that detects TLS certificate validation flaws in Android apps through automated MitM attack simulation.
It employs a foundation model–driven GUI agent, per-app VPN traffic interception, and sequential MitM testing to achieve high coverage across large app corpora.
Empirical results on nearly 40,000 apps reveal a 22.42% vulnerability rate, underscoring persistent TLS security flaws and the need for prompt remediation.

TMV-Hunter is the dynamic-analysis detection component of the Okara framework, designed for large-scale detection of Transport Layer Security (TLS) Man-in-the-Middle Vulnerabilities (TMVs) in Android applications. TMV-Hunter leverages foundation-model–driven graphical user interface (GUI) exploration and automated network-level MitM attack simulation to identify flaws in TLS certificate validation, achieving high coverage and scalability across market-sized app corpora (Yang et al., 30 Jan 2026).

1. System Architecture

TMV-Hunter operates as a standalone dynamic analysis tool that integrates into the Okara pipeline as its detection stage. Its architecture is organized around three core modules orchestrated by a centralized Test Orchestrator:

GUI Agent: Automates interaction with the app's UI to trigger possible TLS flows.
Traffic Forwarding Module: Sets up per-app VPN-based traffic interception and forwarding, enabling transparent capture and manipulation of encrypted flows.
MitM Test Module: Performs active man-in-the-middle probing on observed TLS flows to assess certificate validation robustness.

The Test Orchestrator receives an APK file and a set of testing parameters $\{S_{\mathrm{GUI}}, N_{\mathrm{steps}}, T_{\max}, T_{\mathrm{wait}}, P_{\mathrm{MitM}}\}$ , outputting a vulnerability report of all TLS flows found susceptible to MitM-T1, MitM-T2, and MitM-T3 attack variants. The full workflow is formalized in Algorithm 1, which prescribes sequential installation, traffic interception, GUI exploration, and iterative MitM testing on discovered flows.

2. Foundation Model-Driven GUI Exploration

At the center of TMV-Hunter's scalability is its GUI Agent, which supersedes random and rule-based crawlers by utilizing foundation models for high-coverage interaction. The agent accepts as input the current UI observation $o_i$ (encompassing UI hierarchy and optional screenshots), historical interaction traces $\{(o_j, a_j)\}_{j<i}$ , and task instructions focused on maximal TLS flow discovery. The agent selects discrete actions $a_i$ from $\{\text{click}, \text{long\_click}, \text{type}, \text{scroll}, \text{drag}, \text{back}, \text{wait}, \text{finish}\}$ , parameterized for specific UI elements.

Three decision strategies are implemented:

Random: Uniform random selection over legal $(a_i, o_i)$ pairs.
General LLM: One-shot prompting with a 32B-parameter vision-LLM (Qwen2.5-VL-Instruct) using whole-session context.
Specialized LLM: Multi-turn UI-specific interaction via a 7B-parameter UI-TARS model, leveraging session-based alternation and screenshot inputs.

The agent operates on local vLLM inference servers. System prompts guide the agent to exhaust visible elements, employ back-navigation if stuck, and systematically attempt text input fields. Specialized LLM prompts further encode heuristics to reveal hidden or conditional screens, such as login dialogs and pop-ups. An interaction wait parameter $T_{\mathrm{wait}}$ ensures asynchronous content is realized before subsequent actions.

Coverage is quantified by metrics including $C_{UI}$ and $C_{TLS}$ (intersection ratios with manual ground-truth UI screens and FQDNs), their "novel" complements measuring previously unseen discoveries, and by a high-level coverage formula: $\text{Coverage} = \frac{\#\text{UI elements interacted}}{\#\text{Total UI elements}}\times 100\%.$

3. Automated MitM Vulnerability Testing Methodology

The MitM Test Module executes three attack protocols per observed TLS flow $f$ , with server endpoint $d$ and certificate $C$ :

MitM-T1 (Untrusted-CA Test): Presents a valid $C$ chained to a self-generated, untrusted CA; vulnerability is signaled if $\neg\mathrm{IsTrustedAnchor}(C) \wedge \mathrm{AppAccepts}(C)$ .
MitM-T2 (Domain-Mismatch Test): Substitutes the subject in $C$ to a domain $d' \neq d$ while retaining CA validity; vulnerability occurs if $\mathrm{ChainValid}(C) \wedge (\mathrm{subject}(C)\neq d) \wedge \mathrm{AppAccepts}(C)$ .
MitM-T3 (Pinning-Bypass Test): Installs the attacker's CA in the device trust store; apps without robust certificate pinning will accept arbitrary CA-signed certificates (i.e., for trust-manager $m$ , some $C_{\mathrm{bad}}: m.\texttt{checkServerTrusted}(C_{\mathrm{bad}},...)\,$ does not throw a $\texttt{CertificateException}$ ).

Flows meeting vulnerability criteria are added to the aggregate report $\mathcal{R}$ along with relevant metadata.

4. Empirical Results and Scale

TMV-Hunter was evaluated over a deduplicated dataset of 39,876 unique Android apps, sampled from Google Play (AndroZoo, 20,000 APKs) and the AppChina third-party store (20,000 APKs, latest from March 2025). The dynamic execution environment leveraged 8 parallel Android emulators (redroid on AWS Graviton2/Alibaba Ampere) and three high-end GPUs for model inference, achieving an average per-app analysis time of 144.75 seconds.

Key findings are summarized below:

Entity	AppChina (Count/%)	AndroZoo (Count/%)	Combined (Count/%)
Apps	7.82K (39.40%)	0.558K (3.19%)	8.37K (22.42%)
Flows	80K (9.94%)	6.43K (0.77%)	86K (5.25%)
FQDNs	5.04K (17.11%)	0.919K (4.69%)	5.88K (12.16%)
App-FQDN Pairs	30K (19.42%)	1.61K (1.23%)	32K (11.08%)

Of 37,349 analyzed apps, 8,374 (22.42%) exhibited at least one MitM-vulnerable TLS flow, across 5,881 unique vulnerable FQDNs and 86,000 of 1.64 million tested flows. Vulnerability prevalence is uniform across popularity and app categories ( $r_{pb} \approx 0$ ), with category-wise Jensen–Shannon divergence of 0.0499 (AppChina) and 0.2433 (AndroZoo) indicating minimal skew.

TLS 1.3 dominates amongst vulnerable flows (78.98% vs 21.02% for 1.2); transport protocols are exclusively TCP (for all vulnerable flows). A plausible implication is that the vulnerabilities are not isolated to deprecated cryptographic transport versions but affect the contemporary ecosystem.

Critical functionalities are recurrently affected. In a 100-app case study:

Category	% Flows Vulnerable	% Apps w/≥1 Vuln Flow
Content Delivery	61.28%	56.00%
Telemetry/Analytics	27.70%	61.00%
Executable Code	6.19%	27.00%
Authentication	4.06%	39.00%
Financial Transactions	0.75%	13.00%

Longitudinal analysis (100 apps, 5-year, 3-month sampled history) reveals that vulnerabilities are highly persistent, with a median vulnerable span of 1,384 days, median app lifespan of 1,901 days, and a median remediation delay of 330 days.

5. Performance, Limitations, and Scalability

TMV-Hunter's coverage and detection quality are conditioned by both the GUI agent and MitM test module. Empirically, per-app coverage and runtime depend on agent strategy: random ( $\sim$ 95s), general LLM ( $\sim$ 532s), and specialized LLM ( $\sim$ 334s) at a 4s step wait and 50-step budget; the reported end-to-end mean is 144.75s per app.

Principal sources of error include:

False negatives: Caused by incomplete GUI coverage and thus missing live flows.
False positives: Stemming from heuristic mapping of flows to code regions; benign flows may be misattributed.

Scalability challenges are associated with LLM inference cost/latency and the instrumentation coverage of non-debuggable/native-code apps. Proposed mitigations include the deployment of smaller specialized models, parameterized step/time budgets to fine-tune exploration, extension to native libraries via eBPF and Frida-ART-TI hybrids, and further GUI exploration enhancements using multimodal memory and RL-based coverage guidance.

6. Context and Implications within TLS Security Research

TMV-Hunter’s approach of integrating foundation model–driven exploration with practical MitM probing distinguishes it from prior UI crawlers constrained by low coverage and high manual effort. Its design allows for efficient, market-scale scanning and systematic measurement of TLS certificate validation weaknesses, found to be widespread (22.42% of tested apps) and persistent over multi-year intervals. This suggests that, despite the adoption of improved TLS standards, implementation-level flaws remain pervasive across device and store boundaries. TMV-Hunter’s outputs enable subsequent code-level attribution and mitigation, contributing to ongoing responsible disclosure and research ecosystem support (Yang et al., 30 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Okara: Detection and Attribution of TLS Man-in-the-Middle Vulnerabilities in Android Apps with Foundation Models (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TMV-Hunter.