Papers
Topics
Authors
Recent
Search
2000 character limit reached

Leveraging Machine Learning for Accurate IoT Device Identification in Dynamic Wireless Contexts

Published 15 May 2024 in cs.NI, cs.AI, cs.LG, and cs.OS | (2405.17442v1)

Abstract: Identifying IoT devices is crucial for network monitoring, security enforcement, and inventory tracking. However, most existing identification methods rely on deep packet inspection, which raises privacy concerns and adds computational complexity. More importantly, existing works overlook the impact of wireless channel dynamics on the accuracy of layer-2 features, thereby limiting their effectiveness in real-world scenarios. In this work, we define and use the latency of specific probe-response packet exchanges, referred to as "device latency," as the main feature for device identification. Additionally, we reveal the critical impact of wireless channel dynamics on the accuracy of device identification based on device latency. Specifically, this work introduces "accumulation score" as a novel approach to capturing fine-grained channel dynamics and their impact on device latency when training machine learning models. We implement the proposed methods and measure the accuracy and overhead of device identification in real-world scenarios. The results confirm that by incorporating the accumulation score for balanced data collection and training machine learning algorithms, we achieve an F1 score of over 97% for device identification, even amidst wireless channel dynamics, a significant improvement over the 75% F1 score achieved by disregarding the impact of channel dynamics on data collection and device latency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. IoT Analytics. (2023) State of IoT 2023. https://iot-analytics.com/number-connected-iot-devices.
  2. Wi-Fi Alliance. (2023) Wi-Fi by the numbers: Technology momentum in 2023. https://www.wi-fi.org/beacon/the-beacon/wi-fi-by-the-numbers-technology-momentum-in-2023.
  3. M. R. Santos, R. M. Andrade, D. G. Gomes, and A. C. Callado, “An efficient approach for device identification and traffic classification in iot ecosystems,” in IEEE Symposium on Computers and Communications (ISCC).   IEEE, 2018, pp. 00 304–00 309.
  4. B. Tushir, Y. Dalal, B. Dezfouli, and Y. Liu, “A quantitative study of ddos and e-ddos attacks on wifi smart home devices,” IEEE Internet of Things Journal, vol. 8, no. 8, pp. 6282–6292, 2020.
  5. A. Osman, A. Wasicek, S. Köpsell, and T. Strufe, “Transparent microsegmentation in smart home iot networks,” in 3rd USENIX Workshop on Hot Topics in Edge Computing (HotEdge), 2020.
  6. A. Sivanathan, H. H. Gharakheili, and V. Sivaraman, “Detecting behavioral change of iot devices using clustering-based network traffic modeling,” IEEE Internet of Things Journal, vol. 7, no. 8, pp. 7295–7309, 2020.
  7. S. Marchal, M. Miettinen, T. D. Nguyen, A.-R. Sadeghi, and N. Asokan, “Audi: Toward autonomous iot device-type identification using periodic communication,” IEEE Journal on Selected Areas in Communications, vol. 37, no. 6, pp. 1402–1412, 2019.
  8. J. Chen and B. Dezfouli, “Predictable bandwidth slicing with open vswitch,” in 2021 IEEE Global Communications Conference (GLOBECOM).   IEEE, 2021, pp. 1–6.
  9. R. R. Chowdhury, S. Aneja, N. Aneja, and E. Abas, “Network traffic analysis based iot device identification,” in Proceedings of the 4th International Conference on Big Data and Internet of Things, pp. 79–89.
  10. A. Sivanathan, H. H. Gharakheili, F. Loi, A. Radford, C. Wijenayake, A. Vishwanath, and V. Sivaraman, “Classifying iot devices in smart environments using network traffic characteristics,” IEEE Transactions on Mobile Computing, vol. 18, no. 8, pp. 1745–1759, 2018.
  11. V. Thangavelu, D. M. Divakaran, R. Sairam, S. S. Bhunia, and M. Gurusamy, “Deft: A distributed iot fingerprinting technique,” IEEE Internet of Things Journal, vol. 6, no. 1, pp. 940–952, 2018.
  12. N. Ammar, L. Noirie, and S. Tixeuil, “Autonomous identification of iot device types based on a supervised classification,” in IEEE International Conference on Communications (ICC).   IEEE, 2020, pp. 1–6.
  13. M. Miettinen, S. Marchal, I. Hafeez, N. Asokan, A.-R. Sadeghi, and S. Tarkoma, “Iot sentinel: Automated device-type identification for security enforcement in iot,” in IEEE 37th International Conference on Distributed Computing Systems (ICDCS).   IEEE, 2017, pp. 2177–2184.
  14. A. J. Pinheiro, J. d. M. Bezerra, C. A. Burgardt, and D. R. Campelo, “Identifying iot devices and events based on packet length from encrypted traffic,” Computer Communications, vol. 144, pp. 8–17, 2019.
  15. H. Gordon, C. Batula, B. Tushir, B. Dezfouli, and Y. Liu, “Securing smart homes via software-defined networking and low-cost traffic classification,” in IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC).   IEEE, pp. 1049–1057.
  16. H. Gordon, C. Park, B. Tushir, Y. Liu, and B. Dezfouli, “An efficient sdn architecture for smart home security accelerated by fpga,” in IEEE International Symposium on Local and Metropolitan Area Networks (LANMAN).   IEEE, 2021, pp. 1–3.
  17. A. Aksoy and M. H. Gunes, “Automated iot device identification using network traffic,” in IEEE International Conference on Communications (ICC).   IEEE, 2019, pp. 1–7.
  18. J. Sheth and B. Dezfouli, “Monfi: A tool for high-rate, efficient, and programmable monitoring of wifi devices,” in 2021 IEEE Wireless Communications and Networking Conference (WCNC).   IEEE, 2021, pp. 1–7.
  19. J. Sheth, V. Ramanna, and B. Dezfouli, “Flip: A framework for leveraging ebpf to augment wifi access points and investigate network performance,” in Proceedings of the 19th ACM International Symposium on Mobility Management and Wireless Access (MobiWac), 2021, pp. 117–125.
  20. Y. Liu, J. Wang, J. Li, S. Niu, and H. Song, “Machine learning for the detection and identification of internet of things devices: A survey,” IEEE Internet of Things Journal, vol. 9, no. 1, pp. 298–320, 2022.
  21. R. Doshi, N. Apthorpe, and N. Feamster, “Machine learning ddos detection for consumer internet of things devices,” in IEEE Security and Privacy Workshops (SPW).   IEEE, 2018, pp. 29–35.
  22. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al., “Scikit-learn: Machine learning in python,” the Journal of machine Learning research, vol. 12, pp. 2825–2830, 2011.
  23. Qualcomm. (2020) IPQ8074: High-capacity 802.11ax SoC for Routers, Gateways and Access Points. https://www.qualcomm.com/products/internet-of-things/networking/wi-fi-networks/ipq8074.
  24. G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” Advances in neural information processing systems, vol. 30, 2017.
  25. E. Valdez, D. Pendarakis, and H. Jamjoom, “How to discover iot devices when network traffic is encrypted,” in IEEE International Congress on Internet of Things (ICIOT).   IEEE, 2019, pp. 17–24.
  26. A. Aksoy and M. H. Gunes, “Automated iot device identification using network traffic,” in IEEE International Conference on Communications (ICC), 2019, pp. 1–7.
  27. F. Shaikh, E. Bou-Harb, J. Crichigno, and N. Ghani, “A machine learning model for classifying unsolicited iot devices by observing network telescopes,” in 14th International Wireless Communications & Mobile Computing Conference (IWCMC).   IEEE, 2018, pp. 938–943.
  28. I. Cvitić, D. Peraković, M. Periša, and B. Gupta, “Ensemble machine learning approach for classification of iot devices in smart home,” International Journal of Machine Learning and Cybernetics, vol. 12, no. 11, pp. 3179–3202, 2021.
  29. O. Salman, I. H. Elhajj, A. Chehab, and A. Kayssi, “A machine learning based framework for iot device identification and abnormal traffic detection,” Transactions on Emerging Telecommunications Technologies, vol. 33, no. 3, p. e3743, 2022.
  30. H. F. Fakhruldeen, M. J. Saadh, S. Khan, N. A. Salim, N. Jhamat, and G. Mustafa, “Enhancing smart home device identification in wifi environments for futuristic smart networks-based iot,” International Journal of Data Science and Analytics, pp. 1–14, 2024.
  31. I. Ullah and Q. H. Mahmoud, “Network traffic flow based machine learning technique for iot device identification,” in IEEE International Systems Conference (SysCon).   IEEE, 2021, pp. 1–8.
  32. S. A. Hamad, W. E. Zhang, Q. Z. Sheng, and S. Nepal, “Iot device identification via network-flow based fingerprinting and learning,” in 18th IEEE international conference on trust, security and privacy in computing and communications.   IEEE, 2019, pp. 103–111.

Summary

  • The paper introduces a novel ML approach that uses device latency and an accumulation score to uniquely fingerprint IoT devices in variable wireless conditions.
  • The paper demonstrates that incorporating the accumulation score increases identification F1 accuracy from 75% to over 97%, ensuring robust performance across diverse channel utilizations.
  • The paper validates its framework using tree-based models like LightGBM, enabling real-time, privacy-preserving device tracking on edge devices without relying on intrusive packet inspection.

A Machine Learning Framework for Privacy-Preserving and Accurate IoT Device Identification in Dynamic Wireless Environments

Introduction

The proliferation of IoT deployments has radically increased the diversity and number of connected devices, incentivizing robust approaches for precise device identification. Traditional solutions built on packet inspection or static address associations (e.g., IP/MAC) suffer from privacy risks, computational overhead, and vulnerability to evasion via spoofing or address randomization. Furthermore, prevailing methods largely neglect the influence of real-world wireless channel dynamics, specifically the impact of channel utilization (CU) variability on the stability and discriminative power of observation-level features, limiting practical efficacy.

This paper, "Leveraging Machine Learning for Accurate IoT Device Identification in Dynamic Wireless Contexts" (2405.17442), proposes a refined approach centered on device latency as a privacy-preserving, protocol-layer signal for device fingerprinting. Crucially, the work introduces the "accumulation score," a novel metric encapsulating instantaneous wireless channel contention surrounding probe-response exchanges. Both features are leveraged for robust, data-efficient training of modern tree-based ML models, achieving superior device identification accuracy across a spectrum of channel conditions.

Device Latency as a Fingerprinting Feature

Device latency is defined as the interval between a probe packet's complete reception and the commencement of the corresponding response's transmission by the device. Unlike round-trip time (RTT), this interval isolates device-specific processing and channel contention factors, abstracting away irrelevant AP-side and channel delays unrelated to the device under test. Accurate measurement is enabled by using a packet sniffer to correlate probe and response pairs and compute the relevant intervals. Figure 1

Figure 1: Device latency (ll) is defined as the interval between t4t_4 and t6t_6, exclusively representing device-side processing and channel contention.

Empirical observations demonstrate that device latency distributions are device-dependent but also highly sensitive to variations in channel utilization (CU). This dual dependence complicates generalization, as naive models may learn spurious associations with environmental factors rather than inherent device characteristics. Figure 2

Figure 2: The testbed includes real IoT devices, background traffic generators, and monitoring infrastructure for precise, repeatable measurements.

Accumulation Score: Modeling Instantaneous Wireless Channel Conditions

The paper identifies limitations in using conventional CU measures (e.g., sampled at 10 ms intervals by commodity hardware), which offer insufficient temporal granularity relative to probe-response exchange durations. To address this, the authors introduce the accumulation score, a bespoke metric that aggregates the durations and temporal proximities of both "predecessor" (prior to response) and "successor" (immediately following response) packets, weighted by their closeness to the response event using either a Bell or Gamma curve-centric weight function. Figure 3

Figure 4: Sample probe-response exchange illustrating predecessor and successor packets; the accumulation score integrates durations, intervals, and position-weighted influence.

The accumulation score quantitatively characterizes instantaneous channel contention and its relationship with each unique probe-response instance, rather than relying on coarsely averaged CU statistics. This granularity yields a strong correlation with device latency for various probe packet types and payload sizes. Figure 5

Figure 6: Visualization of distinct accumulation score distributions under diverse CU conditions and different probe/response packet classes.

A critical empirical finding is the strictly increasing relationship between accumulation score and device latency—across all device types—validating its efficacy as a contextual feature that robustifies device fingerprinting against channel dynamics. Figure 7

Figure 3: Device latency distributions stratified by accumulation score bins, demonstrating monotonicity and enhanced separability as accumulation increases.

Machine Learning for Context-Aware Device Identification

The foundation established by latency and accumulation score features enabled the deployment and evaluation of multiple ML algorithms, including Decision Trees, Random Forests, LightGBM, and XGBoost. The pipeline encompasses meticulous feature extraction, stratified sampling to mitigate class imbalance, and targeted experimentation to disentangle the influence of data diversity across accumulation score/CU values.

Notably, models trained on latency/score samples from a single accumulation score (i.e., CU) range generalize poorly: F1 scores plummet to as low as 25% for out-of-range test data, regardless of the number of in-range samples provided. This substantiates the claim that prior approaches, which do not account for wireless context in training, are inherently limited in operational deployments with variable channel conditions. Figure 8

Figure 5: F1 device identification accuracy when training and testing on disjoint accumulation score ranges. In-range performance is high, while out-of-range generalization collapses.

By contrast, training on diverse samples spanning the full spectrum of accumulation scores (i.e., all expected CU dynamics) enables identification models to achieve consistently high accuracy. The inclusion of the accumulation score as an explicit feature further boosts accuracy, offering robustness when some latency features (e.g., those dependent on specific transport protocols) are unavailable. Figure 9

Figure 7: Device identification F1 scores for multiple feature sets, demonstrating the substantial and consistent benefit of including accumulation score features—especially with limited data.

LightGBM emerged as an optimal solution, balancing accuracy, training, and inference latency, outperforming Random Forest and Decision Tree baselines. Results indicate that device identification F1 scores consistently exceed 97% on real-world testbed data when accumulation score-based balancing and feature inclusion are enforced, a significant improvement over the 75% upper bound for models trained naïvely without contextual consideration. Figure 10

Figure 11: Comparative F1 scores: Latency-only features versus latency plus accumulation score, with the latter achieving consistently higher performance and stability.

Figure 12

Figure 8: Algorithmic performance comparison, showing accuracy (F1) and training/inference overheads for DT, RF, LGBM, and XGBoost on embedded-class processors.

Practical Implications and Research Outlook

This research establishes that high-precision, privacy-preserving IoT device identification is achievable without DPI or reliance on potentially spoofed address-level information. Incorporation of instantaneous channel context, as discretized by accumulation score, is essential for robust generalization in realistic wireless environments. From a deployment perspective, the methods described are practical for in situ execution on commercial access points leveraging current hardware acceleration and local monitoring hooks (packet sniffers, eBPF).

Implications include:

  • Scalable asset tracking and anomaly detection: Improved identification enables real-time microsegmentation and rapid isolation in response to compromise or misbehavior.
  • Privacy and regulatory compliance: No user-content inspection is performed, reducing privacy exposure and facilitating compliance with evolving legal mandates.
  • Automated management across heterogeneous deployments: Channel context is local and fine-grained, allowing unified models across diverse hardware and traffic conditions.

Future research directions may investigate transfer learning across sites with differing nominal channel characteristics, adversarial robustness against latency/score manipulation, and feature compression/selection for even lower resource utilization.

Conclusion

This paper provides a comprehensive protocol and empirical validation for robust IoT device identification based on device latency and the accumulation score, an instantaneous metric of wireless channel contention. The methodology is orthogonal to application-layer features, respects user privacy, and is empirically shown to deliver over 97% F1 accuracy in settings where prior art is limited to 75%. The framework supports efficient ML implementation on edge devices with modest computational capability, underlining its practicality for modern, dynamic IoT deployments.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.