This blog features Stevens Institute of Technology PhD candidate Batyr Charyyev’s research on using network traffic fingerprinting of IoT devices for device identification, anomaly detection and user interaction identification. Learn more about Charyyev and his research, including its applications to infer voice commands to smart home speakers.
On his current research:
Internet of Things (IoT) devices are widely adopted in fields ranging from healthcare to entertainment. These devices have numerous benefits such as automatically controlling heaters, locking doors, lighting bulbs, and adjusting the temperature. However, IoT devices also lead to increased security and privacy concerns. An attacker can infer user actions, such as daily routines and activity patterns using the network traffic generated by the devices or use these devices as botnets to conduct large-scale denial-of-service (DoS) attacks. Network traffic fingerprinting is an important tool for network security and management as it enables system administrators to identify devices connected to the network, characterize their traffic flows, and detect malicious activities. The main challenge in network traffic fingerprinting is identifying the most representative set of features (e.g., inter-arrival time of packets, packet lengths, protocol headers, etc. ) which is typically a computationally intensive task requiring expert knowledge. Selecting the most representative set of features increases the accuracy of machine learning models used to solve network security problems. It also prevents models from overfitting or underfitting the training data, reduces the required storage and computation resources, and mitigates the curse of dimensionality.
On approaching the research challenge:
Our research mainly revolves around network traffic fingerprinting of IoT devices with machine learning focusing on device identification, anomaly detection, and user interaction identification. IoT devices have well-defined traffic patterns as they have a limited set of functionalities. Thus, by creating the traffic signatures of the devices, it is possible to identify the device and detect anomalies in its communications. We proposed LSIF (Locality- Sensitive IoT Fingerprinting) to identify IoT devices [1, 2] and LSAD (Locality-Sensitive Anomaly Detection) to detect anomalous network traffic  with locality-sensitive hash functions such as Nilsimsa. Locality-sensitive hashing differs from cryptographic hash functions as it aims to produce similar hash values for similar inputs.
Figure 1: Locality- Sensitive IoT Fingerprinting
As shown in Figure 1, LSIF uses locality-sensitive hashing to generate the signatures for each device from its network traffic flow. These hashes are stored in a signature database with the device information. As a new device joins the network, LSIF generates a hash of the device’s traffic flow and compares it with signatures stored in the database to identify the device through signature similarity. LSIF compares the signature of a new device to signatures of existing devices and computes their similarity. The similarity is calculated by subtracting 128 from the number of similar bits in the signatures. The device with the highest average similarity score is selected as the predicted identity.
In order to detect anomalous communications, LSAD uses locality-sensitive hashing to generate the signatures of the benign traffic flows of the device as shown in Figure 2. LSAD computes the average hash similarity of benign flows as the threshold value T. When a monitored device generates a new traffic flow, LSAD computes the Nilsimsa hash of the flow and compares it with the benign signatures of that device, and computes the average similarity score. If the average similarity score is below the T threshold, an alarm is raised and flow is labeled as anomalous. The anomalous traffic flows includes port scanning, network scanning, TCP/UDP/ICMP Flooding, ARP spoofing and Denial-of-Service (DoS) attacks. Detecting such anomalous traffic flows enables network administrators to block certain kinds of network traffic, limiting the rate of the traffic, or rejecting flows coming from certain directions (e.g., local or Internet) and isolating the device that generates it.
Figure 2: Locality-Sensitive Anomaly Detection
Since both LSIF and LSAD employ locality-sensitive hashing, they do not require feature selection and extraction from the data. Feature selection/extraction requires expert knowledge to identify representative features, computationally costly, requiring extra CPU power and storage, and sometimes not possible due to privacy concerns. LSIF achieves equal or better accuracy in identifying the IoT devices compared to machine learning models that select a set of traffic features. We further investigate the use of network traffic fingerprinting to identify user actions with IoT devices  as well as infer voice commands to smart home speakers purely from network traffic characteristics .
User Interaction Identification
We also investigated how attacker that passively sniffing the network traffic is able to infer IoT device activities such as turning on/off or dimming the light of a smart light bulb, locking/unlocking the smart lock, watching the smart security camera, etc. Our evaluation on a set of 39 different IoT devices showed that an attacker can achieve an average accuracy of 90% on half of the devices and 83% accuracy on three-quarter of the devices by simply using machine learning algorithms (Random Forest, k-NN, etc) with network traffic generated from devices . We also explored the identification of voice commands on smart home speakers (Amazon Echo, Google Home Mini, etc.) . Identification of voice commands such as “What is the weather?”, “What is in the news? ”, etc. can compromise the privacy of device owners. Our evaluation on Amazon Echo (2nd generation) smart speaker with 100 different voice commands showed that attackers can infer user commands with accuracy of 42% using the network traffic fingerprinting. These results raise privacy concerns on adoption of IoT in our daily life.
On testbed needs: Chameleon provides a cloud testbed with bare-metal machines with high connectivity between nodes. Different from other public HPC platforms, Chameleon provides root permission access allowing us to install softwares that can perform system measurements such as disk I/O. An important aspect of Chameleon is the support community, which is very quick and eager to help with any challenges that we face while utilizing the testbed.
I am Batyr Charyyev, a Ph.D. candidate in Systems Engineering at Stevens Institute of Technology, in Hoboken, New Jersey. I received my MS degree in Computer Science and Engineering from the University of Nevada, Reno and BS in Computer Engineering from Middle East Technical University in Ankara, Turkey. My research interests include networking, Internet of Things, traffic fingerprinting, cybersecurity, network science, and edge computing. In my spare time, as an outdoor activity, I like playing soccer, hiking, and camping, and, as an indoor activity, I like watching a movie, playing video games, and cooking.
On staying motivated through a long research project:
I try to set small objectives that I can achieve in a short period of time. This enables me to see that there is progress and I am moving forward.
On researchers he admires:
I admire and closely follow research conducted by Dr. Vijay Sivaraman, professor at the University of New South Wales in Sydney Australia and Dr. David Choffnes, associate professor at Northeastern University in Boston USA. I find their studies interesting and really enjoy reading them.
On his most powerful piece of advice:
My advice to students beginning research or finding a new research project are:
Invest time in literature reviews.
Create a routine and try to follow it.
Do not take rejection too personally, learn to deal with them.
Interested readers can explore the following papers:
 B. Charyyev and M. Gunes. 2020. IoT Traffic Flow Identification using Locality Sensitive Hashes. In ICC 2020 - 2020 IEEE International Conference on Communications (ICC). 1–6.
 B. Charyyev and M. H. Gunes. 2021. Locality-Sensitive IoT Network Traffic Fingerprinting for Device Identification. IEEE Internet of Things Journal 8, 3 (2021), 1272–1281.
 B. Charyyev and M. H. Gunes. 2020. Detecting Anomalous IoT Traffic Flow with Locality Sensitive Hashes. In GLOBECOM 2020 - 2020 IEEE Global Communications Conference. 1–6.
 B. Charyyev and M. H. Gunes. 2020. IoT Event Classification Based on Network Traffic. In IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops 854–859.
 B. Charyyev and M. H. Gunes. 2020. Voice Command Fingerprinting with Locality Sensitive Hashes. In Proceedings of the 2020 Joint Workshop on CPS&IoT Security and Privacy (CPSIOTSEC'20). Association for Computing Machinery, New York, NY, USA, 87–92.