FM Broadcast Monitoring Using Artificial Intelligence

In recent decades, artificial intelligence has demonstrated successful applications in computer vision and natural language processing due to its breakthrough performance. In this paper, we propose a deep convolutional neural network (DCNN)‐based method, characterized by acoustic features, for FM broadcast monitoring. The results show that the proposed DCNN model trained with the FBank feature extracted from FM audio files is capable of achieving nearly 100% accuracy in FM station identification tasks. Illegal broadcasting signals are then detected through a comparison with a radio frequency station database. Integrated with the trained DCNN model, an online FM broadcast monitoring system has been successfully implemented and deployed to help reduce human labor and automate the task of FM broadcast monitoring.


Introduction
Soon after the first radio communication system was invented in the late 19th century, the problem of radio interference from users using the same frequencies arose. As early as the 1820s, the United States made a radio spectrum frequency allocation chart of 0-60 MHz and initiated construction of a spectrum monitoring system. Subsequently, countries around the world began establishing monitoring systems for radio spectrum management. China has a national radio-monitoring system that covers all wavelengths between 20 and 3,000 MHz. However, the system is not automatic and is incapable of real-time spectrum analysis (Lu et al., 2017). Due to ever-increasing wireless applications, bandwidth demand is fast overwhelming the finite spectrum. To better manage the radio spectrum, many novel schemes have been proposed for the design of monitoring systems. Examples of state-of-the-art schemes include the permanent cloud-based system of systems for spectrum monitoring (Cooklev, 2015), the intelligent radio regulatory architecture based on edge computing (Huang et al., 2018), and "Electrosense" (Rajendran et al., 2018), which adopts a crowdsourcing paradigm to collect and analyze spectrum data using low-cost sensors. In addition, artificial intelligence (AI) has experienced rapid development in the last decade, and performance gains from using deep models have fostered progress. Apart from fields in language, voice, and vision (Fogel & Kvedar, 2018;LeCun et al., 2015) that have been intensively studied, deep learning has also shown its efficacy in spectrum signal analysis, such as identifying signals and modulation types at a low signal-to-noise ratio (Rajendran et al., 2018;Zhang et al., 2018). AI in radio monitoring is promising for bringing system automation and intelligence to spectrum management.
In China, illegal broadcasting occurs in almost every city. In 2018, national radio regulatory agencies investigated and dealt with 2,251 illegal broadcasting cases, ranking as the utmost concern among all radio interferences. Because the FM broadcasting band (87-108 MHz) is adjacent to the civil aircraft communications band (118-137 MHz), illegal broadcasting is a potential threat that can interfere with civil aviation navigation signals. However, the current approach to identify illegal broadcasting is by manual monitoring, which is not only time consuming but also hardly exploits the features of radio-monitoring systems. In this paper, we use a DCNN model to classify the acoustic features of speech and noise files collected from FM radio to identify FM broadcasting signals and then detect the illegal signals by verification with a database of legally registered radio stations.

Method and Process Description
A block diagram of the illegal broadcast identification process is shown in Figure 1. First, we scan and demodulate each channel with a bandwidth of 0.1 MHz one by one in the frequency range of 87-108 MHz to obtain the corresponding audio files in the Waveform Audio File (WAV) format. These files are stored and marked with the center frequency. Then, the acoustic features are extracted from the audio files ©2020. American Geophysical Union. All Rights Reserved. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. together with the corresponding labels to generate a training dataset, which is then used to train the DCNN model with four convolutional layers (details are given in the following sections). The trained model is then evaluated for a test dataset to recognize all the FM broadcasting signals and save the corresponding frequencies.
The illegal signals are identified by comparing all identified FM broadcasting signals with a radio frequency station database that contains a list of all authorized broadcast stations.

Feature Extraction
The training audio files consist of four categories: voice, noise, mute, and strong noise with weak speech.
There are a total of 20,896 audio files in the training dataset, of which 10,445 are tagged as "voice" and the rest are tagged as the other three categories. The entire dataset is split into 80% for training and 20% for testing. To extract the acoustic features from an audio file, audio data are converted into a binary

10.1029/2019RS006885
Radio Science spectrum matrix. As shown in Figure 2, the left column illustrates speech waveforms, and the right side provides the corresponding spectral maps. From top to bottom, the panels correspond to the four kinds of audio samples, respectively. Because directly training the DCNN model using the original spectral matrix is time consuming, we first extract the acoustic features from audio data to facilitate the DCNN model training and testing. Here, four feature extraction methods are used: mel-frequency cepstral coefficients (MFCCs) (Zheng et al., 2001), mel-scale filter bank (FBank), shifted delta cepstral (SDC) (Torres-Carassquilo et al., 2002), and perceptual linear prediction (PLP) (Hermansk, 1990). Block diagrams for obtaining PLP and MFCCs are shown in Figure 3. SDC is mined by means of MFCC, and the benefit of SDC is that it is further strong under loud information. The FBank feature is the speech feature obtained by removing the discrete cosine transform in the last step of MFCCs and retains more original voice data than the MFCC features .

Classification Identification
The DCNN structure diagram is illustrated in Figure 4. The design of the convolutional feature extractor is based on the well-known LeNet architecture (Lecun et al., 1998). Our network uses four convolutional layers, where each layer is followed by batch normalization (Ioffe & Szegedy, 2015) and a ReLU activation function (Nair & Hinton, 2010). The kernel sizes and the number of filters for each convolutional layer are (3 × 5, 32), (3 × 5, 64), (3 × 3, 128), and (3 × 3, 128), respectively. The last convolutional layer is followed by three full connection layers, and the last layer has two output units serving as the signal and noise classifiers. Figure 4 provides a schematic overview of the network architecture. To handle variable-length audio data, we used global average pooling between the convolution layer and the full connection layer.
To explore the performance of the network, we used t-distributed stochastic neighbor embedding (t-SNE) to visualize the feature output from the last convolutional layer, as shown in Figure 5. t-SNE is an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots (van der Maaten & Hinton, 2008). The results show much better class separability for the FBank feature than for the other three acoustic features.
In this work, the DCNN takes FBank processed data from the audio file as input. Table 1 shows the comparison results of the recognition error rate of the DCNN trained with different data sizes based on 4,179 test Figure 6. System architecture.

Radio Science
samples. When the training is not enough, the error rate is slightly larger, as for the half-hour training data, and 52 samples are mistakenly classified. With more training data, the error rate drops accordingly. When the training data increase to 5 h, the error rate starts to converge with little change, and the model becomes stable.

System Design and Implementation
Leveraging the established DCNN, we implemented an FM radio-monitoring system consisting of 14 nodes, of which one node is deployed at Kunming and the other nodes are deployed at 13 counties in Honghe Prefecture of Yunnan Province. The system architecture is shown in Figure 6. The spectrum-sensing node consists of an RF receiver, a GPS receiver, a data processing and analysis module with edge computing features, and a wire/4G router that is used for communication between the sensing node and the backend cloud. A hardware block diagram of the sensing node is shown in Figure 7. The specifications of the RF receiver are provided as follows: frequency range: 1 kHz-2 GHz, bandwidth: 10 MHz, analog to digital conversion: 14 bit, typical noise figure: 3.3 dB @ 100 MHz, frequency tolerance: 0.5 ppm. The trained DCNN model is integrated into the spectrum-sensing node. In the working mode, the sensing node keeps scanning and demodulating each channel in the frequency range of 87-108 MHz and stores every baseband signal as a 5-minute-length audio file, which is then directed to the DCNN model for FM broadcast identification. In addition, a sensor node provides application programming interfaces (APIs) for the cloud to access the data at the node site, including a list of the detected FM broadcasts and the corresponding audio files. The cloud pulls the FM broadcast list through the APIs for illegal broadcasting identification. The spectrum-sensing node also provides web interfaces that allow users to access the results through a browser.
The system is designed using browser-server (BS) architecture and has been running steadily for half a year. Figure 8 shows the interface of the system, which is accessed from the radio private network through a browser. Here, Region is an online map, from which we can see each deployed sensor node. By clicking on the node icon, the broadcast stations monitored by the spectrum-sensing node at the corresponding location are displayed in Region , and the numbers of authorized and unlicensed broadcasts in each measurement are counted and shown above the list. Region shows the statistical results of the broadcast duration of each FM radio station monitored by the corresponding node in 1 day. By clicking the button in the upper-right corner, we can obtain a detailed statistical report of the authorized and unlicensed broadcast quantities and broadcast durations monitored by each sensing node. Region is the entrance to the station database,

Radio Science
where after login authentication we can add, delete, and modify the authorized broadcast stations recorded in the database.

Conclusions
In this paper, a method of identifying FM broadcasts based on speech acoustic features and a DCNN is proposed. For comparison, four acoustic features, including PLP, MFCCs, FBank, and SDC, are extracted from audio files to train the DCNN model. The results show that for the FBank feature, when the training dataset duration is 12 h, the error rate can reach 0.0479%, that is, nearly 100% identification accuracy. Illegal broadcasting signals are then identified by comparing the results with an authorized radio station database. An illegal broadcast monitoring system based on 14 spectrum-sensing nodes designed using BS architecture is deployed in Kunming and Honghe Prefecture of Yunnan Province. This system relies on spectrum-sensing nodes integrated with the trained DCNN model to imitate the process in which radio-monitoring staff listen to FM broadcasts and record the frequency of illegal broadcasts. A use case demonstrates the potential of integrating the Internet of things (IoT) and AI technologies for intelligent and automatic radio monitoring.