The project is motivated by the need to develop advanced network monitoring tools coupled with automated statistical methods for the quick detection of Internet traffic anomalies due to ongoing attacks or impending cybersecurity threats. Emphasis is placed on detecting cybersecurity threats such as highly distributed malware infections, which can launch coordinated and crippling distributed denial of service attacks on the nation’s Internet infrastructure. This will be achieved through a study of the so-called darknet traffic data. Malicious actors in the network systematically probe the Internet space for vulnerable or misconfigured devices. In doing so, they automatically send data to the entire Internet address space, which includes the space of unused Internet addresses. This destined-to-nowhere traffic is indicative of malware infection attempts or stealthy vulnerability scanning. The investigators aim to develop and deploy specialized tools that allow cyber-security analysts to efficiently analyze darknet traffic data. The research involves a team of computer engineers and statisticians, who will work closely together to implement a prototype system for detecting as well as mapping and identifying world-wide malicious activity in the Internet. The project will create and communicate to the public a set of simple-to-interpret risk indices that summarize the current darknet threat activity. This effort will potentially enable the prevention and mitigation of cybersecurity network traffic threats.
Understanding Internet threats, which continue to evolve due to the dynamic nature of Internet actors and the rapid expansion of the Internet of Things ecosystem, requires adequate data at fine-grained spatial and temporal scales. The project team has access to unique cyber-security data collected at Merit Network, Inc. that capture Internet-wide activity including network scanning, malware propagation, denial of service attacks, and network outages. This data consists of unsolicited Internet traffic destined to a routed but unused Internet address space, referred to as a darknet. This project will develop algorithmic and software infrastructure to collect and organize darknet data into high-dimensional, multivariate data streams, and will study statistical methods based on (i) extremal dependence, (ii) change-point detection, and/or (iii) high-dimensional sparse signal detection and recovery to inform the construction of Internet threat indices that quantify the risk of malicious scanning, degree of network vulnerability, risk of denial of service attacks, etc. Statistics of extremes in high-dimensional setting is a challenging problem since it requires the modeling/estimation of an infinite-dimensional parameter—the spectral measure. Using multivariate regular variation, this project will study novel hyper-graphical models that quantify and provide interpretable abstractions for the simultaneous occurrence of extremes in high-dimensions. Using limit theory for maxima of dependent variables, the project team will address open theoretical problems on the characterization of extremal dependence hyper-graphs and sparse signal detection in high-dimension. This analysis will lead to the development of novel threat indices that exhibit spatial dependence that will be analyzed with fast, scalable change-point detection algorithms. The new change-point methodology is designed to achieve large computational gains vis-a-vis standard approaches without compromising statistical accuracy and would be a significant contribution to the analysis of large data streams.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.
This project is funded by NSF’s Advanced Threat Detection program under the division of Mathematical Sciences.
Project Partners: University of Michigan, University of Florida