A Data Repository for Cybersecurity Research and Education

Finding adequate data for network and cybersecurity research, student education and staff training is a challenging task. Researchers and other data seekers usually do not have access to the data source in order to collect the data themselves or lack the necessary expertise and/or equipment needed to perform the data measurements. Therefore, obtaining relevant data becomes a tedious exercise of identifying data providers that are willing to share their data. However, in most cases, such relevant data are only exchanged between organizations after the execution of bilateral legal agreements (e.g., policies knows as non-disclosure agreements). This blog entry introduces a data repository hosted by Merit Network that aims to bridge the gap between data requestors and providers.
[/visibility] Merit participates in a unique project that tries to organize, structure, and combine the efforts of the network security research community with the efforts of the Internet data measurement and collection community in order to grant researchers access to valuable data. Under the umbrella of the Information Marketplace for Policy and Analysis of Cyber-risk and Trust (IMPACT)1 initiative of the Department of Homeland Security Science and Technology directorate, the project provides a common framework for managing datasets collected from various Internet data providers. It also formalizes a process for a) qualified researchers to gain access to these datasets in order to prototype, test and improve their Internet threat mitigation techniques, and b) educators that want to leverage the datasets for curriculum development and training, while ensuring that the privacy and confidentiality of Internet users are not compromised. This common framework benefits both the data providers, as they no longer have to review, approve and monitor individual researchers that approach them for access to various datasets, as well as the data seekers, as they no longer need to rely on ad hoc and often arbitrary policies of each data provider.

Merit’s data repository collects both network management and security data in the form of network packets and application logs. However, such data do not contain any live payload data, or any other personally sensitive information. In addition, any information that could link the data with individuals is removed or anonymized to mitigate any potential risks. As a result, datasets collected as part of this repository represent minimal risk to the privacy of individuals.

Interested researchers may visit the project portal2 at, request an account by filling the online form, and then search through the data catalogue and request data access. Merit’s data portfolio includes real-world Internet traffic data collected longitudinally at Merit’s border routers (e.g., search for Netflow data), datasets that capture Distributed Denial of Service (DDoS) attacks (e.g., look for the 2014 Network Time Protocol attacks — NTP DDoS 2014), packet data that capture traffic from malicious scanning or other nefarious Internet-wide activity (i.e., the so-called Darknet data) and many others! One can also check-out data from other IMPACT participants like CAIDA, University of Southern California, Colorado State University, University of Wisconsin, Georgia Institute of Technology and the Packet Clearing House, among others.

For more information, please contact [email protected].