Skip to main content Accessibility help
×
Hostname: page-component-8448b6f56d-sxzjt Total loading time: 0 Render date: 2024-04-18T23:50:04.274Z Has data issue: false hasContentIssue false

4 - Automatic Mining of Cyber Intelligence from the Darkweb

Published online by Cambridge University Press:  06 April 2017

John Robertson
Affiliation:
Arizona State University
Ahmad Diab
Affiliation:
Arizona State University
Ericsson Marin
Affiliation:
Arizona State University
Eric Nunes
Affiliation:
Arizona State University
Vivin Paliath
Affiliation:
Arizona State University
Jana Shakarian
Affiliation:
Arizona State University
Paulo Shakarian
Affiliation:
Arizona State University
Get access

Summary

Introduction

Now that we have a better understanding of the hacker communities present on both the darknet and the clearnet, which were discussed in the previous chapter, we can begin to use data-mining and machine-learning techniques to aggregate and analyze the data from these communities, with a goal of providing valuable cyber threat intelligence. This chapter is an extension of the work in [80]. We present a system for cyber threat intelligence gathering, built on top of the data from communities similar to those presented in Chapter 3. At the time of writing, this system collects, on average, 305 high-quality cyber threat warnings each week. These threat warnings contain information regarding malware and exploits, many of which are newly developed and have not yet been deployed in a cyber-attack. This information can be particularly useful for cyberdefenders. Significantly augmented through the use of various data-mining and machine-learning techniques, this system is able to recall 92% of products in marketplaces and 80% of discussions on forums relating to malicious hacking, as labeled by a security analyst, with high precision. Additionally, we will present a model based on topic modeling used for automatic identification of new hacker forums and exploit marketplaces for data collection.

In succeeding sections, we will introduce a machine-learning-based scraping infrastructure to gather such intelligence from these online communities. We will also discuss the challenges associated with constructing such a system and how we addressed them. Figure 4.1 shows the number of detected threats for five weeks and Table 4.1 shows the database statistics at the time of writing, which indicates that only a small fraction of the data collected is hacking related. The vendor and user statistics cited only consider those individuals associated in the discussion or sale of malicious hacking-related material, as identified by the system.

Specific contributions of this chapter include:

  1. • Description of a system for cyber threat intelligence gathering from various social platforms from the Internet such as deepnet and darknet websites.

  2. • The implementation and evaluation of learning models to separate relevant information from noise in the data collected from these online platforms.

  3. • A machine-learning approach to aid security experts in the discovery of new relevant deepnet and darknet websites of interest using topic modeling—this reduces the time and cost associated with identifying new deepnet and darknet sites.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×