Hunting Onions – A Framework for Simple Darknet Analysis

One of the things I spend a-lot of time doing is researching the current threat landscape. I dedicate a-lot of time to pulling samples from Virus Total, reversing, analyzing, and searching for known/unknown IOC’s everywhere I can in order to properly mimic a threat actor. One of the places I search happens to be via Tor (AKA: the darknet). My method on how I connect to TOR varies and this post is not dedicated on how to stay safe and avoid data leakage. Rather, it’s to demonstrate a novel approach to obtaining .Onion address and analyzing them in a safe and efficient manner.

TL;DR: A full listing of classes and functions are detailed within the Github Wiki.


How Onion Hunter Works

NOTE: Onion Hunter does not download images, malware, full site source code, etc.. This is to protect the user and the system that is running the analysis. There’s a ton of bad things that can easily get you into trouble and for that reason alone, only the HTML of the page in question is analyzed and nothing more.


Onion-Hunter is a Python3 based framework that analyzes Onion site source code for user defined keywords and stores relevant data to a SQLite3 backend. Currently, the framework utilizes several sources to aggregate Onion addresses:

  • Reddit subreddits: A predefined set of subreddits are populated within the config.py. that are scraped for any Onion addresses.
  • Tor Deep Paste: This has turned out to be a very good source.
  • Fresh Onions Sources: Each onion address is analyzed to determine if it’s a fresh onion source (i.e., contains new/mapped onions) and if so, is saved to the FRESH_ONION_SOURCES table.
  • Additional_onions.txt: Any tertiary onion address, that is any address found that is not immediately analyzed, is saved to docs/additional_onions.txt for later analysis.

A researcher can also analyze individual or a text file filled with Onion address if they happen to come upon something interesting that warrants analysis and categorization.

Once valid .Onion addresses have been found, they’re analyzed by issuing a GET request to the .Onion address and searching the index source (HTML) for keywords that are per-defined by the researcher.

Onion Hunting Process

The Database

A SQLite3 backend is used to maintain records of all analyzed Onion addresses as well as a record of all Onions that have been categorized a probable Fresh Onion Domains. There are a total of three (3) tables that are used:

  1. ONIONS – Contains all Onion addresses observed and is by far, the most interesting table for analysis.
  2. FRESH_ONION_SOURCES – Any onion address that 50+ unique addresses listed on the front page and also include fresh onion keywords is categorized as a probable Fresh Onion Domain and saved to this table.
  3. KNOWN_ONIONS – This table is currently unused. It was designed for reporting purposes. That is, if for any reason you want to conduct analysis on weekly/monthly trends (for example) attributes of previously analyzed Onions can be added to this table and therefore avoid duplication is reporting.

An example of what the data looks like within the ONIONS table can be seen in the image below. For simply viewing the data, I like to used DB Browser for SQLite. However, all heavy operations should be done in code as this application can be very clunky.

Viewing the Onions table within DM browser

User Configuration

As stated above, the only method of domain/site analysis is by a simple keyword search. A user must supply the following within the src/config.py:


Improvements

There are a lot of improvements that can be made to the current version and I intend on making some of these changes in 2020. For example, I would like to utilize duckduckgo API as well as Reddit to start the initial searching. There also may be reliable sources that keep tabs on Fresh Onion databases that are active on Tor. These information sources would be more reliable going forward.

Other improvements such as database optimization, site categorization, and user-defined analysis techniques are slotted as well.


No Comments

Post a Comment