Indexing service

Analysis results produced by the system are constantly indexed within the Knowledge Base. The process indexes these results by significant features so that you can query for analysis reports sharing one of these features without prior knowledge of any sample where that feature might be observed. Currently, the indexing service supports the following features:

Host features

Host features encompass all elements present on the file system that constitute a potential IoC such as elements of the file system, the registry, or synchronization objects such as mutexes.

  • Files are indexed either by name (file path) or by hash (file MD5, SHA1, SHA256, ImportHash). Indexing supports the distinction between different types of operation affecting the file: whether the file is executed or written to.

  • Registry keys and registry key/value pairs are both indexed. Indexing supports the distinction between the different types of operation affecting the given keys or values: whether they are written to.

  • Mutexes are indexed by name. Indexing supports the distinction between the different types of operation affecting the given mutexes: whether they are being created or opened.

Network features

Network features encompass observable elements from network traffic that constitute a potential IoC.

  • Contacted IPs are indexed independently of the protocol used.

  • Resolved domains are indexed independently of the success of the resolution.

  • Protocol specific fields are also indexed such as the HTTP user agents used in requests or the TLS certificate and JA3 client fingerprints used when setting secure connections.

String features

The system's dynamic analysis environments have increased visibility over the memory changes along the analysis. Strings are automatically detected and collected, not only from the original artifact under analysis but also from dynamically allocated memory blocks; the stack or the heap. All these strings constitute powerful IoCs betraying the potential malicious activities without depending on the actual code execution.

  • Strings extracted from all memory locations are indexed. Regular expressions are not supported on strings but the system automatically extracts sub-strings of recognizable format to ease searches. Filenames, domains, IPs, URLs, or electronic currency wallets contained within these strings are extracted and indexed so they can be queried as any other string.

Memory features

The analysis sandbox has access to the unpacked code in memory and extracts control flow graphs of untrusted executable code. These code blocks, extracted from the analysis subject or dynamically built by the subject itself, are indexed by hash. The extracted code blocks represent the capabilities of a sample, for example, a routine to perform C&C.

Different methods of hashing are supported:

  • Code hashes for hashes based on the mnemonics of the disassembled code. In this case, the sandbox process only keeps the mnemonic for each instruction the short representation of the instruction stripped from any parameters. For example, mov eax, ebx would be just mov.

  • API hashes for hashes based on API calls observed in the disassembled code.

Detection features

Detection results such as the association of the analysis report to a given threat are also indexed as part of the analysis results.

  • Threat names associated to known samples are indexed.

  • Detected behaviors are indexed under the form of analysis tags.

  • Network signature match from the NSX base rules are indexed by identifiers (see Detected threats).

These indexed features directly correspond to types of IoC that can be searched within the Intelligence interface. A complete reference of the supported types can be found in the list of query keys.