Introduction
Data-intensive applications have special characteristics that in many cases prevent them from executing well on traditional cache-based processors. They can have highly irregular access patterns with very little locality that do not match the expectations of automatically controlled caches. In other cases, such as when they process data in streaming, they do not have temporal locality at all and only limited spatial locality, therefore reducing the effectiveness of caches.
We present an application-driven study of several architectures that are suitable for data-intensive algorithms. Our chosen application is high-speed string matching, which exhibits two key properties of data-intensive codes: highly irregular access patterns and high-speed streaming data. Irregular access patterns appear in string matching when traversing graph-based representations of the pattern dictionaries being used. String matching is typically used in cybersecurity applications to scan incoming network traffic or files for the presence of signatures (such as specific sequences of symbols), which may relate to attack patterns, viruses, or other malware.
String Matching
String matching algorithms check and detect the presence of one or more known symbol sequences inside the analyzed data sets. Besides their wellknown application to databases and text processing, they are the basis of several other critical, real-world applications. String matching algorithms are key components of DNA and protein sequencing, data mining, security systems, such as Intrusion Detection Systems (IDS) for Networks (NIDS), Applications (APIDS), Protocols (PIDS), or Systems (Host based IDS [HIDS]), anti-virus software, and machine learning problems.