Standard text classification adopts the `bag of words' (BoW) model in which a document is treated as an unstructured multiset of terms and information about word position or syntactic structure is ignored. This approach works well for document topic classification but less well for sentiment or genre classification, or for (sub)sentential classification tasks such as named entity recognition, anonymisation, or (non)-speculative assertion identification (e.g. Medlock, 2006).

The RASP toolkit makes available a range of features beyond BoW, based on morphological analysis (lemmas, stems), part of speech tags, and word cooccurrences mediated by grammatical relations rather than by adjacency or windowing. These additional feature types can be made available to machine learning classifiers, and feature instances from these types that are effective for a given classification task can be selected during the training phase by the classifier for run-time application.