To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In Part III we described program verification for C: tools and techniques to demonstrate that C programs satisfy correctness properties. What we ultimately want is the correctness of a compiled machine language binary image, running on some target hardware platform. We will use a correct compiler that turns source-level programs satisfying correctness properties into machine-level programs satisfying those same properties. But defining formally the interface between a compiler correctness proof and a program logic has proven to be fraught with difficulties. Resolving these difficulties is still the object of ongoing research. Here we will explore some of the issues that have arisen and report on the current state of the integration effort.
The two issues that have caused the most headaches revolve around understanding and specifying how compiled programs interact with their environment. First, how should we reason about the execution environment when it may behave in unpredictable ways at runtime? In other words, how do we reason about program nondeterminism? Second, how do we specify correctness for programs that exhibit shared memory interactions?
The first question regarding nondeterminism is treated in detail in Dockins's dissertation [38]. Dockins develops a general theory of refinements for nondeterministic programs based on bisimulation methods. This theory gracefully handles the case where the execution environment is nondeterministic, and it has the critical feature that it allows programs to become more defined as they are compiled.
Predicates (of type A → Prop) in type theory give a model for Natural Deduction. A separation algebra gives a model for separation logic. We formalize these statements in Coq.
For a more expressive logic that permits general recursive types and quasi-self-reference, we use step-indexed models built with indirection theory. We will explain this in Part V; for now it suffices to say that indirection theory requires that the type T be ageable—elements of T must contain an approximation index. A given element of the model contains only a finite approximation to some ideal predicate; these approximations become weaker as we “age” them—which we do as the some operational semantics takes its steps.
To enforce that T is ageable we have a typeclass, ageable(T). Furthermore, when Separation is involved, the ageable mechanism must be compatible with the separating conjunction; this requirement is also expressed by a typeclass, Age_alg(T).
Theorem: Separation Algebras serve as a model of Separation Logic.
Proof. We express this theorem in Coq by saying that given type T, the function algNatDed models an instance of NatDed(pred T). Given a SepAlg over T, the function algSepLog models an instance of SepLog(pred T). The definability of algNatDed and algSepLog serve as a proof of the theorem.
What we show in this chapter is the indirection theory version (in the Coq file msl/alg_seplog.v), so ageable and Age-alg are mentioned from time to time.
Separation logics have assertions—for example P * (x ↦ y) * Q—that describe objects in some underlying model—for example “heaplets”—that separate in some way—such as “the heaplet satisfying P can join with (is disjoint from) the heaplet satisfying x ↦ y.” In this chapter we investigate the objects in the underlying models: what kinds of objects will we have, and what does it mean for them to join?
This study of join relations is the study of separation algebras. Once we know how the underlying objects join, this will explain the meaning of the * operator (and other operators), and will justify the reasoning rules for these operators.
In a typical separation logic, the state has a stack ρ for local variables and a heap m for pointers and arrays. Typically, m is a partial function from addresses to values. The key idea in separation logic is that that each assertion characterizes the domain of this function as well as the value of the function. The separating conjunction P * Q requires that P and Q operate on subheaps with disjoint domains.
In contrast, for the stack we do not often worry about separation: we may assume that both P and Q operate on the entirety of the stack ρ.
For now, let us ignore stacks ρ, and let us assume that assertions P are just predicates on heaps, so m ⊨ P is simply P(m).
For convenient application of the VST program logic for C light, we have synthetic or derived rules: lemmas built from common combinations of the primitive inference rules for C light. We also have proof automation: programs that look at proof goals and choose which rules to apply.
For example, consider the C-language statements x:=e→f; and e1→f := e2; where x is a variable, f is the name of a structure field, and e, e1, e2 are expressions. The first command is a load field statement, and the second is a store field. Proofs about these statements could be done using the general semax-load and semax-store rules—along with the mapsto operator—but these require a lot of reasoning about field l-values. It's best to define a synthetic field_mapsto predicate that can be used as if it were a primitive:
We do not show the definition here (see floyd/field_mapsto.v) but basically field_mapsto π τ v1v2 is a predicate meaning: τ is a struct type whose field f of type τ2 has address-offset δ from the base address of the struct; the size/signedness of f is ch, v1 is a pointer to a struct of type τ, and the heaplet contains exactly v1 + δ v2, (value v2 at address v1 + δ with permission-share π), where v2: τ2.
An important application of separation algebras is to model Hoare logics of programming languages with mutable memory. We generate an appropriate separation logic by choosing the correct semantic model, that is, the correct separation algebra. A natural choice is to simply take the program heaps as the elements of the separation algebra together with some appropriate join relation.
In most of the early work in this direction, heaps were modeled as partial functions from addresses to values. In those models, two heaps join iff their domains are disjoint, the result being the union of the two heaps. However, this simple model is too restrictive, especially when one considers concurrency. It rules out useful and interesting protocols where two or more threads agree to share read permission to an area of memory.
There are a number of different ways to do the necessary permission accounting. Bornat et al. [27] present two different methods; one based on fractional permissions, and another based on token counting. Parkinson, in chapter 5 of his thesis [74], presents a more sophisticated system capable of handling both methods. However, this model has some drawbacks, which we shall address below.
Fractional permissions are used to handle the sorts of accounting situations that arise from concurrent divide-and-conquer algorithms. In such algorithms, a worker thread has read-only permission to the dataset and it needs to divide this permission among various child threads.
Outlier detection is an important subject in machine learning and data analysis. The term outlier refers to abnormal observations that are inconsistent with the bulk of the data distribution [16, 32, 98, 240, 265, 266, 287]. Some sample applications are as follows.
• Detection of imposters or rejection of unauthorized access to computer networks.
• Genomic research – identifying abnormal gene or protein sequences.
• Biomedical, e.g. ECG arrythmia monitoring.
• Environmental safety detection, where outliers indicate abnormality.
• Personal safety, with security aids embedded in mobile devices.
For some real-world application examples, see e.g Hodge and Austin [98].
The standard approach to outlier detection is density-based, whereby the detection depends on the outlier's relationship with the bulk of the data. Many algorithms use concepts of proximity and/or density estimation in order to find outliers. However, in high-dimensional spaces the data become increasingly sparse and the notion of proximity/density has become less meaningful, and consequently model-based methods have become more appealing [1, 39]. It is also a typical assumption that a model is to be trained from only one type of (say, positive) training patterns, making it a fundamentally different problem, and thereby creating a new learning paradigm leading to one-class-based learning models.
SVM-type learning models are naturally amenable to outlier detection since certain support vectors can be identified as outliers.
The objective of cluster discovery is to subdivide a given set of training data, X ≡ (x1, x2,…, xN}, into a number of (say K) subgroups. Even with unknown class labels of the training vectors, useful information may be extracted from the training dataset to facilitate pattern recognition and statistical data analysis. Unsupervised learning models have long been adopted to systematically partition training datasets into disjoint groups, a process that is considered instrumental for classification of new patterns. This chapter will focus on conventional clustering strategies with the Euclidean distance metric. More specifically, it will cover the following unsupervised learning models for cluster discovery.
• Section 5.2 introduces two key factors – the similarity metric and clustering strategy – dictating the performance of unsupervised cluster discovery.
• Section 5.3 starts with the basic criterion and develops the iterative procedure of the K-means algorithm, which is a common tool for clustering analysis. The convergence property of the K-means algorithm will be established.
• Section 5.4 extends the basic K-means to a more flexible and versatile expectation-maximization (EM) clustering algorithm. Again, the convergence property of the EM algorithm will be treated.
• Section 5.5 further considers the topological property of the clusters, leading to the well-known self-organizing map (SOM).
• Section 5.6 discusses bi-clustering methods that allow simultaneous clustering of the rows and columns of a data matrix.
The rapid advances in information technologies, in combination with today's internet technologies (wired and mobile), not only have profound impacts on our daily lifestyles but also have substantially altered the long-term prospects of humanity. In this era of big data, diversified types of raw datasets with huge data-size are constantly collected from wired and/or mobile devices/sensors. For example, in Facebook alone, more than 250 million new photos are being added on a daily basis. The amount of newly available digital data more than doubles every two years. Unfortunately, such raw data are far from being “information” useful for meaningful analysis unless they are processed and distilled properly. The main purpose of machine learning is to convert the wealth of raw data into useful knowledge.
Machine learning is a discipline concerning the study of adaptive algorithms to infer from training data so as to extract critical and relevant information. It offers an effective data-driven approach to data mining and intelligent classification/prediction. The objective of learning is to induce optimal decision rules for classification/prediction or to extract the salient characteristics of the underlying system which generates the observed data. Its potential application domains cover bioinformatics, DNA expression and sequence analyses, medical diagnosis and health monitoring, brain–machine interfaces, biometric recognition, security and authentication, robot/computer vision, market analysis, search engines, and social network association.
In machine learning, the learned knowledge should be represented in a form that can readily facilitate decision making in the classification or prediction phase.
The traditional curse of dimensionality is often focused on the extreme dimensionality of the feature space, i.e. M. However, for kernelized learning models for big data analysis, the concern is naturally shifted to the extreme dimensionality of the kernel matrix, N, which is dictated by the size of the training dataset. For example, in some biomedical applications, the sizes may be hundreds of thousands. In social media applications, the sizes could be easily of the order of millions. This creates a new large-scale learning paradigm, which calls for a new level of computational tools, both in hardware and in software.
Given the kernelizability, we have at our disposal two learning models, respectively represented by two different kernel-induced vector spaces. Now our focus of attention should be shifted to the interplay between two kernel-induced representations. Even though the two models are theoretically equivalent, they could incur very different implementation costs for learning and prediction. For cost-effective system implementation, one should choose the lower-cost representation, whether intrinsic or empirical. For example, if the dimension of the empirical space is small and manageable, an empirical-space learning models will be more appealing. However, this will just be the opposite if the number of training vectors is extremely large, which is the case for the “big data” learning scenario. In this case, one must give a serious consideration to the intrinsic model whose cost can be controlled by properly adjusting the order of the kernel function.
Unsupervised cluster discovery is instrumental for data analysis in many important applications. It involves a process of partitioning the training dataset into disjoint groups. The performance of cluster discovery depends on several key factors, including the number of clusters, the topology of node vectors, the objective function for clustering, iterative learning algorithms (often with multiple initial conditions), and, finally, an evaluation criterion for picking the best result among multiple trials. This part contains two chapters: Chapter 5 covers unsupervised learning models employing the conventional Euclidean metric for vectorial data analysis while Chapter 6 focuses on the use of kernel-induced metrics and kernelized learning models, which may be equally applied to nonvectorial data analysis.
Chapter 5 covers several conventional unsupervised learning models for cluster discovery, including K-means and expectation-maximization (EM) learning models, which are presented in Algorithms 5.1 and 5.2, along with the respective proofs of the monotonic convergence property, in Theorems 5.1 and 5.2. By imposing topological sensitivity on the cluster (or node) structure, we can extend the basic K-means learning rule to the SOM learning model presented in Algorithm 5.3. Finally, for biclustering problems, where features and objects are simultaneously clustered, several useful coherence models are proposed.
Chapter 6 covers kernel-based cluster discovery, which is useful both for vectorial and for nonvectorial data analyses.
Machine learning has successfully led to many promising tools for intelligent data filtering, processing, and interpretation. Naturally, proper metrics will be required in order to objectively evaluate the performance of machine learning tools. To this end, this chapter will address the following subjects.
It is commonly agreed that the testing accuracy serves as a more reasonable metric for the performance evaluation of a learned classifier. Section A.1 discusses several cross-validation (CV) techniques for evaluating the classification performance of the learned models.
Section A.2 explores two important test schemes: the hypothesis test and the significance test.
Cross-validation techniques
Suppose that the dataset under consideration has N samples to be used for training the classfier model and/or estimating the classfication accuracy. Before the training phase starts, a subset of training dataset must be set aside as the testing dataset. The class labels of the test patterns are assumed to be unknown during the learning phase. These labels will be revealed only during the testing phase in order to provide the necessary guideline for the evaluation of the performance.
Some evaluation/validation methods are presented as follows.
(i) Holdout validation. N' (N' < N) samples are randomly selected from the dataset for training a classifier, and the remaining N – N' samples are used for evaluating the accuracy of the classifier. Typically, N' is about two-thirds of N. Holdout validation solves the problem of which biased estimation occurs in re-substitution by completely separating the training data from the validation data.
It is well known that a classifier's effectiveness depends strongly on the distribution of the (training and testing) datasets. Consequently, we will not know in advance the best possible classifiers for data analysis. This prompts the need to develop a versatile classifier endowed with an adequate set of adjustable parameters to cope with various real-world application scenarios. Two common ways to enhance the robustness of the classifiers are by means of (1) using a proper ridge factor to mitigate over-fitting problems (as adopted by KRR) and/or (2) selecting an appropriate number of support vectors to participate in the decision making (as adopted by SVM). Both regularization mechanisms are meant to enhance the robustness of the learned models and, ultimately, improve the generalization performance.
This chapter introduces the notion of a weight–error curve (WEC) for characterization of kernelized supervised learning models, including KDA, KRR, SVM, and Ridge-SVM. Under the LSP condition, the decision vector can be “voted” as a weighted sum of training vectors in the intrinsic space – each vector is assigned a weight in voting. The weights can be obtained by solving the kernelized learning model. In addition, each vector is also associated with an error, which is dictated by its distance from the decision boundary. In short, given a learned model, each training vector is endowed with two parameters: weight and error. These parameters collectively form the so-called WEC. The analysis of the WEC leads us to a new type of classifier named Ridge-SVM.