To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The goal of this paper is to report on a formalization of the p-adic numbers in the setting of the second author's univalent foundations program. This formalization, which has been verified in the Coq proof assistant, provides an approach to the p-adic numbers in constructive algebra and analysis.
This chapter deals with problems that cannot be readily solved with a straightforward, deterministic algorithm. This includes problems that computer scientists would describe as NP and not P (non-deterministic in polynomial time, but not solvable in polynomial time), which is a way of saying that a problem is not efficiently solvable. Whether a problem is straightforward to solve will depend on the complexity of the system. To take a classic example, solving the gravitational equations for two orbiting masses, like the Sun and Earth, is fairly easy, but adding more masses, e.g. the Moon, Mars etc., makes the problem much harder. The basic equations of the system do not have to be complicated though. Another famous (NP-hard) problem is the travelling salesman problem. Here the objective is to find the shortest route on a tour that goes through all the places on the salesman’s list. The problem is easy to describe, and it is easy to calculate the length of a solution (a route), but the number of combinations grows very quickly with the number of places to visit and so finding the best solution can be difficult. This is somewhat different to a classic optimisation problem, e.g. finding the minimum of a function, where you can typically follow gradients to home in on the answer.
When it comes to biological information there are many situations of this kind, because biology frequently deals with large and interacting systems. For example, determining the structure of a protein generally involves several thousands of atoms and in general we can only ‘solve’ the structure with good experimental data (e.g. from high-resolution X-ray crystallography); it is not sufficient to start with unstructured atoms and a physical model. However, for a complex problem like this, and in a similar vein to measuring a travelling salesman’s route, testing a given solution to see if it is better or worse can be proportionately straightforward. Referring again to protein structures, there are many methods that can quickly calculate the likelihood (or energy) of a structural model.
So far in this book the more biological chapters have focussed on sequences: a linear and effectively one-dimensional representation of biological macromolecules. Studying sequences allows us to study the flow of biological information from the genome and how DNA, RNA and protein macromolecules evolve. However, this representation is somewhat removed from the physical reality of the biochemical soup of life, which of course occurs in three-dimensional space. We can even think in terms of four dimensions, if you consider time and how things change. Naturally, change in biological molecules is at the core of all life processes; nothing stands still. Here we will keep things relatively simple, however, and will not delve into the time-dependent, dynamic aspects. Hence, this chapter simply relates to the three-dimensional arrangements of biological molecules.
Here our primary focus is on the structure of proteins and RNA. This is not to say that the structure of DNA is not important, it is of course vital, but the difference is that for proteins (and directly functional, untranslated RNA) our understanding of the way biology works is so much more dependent on a precise three-dimensional structure. DNA, with its double helix, is necessarily an inert and repetitive structure. Things happen to cause deviations from this regularity when DNA is activated and deactivated (for reading), transcribed into mRNA, replicated, repaired etc., but it is the proteins of the cell that are the causal agents for these specific events. The way that proteins interact with DNA is just one of a plethora of different actions they perform to create the life-sustaining processes within organisms. The ability of an organism’s proteins to do a multitude of, usually very precise, jobs stems from the fact that different proteins, encoded in different gene transcripts, have different sequences of amino acid residues. The combinations of amino acids cause the different protein chains, initially made in a linear way, to fold into different three-dimensional structures. It is the precision of the various protein structures, i.e. that the same amino acid sequence virtually always gives the same three-dimensional arrangement of atoms, which allows proteins to perform a task and evolve according to this task, albeit catalysing a chemical reaction, interacting with another biological molecule or whatever.
This chapter is concerned with improving Python code and we will illustrate, using short code snippets, various tips that help with speed, memory use and coding clarity. There may be several aspects of a program that we seek to improve, but we can’t necessarily expect to improve all of them all of the time. Often optimisation of a Python program is about compromise; you may make a program run faster at the expense of using more memory. Clarity is an especially important aspect that we will be mindful of when making suggestions, and in general we recommend making code more easily understood over mild improvements in performance. Finding and correcting errors in code can take a long time, sometimes longer than the program took to write in the first place, so keeping the code easy to understand is especially important.
A basic programming approach that the authors often follow, and which may be helpful for others, is a three-point plan:
Firstly, make the code work: an inelegant program is better than one that doesn’t work.
Next do it properly: with a working reference, you can take a step back and criticise your approach.
Then make it better: only once your program is working, and the general approach won’t change, is it worth optimising.
Often in biology and medicine the data people use comes in the form of an image. This could be as simple as a photograph of some cells or an image that has been constructed from other data, e.g. from an MRI scan. The images that we will be discussing in this chapter, whatever their source, will be pixmap images, also known as raster images. They will be constructed as rectangular arrays of colour or grey values, the smallest square element of which we refer to as a pixel. We will not be considering the vector graphics approach to making pictures, where the data is described in terms of lines and shape outlines. Here we will concentrate on pixel arrays, the kind of image data that comes from our digital cameras and various scientific instruments.
We will deal with pixmap images in a general, slightly mathematical way. It will not matter what the image actually represents for the most part, although we will endeavour to give examples with a biological flavour. Not so long ago images would largely be acquired by using photographic film, but now the digital camera is ubiquitous, and without the need to buy expensive film a scientist can capture as many images as time and storage capacity allow. Thus the examples presented here will often have an emphasis towards automation, and if you need to write programs dealing with biological data this will allow you to construct efficient analytical pipelines.
This chapter delves more deeply into the topic of creating custom Python objects using class definitions. Given that we have discussed the basics of object-oriented programming in Chapter 7, we now move on to illustrate how such mechanisms can be used in a practical, scientific sense. If you are interested in only a light introduction to Python, you might consider skipping this chapter on a first reading. However, the objects we discuss here will underpin many of the examples given later on in this book, in Chapters 15 and 20, so you may like to look back to see how such things are constructed.
In the previous chapter we saw how to introduce our own types of data object into Python, using classes. Here we move on to look at how to use a number of different, but connected, classes to construct what is often known as a data model. A data model is an abstract description of concepts that can be used to build a computational version of some topic or real-world situation that you are interested in. Essentially, you examine the kind of information you wish to describe and divide it up into conceptual parcels. Each of these will become one kind of computer object (a class with attributes, functions and links to other classes), which then allows you to create a synthetic analogue of the thing you are interested in. Virtually all programs, irrespective of size, rely on some kind of underlying model to organise data, although this may not use object-oriented programming and is often not formalised in any way. No data model can be expected to be a perfect computer representation of what it describes, but the idea is to make it good enough to serve a useful purpose, by having some of the properties of the things being modelled.
This chapter is all about how to make Python programs run faster. We will discuss optimising existing routines so that they take a shorter amount of time to run, above and beyond the simple Python tips and tricks discussed earlier. Initially parallel computing, where a job is split into parts and run concurrently on separate processors (or processing cores), is discussed in a basic way. For this we use modules that are available from Python 2.6 and above, which allow programs to take account of multiple processing cores present in a single computer. For the remainder of the chapter we will deal with improving the performance of a single processing job.
At the end some timing results will be given so that the reader can see how much was gained for the effort. For mathematical routines involving lots of loops it is not uncommon to get speed improvements of better than tenfold. The fine details about the logic and underlying algorithms of the examples used here will not be described; an example will be taken from earlier in the book where such things are described fully. Also, which particular example we have chosen is not especially important, other than the fact that it is a computationally intensive one that takes a noticeable time to run. It should be noted that this chapter comes with a ‘health warning’ for novice programmers, because the mainstay of the optimisation will be to move away from Python. Some of the focus will be on the low-level compiled language C, although it will be used in a way to provide a module that can still be used directly inside Python programs. The details of the C language, and how to compile it, will not be discussed and to actually learn to program in C we recommend further reading. Nonetheless, if you have no experience with C we hope that we can provide a basic appreciation of how it can help. We also consider Cython, a C-like extension to Python, which has made it possible to benefit from the speed of C without having to necessarily deal with all the complexities of C. This is particularly powerful in combination with using NumPy arrays.
Given that Python is an interpreted programming language, rather than a fast compiled language, many people do not consider it for writing programs that involve extensive numerical work. While Python programs are certainly slower to execute than the equivalent written in something like C or FORTRAN, mathematical functionality certainly exists in Python and has the inherent advantages of the language; it is easy for people to use and conveniently links to other helpful data structures. Of course speed of calculation may not be so important, for a scientific investigation it may not matter if something takes 1 second or 0.1 second to run. Fortunately, computers get faster and the Python interpreter becomes improved, so you can do quite a bit of numerical work without concern. However, if calculation speed really is important in a given situation then there are a few things you can do to make things faster while still keeping the convenience of Python. For example, you can write code in C, a very efficient numerical language, and use it from within Python (this is called a C extension), effectively extending the vocabulary of the interpreted language with speedy subroutines. More recently the language Cython has helped make C extensions very easy to write. Cython is a Python-like language, and virtually all Python programs can be interpreted by it, without alteration, but the language ultimately generates C code that can be compiled. Cython can be used to call fast library code written in pure C, and can incorporate a mixture of Python and C data structures in the same code; although less flexible, the C data structures are very efficient. Writing C extensions and Cython modules is discussed in Chapter 27.
Python includes standard arithmetic operations as part of the core functionality: add, multiply etc. There is an additional module, math, which always comes packaged with Python and which provides further numerical functionality: logarithms, trigonometry etc. For numerical calculations that are not especially intensive, the core functionality and the math module will often suffice. There has been a history of trying to provide modules for quick numerical algorithms in Python. The first attempt, begun in 1995, was called Numeric, and the second attempt was called Numarray. These two are now deemed to be obsolete, but the third attempt, begun in 2005, is called NumPy (http://numpy.scipy.org/), incorporates elements from the earlier attempts and will hopefully last longer.
Most social networks present complex structures. They can be both multi-modal and multi-relational. In addition, each relationship can be observed across time occasions. Relational data observed in such conditions can be organized into multidimensional arrays and statistical methods from the theory of multiway data analysis may be exploited to reveal the underlying data structure. In this paper, we adopt an exploratory data analysis point of view, and we present a procedure based on multiple factor analysis and multiple correspondence analysis to deal with time-varying two-mode networks. This procedure allows us to create static displays in order to explore network evolutions and to visually analyze the degree of similarity of actor/event network profiles over time while preserving the different statuses of the two modes.
Dialetheism is the metaphysical claim that there are true contradictions. And based on this view, Graham Priest and his collaborators have been suggesting solutions to a number of paradoxes. Those paradoxes include Russell’s paradox in naive set theory. For the purpose of dealing with this paradox, Priest is known to have argued against the presence of classical negation in the underlying logic of naive set theory. The aim of the present paper is to challenge this view by showing that there is a way to handle classical negation.
The Python language can be viewed as a formalised system of understanding instructions (represented by letters, numbers and other funny characters) and acting upon those directions. Quite naturally, you have to put something in to get something out, and what you are going to be passing to Python is a series of commands. Python is itself a computer program, which is designed to interpret commands that are written in the Python language, and then act by executing what these instructions direct. A programmer will sometimes refer to such commands collectively as ‘code’.
Interpreting commands
So, to our first practical point; to get the Python interpreter to do something we will give it some commands in the form of a specially created piece of text. It is possible to give Python a series of commands one at a time, as we slowly type something into our computer. However, while giving Python instructions line by line is useful if you want to test out something small, like the examples in this chapter, for the most part this method of issuing commands is impractical. What we usually do instead is create all of the lines of text representing all the instructions, written as commands in the Python language, and store the whole lot in a file. We can then activate the Python interpreter program so that it reads all of the text from the file and acts on all of the commands issued within. A series of commands that we store together in such a way, and which do a specific job, can be considered as a computer program. If you would like to try any of the examples given in the book the next chapter will tell you how to actually get started. The initial intention, however, is mostly to give you a flavour of Python and introduce a few key principles.
At some stage when writing your own programs there may come a time when you want others to be able to use what you have created without them necessarily having to know anything about programming or Python. Should this happen the next step is to consider writing a more friendly interface to the program. Once upon a time in computing everything was text-based and the user had to type commands to get things to work. Fortunately things have moved on and we are now usually presented with graphics and a pointing device, either a mouse or a touch screen, and the user can interact with graphical objects like menus and buttons.
When building a graphical user interface (GUI) the programmer must be mindful of various factors, which are sometimes antagonistic, forcing us to make compromises. For example, the designer has to strike a balance between on the one hand giving lots of functionality and on the other hand keeping things simple for novices and intuitive to use. In this chapter we will aim to give some general advice about the programming, but we leave you to make the tough choices. We wish to be clear that this chapter deals with making graphical interfaces that run on the users’ local computer. We will not venture into the world of Internet-based applications, although these are becoming increasingly important, and the Pyjamas library, which is available for Python programmers, works in a remarkably similar way to the graphical libraries discussed here.
In many areas of biological and medical science, as new techniques and machinery are developed there is a tendency to record ever increasing amounts of data. A notable example of this is comes with ‘next-generation’ DNA sequencing, which we discuss further in Chapter 17. In general though, with high-throughput methods the idea is to perform many small experiments, of the same design, in parallel. When we simultaneously detect the outcome of many assays the procedure can be described as being multiplexed. This not only has speed advantages but can also reduce costs and improve consistency between experiments. And naturally, to handle large numbers of experimental assays it is important to use computers for the processing and analysis of data.
A multitude of modern techniques involve parallel experiments, including the detection of potential drug compounds, RNA molecules, antibodies and protein crystals, to name only a few. However, in this chapter we do not have space to cover the informatics of lots of specific techniques, so instead we cover general themes, such as data organisation, normalisation and comparison. Also, all of the examples will be based on the notion of the experimental data being arranged as a rectangular array, which in turn is often a consequence of the physical manner in which the assays were performed and detected, on some form of regular grid.
For simple tasks involving short programs, you can survive perfectly well with the standard Python data types for holding information, such as lists and dictionaries. However, for more complicated tasks involving long programs, this often becomes unwieldy. There are various ways to deal with this issue, but one of the most fruitful is the ability to define your own data types: objects built to your own specification, organised in the way that is convenient to you. Modern computer languages do this via the introduction of bespoke object definitions that are known as classes and this kind of thinking is generally termed object-oriented programming.
When creating your own custom data types, the class is the definition of a particular kind of object in terms of its component features and how it is constructed or implemented in code. The term object, however, refers to a specific instance, or occurrence, of the thing which has been made according to the class definition. The making of an object of a given class is what is usually termed instantiation. A convenient analogy is to think of the blueprint for a house being like a class, but the actual, solid house being the object. Also, given a single blueprint one may build many instances of different house objects, all to the same design. It is quite common to use the words ‘class’ and ‘object’ interchangeably, even in the same context, although they mean different things, and it is important to understand the difference. As it happens, everything that is brought into existence in Python is an object, so even integer and floating point numbers are objects, although most of the time you can work without noticing that.