# Code for chapter 18

#### For instructors:

Below are listed all the scripts as shown in chapter 18 of the book. These are free to copy and paste into your code editor.

## Script 18.1

 script_18_1.py ```# Script 18.1mass = 3.4volume = 1.8density = mass/volume       # Divisiondensity = round(density, 3) # Round to three decimal placesprint(density) ``` script_18_1.out ```[ah@hobbes Documents]\$ python script_18_1.py 1.889[ah@hobbes Documents]\$ ```

## Script 18.2

 script_18_2.py ```# Script 18.2x = 2     # Integery = 5     # Integerz = 3.0   # Floating pointprint(x * y) # 10  - Integerprint(x / y) # 0.4 - Floating point due to divisionprint(x + z) # 5.0 - Floating pointt = type(x * y - z)  # Get a value's data typeprint(t)             # float ``` script_18_2.out ```[ah@hobbes Documents]\$ python script_18_2.py 1005.0[ah@hobbes Documents]\$ ```

## Script 18.3

 script_18_3.py ```# Script 18.3x = 1            # An arbitrary integerprint(x.__doc__) # Print Python documentation for this object ``` script_18_3.out `[ah@hobbes Documents]\$ python script_18_3.py int(x=0) -> int or longint(x, base=10) -> int or longConvert a number or string to an integer, or return 0 if no argumentsare given.  If x is floating point, the conversion truncates towards zero.If x is outside the integer range, the function returns a long instead.If x is not a number or if base is given, then x must be a string orUnicode object representing an integer literal in the given base.  Theliteral can be preceded by '+' or '-' and be surrounded by whitespace.The base defaults to 10.  Valid bases are 0 and 2-36.  Base 0 means tointerpret the base from the string as an integer literal.>>> int('0b100', base=0)4[ah@hobbes Documents]\$ `

## Script 18.4

 script_18_4.py ```# Script 18.4x = Trueprint(x)print(bool(0.0)) # Falseprint(bool(7))   # True ``` script_18_4.out `[ah@hobbes Documents]\$ python script_18_4.py TrueFalseTrue[ah@hobbes Documents]\$`

## Script 18.5

 script_18_5.py ```# Script 18.5x = 2print(x > 5) # Falsey = x < 3print(y)     # True ``` script_18_5.out `[ah@hobbes Documents]\$ python script_18_5.py FalseTrue[ah@hobbes Documents]\$ `

## Script 18.6

 script_18_6.py ```# Script 18.6x = 1y = -1x > y and y > 1   # False - second comparison is Falseprint(x > y and y > 1)x != 2 or y == 1  # True - first comparison is True; x does not equal 2print(x != 2 or y == 1)not (y > x)       # True - comparison is False; not False is Trueprint(not (y > x)) ``` script_18_6.out `[ah@hobbes Documents]\$ python script_18_6.py FalseTrueTrue[ah@hobbes Documents]\$ `

## Script 18.7

 script_18_7.py ```# Script 18.7x = 'Hello'y = "world"print(x,y)  # Print both values to screen ``` script_18_7.out `[ah@hobbes Documents]\$ python script_18_7.py ('Hello', 'world')[ah@hobbes Documents]\$ `

## Script 18.8

 script_18_8.py ```# Script 18.8x = '''This text flows from one lineto the next line inside triple quotes'''print(x) ``` script_18_8.out `[ah@hobbes Documents]\$ python script_18_8.py This text flows from one lineto the next line inside triple quotes[ah@hobbes Documents]\$ `

## Script 18.9

 script_18_9.py ```# Script 18.9x = 'abcde'print(len(x)) # 5 print(x[0])   # First character 'a' at index 0print(x[-1])  # Last character 'e'print(x[1:3]) # 'bcd' range from index 1, up to, but not including 3 ``` script_18_9.out `[ah@hobbes Documents]\$ python script_18_9.py 5aebc[ah@hobbes Documents]\$ `

## Script 18.10

 script_18_10.py ```# Script 18.10name = 'Mary'x = 0.12y = 34121.0a = 'Hello {}'.format(name)    # name replaces bracketsprint(a)                       # 'Hello Mary'b = 'X {:.5f} Y {:.2e}'.format(x, y)   # 5 dp float 2 dp sciprint(b)                               # 'X 0.12000 Y 3.41e4' c = '{:3d},{:3d}'.format(12,7)   # pad digits to 3 charactersprint(c)                         # ' 12,  7' ``` script_18_10.out `[ah@hobbes Documents]\$ python script_18_10.py Hello MaryX 0.12000 Y 3.41e+04 12,  7[ah@hobbes Documents]\$ `

## Script 18.11

 script_18_11.py ```# Script 18.11x = ['a', 'b', 'c', 'd'] print(x[2])   # 'c' - the character at index 2print(x[1:])  # ['b', 'c', 'd'] - from index 1 to the endx[0] = 'z'    # character at index 0 is set to 'z'del x[2]      # Deletes item at index 2print(x)      # ['z', 'b', 'd']y = [[4,7], [9,6]] # A list containing two other listsprint(y[1])        # [9,6] - the sub-list at y index 1print(y[1][0])     # 9 - item 0 from sub-list 1   ``` script_18_11.out `[ah@hobbes Documents]\$ python script_18_11.py c['b', 'c', 'd']['z', 'b', 'd'][9, 6]9[ah@hobbes Documents]\$ `

## Script 18.12

 script_18_12.py ```# Script 18.12x = list('PQRST')     # Create list from text stringprint(x)              # ['P','Q','R','S','T'] - list of charactersy = list(range(10))   # A list of the range from 0 up to <10 print(y)              # [0,1,2,3,4,5,6,7,8,9] ``` script_18_12.out `[ah@hobbes Documents]\$ python script_18_12.py ['P', 'Q', 'R', 'S', 'T'][0, 1, 2, 3, 4, 5, 6, 7, 8, 9][ah@hobbes Documents]\$ `

## Script 18.13

 script_18_13.py ```# Script 18.13x = [4, 1, 0] # Three items a, b, c = x   # a is 4, b is 1, c is 0x = ['a', 'b', 'c', 'd']print(len(x))     # 4 - number of items in listprint('c' in x)   # True - string 'c' is in the listprint(2 in x)     # False - number 2 is not in list ``` script_18_13.out `[ah@hobbes Documents]\$ python script_18_13.py 4TrueFalse[ah@hobbes Documents]\$ `

## Script 18.14

 script_18_14.py ```# Script 18.14x = (9, 3, 1, 0)  # Create a tupley = (2,)          # Tuple with one item (needs trailing comma)z = ()            # Empty tupleprint(x[-1])      # Print the last item - 0#x[0] = 6          # Does not work! Tuples cannot be changed!w = list(x)       # Create equivalent list from a tuplew[0] = 6          # List copy can be changed print('x',x)print('w',w) ``` script_18_14.out `[ah@hobbes Documents]\$ python script_18_14.py 0('x', (9, 3, 1, 0))('w', [6, 3, 1, 0])[ah@hobbes Documents]\$ `

## Script 18.15

 script_18_15.py ```# Script 18.15x = {1,2,3,4,3,2,1}  # x is {1,2,3,4} - duplicates ignoredy = set([3,4,5,6])   # y created from a listprint(len(y))        # 4print(1 in y)        # False; 1 is not in yprint(x & y)         # {3,4} - intersection, items in bothprint(x | y)         # {1,2,3,4,5,6} - union, items in either ``` script_18_15.out `[ah@hobbes Documents]\$ python script_18_15.py 4Falseset([3, 4])set([1, 2, 3, 4, 5, 6])[ah@hobbes Documents]\$ `

## Script 18.16

 script_18_16.py ```# Script 18.16d = {"G":329.21, "C":289.18, "A":313.21, "T":314.19}print(d['A'])      # 313.21 -  value associated with 'A'print(len(d))      # 4 - number of key:value pairsprint(d.keys())    # Just keys 'G', 'A', 'T', 'C'print(d.values())  # Just values 329.21, 313.21, 314.19, 289.18 ``` script_18_16.out `[ah@hobbes Documents]\$ python script_18_16.py 313.214['A', 'C', 'T', 'G'][313.21, 289.18, 314.19, 329.21][ah@hobbes Documents]\$ `

## Script 18.17

 script_18_17.py ```# Script 18.17d = {"G":329.21, "C":289.18, "A":313.21, "T":314.19}d['T'] = 304.19   # Change the value of an existing itemd['U'] = 291.08   # Add a new key:value pairprint(len(d))     # 5 - dict is larger  del d['U']        # Delete a key and its value from the dictionary ``` script_18_17.out `[ah@hobbes Documents]\$ python script_18_17.py 5[ah@hobbes Documents]\$ `

## Script 18.18

 script_18_18.py ```# Script 18.18x = ['Mon', 'Tue', 'Wed'] # A list of stringsy = ['Fri', 'Sat', 'Sun'] # And anotherx.append('Thu')  # Add a single new item to endprint(x)x.extend(y)       # Extend with items from another collection print(x)x.sort()         # Sort contents alphabeticallyprint(x)x.remove('Sun')  # Remove an itemprint(x)x.index('Sat')   # Positional index of an itemprint(x) ``` script_18_18.out `[ah@hobbes Documents]\$ python script_18_18.py ['Mon', 'Tue', 'Wed', 'Thu']['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']['Fri', 'Mon', 'Sat', 'Sun', 'Thu', 'Tue', 'Wed']['Fri', 'Mon', 'Sat', 'Thu', 'Tue', 'Wed']['Fri', 'Mon', 'Sat', 'Thu', 'Tue', 'Wed'][ah@hobbes Documents]\$ `

## Script 18.19

 script_18_19.py ```# Script 18.19s = {'G', 'C', 'A', 'T'}   # A set with 4 stringst = {'N', 'R', 'Y'}print(s)s.add('U')       # Add a single item (if not present) print(s)s.update(t)      # Add any new items from another collectionprint(s) ``` script_18_19.out `[ah@hobbes Documents]\$ python script_18_19.py set(['A', 'C', 'T', 'G'])set(['A', 'C', 'U', 'T', 'G'])set(['A', 'C', 'G', 'N', 'R', 'U', 'T', 'Y'])[ah@hobbes Documents]\$`

## Script 18.20

 script_18_20.py ```# Script 18.20x = 3if x < -1 or x > 1:    # Run the next indented lines only when true       x *= 2                    print('Value was doubled')  print('Value is:', x)  # Always executed, not in indented block ``` script_18_20.out `[ah@hobbes Documents]\$ python script_18_20.py Value was doubled('Value is:', 6)[ah@hobbes Documents]\$ `

## Script 18.21

 script_18_21.py ```# Script 18.21x = 3print(x)if x > 0:    print('Positive')elif x < 0:            # Checked if first condition was false    print('Negative')else:                  # If all fails    print('Zero')x = -5print(x)if x > 0:    print('Positive')elif x < 0:            # Checked if first condition was false    print('Negative')else:                  # If all fails    print('Zero') ``` script_18_21.out `[ah@hobbes Documents]\$ python script_18_21.py 3Positive-5Negative[ah@hobbes Documents]\$ `

## Script 18.22

 script_18_22.py ```# Script 18.22x = 'abc'if x:              # Test innate truth of x    print('true')  # This prints; a non-empty string is trueelse:    print('false') ``` script_18_22.out `[ah@hobbes Documents]\$ python script_18_22.py true[ah@hobbes Documents]\$ `

## Script 18.23

 script_18_23.py ```# Script 18.23total = 0data = [1,4,9,25,36]for x in data:       # x is first 1, then 4, then 9 etc.    print(x)         # Current value of x in this cycle    total += x       # Add current value of x to total ``` script_18_23.out `[ah@hobbes Documents]\$ python script_18_23.py 1492536[ah@hobbes Documents]\$ `

## Script 18.24

 script_18_24.py ```# Script 18.24text = 'AGCAGTAGACGAACAT'     # String of charactersfor i, x in enumerate(text):  # Extract index and character value    print(i, x)               # Print index and value for each cycle ``` script_18_24.out `[ah@hobbes Documents]\$ python script_18_24.py (0, 'A')(1, 'G')(2, 'C')(3, 'A')(4, 'G')(5, 'T')(6, 'A')(7, 'G')(8, 'A')(9, 'C')(10, 'G')(11, 'A')(12, 'A')(13, 'C')(14, 'A')(15, 'T')[ah@hobbes Documents]\$ `

## Script 18.25

 script_18_25.py ```# Script 18.25x = 1while x < 1000:   # Repeat the indented block while this is true    print(x)     x *= 2        # Double the valueprint(x)  # 1024 - final value stopped the loop: not less than 1000 ``` script_18_25.out `[ah@hobbes Documents]\$ python script_18_25.py 12481632641282565121024[ah@hobbes Documents]\$ `

## Script 18.26

 script_18_26.py ```# Script 18.26t = 0data = [3, -1, 2, -5, None, 9, -2]for x in data:     if x < 0:       continue      # Skip the remainder of 'for' loop   elif x is None:       break         # Quit entirely        t += x * x        # Otherwise do a calculation   print(t)print('Final',t) ``` script_18_26.out `[ah@hobbes Documents]\$ python script_18_26.py 91394('Final', 94)[ah@hobbes Documents]\$  `

## Script 18.27

 script_18_27.py ```# Script 18.27squares = []for x in range(1, 10):    squares.append(x*x)  # 1, 4, 9, 16, 25, 36...    print('Option 1',squares)squares2 = []squares2 = [x*x for x in range(1,10)]print('Option 2',squares2) ``` script_18_27.out `[ah@hobbes Documents]\$ python script_18_27.py ('Option 1', [1])('Option 1', [1, 4])('Option 1', [1, 4, 9])('Option 1', [1, 4, 9, 16])('Option 1', [1, 4, 9, 16, 25])('Option 1', [1, 4, 9, 16, 25, 36])('Option 1', [1, 4, 9, 16, 25, 36, 49])('Option 1', [1, 4, 9, 16, 25, 36, 49, 64])('Option 1', [1, 4, 9, 16, 25, 36, 49, 64, 81])('Option 2', [1, 4, 9, 16, 25, 36, 49, 64, 81])[ah@hobbes Documents]\$ `

## Script 18.28

 script_18_28.py ```# Script 18.28x = 1y = 0try:                 # Run the following block and check for failure    w = x / yexcept ZeroDivisionError as err:        # y was zero   print('divided by zero, continuing') # warn, but otherwise ignoreexcept Exception as err:   # Other, unspecified problem    raise(err)             # Trigger the error, do not continue ``` script_18_28.out `[ah@hobbes Documents]\$ python script_18_28.py divided by zero, continuing[ah@hobbes Documents]\$ `

## Script 18.29

 script_18_29.py ```# Script 18.29def my_calc(x, y=1):     # x is mandatory, y defaults to 1 if not given    z = x * x - y * y    return za = my_calc(7)           # x is 7, y defaults to 1  b = my_calc(x=2, y=2)    # name both argumentsc = my_calc(y=9, x=-1)   # name arguments in a different orderd = my_calc(3, y=-2)     # unnamed arguments (x is 3) come firstprint(a, b, c, d) ``` script_18_29.out `[ah@hobbes Documents]\$ python script_18_29.py (48, 0, -80, 5)[ah@hobbes Documents]\$`

## Script 18.30

 script_18_30.py ```# Script 18.30class Person():         # Next indented block is in the class definition  def __init__(self, name, age):  # Values specified when object is made    self.name = name              # Link input values to the object    self.age = age  def get_first_name(self):       # A second, custom function    names = self.name.split()     # self refers to the run-time object    return names[0]               # Give back first wordp1 = Person('Lisa Simpson', 8)    # Make object of Person classp2 = Person('Bart Simpson', 10)   # Make anotherprint(p1.age, p2.age)             # Values linked to objects - 8, 10 print(p1.get_first_name())        # Run a linked function - gives 'Lisa' ``` script_18_30.out `[ah@hobbes Documents]\$ python script_18_30.py (8, 10)Lisa[ah@hobbes Documents]\$ `

## Script 18.31

 script_18_31.py ```# Script 18.31import math           # Import the inbuilt mathematics moduleprint(math.e)         # An attribute representing eprint(math.exp(2.0))  # Use the exponent (ex) function ``` script_18_31.out `[ah@hobbes Documents]\$ python script_18_31.py 2.718281828467.38905609893[ah@hobbes Documents]\$`

## Script 18.32

 script_18_32.py ```# Script 18.32import math as m      # Import as a different nameprint(m.log(2.0))     # Use the logarithm function ``` script_18_32.out `[ah@hobbes Documents]\$ python script_18_32.py 0.69314718056[ah@hobbes Documents]\$ `

## Script 18.33

 script_18_33.py ```# Script 18.33from math import sqrt, cos   # Import named module componentsx = sqrt(3/2)                # Use the square root functionprint(cos(x))                # Use the cosine function ``` script_18_33.out `[ah@hobbes Documents]\$ python script_18_33.py 0.540302305868[ah@hobbes Documents]\$ `

## Script 18.34

 script_18_34.py ```# Script 18.34import sysprint(sys.argv) # List of words typed at the command line after "python"print(sys.path) # Directory search path used to locate Python modules sys.exit()      # Quit the Python program ``` script_18_34.out `[ah@hobbes Documents]\$ python script_18_34.py ['script_18_34.py']['/home/ah/Documents', '/usr/lib64/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-packages', '/usr/lib64/python2.7/site-packages/Numeric', '/usr/lib64/python2.7/site-packages/gtk-2.0', '/usr/lib64/python2.7/site-packages/wx-3.0-gtk3', '/usr/lib/python2.7/site-packages'][ah@hobbes Documents]\$ `

## Script 18.35

 script_18_35.py ```# Script 18.35import randomrandom.seed(3)                   # Set the random number seedprint(random.randint(1,10))      # A random integer from 1 to 10 inc.print(random.uniform(0.0, 2.0))  # A random float between 0.0 and 2.0data = [1,2,3,4,5]random.shuffle(data)             # Shuffle list order ``` script_18_35.out `[ah@hobbes Documents]\$ python script_18_35.py 31.08845845059[ah@hobbes Documents]\$ `

## Script 18.36

 script_18_36.py ```# Script 18.36import repattern = re.compile('\d+') # Make a regular expression object                            # (one or more digits)text = 'A 123 B 456'        # A string to look in match_obj = pattern.search(text)  # Match inside string  print(match_obj.group())          # '123' - matching substringprint(match_obj.start)            # 2 - position of matchtext_2 = pattern.sub(text, '**')  # Substitute all matches with '**'print(text_2)                     # 'A ** B **' hits = pattern.findall(text)      # List of matching sub-stringsprint(hits)                       # ['123', '456']) ``` script_18_36.out `[ah@hobbes Documents]\$ python script_18_36.py 123**['123', '456'][ah@hobbes Documents]\$`

## Script 18.37

 script_18_37.py ```# Script 18.37import sysprint(sys.path)                        # Current module search pathsys.path.append('./')                  # Directory with my_module.pyimport my_module                       # Import custom module (no .py)print(my_module.CONSTANT)       # Use variable from other Python fileprint(my_module.my_calc(9,-1))  # Use function from other Python file ``` script_18_37.out `[ah@hobbes Documents]\$ python script_18_37.py ['/home/ah/Documents', '/usr/lib64/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-packages', '/usr/lib64/python2.7/site-packages/Numeric', '/usr/lib64/python2.7/site-packages/gtk-2.0', '/usr/lib64/python2.7/site-packages/wx-3.0-gtk3', '/usr/lib/python2.7/site-packages']1.0545718e-3498[ah@hobbes Documents]\$ `

## Script 18.38

 script_18_38.py ```# Script 18.38def my_calc(x, y):             # Example function    return x*x - 2*x*y - y*yif __name__ == '__main__':     # Is the code run as a main program?    print(my_calc(2,3))        # Run test code only when it is main    print(my_calc(-1,0))       # Module imports do not run this block ``` script_18_38.out `[ah@hobbes Documents]\$ python script_18_38.py -171[ah@hobbes Documents]\$`

## Script 18.39

 script_18_39.py `# Script 18.39path = 'data_file.txt'            # Location of data in file systemfile_obj = open(path, 'r')        # Create file object for readingdata = file_obj.read()            # Read all the data (as a string)print(data)` script_18_39.out `[ah@hobbes Documents]\$ python script_18_39.py Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.[ah@hobbes Documents]\$ `

## Script 18.40

 script_18_40.py ```# Script 18.1mass = 3.4volume = 1.8density = mass/volume       # Divisiondensity = round(density, 3) # Round to three decimal placesprint(density) ``` script_18_40.out `[ah@hobbes Documents]\$ python script_18_40.py ** First read **Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.** Second read **** Third read **Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.[ah@hobbes Documents]\$ `

## Script 18.41

 script_18_41.py ```# Script 18.41path = 'data.txt'           # Location of data in file systemfile_obj = open(path, 'r')  # Create file object for readingfor line in file_obj:       # Loop through all lines in file    print(line) ``` script_18_41.out `[ah@hobbes Documents]\$ python script_18_41.py chr        mb_size        proteins1        249.250        20122        243.199        12033        198.022        1040[ah@hobbes Documents]\$ `

## Script 18.42

 script_18_42.py ```# Script 18.42f_name = 'data.txt'total = 0with open(f_name) as file_obj:    file_obj.readline()         # Read first header line; not used    for line in file_obj:       # Go through each remaining line        data = line.split()     # Split line at whitespace into a list        size = float(data[1])   # Make a number from column [1]        total += size           # Increase total        print(size,total) ``` script_18_42.out `[ah@hobbes Documents]\$ python script_18_42.py (249.25, 249.25)(243.199, 492.449)(198.022, 690.471)[ah@hobbes Documents]\$ `

## Script 18.43

 script_18_43.py ```# Script 18.43import sys                 # Access the sys modulepy_script = sys.argv[0]    # 'script_18_43.py'   data_file = sys.argv[1]    # 'inputFile.txt'data = open(data_file).read()  # Make file object and read its dataprint(data_file)print(data) ``` script_18_43.out `[ah@hobbes Documents]\$ python script_18_43.py inputFile.txt inputFile.txt  SampleID Gender Weight Age Affected1       S1   Male   66.3  20     TRUE2       S2   Male   80.2  21    FALSE3       S3 Female   53.1  29     TRUE4       S4 Female   21.3  18     TRUE5       S5 Female   54.6  NA    FALSE6       S6 Female   78.1  23     TRUE[ah@hobbes Documents]\$ `

## Script 18.44

 script_18_44.py ```# Script 18.44from Bio import SeqIO    # Load BioPython module; must be installed file_name = "demoSequences.fasta"     # Location of datafile_obj = open(file_name, "rU")               # Universal read modefor protein in SeqIO.parse(file_obj, 'fasta'):  # Go through each entry  print(protein.id)                            # The ID of seq record  print(protein.seq)                           # The actual sequencefile_obj.close() ``` script_18_44.out ```[ah@hobbes Documents]\$ python script_18_44.py sp|P52907|CAPZA1_HUMANMADFDDRVSDEEKVRIAAKFITHAPPGEFNEVFNDVRLLLNNDNLLREGAAHAFAQYNMDQFTPVKIEGYEDQVLITEHGDLGNSRFLDPRNKISFKFDHLRKEASDPQPEEADGGLKSWRESCDSALRAYVKDHYSNGFCTVYAKTIDGQQTIIACIESHQFQPKNFWNGRWRSEWKFTITPPTAQVVGVLKIQVHYYEDGNVQLVSHKDVQDSLTVSNEAQTAKEFIKIIENAENEYQTAISENYQTMSDTTFKALRRQLPVTRTKIDWNKILSYKIGKEMQNAsp|Q9Y3C0|CCD53_HUMANMDEDGLPLMGSGIDLTKVPAIQQKRTVAFLNQFVVHTVQFLNRFSTVCEEKLADLSLRIQQIETTLNILDAKLSSIPGLDDVTVEVSPLNVTSVTNGAHPEATSEQPQQNSTQDSGLQESEVSAENILTVAKDPRYARYLKMVQVGVPVMAIRNKMISEGLDPDLLERPDAPVPDGESEKTVEESSDSESSFSD1V1A.pdb_AMLEVVTAGEPLVALVPQEPGHLRGKRLLEVYVGGAEVNVAVALARLGVKVGFVGRVGEDELGAMVEERLRAEGVDLTHFRRAPGFTGLYLREYLPLGQGRVFYYRKGSAGSALAPGAFDPDYLEGVRFLHLSGITPALSPEARAFSLWAMEEAKRRGVRVSLDVNYRQTLWSPEEARGFLERALPGVDLLFLSEEEAELLFGRVEEALRALSAPEVVLKRGAKG-AWAFVDGRRVEGSAFAVEAVDPVGAGDAFAAGYLAGAVWGLPVEERLRLANLLGASVAASRGDHEGAPYREDLEVLLKH.MGKDNYDVVVIGAVGVDTNVYLPGQDIDFDVEANFTTNIDYPGLAGSYTSRGFARLVDKVAVIAYIGEDHNGWYVKNELEGDGIDCTFFLDPVGTKRSINFMYKDGRRKNFYDGKASMEVKPDIDICKSILKKTNLAHFNIVNWTRYLLPVARELGVTISCDIQDITDLDDEYRQDFINYADVLFFSAVNFKDPVPLIKEFIKRKPDRIVICGMGDKGCALGTEQGIKFYKALDMLEPVVDTNGAGDGLAVGFLSSYFIDGFSLEDSILRGQIVARYTCTKKATSSELITGDKLNKYFEEMKESG[ah@hobbes Documents]\$ ```

## Script 18.45

 script_18_45.py ```# Script 18.45import pandasdata_set = pandas.read_csv('in.csv', sep='\t',                           header=None, names=['A','B','C'])for col in data_set:            # Loop through column names  for value in data_set[col]:   # Column names act as keys    print(col, value) ``` script_18_45.out ```[ah@hobbes Documents]\$ python script_18_45.py ('A', 'chr')('A', '1')('A', '2')('A', '3')('B', 'mb_size')('B', '249.250')('B', '243.199')('B', '198.022')('C', 'proteins')('C', '2012')('C', '1203')('C', '1040')[ah@hobbes Documents]\$ ```

## Script 18.46

 script_18_46.py ```# Script 18.46import sqlite3                  # Installed as standardconnection = sqlite3.connect('genome_example.sqlite')cursor = connection.cursor()    # Make a cursormake_table_smt = """ CREATE TABLE Genome (   build_id VARCHAR(12),  genus TEXT NOT NULL,   species TEXT NOT NULL,  strain TEXT,  num_chromsomes INT,  mb_length FLOAT,  pubmed_id VARCHAR(12),  PRIMARY KEY (build_id));"""cursor.execute(make_table_smt)  # Execute SQL string to make tablecursor.close()                  # The cursor is finishedconnection.commit()             # Commit the changes to the database ``` script_18_46.out `[ah@hobbes Documents]\$ ls -al genome_example.sqlite -rw-r--r--. 1 ah ah 3072 Jan  5 18:19 genome_example.sqlite[ah@hobbes Documents]\$`

## Script 18.47

 script_18_47.py ```# Script 18.47import sqlite3                  # Installed as standardconnection = sqlite3.connect('genome_example.sqlite')cursor = connection.cursor()    # A new cursor into the database# Create the INSERT statement using "\" to split over two linesadd_genome_smt = 'INSERT INTO Genome (build_id, genus, species, ' \                 'strain, num_chromsomes) VALUES (?, ?, ?, ?, ?)' mouse_data = ['GRCm38', 'Mus', 'musculus', 'C57BL/6', 22] # Data to addcursor.execute(add_genome_smt, mouse_data)                # Add one rowmulti_data = [['GRCh38', 'Homo', 'sapiens', None, 25],    # Data to add              ['GRCh37', 'Homo', 'sapiens', None, 25],              ['BDGP6', 'Drosophila', 'melanogaster', '2057', 7]]cursor.executemany(add_genome_smt, multi_data)            # Add many rowsset_size_smt = 'UPDATE Genome SET mb_length=? WHERE build_id=?'cursor.execute(set_size_smt, [142.573, 'BDGP6'])          # Alter a row cursor.execute(set_size_smt, [3482.010, 'GRCm38'])get_organism_smt = 'SELECT genus, species FROM Genome WHERE build_id=?'genus, species = cursor.execute(get_organism_smt, ['GRCh37']).fetchone()smt = 'SELECT build_id, genus, species, mb_length FROM Genome'for build, g, s, sz in cursor.execute(smt):  print(build, g, s, sz)   cursor.close() ``` script_18_47.out `[ah@hobbes Documents]\$ python script_18_47.py (u'GRCm38', u'Mus', u'musculus', 3482.01)(u'GRCh38', u'Homo', u'sapiens', None)(u'GRCh37', u'Homo', u'sapiens', None)(u'BDGP6', u'Drosophila', u'melanogaster', 142.573)[ah@hobbes Documents]\$ `

## Script 18.48

 script_18_48.py ```# Script 18.48from numpy import array       # Import the function to make ndarrays x = array([[1,2,3],[4,5,6]])  # Make array from list of listsprint(x.shape)                # (2,3) - rows x columnsprint(x.size)                 # 6 - elements in totalprint(x.ndim)                 # 2 - two axesprint(x.dtype)                # int - whole numbersy = array(x)                  # Copy an arrayy.reshape((3,2))              # [[1,2], [3,4], [5,6]] - Convert to 3 x 2 z = array((3,5,1,2), float)   # Make array forced as floating pointprint('y',y)print('z',z) ``` script_18_48.out `[ah@hobbes Documents]\$ python script_18_48.py (2, 3)62int64('y', array([[1, 2, 3],       [4, 5, 6]]))('z', array([ 3.,  5.,  1.,  2.]))[ah@hobbes Documents]\$ `

## Script 18.49

 script_18_49.py ```# Script 18.49from numpy import array # Import the function to make ndarrays x = array([5.1, 6.2, 7.3, 8.4, 9.5]) # Flat array of five float numbersx[2]         # 7.3                    - number at index 2x[1:4]       # array([6.2, 7.3, 8.4]) - slice range makes new arrayprint('x',x)y = array([[1,2,3], [4,5,6]]) # Two dimensional arrayy[1,2]       # 6                       - row one, column twoy[0]         # array([1, 2, 3])        - entire row zeroy[0,:]       # array([1, 2, 3])        - row zero, all columns (same)y[:,2]       # array([3, 6])           - all rows, column twoy[-1,:]      # array([4, 5, 6])        - last row, all columnsy[:,1:]      # array([[2, 3],[5, 6]])  - column one onwards y[1,0:2]     # array([4, 5])           - row one, first two columnsy[::-1,:]    # array([[4, 5, 6],[1, 2, 3]]) - reversed rowsy[:,(2,1,0)] # array([[3, 1, 2],[6, 4, 5]]) - new column orderprint('y',y) ``` script_18_49.out `[ah@hobbes Documents]\$ python script_18_49.py ('x', array([ 5.1,  6.2,  7.3,  8.4,  9.5]))('y', array([[1, 2, 3],       [4, 5, 6]]))[ah@hobbes Documents]\$ `

## Script 18.50

 script_18_50.py ```# Script 18.50import numpy as npa = np.zeros((2,3))          # 2 x 3 array full of 0.0b = np.ones((3,2), int)      # 3 x 2 array full of 1c = np.full((2,2), 7.0)      # 2 x 2 array full of 7.0d = np.identity(3)           # 3 x 3 identity (1 on diagonal 0 elsewhere) e = np.arange(1.0, 3.0, 0.5) # [1.0, 1.5, 2.0, 2.5]                             # From 1.0 up to <3.0 in steps of 0.5print('a',a)print('b',b)print('c',c)print('d',d)print('e',e)x = np.array([1.0, 2.0, 3.0])y = np.array([3.0, 4.0, 5.0])x + 2    # array([3.0, 4.0, 5.0]) - Add 2 to all valuesy / 2    # array([1.5, 2.0, 2.5]) - Divide all values by 2x + y    # array([4.0, 6.0, 8.0]) - i.e. 1+3, 2+4, 3+5 x * y    # array([3.0, 8.0, 15.0])    x - y    # array([-2.0, -2.0, -2.0])x / y    # array([0.33333333, 0.5, 0.6])angles  = np.array([30.0, 60.0, 90.0, 135.0])radians = np.radians(angles)cosines = np.cos(radians)     # array([0.866, 0.50, 0.0, -0.707])print('angles',angles)print('radians',radians)print('cosines',cosines)np.log([10.0, 2.71828, 1.0])  # array([2.302585, 1.0, 0.0])np.exp([2.302585, 1.0, 0.0])  # array([10.0, 2.71828, 1.0]) ``` script_18_50.out `[ah@hobbes Documents]\$ python script_18_50.py ('a', array([[ 0.,  0.,  0.],       [ 0.,  0.,  0.]]))('b', array([[1, 1],       [1, 1],       [1, 1]]))('c', array([[ 7.,  7.],       [ 7.,  7.]]))('d', array([[ 1.,  0.,  0.],       [ 0.,  1.,  0.],       [ 0.,  0.,  1.]]))('e', array([ 1. ,  1.5,  2. ,  2.5]))('angles', array([  30.,   60.,   90.,  135.]))('radians', array([ 0.52359878,  1.04719755,  1.57079633,  2.35619449]))('cosines', array([  8.66025404e-01,   5.00000000e-01,   6.12323400e-17,        -7.07106781e-01]))[ah@hobbes Documents]\$ `

## Script 18.51

 script_18_51.py ```# Script 18.51from numpy import arange, exp, fft, random, pil = 0.04w = 0.1t = arange(0.0, 100.0, 1.0)       # A range of time valuesx = exp(2j*pi*w*t) * exp(-l*t)    # Wave equation using complex numbersy = fft.fft(x)                    # Fast Fourier transform array of wavea = random.uniform(0.0, 5.0, 100) # Uniform sample 100 values in [0, 5)b = random.normal(0.0, 2.5, 100)  # Sample normal distribution                                  # Mean 0.0, standard deviation 2.5print(x[0],y[0])print(a[0],b[0]) ``` script_18_51.out `[ah@hobbes Documents]\$ python script_18_51.py ((1+0j), (0.59324391476340854+1.5043545361088564j))(1.7221136377293216, 3.4608000427150243)[ah@hobbes Documents]\$ `

## Script 18.52

 script_18_52.py ```# Script 18.52from matplotlib import pyplotfrom numpy import arange, random  x_vals = arange(1000)y_vals = random.poisson(5, 1000)pyplot.scatter(x_vals, y_vals, s=4, marker='*') # Make scatter plotpyplot.savefig("script_18_52.png", dpi=300)     # Save graph as an imagepyplot.show()                                   # Show on screen ``` script_18_52.out `[ah@hobbes Documents]\$ python script_18_52.py [ah@hobbes Documents]\$ ls -al script_18_52.png -rw-rw-r--. 1 ah ah 78409 Jan  5 22:25 script_18_52.png[ah@hobbes Documents]\$ ` script_18_52.png

## Script 18.53

 script_18_53.py ```# Script 18.53from scipy import ndimageimg_file = 'my_image.png'pixmap = ndimage.imread(img_file)     # Read image data as arrayheight, width, channels = pixmap.shape           red_channel   = pixmap[:,:,0]         # Color channels are last axisgreen_channel = pixmap[:,:,1]print(red_channel) ``` script_18_53.out `[ah@hobbes Documents]\$ python script_18_53.py [[ 0  0  0 ...,  0  0  0] [ 0  0  0 ...,  0  0  0] [ 3  2 14 ...,  8 17 23] ...,  [18 18  7 ..., 18  6  1] [10  9  9 ..., 19  9  7] [11  1  1 ...,  8  1  3]][ah@hobbes Documents]\$`

## Script 18.54

 script_18_54.py ```# Script 18.54def rev_complement(seq):       # Define function name and input argument  rev_map = {'A':'T','T':'A',  # DNA letter mapping dictionary             'G':'C','C':'G'}  rc_seq = ''                  # Rev. comp. string is initially empty  rev_comp = ''  for x in seq:                # Go through each seq letter in turn    y = rev_map[x]             # look up comp. letter in dictionary     rev_comp = y + rev_comp    # Rev. comp. string grows                               # New letter added to _beginning_    return rev_comp              # Rev. comp string sent as output f_seq = 'AGCATAAGAATAGCAGCAGCGCGA'  # A test DNA one-letter sequencer_seq = rev_complement(f_seq)       # Run function on test sequenceprint(f_seq, r_seq)                 # Print original and rev. complement ``` script_18_54.out `[ah@hobbes Documents]\$ python script_18_54.py ('AGCATAAGAATAGCAGCAGCGCGA', 'TCGCGCTGCTGCTATTCTTATGCT')[ah@hobbes Documents]\$ `

## Script 18.55

 script_18_55.py ```# Script 18.55def calc_seq_ident(seq_a, seq_b):  # Define func. taking two input args  n = min(len(seq_a), len(seq_b))  # Find minimum input seq. length  count = 0.0                      # Starting identity count is zero  for i in range(n):          # Loop through valid position indices    a = seq_a[i]              # Letter at index i for first seq    b = seq_b[i]              # Letter at index i for second seq    if a == b and a != '-':   # Test if letters are same and not a gap      count += 1.0            # .. if they are increase count by one  return 100.0 * count/n      # Calc. and send back identity as % total   seq1 = 'ALIGDPVENTS'seq2 = 'ALIGN-MENTS'x = calc_seq_ident(seq1, seq2) # Run on some test sequencesprint('Seq identity:', x, '%') # x is 72.7 ``` script_18_55.out `[ah@hobbes Documents]\$ python script_18_55.py ('Seq identity:', 72.72727272727273, '%')[ah@hobbes Documents]\$ `

## Script 18.56

 script_18_56.py ```# Script 18.56def download_db_id(db_id, url, file_name=None): # Save file is optional  #from urllib.request import urlopen    # python 3.0 Import urlopen() function    from urllib2 import urlopen            # python 2.7 Import urlopen() function    response = urlopen(url.format(db_id))  # Makes object to handle link  data = response.read()                 # Read data from URL  data = data.decode('utf-8')            # Interpret data as plain text    if file_name:                       # If save file was specified...    file_obj = open(file_name, 'w')   # Create file object for writing    file_obj.write(data)              # Save all data to file    file_obj.close()  return data                         # Hand back data from functionUP_FASTA_URL = 'http://www.uniprot.org/uniprot/{}.fasta'PDB_URL = 'http://www.rcsb.org/pdb/cgi/export.cgi/' \          '{}.pdb?format=PDB&compression=None'download_db_id('1AFO', PDB_URL, '1AF0.pdb')   # Save .pdb filedata = download_db_id('P02788', UP_FASTA_URL) # Get FASTA data, no save print(data)                                   # Show downloaded data ``` script_18_56.out `[ah@hobbes Documents]\$ python script_18_56.py >sp|P02788|TRFL_HUMAN Lactotransferrin OS=Homo sapiens GN=LTF PE=1 SV=6MKLVFLVLLFLGALGLCLAGRRRSVQWCAVSQPEATKCFQWQRNMRKVRGPPVSCIKRDSPIQCIQAIAENRADAVTLDGGFIYEAGLAPYKLRPVAAEVYGTERQPRTHYYAVAVVKKGGSFQLNELQGLKSCHTGLRRTAGWNVPIGTLRPFLNWTGPPEPIEAAVARFFSASCVPGADKGQFPNLCRLCAGTGENKCAFSSQEPYFSYSGAFKCLRDGAGDVAFIRESTVFEDLSDEAERDEYELLCPDNTRKPVDKFKDCHLARVPSHAVVARSVNGKEDAIWNLLRQAQEKFGKDKSPKFQLFGSPSGQKDLLFKDSAIGFSRVPPRIDSGLYLGSGYFTAIQNLRKSEEEVAARRARVVWCAVGEQELRKCNQWSGLSEGSVTCSSASTTEDCIALVLKGEADAMSLDGGYVYTAGKCGLVPVLAENYKSQQSSDPDPNCVDRPVEGYLAVAVVRRSDTSLTWNSVKGKKSCHTAVDRTAGWNIPMGLLFNQTGSCKFDEYFSQSCAPGSDPRSNLCALCIGDEQGENKCVPNSNERYYGYTGAFRCLAENAGDVAFVKDVTVLQNTDGNNNEAWAKDLKLADFALLCLDGKRKPVTEARSCHLAMAPNHAVVSRMDKVERLKQVLLHQQAKFGRNGSDCPDKFCLFQSETKNLLFNDNTECLARLHGKTTYEKYLGPQYVAGITNLKKCSTSPLLEACEFLRK[ah@hobbes Documents]\$ ls -al 1AF0.pdb -rw-rw-r--. 1 ah ah 2182302 Jan  5 22:39 1AF0.pdb[ah@hobbes Documents]\$ `

## Script 18.57

 script_18_57.py ```# Script 18.57def download_db_id(db_id, url, file_name=None): # Save file is optional  #from urllib.request import urlopen    # python 3.0 Import urlopen() function    from urllib2 import urlopen            # python 2.7 Import urlopen() function    response = urlopen(url.format(db_id))  # Makes object to handle link  data = response.read()                 # Read data from URL  data = data.decode('utf-8')            # Interpret data as plain text    if file_name:                       # If save file was specified...    file_obj = open(file_name, 'w')   # Create file object for writing    file_obj.write(data)              # Save all data to file    file_obj.close()  return data                         # Hand back data from functiondef get_uniprot_seq_record(db_id):   import os   from Bio import SeqIO   UNIPROT_FASTA_URL = 'http://www.uniprot.org/uniprot/{}.fasta'   file_name = db_id + '.fasta'  # Add extension to ID to make file name      if not os.path.exists(file_name):     download_db_id(db_id, UNIPROT_FASTA_URL, file_name)      file_obj = open(file_name)    # Open file for reading      for seq_record in SeqIO.parse(file_obj, 'fasta'):     return seq_record           # Give back first record encounteredsr = get_uniprot_seq_record('P18754')print(sr.id, sr.seq) ``` script_18_57.out `[ah@hobbes Documents]\$ python script_18_57.py ('sp|P18754|RCC1_HUMAN', Seq('MSPKRIAKRRSPPADAIPKSKKVKVSHRSHSTEPGLVLTLGQGDVGQLGLGENV...EQS', SingleLetterAlphabet()))[ah@hobbes Documents]\$ ls -al P18754.fasta -rw-rw-r--. 1 ah ah 522 Jan  5 22:48 P18754.fasta[ah@hobbes Documents]\$ `

## Script 18.58

 script_18_58.py ```# Script 18.58def download_db_id(db_id, url, file_name=None): # Save file is optional  #from urllib.request import urlopen    # python 3.0 Import urlopen() function    from urllib2 import urlopen            # python 2.7 Import urlopen() function    response = urlopen(url.format(db_id))  # Makes object to handle link  data = response.read()                 # Read data from URL  data = data.decode('utf-8')            # Interpret data as plain text    if file_name:                       # If save file was specified...    file_obj = open(file_name, 'w')   # Create file object for writing    file_obj.write(data)              # Save all data to file    file_obj.close()  return data                         # Hand back data from functiondef get_uniprot_seq_record(db_id):   import os   from Bio import SeqIO   UNIPROT_FASTA_URL = 'http://www.uniprot.org/uniprot/{}.fasta'   file_name = db_id + '.fasta'  # Add extension to ID to make file name      if not os.path.exists(file_name):     download_db_id(db_id, UNIPROT_FASTA_URL, file_name)      file_obj = open(file_name)    # Open file for reading      for seq_record in SeqIO.parse(file_obj, 'fasta'):     return seq_record           # Give back first record encountereddef blast_search(seq, database, program,          # Mandatory input                 e_cutoff=1e-10, cpu_cores=None): # Optional parameters    from subprocess import Popen, PIPE # imports for running external jobs     if not cpu_cores:                  # CPU count parameter not specified    import multiprocessing       cpu_cores = multiprocessing.cpu_count()  # Use all avail. CPU cores    # Make a list of all options and params to run BLAST  # Output format 6 means table output  # Numeric params. are converted to strings with str()    cmd_args = [program, '-outfmt', '6', '-db', database,              '-num_threads', str(cpu_cores),              '-evalue', str(e_cutoff)]      try:      # Try to create a process to run BLAST, but catch any error    proc = Popen(cmd_args, stdin=PIPE, stdout=PIPE)      except Exception as err:         # Something went wrong    print('BLAST command failed')    print('Command used: "%s"' % ' '.join(cmd_args))    print(err)    return []     # Send back an empty list if BLAST failed    template = '>UserQuery\n{}\n'          # FASTA format string template  in_data = template.format(seq.upper()) # Insert the seq in template  in_data.encode('utf-8')                # Removes any unicode chars  out_data, err_data = proc.communicate(in_data) # Send input get output    results_str = out_data.decode('utf-8')    # Interpret output as text    results = []                              # Start with empty results  for line in results_str.split('\n'):      # Split output into lines    data = line.split()                     # Split line into sub-list        if data:                                # If line was not blank       for col in range(3,10):                      data[col] = int(data[col])          # Some columns are integers      for col in (2,10,11):          data[col] = float(data[col])        # Other columns are float       results.append(data)                  # Add row of data to results    return results                            # Send back resultsdb = '/home/ah/database/blast/nrfilt'       # BLAST database: must existsr = get_uniprot_seq_record('P18754')print(sr.seq)blast_hits = blast_search(sr.seq, db, 'blastp') # Run on seq used beforefor hit in blast_hits:              # Go through each row in output list  print(hit[1], hit[10],  hit[11])  # Print some data columns ``` script_18_58.out `[ah@hobbes Documents]\$ python script_18_58.py MSPKRIAKRRSPPADAIPKSKKVKVSHRSHSTEPGLVLTLGQGDVGQLGLGENVMERKKPALVSIPEDVVQAEAGGMHTVCLSKSGQVYSFGCNDEGALGRDTSVEGSEMVPGKVELQEKVVQVSAGDSHTAALTDDGRVFLWGSFRDNNGVIGLLEPMKKSMVPVQVQLDVPVVKVASGNDHLVMLTADGDLYTLGCGEQGQLGRVPELFANRGGRQGLERLLVPKCVMLKSRGSRGHVRFQDAFCGAYFTFAISHEGHVYGFGLSNYHQLGTPGTESCFIPQNLTSFKNSTKSWVGFSGGQHHTVCMDSEGKAYSLGRAEYGRLGLGEGAEEKSIPTLISRLPAVSSVACGASVGYAVTKDGRVFAWGMGTNYQLGTGQDEDAWSPVEMMGKQLENRVVLSVSSGGQHTVLLVKDKEQS(u'gi|4502801|ref|NP_001260.1|', 0.0, 863.0)(u'gi|327199310|ref|NP_001192134.1|', 0.0, 861.0)(u'gi|119628100|gb|EAX07695.1|', 0.0, 860.0)(u'gi|397515820|ref|XP_003828141.1|', 0.0, 858.0)(u'gi|402853628|ref|XP_003891494.1|', 0.0, 856.0)(u'gi|119628097|gb|EAX07692.1|', 0.0, 854.0)(u'gi|114796646|ref|NP_001041660.1|', 0.0, 853.0)(u'gi|402853626|ref|XP_003891493.1|', 0.0, 853.0)(u'gi|544407822|ref|XP_005544289.1|', 0.0, 852.0)(u'gi|119628099|gb|EAX07694.1|', 0.0, 850.0)(u'gi|397515824|ref|XP_003828143.1|', 0.0, 849.0)(u'gi|4389390|pdb|1A12|A', 0.0, 847.0)(u'gi|114796644|ref|NP_001041659.1|', 0.0, 846.0)(u'gi|332245213|ref|XP_003271757.1|', 0.0, 844.0)(u'gi|119628095|gb|EAX07690.1|', 0.0, 844.0)(u'gi|402853634|ref|XP_003891497.1|', 0.0, 844.0)(u'gi|327199318|ref|NP_001192137.1|', 0.0, 843.0)(u'gi|544407820|ref|XP_005544288.1|', 0.0, 843.0)(u'gi|397515822|ref|XP_003828142.1|', 0.0, 843.0)(u'gi|327199324|ref|NP_001192140.1|', 0.0, 843.0)(u'gi|478503167|ref|XP_004425891.1|', 0.0, 842.0)(u'gi|410966597|ref|XP_003989817.1|', 0.0, 840.0)(u'gi|507575528|ref|XP_004671309.1|', 0.0, 838.0)(u'gi|560898119|ref|XP_006175513.1|', 0.0, 837.0)(u'gi|544407818|ref|XP_005544287.1|', 0.0, 837.0)(u'gi|560948741|ref|XP_006196937.1|', 0.0, 837.0)(u'gi|545490248|ref|XP_005617300.1|', 0.0, 837.0)(u'gi|511832853|ref|XP_004741077.1|', 0.0, 837.0)(u'gi|402853632|ref|XP_003891496.1|', 0.0, 837.0)(u'gi|327315350|ref|NP_001192141.1|', 0.0, 836.0)(u'gi|472350030|ref|XP_004394708.1|', 0.0, 835.0)(u'gi|301755120|ref|XP_002913415.1|', 0.0, 835.0)(u'gi|327315353|ref|NP_001192142.1|', 0.0, 833.0)(u'gi|355764243|gb|EHH62271.1|', 0.0, 833.0)(u'gi|545834593|ref|XP_005665197.1|', 0.0, 832.0)(u'gi|403308348|ref|XP_003944627.1|', 0.0, 832.0)(u'gi|478503165|ref|XP_004425890.1|', 0.0, 831.0)(u'gi|478503163|ref|XP_004425889.1|', 0.0, 830.0)(u'gi|548455340|ref|XP_005676827.1|', 0.0, 830.0)(u'gi|545207509|ref|XP_005607066.1|', 0.0, 829.0)(u'gi|426221838|ref|XP_004005113.1|', 0.0, 828.0)(u'gi|157427942|ref|NP_001098878.1|', 0.0, 827.0)(u'gi|545490246|ref|XP_005617299.1|', 0.0, 827.0)(u'gi|560898117|ref|XP_006175512.1|', 0.0, 826.0)(u'gi|471378580|ref|XP_004377091.1|', 0.0, 826.0)(u'gi|465990666|ref|XP_004266634.1|', 0.0, 826.0)(u'gi|348571060|ref|XP_003471314.1|', 0.0, 826.0)(u'gi|560948739|ref|XP_006196936.1|', 0.0, 825.0)(u'gi|395856826|ref|XP_003800819.1|', 0.0, 825.0)(u'gi|507935083|ref|XP_004678748.1|', 0.0, 824.0)(u'gi|478503171|ref|XP_004425893.1|', 0.0, 824.0)(u'gi|410966599|ref|XP_003989818.1|', 0.0, 824.0)(u'gi|545207507|ref|XP_005607065.1|', 0.0, 824.0)(u'gi|472350028|ref|XP_004394707.1|', 0.0, 824.0)(u'gi|504164105|ref|XP_004592234.1|', 0.0, 823.0)(u'gi|507694534|ref|XP_004642946.1|', 0.0, 823.0)(u'gi|344287488|ref|XP_003415485.1|', 0.0, 823.0)(u'gi|14278207|pdb|1I2M|B', 0.0, 822.0)(u'gi|560898115|ref|XP_006175511.1|', 0.0, 821.0)(u'gi|511832847|ref|XP_004741074.1|', 0.0, 821.0)(u'gi|511832845|ref|XP_004741073.1|', 0.0, 821.0)(u'gi|132171|sp|P23800.1|RCC1_MESAU', 0.0, 820.0)(u'gi|560948737|ref|XP_006196935.1|', 0.0, 820.0)(u'gi|472350026|ref|XP_004394706.1|', 0.0, 820.0)(u'gi|335290816|ref|XP_003356291.1|', 0.0, 818.0)(u'gi|533162396|ref|XP_005395013.1|', 0.0, 818.0)(u'gi|548455338|ref|XP_005676826.1|', 0.0, 817.0)(u'gi|507655597|ref|XP_004705111.1|', 0.0, 816.0)(u'gi|426221844|ref|XP_004005116.1|', 0.0, 816.0)(u'gi|512815930|ref|XP_004878182.1|', 0.0, 816.0)(u'gi|431891177|gb|ELK02054.1|', 0.0, 815.0)(u'gi|344245054|gb|EGW01158.1|', 0.0, 815.0)(u'gi|537237358|gb|ERE85933.1|', 0.0, 815.0)(u'gi|488527862|ref|XP_004455742.1|', 0.0, 815.0)(u'gi|528941068|ref|XP_005203264.1|', 0.0, 815.0)(u'gi|507935080|ref|XP_004678747.1|', 0.0, 815.0)(u'gi|354472395|ref|XP_003498425.1|', 0.0, 815.0)(u'gi|505780544|ref|XP_004603574.1|', 0.0, 814.0)(u'gi|148698161|gb|EDL30108.1|', 0.0, 814.0)(u'gi|528768277|gb|EPY87936.1|', 0.0, 814.0)(u'gi|514463590|ref|XP_005004313.1|', 0.0, 813.0)(u'gi|471378578|ref|XP_004377090.1|', 0.0, 813.0)(u'gi|465990664|ref|XP_004266633.1|', 0.0, 813.0)(u'gi|19527092|ref|NP_598639.1|', 0.0, 813.0)(u'gi|556729262|ref|XP_005960349.1|', 0.0, 812.0)(u'gi|148698159|gb|EDL30106.1|', 0.0, 812.0)(u'gi|504164103|ref|XP_004592233.1|', 0.0, 812.0)(u'gi|74213820|dbj|BAE29345.1|', 0.0, 812.0)(u'gi|344287492|ref|XP_003415487.1|', 0.0, 812.0)(u'gi|532063752|ref|XP_005317746.1|', 0.0, 811.0)(u'gi|74139909|dbj|BAE31793.1|', 0.0, 811.0)(u'gi|555967883|ref|XP_005895986.1|', 0.0, 811.0)(u'gi|532063754|ref|XP_005317747.1|', 0.0, 811.0)(u'gi|507694530|ref|XP_004642945.1|', 0.0, 810.0)(u'gi|426221842|ref|XP_004005115.1|', 0.0, 810.0)(u'gi|189491881|ref|NP_001121661.1|', 0.0, 810.0)(u'gi|471378576|ref|XP_004377089.1|', 0.0, 809.0)(u'gi|395856828|ref|XP_003800820.1|', 0.0, 809.0)(u'gi|528941060|ref|XP_005203260.1|', 0.0, 809.0)(u'gi|507935078|ref|XP_004678746.1|', 0.0, 808.0)(u'gi|507935075|ref|XP_004678745.1|', 0.0, 807.0)(u'gi|471378574|ref|XP_004377088.1|', 0.0, 807.0)(u'gi|465990661|ref|XP_004266632.1|', 0.0, 807.0)(u'gi|504164100|ref|XP_004592232.1|', 0.0, 806.0)(u'gi|344287490|ref|XP_003415486.1|', 0.0, 806.0)(u'gi|533162394|ref|XP_005395012.1|', 0.0, 805.0)(u'gi|351695932|gb|EHA98850.1|', 0.0, 805.0)(u'gi|354472397|ref|XP_003498426.1|', 0.0, 805.0)(u'gi|308153287|ref|NP_001184011.1|', 0.0, 804.0)(u'gi|532063756|ref|XP_005317748.1|', 0.0, 804.0)(u'gi|512815922|ref|XP_004878180.1|', 0.0, 804.0)(u'gi|532063750|ref|XP_005317745.1|', 0.0, 803.0)(u'gi|488527860|ref|XP_004455741.1|', 0.0, 803.0)(u'gi|512815918|ref|XP_004878179.1|', 0.0, 803.0)(u'gi|512815914|ref|XP_004878178.1|', 0.0, 803.0)(u'gi|148698160|gb|EDL30107.1|', 0.0, 803.0)(u'gi|505780540|ref|XP_004603573.1|', 0.0, 802.0)(u'gi|354472399|ref|XP_003498427.1|', 0.0, 798.0)(u'gi|488527856|ref|XP_004455739.1|', 0.0, 797.0)(u'gi|505780537|ref|XP_004603572.1|', 0.0, 797.0)(u'gi|505780532|ref|XP_004603571.1|', 0.0, 796.0)(u'gi|558163174|ref|XP_006097311.1|', 0.0, 795.0)(u'gi|532018884|ref|XP_005353185.1|', 0.0, 794.0)(u'gi|545490250|ref|XP_005617301.1|', 0.0, 791.0)(u'gi|281351636|gb|EFB27220.1|', 0.0, 791.0)(u'gi|488527858|ref|XP_004455740.1|', 0.0, 779.0)(u'gi|558163170|ref|XP_006097310.1|', 0.0, 776.0)(u'gi|488527854|ref|XP_004455738.1|', 0.0, 772.0)(u'gi|521031443|gb|EPQ13229.1|', 0.0, 767.0)(u'gi|354475663|ref|XP_003500047.1|', 0.0, 761.0)(u'gi|395730887|ref|XP_003780622.1|', 0.0, 754.0)(u'gi|470628969|ref|XP_004321307.1|', 0.0, 741.0)(u'gi|395521930|ref|XP_003765067.1|', 0.0, 740.0)(u'gi|327199320|ref|NP_001192138.1|', 0.0, 736.0)(u'gi|545834597|ref|XP_005665199.1|', 0.0, 729.0)(u'gi|533162398|ref|XP_005395014.1|', 0.0, 729.0)(u'gi|558153057|ref|XP_006121129.1|', 0.0, 721.0)(u'gi|558153052|ref|XP_006121128.1|', 0.0, 721.0)(u'gi|488527864|ref|XP_004455743.1|', 0.0, 718.0)(u'gi|465966594|gb|EMP31100.1|', 0.0, 715.0)(u'gi|557317496|ref|XP_006031961.1|', 0.0, 709.0)(u'gi|529437161|ref|XP_005238344.1|', 0.0, 706.0)(u'gi|326932912|ref|XP_003212555.1|', 0.0, 696.0)(u'gi|525022195|ref|XP_005058591.1|', 0.0, 695.0)(u'gi|525022197|ref|XP_005058592.1|', 0.0, 694.0)(u'gi|363742391|ref|XP_427366.3|', 0.0, 694.0)(u'gi|524959499|ref|XP_005079698.1|', 0.0, 693.0)(u'gi|543277880|ref|XP_005427132.1|', 0.0, 691.0)(u'gi|543372998|ref|XP_005529718.1|', 0.0, 689.0)(u'gi|542181352|ref|XP_005496408.1|', 0.0, 687.0)(u'gi|327315362|ref|NP_001192146.1|', 0.0, 684.0)(u'gi|543718943|ref|XP_005500705.1|', 0.0, 683.0)(u'gi|62859419|ref|NP_001016101.1|', 0.0, 664.0)(u'gi|115528650|gb|AAI24849.1|', 0.0, 664.0)(u'gi|171846900|gb|AAI61455.1|', 0.0, 663.0)(u'gi|47682592|gb|AAH70867.1|', 0.0, 662.0)(u'gi|48735020|gb|AAH72030.1|', 0.0, 661.0)(u'gi|147900380|ref|NP_001085384.1|', 0.0, 660.0)(u'gi|2134146|pir||I51558', 0.0, 658.0)(u'gi|148227322|ref|NP_001081229.1|', 0.0, 657.0)(u'gi|55742589|ref|NP_998343.1|', 0.0, 637.0)(u'gi|291190112|ref|NP_001167184.1|', 0.0, 629.0)(u'gi|348529242|ref|XP_003452123.1|', 0.0, 620.0)(u'gi|542238103|ref|XP_005456766.1|', 0.0, 615.0)(u'gi|499018444|ref|XP_004560016.1|', 0.0, 610.0)(u'gi|530584815|ref|XP_005286204.1|', 0.0, 609.0)(u'gi|548435391|ref|XP_005744531.1|', 0.0, 607.0)(u'gi|554870555|ref|XP_005945248.1|', 0.0, 607.0)(u'gi|551493873|ref|XP_005798503.1|', 0.0, 604.0)(u'gi|410905257|ref|XP_003966108.1|', 0.0, 604.0)(u'gi|556961668|ref|XP_005991115.1|', 0.0, 596.0)(u'gi|47220834|emb|CAG00041.1|', 0.0, 586.0)(u'gi|537236894|gb|ERE85794.1|', 0.0, 582.0)(u'gi|527248394|ref|XP_005142516.1|', 0.0, 573.0)(u'gi|426328672|ref|XP_004025374.1|', 0.0, 566.0)(u'gi|327289459|ref|XP_003229442.1|', 0.0, 563.0)(u'gi|297282712|ref|XP_002808329.1|', 0.0, 558.0)(u'gi|512815934|ref|XP_004878183.1|', 0.0, 558.0)(u'gi|355562421|gb|EHH19015.1|', 1e-159, 466.0)(u'gi|556103713|gb|ESO92365.1|', 6e-148, 438.0)(u'gi|524898165|ref|XP_005105535.1|', 3e-147, 437.0)(u'gi|260831025|ref|XP_002610460.1|', 2e-142, 432.0)(u'gi|521035504|gb|EPQ17286.1|', 8e-142, 420.0)(u'gi|224167221|ref|XP_002198843.1|', 7e-141, 414.0)(u'gi|449267782|gb|EMC78684.1|', 4e-140, 412.0)(u'gi|354475661|ref|XP_003500046.1|', 2e-139, 409.0)(u'gi|432951688|ref|XP_004084886.1|', 9e-139, 409.0)(u'gi|432951688|ref|XP_004084886.1|', 2e-14, 81.6)(u'gi|146332865|gb|ABQ22938.1|', 3e-136, 400.0)(u'gi|390345132|ref|XP_001186731.2|', 4e-136, 408.0)(u'gi|405976795|gb|EKC41279.1|', 2e-135, 407.0)(u'gi|443686928|gb|ELT90046.1|', 1e-133, 403.0)(u'gi|321478568|gb|EFX89525.1|', 1e-132, 399.0)(u'gi|345492280|ref|XP_001603331.2|', 2e-126, 383.0)(u'gi|327315376|ref|NP_001192151.1|', 3e-126, 385.0)(u'gi|221108961|ref|XP_002157481.1|', 1e-122, 374.0)(u'gi|241750184|ref|XP_002405888.1|', 3e-122, 372.0)(u'gi|332031143|gb|EGI70720.1|', 5e-122, 372.0)(u'gi|322786159|gb|EFZ12764.1|', 7e-122, 374.0)(u'gi|307173133|gb|EFN64233.1|', 2e-121, 373.0)(u'gi|383865068|ref|XP_003707997.1|', 2e-120, 370.0)(u'gi|312374979|gb|EFR22435.1|', 8e-120, 367.0)(u'gi|327315378|ref|NP_001192152.1|', 9e-119, 364.0)(u'gi|380022175|ref|XP_003694928.1|', 1e-118, 363.0)(u'gi|156361074|ref|XP_001625345.1|', 2e-118, 363.0)(u'gi|402880198|ref|XP_003903699.1|', 8e-118, 357.0)(u'gi|402880198|ref|XP_003903699.1|', 3e-64, 219.0)(u'gi|328792337|ref|XP_003251711.1|', 2e-117, 362.0)(u'gi|350419275|ref|XP_003492127.1|', 2e-115, 357.0)(u'gi|340709117|ref|XP_003393160.1|', 9e-115, 355.0)(u'gi|478250372|gb|ENN70867.1|', 3e-114, 352.0)(u'gi|189235771|ref|XP_970173.2|', 2e-112, 348.0)(u'gi|260795414|ref|XP_002592700.1|', 6e-112, 345.0)(u'gi|196008365|ref|XP_002114048.1|', 2e-109, 340.0)(u'gi|498964755|ref|XP_004525461.1|', 2e-109, 342.0)(u'gi|307213459|gb|EFN88881.1|', 2e-109, 338.0)(u'gi|170053955|ref|XP_001862909.1|', 8e-109, 340.0)(u'gi|157116974|ref|XP_001652914.1|', 5e-108, 339.0)(u'gi|347970520|ref|XP_310274.7|', 2e-107, 338.0)(u'gi|555703651|gb|ESO06884.1|', 3e-105, 328.0)(u'gi|557782445|ref|XP_005190835.1|', 9e-105, 331.0)(u'gi|313213144|emb|CBY37001.1|', 2e-102, 326.0)(u'gi|347970522|ref|XP_003436591.1|', 6e-102, 322.0)(u'gi|512893358|ref|XP_004923161.1|', 7e-100, 319.0)(u'gi|195020061|ref|XP_001985112.1|', 4e-98, 313.0)(u'gi|195127239|ref|XP_002008076.1|', 8e-98, 312.0)(u'gi|195376815|ref|XP_002047188.1|', 1e-97, 312.0)(u'gi|195492131|ref|XP_002093858.1|', 9e-96, 308.0)(u'gi|194867327|ref|XP_001972047.1|', 1e-95, 307.0)(u'gi|304445931|pdb|3MVD|K', 1e-95, 304.0)(u'gi|194750136|ref|XP_001957486.1|', 1e-95, 306.0)(u'gi|195337773|ref|XP_002035500.1|', 1e-94, 305.0)(u'gi|195588086|ref|XP_002083789.1|', 2e-94, 306.0)(u'gi|7660|emb|CAA41417.1|', 2e-94, 305.0)(u'gi|18859625|ref|NP_523943.1|', 2e-94, 305.0)(u'gi|340370015|ref|XP_003383542.1|', 2e-94, 301.0)(u'gi|195431549|ref|XP_002063799.1|', 2e-93, 300.0)(u'gi|195428731|ref|XP_002062419.1|', 7e-93, 299.0)(u'gi|555937498|emb|CDJ14619.1|', 2e-92, 303.0)(u'gi|344242290|gb|EGV98393.1|', 5e-92, 286.0)(u'gi|198455071|ref|XP_002138000.1|', 8e-90, 291.0)(u'gi|195157504|ref|XP_002019636.1|', 1e-89, 290.0)(u'gi|195174428|ref|XP_002027976.1|', 2e-89, 291.0)(u'gi|556516705|emb|CDJ19951.1|', 3e-89, 295.0)(u'gi|125977322|ref|XP_001352694.1|', 3e-89, 291.0)(u'gi|357626115|gb|EHJ76321.1|', 9e-89, 291.0)(u'gi|167534172|ref|XP_001748764.1|', 6e-85, 278.0)(u'gi|555689788|gb|ESN93020.1|', 4e-83, 271.0)(u'gi|242021219|ref|XP_002431043.1|', 2e-82, 268.0)(u'gi|391339176|ref|XP_003743928.1|', 1e-78, 267.0)(u'gi|358337819|dbj|GAA56144.1|', 2e-78, 270.0)(u'gi|426328674|ref|XP_004025375.1|', 3e-78, 250.0)(u'gi|470305350|ref|XP_004348307.1|', 4e-78, 261.0)(u'gi|195174458|ref|XP_002027991.1|', 3e-74, 248.0)(u'gi|391347935|ref|XP_003748209.1|', 2e-71, 241.0)(u'gi|328766889|gb|EGF76941.1|', 1e-66, 230.0)(u'gi|387598185|gb|AFJ91748.1|', 1e-65, 219.0)(u'gi|545369609|ref|XP_005649542.1|', 4e-65, 233.0)(u'gi|511003208|gb|EPB84619.1|', 2e-63, 221.0)(u'gi|198463144|ref|XP_002135445.1|', 4e-63, 217.0)(u'gi|514699980|ref|XP_004997558.1|', 8e-61, 214.0)(u'gi|291222248|ref|XP_002731129.1|', 1e-60, 204.0)(u'gi|389642071|ref|XP_003718668.1|', 6e-60, 217.0)(u'gi|432949840|ref|XP_004084284.1|', 3e-59, 200.0)(u'gi|384489724|gb|EIE80946.1|', 2e-58, 207.0)(u'gi|303270823|ref|XP_003054773.1|', 1e-57, 208.0)(u'gi|550805201|gb|ERS97118.1|', 2e-56, 206.0)(u'gi|402083663|gb|EJT78681.1|', 8e-56, 205.0)(u'gi|328712488|ref|XP_001950480.2|', 2e-55, 202.0)(u'gi|500260767|gb|EOO03290.1|', 3e-55, 203.0)(u'gi|475675995|gb|EMT73038.1|', 2e-54, 198.0)(u'gi|346321525|gb|EGX91124.1|', 4e-54, 198.0)(u'gi|340518879|gb|EGR49119.1|', 1e-53, 197.0)(u'gi|46127591|ref|XP_388349.1|', 2e-53, 197.0)(u'gi|340939343|gb|EGS19965.1|', 2e-53, 198.0)(u'gi|346972722|gb|EGY16174.1|', 3e-53, 198.0)(u'gi|302885386|ref|XP_003041585.1|', 3e-53, 197.0)(u'gi|255069897|ref|XP_002507030.1|', 4e-53, 194.0)(u'gi|342873514|gb|EGU75680.1|', 6e-53, 196.0)(u'gi|477508915|gb|ENH62208.1|', 7e-53, 196.0)(u'gi|348685793|gb|EGZ25608.1|', 7e-53, 195.0)(u'gi|408389507|gb|EKJ68954.1|', 1e-52, 196.0)(u'gi|384501769|gb|EIE92260.1|', 1e-52, 192.0)(u'gi|400597595|gb|EJP65325.1|', 1e-52, 194.0)(u'gi|19112818|ref|NP_596026.1|', 2e-52, 194.0)(u'gi|430813410|emb|CCJ29231.1|', 2e-52, 193.0)(u'gi|320588268|gb|EFX00743.1|', 5e-52, 194.0)(u'gi|336270150|ref|XP_003349834.1|', 5e-52, 194.0)(u'gi|552812683|ref|XP_005843760.1|', 5e-52, 197.0)(u'gi|452977561|gb|EME77327.1|', 9e-52, 191.0)(u'gi|145344573|ref|XP_001416804.1|', 1e-51, 192.0)(u'gi|512191448|gb|EPE07209.1|', 2e-51, 193.0)(u'gi|358394418|gb|EHK43811.1|', 4e-51, 191.0)(u'gi|485925292|gb|EOD49915.1|', 4e-51, 191.0)(u'gi|517319687|emb|CCT70497.1|', 4e-51, 191.0)(u'gi|477513564|gb|ENH66050.1|', 8e-51, 196.0)(u'gi|358385781|gb|EHK23377.1|', 1e-50, 190.0)(u'gi|367040685|ref|XP_003650723.1|', 3e-50, 189.0)(u'gi|407927258|gb|EKG20156.1|', 4e-50, 188.0)(u'gi|85097594|ref|XP_960475.1|', 8e-50, 188.0)(u'gi|336466175|gb|EGO54340.1|', 9e-50, 188.0)(u'gi|470319153|gb|EMR10777.1|', 9e-50, 186.0)(u'gi|258577767|ref|XP_002543065.1|', 2e-49, 186.0)(u'gi|116201941|ref|XP_001226782.1|', 3e-49, 186.0)(u'gi|521729061|gb|EPQ59146.1|', 3e-49, 184.0)(u'gi|171681784|ref|XP_001905835.1|', 6e-49, 185.0)(u'gi|449297034|gb|EMC93053.1|', 8e-49, 184.0)(u'gi|528317086|gb|EPY51662.1|', 9e-49, 184.0)(u'gi|471565888|gb|EMR66351.1|', 1e-48, 183.0)(u'gi|530741911|gb|EQC40153.1|', 2e-48, 182.0)(u'gi|303320911|ref|XP_003070450.1|', 3e-48, 183.0)(u'gi|367030153|ref|XP_003664360.1|', 3e-48, 183.0)(u'gi|531862984|gb|EQL00397.1|', 3e-48, 184.0)(u'gi|531862984|gb|EQL00397.1|', 6e-11, 73.6)(u'gi|50543738|ref|XP_500035.1|', 7e-48, 181.0)(u'gi|320033050|gb|EFW14999.1|', 9e-48, 182.0)(u'gi|406602412|emb|CCH46028.1|', 1e-47, 179.0)(u'gi|308802331|ref|XP_003078479.1|', 1e-47, 181.0)(u'gi|119179563|ref|XP_001241353.1|', 2e-47, 181.0)(u'gi|528060942|gb|EPX70506.1|', 2e-47, 181.0)(u'gi|526199565|gb|EPS40434.1|', 2e-47, 181.0)(u'gi|301111610|ref|XP_002904884.1|', 2e-47, 180.0)(u'gi|320583511|gb|EFW97724.1|', 3e-47, 179.0)(u'gi|385304742|gb|EIF48748.1|', 5e-47, 176.0)(u'gi|326470311|gb|EGD94320.1|', 9e-47, 179.0)(u'gi|384501453|gb|EIE91944.1|', 1e-46, 176.0)(u'gi|449549966|gb|EMD40931.1|', 2e-46, 177.0)(u'gi|302507508|ref|XP_003015715.1|', 2e-46, 178.0)(u'gi|325190927|emb|CCA25412.1|', 2e-46, 177.0)(u'gi|325190927|emb|CCA25412.1|', 3e-11, 74.3)(u'gi|539432770|gb|ERF69624.1|', 2e-46, 178.0)(u'gi|327292499|ref|XP_003230948.1|', 3e-46, 177.0)(u'gi|213409582|ref|XP_002175561.1|', 3e-46, 177.0)(u'gi|390601153|gb|EIN10547.1|', 4e-46, 177.0)(u'gi|254574068|ref|XP_002494143.1|', 5e-46, 175.0)(u'gi|169848433|ref|XP_001830924.1|', 5e-46, 176.0)(u'gi|527294507|gb|EPS95366.1|', 5e-46, 174.0)(u'gi|440639174|gb|ELR09093.1|', 7e-46, 176.0)(u'gi|451996623|gb|EMD89089.1|', 8e-46, 177.0)(u'gi|429859855|gb|ELA34615.1|', 9e-46, 177.0)(u'gi|530472733|gb|EQB53285.1|', 1e-45, 176.0)(u'gi|451847555|gb|EMD60862.1|', 2e-45, 176.0)(u'gi|296819631|ref|XP_002849878.1|', 2e-45, 175.0)(u'gi|5478530|gb|AAD43920.1|AF130441_1', 2e-45, 172.0)(u'gi|5478530|gb|AAD43920.1|AF130441_1', 1e-22, 107.0)(u'gi|310799740|gb|EFQ34633.1|', 2e-45, 176.0)(u'gi|315040529|ref|XP_003169642.1|', 3e-45, 174.0)(u'gi|326481150|gb|EGE05160.1|', 4e-45, 174.0)(u'gi|557090874|gb|ESQ31521.1|', 4e-45, 172.0)(u'gi|557090874|gb|ESQ31521.1|', 8e-23, 108.0)(u'gi|452839962|gb|EME41901.1|', 4e-45, 174.0)(u'gi|482805757|gb|EOA82843.1|', 6e-45, 174.0)(u'gi|312282233|dbj|BAJ33982.1|', 7e-45, 171.0)(u'gi|312282233|dbj|BAJ33982.1|', 1e-22, 107.0)(u'gi|453083534|gb|EMF11580.1|', 7e-45, 174.0)(u'gi|379318630|pdb|4DNU|A', 1e-44, 169.0)(u'gi|379318630|pdb|4DNU|A', 7e-25, 113.0)(u'gi|379318635|pdb|4DNW|A', 2e-44, 169.0)(u'gi|379318635|pdb|4DNW|A', 6e-24, 110.0)(u'gi|297797389|ref|XP_002866579.1|', 2e-44, 170.0)(u'gi|297797389|ref|XP_002866579.1|', 9e-24, 110.0)(u'gi|501755350|emb|CCG81367.1|', 3e-44, 171.0)(u'gi|554913264|gb|ESK94478.1|', 4e-44, 171.0)(u'gi|383280368|pdb|4D9S|A', 4e-44, 169.0)(u'gi|383280368|pdb|4D9S|A', 1e-23, 110.0)(u'gi|309366092|emb|CAP21957.2|', 4e-44, 171.0)(u'gi|336373725|gb|EGO02063.1|', 4e-44, 171.0)(u'gi|379318631|pdb|4DNV|A', 5e-44, 167.0)(u'gi|379318631|pdb|4DNV|A', 8e-24, 110.0)(u'gi|268530278|ref|XP_002630265.1|', 5e-44, 171.0)(u'gi|15237540|ref|NP_201191.1|', 6e-44, 169.0)(u'gi|15237540|ref|NP_201191.1|', 1e-23, 110.0)(u'gi|159488781|ref|XP_001702381.1|', 6e-44, 171.0)(u'gi|345563579|gb|EGX46567.1|', 6e-44, 171.0)(u'gi|393215520|gb|EJD01011.1|', 7e-44, 168.0)(u'gi|380473989|emb|CCF46021.1|', 7e-44, 171.0)(u'gi|380473989|emb|CCF46021.1|', 5e-11, 73.9)(u'gi|341896411|gb|EGT52346.1|', 9e-44, 171.0)(u'gi|341877072|gb|EGT33007.1|', 9e-44, 171.0)(u'gi|523422630|emb|CDF88905.1|', 1e-43, 169.0)(u'gi|482549182|gb|EOA13376.1|', 1e-43, 168.0)(u'gi|482549182|gb|EOA13376.1|', 2e-23, 110.0)(u'gi|448119392|ref|XP_004203719.1|', 3e-43, 168.0)(u'gi|552924917|gb|ESA08835.1|', 3e-43, 167.0)(u'gi|399218366|emb|CCF75253.1|', 4e-43, 169.0)(u'gi|17532079|ref|NP_495753.1|', 4e-43, 168.0)(u'gi|477531419|gb|ENH83126.1|', 6e-43, 169.0)(u'gi|448116941|ref|XP_004203136.1|', 6e-43, 167.0)(u'gi|426192396|gb|EKV42332.1|', 1e-42, 167.0)(u'gi|302417714|ref|XP_003006688.1|', 2e-42, 166.0)(u'gi|398404986|ref|XP_003853959.1|', 3e-42, 166.0)(u'gi|325092204|gb|EGC45514.1|', 3e-42, 166.0)(u'gi|330925699|ref|XP_003301155.1|', 3e-42, 166.0)(u'gi|396081299|gb|AFN82917.1|', 4e-42, 162.0)(u'gi|121714259|ref|XP_001274740.1|', 4e-42, 165.0)(u'gi|409079646|gb|EKM80007.1|', 5e-42, 166.0)(u'gi|242820402|ref|XP_002487503.1|', 5e-42, 165.0)(u'gi|308509404|ref|XP_003116885.1|', 5e-42, 165.0)(u'gi|367005462|ref|XP_003687463.1|', 6e-42, 164.0)(u'gi|322699843|gb|EFY91602.1|', 6e-42, 165.0)(u'gi|169610477|ref|XP_001798657.1|', 6e-42, 165.0)(u'gi|242208551|ref|XP_002470126.1|', 8e-42, 163.0)(u'gi|242208551|ref|XP_002470126.1|', 4e-11, 73.9)(u'gi|255089408|ref|XP_002506626.1|', 1e-41, 162.0)(u'gi|302832638|ref|XP_002947883.1|', 2e-41, 161.0)(u'gi|367039539|ref|XP_003650150.1|', 2e-41, 163.0)(u'gi|344231377|gb|EGV63259.1|', 2e-41, 162.0)(u'gi|402217324|gb|EJT97405.1|', 2e-41, 162.0)(u'gi|294654461|ref|XP_456522.2|', 3e-41, 162.0)(u'gi|409049848|gb|EKM59325.1|', 3e-41, 163.0)(u'gi|409081255|gb|EKM81614.1|', 3e-41, 161.0)(u'gi|392512630|emb|CAD25339.2|', 3e-41, 159.0)(u'gi|392512630|emb|CAD25339.2|', 3e-13, 79.7)(u'gi|392512630|emb|CAD25339.2|', 2e-11, 73.9)(u'gi|528292911|emb|CCU81601.1|', 3e-41, 163.0)(u'gi|19074229|ref|NP_584835.1|', 3e-41, 161.0)(u'gi|19074229|ref|NP_584835.1|', 5e-13, 79.7)(u'gi|19074229|ref|NP_584835.1|', 2e-11, 74.3)(u'gi|399169568|emb|CCE29551.1|', 4e-41, 163.0)(u'gi|310795953|gb|EFQ31414.1|', 8e-41, 160.0)(u'gi|552926495|gb|ESA10399.1|', 8e-41, 160.0)(u'gi|255563857|ref|XP_002522929.1|', 9e-41, 160.0)(u'gi|255563857|ref|XP_002522929.1|', 1e-19, 99.0)(u'gi|358372284|dbj|GAA88888.1|', 1e-40, 162.0)(u'gi|494827921|gb|EON64817.1|', 1e-40, 162.0)(u'gi|154273022|ref|XP_001537363.1|', 1e-40, 161.0)(u'gi|270004502|gb|EFA00950.1|', 1e-40, 152.0)(u'gi|378732654|gb|EHY59113.1|', 2e-40, 161.0)(u'gi|557721580|dbj|GAD99570.1|', 2e-40, 160.0)(u'gi|392593511|gb|EIW82836.1|', 2e-40, 159.0)(u'gi|302696855|ref|XP_003038106.1|', 2e-40, 159.0)(u'gi|401826164|ref|XP_003887176.1|', 2e-40, 157.0)(u'gi|241957479|ref|XP_002421459.1|', 2e-40, 160.0)(u'gi|303389130|ref|XP_003072798.1|', 2e-40, 157.0)(u'gi|449328953|gb|AGE95228.1|', 3e-40, 159.0)(u'gi|449328953|gb|AGE95228.1|', 5e-13, 79.3)(u'gi|449328953|gb|AGE95228.1|', 7e-11, 73.2)(u'gi|212545723|ref|XP_002153015.1|', 4e-40, 160.0)(u'gi|238883419|gb|EEQ47057.1|', 4e-40, 159.0)(u'gi|459366611|gb|EMG45917.1|', 4e-40, 159.0)(u'gi|549045443|emb|CCX33475.1|', 5e-40, 160.0)(u'gi|3850148|emb|CAA21948.1|', 7e-40, 158.0)(u'gi|12001926|gb|AAG43106.1|AF049867_1', 8e-40, 158.0)(u'gi|77022888|ref|XP_888888.1|', 9e-40, 158.0)(u'gi|322702426|gb|EFY94076.1|', 9e-40, 159.0)(u'gi|261188107|ref|XP_002620470.1|', 1e-39, 158.0)(u'gi|255731754|ref|XP_002550801.1|', 1e-39, 158.0)(u'gi|350639170|gb|EHA27524.1|', 1e-39, 158.0)(u'gi|239609087|gb|EEQ86074.1|', 1e-39, 158.0)(u'gi|390600679|gb|EIN10074.1|', 1e-39, 157.0)(u'gi|531980384|gb|EQL30971.1|', 2e-39, 158.0)(u'gi|317027708|ref|XP_001399882.2|', 2e-39, 157.0)(u'gi|145248596|ref|XP_001400637.1|', 2e-39, 158.0)(u'gi|350634707|gb|EHA23069.1|', 2e-39, 157.0)(u'gi|225555094|gb|EEH03387.1|', 2e-39, 157.0)(u'gi|557534224|gb|ESR45342.1|', 2e-39, 156.0)(u'gi|557534224|gb|ESR45342.1|', 8e-21, 102.0)(u'gi|156848822|ref|XP_001647292.1|', 3e-39, 157.0)(u'gi|170090772|ref|XP_001876608.1|', 3e-39, 156.0)(u'gi|296424151|ref|XP_002841613.1|', 3e-39, 156.0)(u'gi|403353979|gb|EJY76535.1|', 3e-39, 159.0)(u'gi|403353979|gb|EJY76535.1|', 7e-14, 82.8)(u'gi|403353979|gb|EJY76535.1|', 2e-13, 81.6)(u'gi|380490700|emb|CCF35832.1|', 4e-39, 157.0)(u'gi|323348646|gb|EGA82889.1|', 4e-39, 156.0)(u'gi|548850227|gb|ERN08779.1|', 4e-39, 155.0)(u'gi|548850227|gb|ERN08779.1|', 2e-18, 95.9)(u'gi|548850227|gb|ERN08779.1|', 2e-18, 95.5)(u'gi|349578129|dbj|GAA23295.1|', 5e-39, 156.0)(u'gi|151943711|gb|EDN62021.1|', 5e-39, 156.0)(u'gi|323337589|gb|EGA78834.1|', 6e-39, 155.0)(u'gi|323309055|gb|EGA62283.1|', 7e-39, 155.0)(u'gi|389743006|gb|EIM84191.1|', 8e-39, 155.0)(u'gi|296082603|emb|CBI21608.3|', 9e-39, 154.0)(u'gi|296082603|emb|CBI21608.3|', 9e-18, 94.0)(u'gi|557534225|gb|ESR45343.1|', 9e-39, 154.0)(u'gi|557534225|gb|ESR45343.1|', 1e-21, 105.0)(u'gi|557534225|gb|ESR45343.1|', 8e-13, 79.0)(u'gi|365765541|gb|EHN07048.1|', 1e-38, 155.0)(u'gi|6321341|ref|NP_011418.1|', 1e-38, 155.0)(u'gi|145545147|ref|XP_001458258.1|', 1e-38, 155.0)(u'gi|134056804|emb|CAK37712.1|', 1e-38, 156.0)(u'gi|526118004|ref|NP_001267915.1|', 2e-38, 154.0)(u'gi|526118004|ref|NP_001267915.1|', 1e-16, 90.5)(u'gi|500253801|gb|EON97448.1|', 2e-38, 154.0)(u'gi|126134081|ref|XP_001383565.1|', 2e-38, 154.0)(u'gi|326634388|pdb|3OF7|A', 3e-38, 154.0)(u'gi|531862981|gb|EQL00394.1|', 3e-38, 154.0)(u'gi|363749193|ref|XP_003644814.1|', 3e-38, 154.0)(u'gi|255719666|ref|XP_002556113.1|', 3e-38, 154.0)(u'gi|94469124|gb|ABF18411.1|', 3e-38, 147.0)(u'gi|50310715|ref|XP_455379.1|', 3e-38, 153.0)(u'gi|170575713|ref|XP_001893354.1|', 3e-38, 154.0)(u'gi|550334138|gb|EEE90389.2|', 4e-38, 152.0)(u'gi|550334138|gb|EEE90389.2|', 7e-20, 99.8)(u'gi|550334138|gb|EEE90389.2|', 2e-16, 90.1)(u'gi|224093583|ref|XP_002309939.1|', 4e-38, 152.0)(u'gi|224093583|ref|XP_002309939.1|', 7e-20, 99.8)(u'gi|224093583|ref|XP_002309939.1|', 2e-16, 89.7)(u'gi|557726493|dbj|GAD94871.1|', 5e-38, 154.0)(u'gi|254586731|ref|XP_002498933.1|', 5e-38, 153.0)(u'gi|523780116|gb|EPR79468.1|', 5e-38, 150.0)(u'gi|401625749|gb|EJS43742.1|', 6e-38, 153.0)(u'gi|226495309|ref|NP_001141147.1|', 6e-38, 152.0)(u'gi|226495309|ref|NP_001141147.1|', 9e-11, 72.8)(u'gi|242073146|ref|XP_002446509.1|', 8e-38, 152.0)(u'gi|242073146|ref|XP_002446509.1|', 4e-20, 100.0)(u'gi|190344431|gb|EDK36104.2|', 9e-38, 152.0)(u'gi|514801519|ref|XP_004975663.1|', 2e-37, 151.0)(u'gi|514801519|ref|XP_004975663.1|', 5e-19, 97.4)(u'gi|410074851|ref|XP_003955008.1|', 2e-37, 152.0)(u'gi|429850121|gb|ELA25421.1|', 2e-37, 150.0)(u'gi|342867548|gb|EGU72488.1|', 2e-37, 149.0)(u'gi|322701288|gb|EFY93038.1|', 2e-37, 152.0)(u'gi|514801517|ref|XP_004975662.1|', 2e-37, 150.0)(u'gi|514801517|ref|XP_004975662.1|', 6e-19, 97.4)(u'gi|119486678|ref|XP_001262325.1|', 2e-37, 152.0)(u'gi|402590741|gb|EJW84671.1|', 2e-37, 152.0)(u'gi|393905325|gb|EJD73936.1|', 2e-37, 152.0)(u'gi|512203121|gb|EPE31946.1|', 3e-37, 151.0)(u'gi|471874627|emb|CCO35985.1|', 4e-37, 150.0)(u'gi|346327098|gb|EGX96694.1|', 4e-37, 151.0)(u'gi|444318081|ref|XP_004179698.1|', 4e-37, 150.0)(u'gi|448536004|ref|XP_003871048.1|', 4e-37, 150.0)(u'gi|168052711|ref|XP_001778783.1|', 5e-37, 150.0)(u'gi|168052711|ref|XP_001778783.1|', 4e-22, 106.0)(u'gi|326518338|dbj|BAJ88198.1|', 5e-37, 149.0)(u'gi|326518338|dbj|BAJ88198.1|', 9e-21, 102.0)(u'gi|146421762|ref|XP_001486825.1|', 5e-37, 149.0)(u'gi|145504819|ref|XP_001438376.1|', 6e-37, 150.0)(u'gi|444313687|ref|XP_004177501.1|', 6e-37, 150.0)(u'gi|168008673|ref|XP_001757031.1|', 7e-37, 149.0)(u'gi|168008673|ref|XP_001757031.1|', 6e-22, 105.0)(u'gi|260950135|ref|XP_002619364.1|', 7e-37, 150.0)(u'gi|154297017|ref|XP_001548937.1|', 1e-36, 150.0)(u'gi|365760753|gb|EHN02449.1|', 1e-36, 149.0)(u'gi|344299635|gb|EGW29988.1|', 1e-36, 149.0)(u'gi|361125476|gb|EHK97517.1|', 1e-36, 149.0)(u'gi|545365640|ref|XP_005647780.1|', 1e-36, 148.0)(u'gi|545365640|ref|XP_005647780.1|', 4e-17, 92.0)(u'gi|545365640|ref|XP_005647780.1|', 1e-16, 90.1)(u'gi|545365640|ref|XP_005647780.1|', 3e-14, 83.2)(u'gi|124088041|ref|XP_001346941.1|', 1e-36, 149.0)(u'gi|462407499|gb|EMJ12833.1|', 1e-36, 148.0)(u'gi|462407499|gb|EMJ12833.1|', 1e-17, 93.2)(u'gi|358389183|gb|EHK26775.1|', 2e-36, 147.0)(u'gi|429964945|gb|ELA46942.1|', 2e-36, 146.0)[ah@hobbes Documents]\$ `

## Script 18.59

 script_18_59.py ```# Script 18.59from pandas import DataFrame, Series, ExcelWriterfrom matplotlib import pyplotdef download_db_id(db_id, url, file_name=None): # Save file is optional  #from urllib.request import urlopen    # python 3.0 Import urlopen() function    from urllib2 import urlopen           # python 2.7 Import urlopen() function    response = urlopen(url.format(db_id))  # Makes object to handle link  data = response.read()                 # Read data from URL  data = data.decode('utf-8')            # Interpret data as plain text    if file_name:                       # If save file was specified...    file_obj = open(file_name, 'w')   # Create file object for writing    file_obj.write(data)              # Save all data to file    file_obj.close()  return data                         # Hand back data from functiondef blast_search(seq, database, program,          # Mandatory input                 e_cutoff=1e-10, cpu_cores=None): # Optional parameters    from subprocess import Popen, PIPE # imports for running external jobs     if not cpu_cores:                  # CPU count parameter not specified    import multiprocessing       cpu_cores = multiprocessing.cpu_count()  # Use all avail. CPU cores    # Make a list of all options and params to run BLAST  # Output format 6 means table output  # Numeric params. are converted to strings with str()    cmd_args = [program, '-outfmt', '6', '-db', database,              '-num_threads', str(cpu_cores),              '-evalue', str(e_cutoff)]      try:      # Try to create a process to run BLAST, but catch any error    proc = Popen(cmd_args, stdin=PIPE, stdout=PIPE)      except Exception as err:         # Something went wrong    print('BLAST command failed')    print('Command used: "%s"' % ' '.join(cmd_args))    print(err)    return []     # Send back an empty list if BLAST failed    template = '>UserQuery\n{}\n'          # FASTA format string template  in_data = template.format(seq.upper()) # Insert the seq in template  in_data.encode('utf-8')                # Removes any unicode chars  out_data, err_data = proc.communicate(in_data) # Send input get output    results_str = out_data.decode('utf-8')    # Interpret output as text    results = []                              # Start with empty results  for line in results_str.split('\n'):      # Split output into lines    data = line.split()                     # Split line into sub-list        if data:                                # If line was not blank       for col in range(3,10):                      data[col] = int(data[col])          # Some columns are integers      for col in (2,10,11):          data[col] = float(data[col])        # Other columns are float       results.append(data)                  # Add row of data to results    return results                            # Send back resultsdef get_uniprot_seq_record(db_id):   import os   from Bio import SeqIO   UNIPROT_FASTA_URL = 'http://www.uniprot.org/uniprot/{}.fasta'   file_name = db_id + '.fasta'  # Add extension to ID to make file name      if not os.path.exists(file_name):     download_db_id(db_id, UNIPROT_FASTA_URL, file_name)      file_obj = open(file_name)    # Open file for reading      for seq_record in SeqIO.parse(file_obj, 'fasta'):     return seq_record           # Give back first record encounteredcols = ('query_id', 'subject_id', '%_identity', 'alignment_length',          'mismatches', 'gap_opens', 'q_start', 'q_end', 's_start',        's_end', 'evalue', 'bit_score')   # Column/key namesdb = '/home/ah/database/blast/nrfilt'     # BLAST database: must existsr = get_uniprot_seq_record('P18754')print(sr.seq)hits = blast_search(sr.seq, db, 'blastp') # Run on seq used beforedata_set = DataFrame(hits, columns=cols)  # Convert to Pandas tableprint(data_set.dtypes)                # Show data types of the columnscolumn = data_set['%_identity']                  # Fetch a columnprint(column.min(), column.mean(), column.max()) # numpy-like functionsrows = data_set['%_identity'] > 75    # Conditionally select indicesprint(data_set[rows])                 # Show selected indices  # Create a new column from existing column seriesdata_set['offset'] = Series(data_set['q_start']-data_set['s_start'])data_set.to_csv('BLAST_results.csv', '\t') # Save as tab-separated fileexcel_writer = ExcelWriter('BLAST_results.xlsx')       # Write to Exceldata_set.to_excel(excel_writer, sheet_name='Sheet1')   # spreadsheetexcel_writer.save()# Plot a graph directly from a column Series (uses matplotlib)ax = column.plot('hist', bins=50, title='Seq identity distribution')ax.set_xlabel('% Identity') # Set axis labelsax.set_ylabel('Count')pyplot.show()               # Display graphics in screen ``` script_18_59.out `[ah@hobbes Documents]\$ python script_18_59.py MSPKRIAKRRSPPADAIPKSKKVKVSHRSHSTEPGLVLTLGQGDVGQLGLGENVMERKKPALVSIPEDVVQAEAGGMHTVCLSKSGQVYSFGCNDEGALGRDTSVEGSEMVPGKVELQEKVVQVSAGDSHTAALTDDGRVFLWGSFRDNNGVIGLLEPMKKSMVPVQVQLDVPVVKVASGNDHLVMLTADGDLYTLGCGEQGQLGRVPELFANRGGRQGLERLLVPKCVMLKSRGSRGHVRFQDAFCGAYFTFAISHEGHVYGFGLSNYHQLGTPGTESCFIPQNLTSFKNSTKSWVGFSGGQHHTVCMDSEGKAYSLGRAEYGRLGLGEGAEEKSIPTLISRLPAVSSVACGASVGYAVTKDGRVFAWGMGTNYQLGTGQDEDAWSPVEMMGKQLENRVVLSVSSGGQHTVLLVKDKEQSquery_id             objectsubject_id           object%_identity          float64alignment_length      int64mismatches            int64gap_opens             int64q_start               int64q_end                 int64s_start               int64s_end                 int64evalue              float64bit_score           float64dtype: object(24.789999999999999, 52.088226691042053, 100.0)      query_id                        subject_id  %_identity  \0    UserQuery       gi|4502801|ref|NP_001260.1|      100.00   1    UserQuery  gi|327199310|ref|NP_001192134.1|       99.76   2    UserQuery       gi|119628100|gb|EAX07695.1|      100.00   3    UserQuery  gi|397515820|ref|XP_003828141.1|       99.76   4    UserQuery  gi|402853628|ref|XP_003891494.1|       99.05   5    UserQuery       gi|119628097|gb|EAX07692.1|       96.12   6    UserQuery  gi|114796646|ref|NP_001041660.1|       96.12   7    UserQuery  gi|402853626|ref|XP_003891493.1|       99.05   8    UserQuery  gi|544407822|ref|XP_005544289.1|       99.05   9    UserQuery       gi|119628099|gb|EAX07694.1|       96.12   10   UserQuery  gi|397515824|ref|XP_003828143.1|       95.89   11   UserQuery             gi|4389390|pdb|1A12|A      100.00   12   UserQuery  gi|114796644|ref|NP_001041659.1|       93.14   13   UserQuery  gi|332245213|ref|XP_003271757.1|       94.98   14   UserQuery       gi|119628095|gb|EAX07690.1|       93.14   15   UserQuery  gi|402853634|ref|XP_003891497.1|       95.21   16   UserQuery  gi|327199318|ref|NP_001192137.1|       97.86   17   UserQuery  gi|544407820|ref|XP_005544288.1|       95.21   18   UserQuery  gi|397515822|ref|XP_003828142.1|       92.92   19   UserQuery  gi|327199324|ref|NP_001192140.1|       96.91   20   UserQuery  gi|478503167|ref|XP_004425891.1|       97.39   21   UserQuery  gi|410966597|ref|XP_003989817.1|       96.91   22   UserQuery  gi|507575528|ref|XP_004671309.1|       96.20   23   UserQuery  gi|560898119|ref|XP_006175513.1|       96.44   24   UserQuery  gi|544407818|ref|XP_005544287.1|       92.26   25   UserQuery  gi|560948741|ref|XP_006196937.1|       96.20   26   UserQuery  gi|545490248|ref|XP_005617300.1|       96.44   27   UserQuery  gi|511832853|ref|XP_004741077.1|       96.67   28   UserQuery  gi|402853632|ref|XP_003891496.1|       92.26   29   UserQuery  gi|327315350|ref|NP_001192141.1|       96.44   ..         ...                               ...         ...   142  UserQuery  gi|326932912|ref|XP_003212555.1|       79.81   143  UserQuery  gi|525022195|ref|XP_005058591.1|       78.62   144  UserQuery  gi|525022197|ref|XP_005058592.1|       78.62   145  UserQuery     gi|363742391|ref|XP_427366.3|       79.57   146  UserQuery  gi|524959499|ref|XP_005079698.1|       90.69   147  UserQuery  gi|543277880|ref|XP_005427132.1|       78.86   148  UserQuery  gi|543372998|ref|XP_005529718.1|       77.80   149  UserQuery  gi|542181352|ref|XP_005496408.1|       78.15   150  UserQuery  gi|327315362|ref|NP_001192146.1|       77.91   151  UserQuery  gi|543718943|ref|XP_005500705.1|       77.96   152  UserQuery   gi|62859419|ref|NP_001016101.1|       76.90   153  UserQuery       gi|115528650|gb|AAI24849.1|       76.43   154  UserQuery       gi|171846900|gb|AAI61455.1|       80.15   155  UserQuery        gi|47682592|gb|AAH70867.1|       76.74   156  UserQuery        gi|48735020|gb|AAH72030.1|       75.95   157  UserQuery  gi|147900380|ref|NP_001085384.1|       75.71   158  UserQuery            gi|2134146|pir||I51558       79.40   159  UserQuery  gi|148227322|ref|NP_001081229.1|       76.44   165  UserQuery  gi|530584815|ref|XP_005286204.1|       77.98   172  UserQuery       gi|537236894|gb|ERE85794.1|       88.68   173  UserQuery  gi|527248394|ref|XP_005142516.1|       77.43   174  UserQuery  gi|426328672|ref|XP_004025374.1|       99.64   176  UserQuery  gi|297282712|ref|XP_002808329.1|       98.54   177  UserQuery  gi|512815934|ref|XP_004878183.1|       86.16   183  UserQuery  gi|224167221|ref|XP_002198843.1|       80.08   184  UserQuery       gi|449267782|gb|EMC78684.1|       79.75   185  UserQuery  gi|354475661|ref|XP_003500046.1|       88.00   188  UserQuery       gi|146332865|gb|ABQ22938.1|       98.46   239  UserQuery       gi|344242290|gb|EGV98393.1|       86.34   251  UserQuery  gi|426328674|ref|XP_004025375.1|       90.78        alignment_length  mismatches  gap_opens  q_start  q_end  s_start  s_end  \0                 421           0          0        1    421        1    421   1                 421           1          0        1    421        1    421   2                 421           0          0        1    421       55    475   3                 421           1          0        1    421       55    475   4                 421           4          0        1    421        1    421   5                 438           0          1        1    421       87    524   6                 438           0          1        1    421        1    438   7                 421           4          0        1    421       55    475   8                 421           4          0        1    421       55    475   9                 438           0          1        1    421       55    492   10                438           1          1        1    421       26    463   11                413           0          0        9    421        1    413   12                452           0          1        1    421        1    452   13                438           5          1        1    421        1    438   14                452           0          1        1    421       55    506   15                438           4          1        1    421       26    463   16                421           9          0        1    421        1    421   17                438           4          1        1    421       55    492   18                452           1          1        1    421       26    477   19                421          13          0        1    421        1    421   20                421          11          0        1    421        1    421   21                421          13          0        1    421        1    421   22                421          16          0        1    421        1    421   23                421          15          0        1    421        1    421   24                452           4          1        1    421       55    506   25                421          16          0        1    421        1    421   26                421          15          0        1    421       45    465   27                421          14          0        1    421       10    430   28                452           4          1        1    421       26    477   29                421          15          0        1    421        1    421   ..                ...         ...        ...      ...    ...      ...    ...   142               421          81          2        1    421        1    417   143               421          88          1        1    421       15    433   144               421          88          1        1    421        1    419   145               421          82          2        1    421        1    417   146               376          22          1        1    363        1    376   147               421          86          1        4    421        8    428   148               419          93          0        3    421        2    420   149               421          91          1        1    421       27    446   150               421          92          1        1    421        1    420   151               422          92          1        1    421        1    422   152               420          94          2        1    418        1    419   153               420          96          2        1    418        1    419   154               398          78          1       21    418       22    418   155               417          94          2        4    418        1    416   156               420          98          2        1    418        1    419   157               420          99          2        1    418        1    419   158               398          81          1       21    418        8    404   159               416          96          2        4    418        9    423   165               377          83          0        1    377        1    377   172               318          36          0       77    394        1    318   173               350          78          1       72    420       21    370   174               274           1          0      148    421      219    492   176               274           4          0      148    421      107    380   177               318          22          1        1    296        8    325   183               241          48          0      181    421        1    241   184               242          49          0      180    421        1    242   185               225          27          0      197    421        1    225   188               195           3          0      227    421        1    195   239               161          22          0      261    421        1    161   251               141           6          1       25    165       27    160               evalue  bit_score  0     0.000000e+00        863  1     0.000000e+00        861  2     0.000000e+00        860  3     0.000000e+00        858  4     0.000000e+00        856  5     0.000000e+00        854  6     0.000000e+00        853  7     0.000000e+00        853  8     0.000000e+00        852  9     0.000000e+00        850  10    0.000000e+00        849  11    0.000000e+00        847  12    0.000000e+00        846  13    0.000000e+00        844  14    0.000000e+00        844  15    0.000000e+00        844  16    0.000000e+00        843  17    0.000000e+00        843  18    0.000000e+00        843  19    0.000000e+00        843  20    0.000000e+00        842  21    0.000000e+00        840  22    0.000000e+00        838  23    0.000000e+00        837  24    0.000000e+00        837  25    0.000000e+00        837  26    0.000000e+00        837  27    0.000000e+00        837  28    0.000000e+00        837  29    0.000000e+00        836  ..             ...        ...  142   0.000000e+00        696  143   0.000000e+00        695  144   0.000000e+00        694  145   0.000000e+00        694  146   0.000000e+00        693  147   0.000000e+00        691  148   0.000000e+00        689  149   0.000000e+00        687  150   0.000000e+00        684  151   0.000000e+00        683  152   0.000000e+00        664  153   0.000000e+00        664  154   0.000000e+00        663  155   0.000000e+00        662  156   0.000000e+00        661  157   0.000000e+00        660  158   0.000000e+00        658  159   0.000000e+00        657  165   0.000000e+00        609  172   0.000000e+00        582  173   0.000000e+00        573  174   0.000000e+00        566  176   0.000000e+00        558  177   0.000000e+00        558  183  7.000000e-141        414  184  4.000000e-140        412  185  2.000000e-139        409  188  3.000000e-136        400  239   5.000000e-92        286  251   3.000000e-78        250  [172 rows x 12 columns][ah@hobbes Documents]\$ ls -al *.fasta *.csv *.xlsx-rw-rw-r--. 1 ah ah 48589 Jan  7 13:54 BLAST_results.csv-rw-rw-r--. 1 ah ah 45210 Jan  7 13:54 BLAST_results.xlsx-rw-rw-r--. 1 ah ah   522 Jan  7 13:53 P18754.fasta[ah@hobbes Documents]\$ ` script_18_59.png

## Script 18.60

 script_18_60.py ```# Script 18.60from numpy import random, dot, cov, linalg, concatenate, array from matplotlib import pyplotdef get_principle_components(data, n=2):  data = array(data)               # Convert input to array  mean = data.mean(axis=0)         # Mean vector  centred_data = (data - mean).T   # Centre all vectors and transpose  covar = cov(centred_data)        # Get covariance matrix  evals, evecs = linalg.eig(covar) # Get Eigenvalues and Eigenvectors  indices = evals.argsort()[::-1]  # E. value indices by decreasing size  evecs = evecs[:,indices]    # Sort Eigenvecs according to Eigenvals  p_comp_mat = evecs[:,:n]    # Select required principle components  return p_comp_matsize = (100, 3)                     # 100 points times 3 dimensionsd1 = random.normal(0.0, 0.5, size) d2 = random.normal(0.0, 0.5, size) + array([4.0, 1.0, 2.0])d3 = random.normal(0.0, 0.5, size) + array([2.0, 0.0, -1.0])test_data = concatenate([d1, d2, d3], axis=0)pcomps = get_principle_components (test_data, n=2)   # Run PCApc1, pc2 = pcomps.T   # Extract the two PC vectors from columnsfor x, y, z in (pc1, pc2):  x *= 10             # Scale value so it can be seen better on graph         y *= 10  pyplot.plot((0, x), (0, y))  # Plot PC x, y as line from origin (0,0)x, y, x = test_data.T          # Extract x and y vals from transposepyplot.scatter(x, y, s=20, c='#0080FF', marker='o', alpha=0.5)pyplot.show() ``` script_18_60.out `[ah@hobbes Documents]\$ python script_18_60.py [ah@hobbes Documents]\$ ` script_18_60.png

## Script 18.61

 script_18_61.py ```# Script 18.61from scipy.stats import norm, poissonfrom numpy import arangefrom matplotlib import pyplotpoisson_rand_var = poisson(2.0)       # Random variable objectx_xals = arange(0, 10, 1)             # Value to plot probabilities fory_vals = poisson_rand_var.pmf(x_xals) # Get probabilities from distrib.pyplot.plot(x_xals, y_vals, color='black') # Make line plot pyplot.show()                              # Show on screen ``` script_18_61.out `[ah@hobbes Documents]\$ python script_18_61.py [ah@hobbes Documents]\$ ` script_18_61.png

## Script 18.62

 script_18_62.py ```# Script 18.62from numpy import arrayfrom scipy.stats import normdef normal_tail_test(values, mv, std, one_sided=True):    norm_rv = norm(mv, std)     # Normal distrib. random variable object    diffs = abs(values-mv)         # Calc differences from distrib. mean  result = norm_rv.cdf(mv-diffs) # Use cumulative density function    if not one_sided:              # Two-tailed test    result *= 2                  # Distrib. is symmetric: double area  return resultmean    = 1.76std_dev = 0.075values  = array([1.8, 1.9, 2.0])result = normal_tail_test (values, mean, std_dev, one_sided=True)print('Normal one-tail', result)         # [0.297, 0.03097, 0.000687] ``` script_18_62.out `[ah@hobbes Documents]\$ python script_18_62.py ('Normal one-tail', array([ 0.29690143,  0.03097408,  0.00068714]))[ah@hobbes Documents]\$ `

## Script 18.63

 script_18_63.py # Script 18.63from numpy import array, zeros, fromstring, int8, arangefrom matplotlib import pyplotdef plot_fastq_qualities(fastq_file, read_len=100):      file_obj = open(fastq_file)    # Open FASTQ file for reading  qual_scores = []               # Initial scores are empty list  line = file_obj.readline()     # Read first line (a header)  pad = ' ' * read_len           # Lots of spaces (to pad short data)  while line: # Continue looping while there are more lines    seq = file_obj.readline()    # Sequence line; not used     head = file_obj.readline()   # Second header line; not used     codes = file_obj.readline()  # Quality score code line; used    codes = codes.strip()        # Remove trailing newline char etc      if len(codes) < read_len:    # If quality code string too short      codes += pad               # Extend with spaces     codes = codes[:read_len]     # Chop codes string to desired length     qvals = fromstring(codes, dtype=int8) # Convert string to numbers    qvals -= 32                  # Set lowest possible value to zero        qual_scores.append(qvals)    # Add qual. scores for this seq to list    line = file_obj.readline()   # Read next header line   qual_scores = array(qual_scores)   # Convert final list to array  positions = arange(1, read_len+1)  # Make array of seq. positions  ave_scores = qual_scores.mean(axis=0)  # Find mean for each position  st_devs = qual_scores.std(axis=0)      # Standard dev. for each pos.  # Plot errorbars of stardard dev. height at score positions  pyplot.errorbar(positions, ave_scores, yerr=st_devs, color='red')  # Plot a line graph of scores on top  pyplot.plot(positions, ave_scores, color='black', linewidth=2)  pyplot.show()fastq_file = 'My_DNA_sample.fastq' # Test FASTQ file locationplot_fastq_qualities(fastq_file) script_18_63.out `[ah@hobbes Documents]\$ python script_18_63.py [ah@hobbes Documents]\$ ` script_18_63.png

## Script 18.64

 script_18_64.py ```# Script 18.64def find_discordant_read_pairs(sam_file, max_sep=2e3, verbose=True):    import os  from time import time     # Import inside the function, for a change  from pysam import Samfile, AlignmentFile  start_time = time()       # Record start time (seconds since 1/1/1970)    root_name, ext = os.path.splitext(sam_file) # Split input file ext.  out_file = root_name + "_discordant.bam"    # Out file has new ending    in_sam = Samfile(sam_file, 'rb')          # Open BAM file for reading   out_sam = AlignmentFile(out_file, "wb",   # Make new (empty) BAM file                          template=in_sam)  # base headers on input file      chromo_names = in_sam.references          # Fetch chromosome names  msg_cis = 'Long range: {}:{} - {}:{}'     # Some message templates  msg_trans = 'Interchromosome: {}:{} - {}:{}'    n = 0                    # Number of discordant read pairs found  for align in in_sam:     # Go through each input alignment record    chr_a = align.tid      # Primary chromosome of alignment    pos_a = align.pos      # Primary base pair position    chr_b = align.rnext    # Secondary chromosome    pos_b = align.pnext    # Secondary position    if chr_a == chr_b:          # If both chromosomes the same      delta = abs(pos_b-pos_a)  # Get separation of read ends            if delta > max_sep:         # If too big        out_sam.write(align)      # Write alignment record to output BAM        n += 1                    # Increase count                if verbose:               # Check option to print a message          chr_name = chromo_names[chr_a]          print(msg_cis.format(chr_name, pos_a, chr_name, pos_b))          else:                         # Chromosomes were different      out_sam.write(align)        # Write alignment record to output BAM      n += 1                      # Increase count            if verbose:                 # Check option to print a message        chr_name_a = chromo_names[chr_a]        chr_name_b = chromo_names[chr_b]        print(msg_trans.format(chr_name_a, pos_a, chr_name_b, pos_b))   in_sam.close()  out_sam.close()   delta_time = time()-start_time  # Time taken: current time minus start    # Wite out some useful information  print('Written {:d} records to BAM file: {}'.format(n, out_file))  print('Time taken: {:.1f} s'.format(delta_time)) bam_file = 'GSM409307_UCSD.H3K4me1.bam'  # Test on a _paired_ BAM filefind_discordant_read_pairs(bam_file, verbose=False) ``` script_18_64.out `[ah@hobbes Documents]\$ python script_18_64.py [W::bam_hdr_read] EOF marker is absent. The input is probably truncated.Written 6146231 records to BAM file: GSM409307_UCSD.H3K4me1_discordant.bamTime taken: 48.1 s[ah@hobbes Documents]\$ ls -al *.bam-rw-rw-r--. 1 ah ah 356121115 Jan  8 08:04 GSM409307_UCSD.H3K4me1.bam-rw-rw-r--. 1 ah ah 300195982 Jan  8 08:15 GSM409307_UCSD.H3K4me1_discordant.bam[ah@hobbes Documents]\$ `

## Script 18.65

 script_18_65.py ```# Script 18.65from subprocess import call           # Function to run an external program#from urllib.request import urlopen   # python 3.0 Import urlopen() functionfrom urllib2 import urlopen           # python 2.7 Import urlopen() functionfrom pandas import ExcelWriter, read_tableimport ruffusBLAST_DATABASE = '/home/ah/database/blast/nrfilt'  # Location of BLAST databaseUNIPROT_URL = 'http://www.uniprot.org/uniprot/{}'# Create a list of FASTA file names from UniProt IDsuniprot_id_list = ['P68431', 'P62805', 'P0C0S8', 'Q96A08']fasta_files = [x + '.fasta' for x in uniprot_id_list] @ruffus.originate(fasta_files)      # First stage, makes FASTA filesdef get_uniprot_seq(fasta_file):    # Downloads a named FASTA file    url = UNIPROT_URL.format(fasta_file)    # Insert file name into URL  response = urlopen(url)                 # Access remote location  data = response.read().decode('utf-8')  # Read data as text    file_obj = open(fasta_file, 'w')  # Open file for writing  file_obj.write(data)              # Save downloaded data   file_obj.close() @ruffus.transform(get_uniprot_seq, ruffus.suffix(".fasta"), ".blast")def run_blast(fasta_file, table_file, e_cutoff=1e-10, cpu_cores=1):      cmd_args = ['blastp', '-query', fasta_file, '-out', table_file,              '-outfmt', '6', '-num_threads', str(cpu_cores),              '-db', BLAST_DATABASE, '-evalue', str(e_cutoff)]      call(cmd_args)  # Run external program with command parameters.@ruffus.transform(run_blast, ruffus.suffix(".blast"), ".xlsx")def make_spreadsheet(table_file, xl_file):    # A tuple containing spreadsheet column headings  columns = ('query_id', 'subject_id', '%_identity', 'alignment_length',             'mismatches', 'gap_opens', 'q_start', 'q_end', 's_start',             's_end', 'evalue', 'bit_score')  data_types = [str, str, float, int, int, int,    # Data type for each                 int, int, int, int, float, float]  # column in sheet      data_types = zip(columns, data_types)   # Pair cols with data types  data_types = dict(data_types)           # Make dict of col_name:dtype  # Make a data table object with pandas, with the dtype dict  # allowing the correct conversion of the BLAST table to numbers  data_frame = read_table(table_file, dtype=data_types,                                 names=columns, header=None)  excel_writer = ExcelWriter(xl_file)         # Make Excel file  data_frame.to_excel(excel_writer,           # Write data to sheet                      sheet_name='Sheet1')    # in Excel file  excel_writer.save()ruffus.pipeline_printout()ruffus.pipeline_run(multiprocess=4,)ruffus.pipeline_printout() ``` script_18_65.out `[ah@hobbes Documents]\$ python script_18_65.py ________________________________________Tasks which will be run:Task = 'get_uniprot_seq'       Job  = [None             -> P68431.fasta]         Job needs update: ...               Missing file [P68431.fasta]       Job  = [None             -> P62805.fasta]         Job needs update: ...               Missing file [P62805.fasta]       Job  = [None             -> P0C0S8.fasta]         Job needs update: ...               Missing file [P0C0S8.fasta]       Job  = [None             -> Q96A08.fasta]         Job needs update: ...               Missing file [Q96A08.fasta]Task = 'run_blast'       Job  = [P0C0S8.fasta             -> P0C0S8.blast]         Job needs update: ...               Missing files [P0C0S8.fasta, P0C0S8.blast]       Job  = [P62805.fasta             -> P62805.blast]         Job needs update: ...               Missing files [P62805.fasta, P62805.blast]       Job  = [P68431.fasta             -> P68431.blast]         Job needs update: ...               Missing files [P68431.fasta, P68431.blast]       Job  = [Q96A08.fasta             -> Q96A08.blast]         Job needs update: ...               Missing files [Q96A08.fasta, Q96A08.blast]Task = 'make_spreadsheet'       Job  = [P0C0S8.blast             -> P0C0S8.xlsx]         Job needs update: ...               Missing files [P0C0S8.blast, P0C0S8.xlsx]       Job  = [P62805.blast             -> P62805.xlsx]         Job needs update: ...               Missing files [P62805.blast, P62805.xlsx]       Job  = [P68431.blast             -> P68431.xlsx]         Job needs update: ...               Missing files [P68431.blast, P68431.xlsx]       Job  = [Q96A08.blast             -> Q96A08.xlsx]         Job needs update: ...               Missing files [Q96A08.blast, Q96A08.xlsx]________________________________________________________________________________Tasks which will be run:Task enters queue = 'get_uniprot_seq' Completed Task = 'get_uniprot_seq' Task enters queue = 'run_blast' Completed Task = 'run_blast' Task enters queue = 'make_spreadsheet' Completed Task = 'make_spreadsheet' ________________________________________Tasks which will be run:________________________________________[ah@hobbes Documents]\$ ls -al *.fasta *.blast *.xlsx-rw-rw-r--. 1 ah ah 44432 Jan  7 13:59 P0C0S8.blast-rw-rw-r--. 1 ah ah   213 Jan  7 13:57 P0C0S8.fasta-rw-rw-r--. 1 ah ah 35802 Jan  7 14:00 P0C0S8.xlsx-rw-rw-r--. 1 ah ah 43209 Jan  7 13:59 P62805.blast-rw-rw-r--. 1 ah ah   174 Jan  7 13:57 P62805.fasta-rw-rw-r--. 1 ah ah 35912 Jan  7 14:00 P62805.xlsx-rw-rw-r--. 1 ah ah 47852 Jan  7 13:59 P68431.blast-rw-rw-r--. 1 ah ah   211 Jan  7 13:57 P68431.fasta-rw-rw-r--. 1 ah ah 38021 Jan  7 14:00 P68431.xlsx-rw-rw-r--. 1 ah ah 45540 Jan  7 13:59 Q96A08.blast-rw-rw-r--. 1 ah ah   213 Jan  7 13:57 Q96A08.fasta-rw-rw-r--. 1 ah ah 35796 Jan  7 14:00 Q96A08.xlsx[ah@hobbes Documents]\$ `

Not already registered? Create an account now. ×