This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Here are some exercises to get you into basic perl programming. This document, and any files it refers to, can be found in /home/shared/5120/Perl on fbsdpcu001. There are 10 exercises which you should be able to do using the lecture notes, perhaps with an occasional consultation with an experienced Perl programmer, or "Programming Perl" (Wall L, Christiansen T and Schwartz RL, Second Edition, O'Reilly, 1996, ISBN 1-56592-149-6). To help you, solutions to the early problems are also present in the above directory.
To do these exercises you will need to login to fbsdpcu001 using ssh, and you will have to edit your code using the editor "vi". You might have done this already for the last exercise of BIOL 5010 assignment 1. If not, you have some learning to do. Here are some brief vi instructions to help with the exercises below.
To edit (or create and edit) a file in vi called "HelloWorld.pl" type the following command at the UNIX command prompt:
When in vi there are two main modes, insertion mode (where you are typing your program) and command mode (when you are typing commands). It is not quite as friendly as Forte. You start in command mode, so to enter insertion mode type "i". You can now type in some code. Press ESC when you want to leave insertion mode. To save the file you need to be in command mode and to type ":w" (REMEMBER the colon!). To save and quit vi at the same time type ":wq", and to quit without saving type ":q!". There are some useful commands that you can use in command mode: "x" will delete the character under the cursor; "dd" will delete the entire line containing the cursor; "yy" will yank the entire line into a buffer; "p" will paste the line out of the buffer (put there by "dd" or "yy"); "r" will replace the character under the cursor with whatever you type next. To enter insertion mode from command mode you can use "i" (as above), "a" to insertÂÂ afterÂÂ the cursor, or "R" to replace existing text from the cursor onwards. Remember that to enter any commands you first need to leave insertion mode by typing ESC. I hope that's enough to be going on with.
Exercise 1: The "Hello World" program
All attempts to learn a new programming language should start with a "Hello World" program. This is a quick way of making sure that you understand the process of writing a simple program and making it run.
Using vi as explained above edit a new file called "helloworld.pl". Put in the following lines
print STDOUT "Hello World";
The line beginning #! tells the shell to run the script using Perl. This should appear on the first line of the file. The flag -w tells Perl to generate warnings if it thinks there are mistakes in your code (enable this while de-bugging).
Save the file and make it executable (enter the UNIX command "chmod +x helloworld.pl" at the UNIX prompt). Finally execute the program (by typing the filename, as you would any other UNIX executable, but if "." is not on your PATH (environment variable you might need to type ./helloworld.pl instead of just helloworld.pl to run the program). It should print "Hello World" on the screen.
Exercise 2: File IO
Copy the data file "marks.dat" into your working directory. Write a perl program to open the file (or die in the case of failure) and to read the file line by line and print each line on the screen. (HINTS: seeÂÂ lecture notes).
Exercise 3: Using arrays and foreach loops
Modify the above program so that the first while loop simply reads each line into an element of an array (remember that strings are scalars in Perl, so unlike C, where a two dimensional array would have been needed, here you only need a one-dimensional array). Then write a foreach loop to output every line on the screen.
Exercise 4: Using "split"
Perl has a very useful function called split, which is analogous to the StringTokenizer class in Java. It will split a string into tokens using specified delimeters. For instance if the delimeter is any white space character, the string "The cow jumped over the moon" would be split into six tokens (The,cow,jumped,over,the,moon). If the delimeter is "," then the string "345,456,567" would split into 3 tokens, each consisting of three digits. The syntax of split is
@tokens = split /PATTERN/, $mystring;
where PATTERN is a regular expression defining the delimeters, $mystring is the string to be split, and the tokens are returned in the @tokens array.
@tokens = split /\s+/, "The cow jumped over the moon";
would return in tokens $tokens = "The", $tokens = "cow", etc.
Modify the program from exercise 3 so that instead of outputting the line in its raw format it is first split into two tokens (a name and a number) which are then written out on a single line separated by a tab (\t) character.
Exercise 5: Using a Perl hash
Modify one of the programs written above to first read in the data from the file, then ask the user for an input surname, and finally output the mark associated with the input surname.
(HINT: a good way to store the data would be in a hash. Hashes were mentioned in theÂÂ lecture. They are a bit like arrays, except that the elements are accessed using strings rather than numbers. Each mark can be associated with a surname as the hash key).
Exercise 6: More hashes
Modify the above program so that instead of asking the user for a name, it just writes out all the marks in alphabetical order of based on surname.
(HINT: the "keys" function will return the keys of the hash as a list, the sort function will sort this list in alphabetical order).
Exercise 7: Regular expressions
Design and test Perl regular expressions to match the following:
A file name with the extension ".emb"
A file name with extension .htm or .html
A line of text beginning with a digit
A line of text containing a word and a number, separated by one or more white space characters
Leeds acacdemic URLS (i.e. strings of the form "http://www.bioinf.leeds.ac.uk" where it is desired to match any word in the place of bioinf.
Exercise 7: Extracting information with regular expressions
Write a program to scan an HTML document, and find (list and count) all Leeds URLs contained within it.
Exercise 8: Using a perl subroutine
Rewrite the program from exercise 7 so that the main functionality is contained within a subroutine which takes the HTML file name as its only argument and prints the relevant URLS on standard output, returning the number of URLs found.
Exercise 9: Perl CGI
Try to get the CGI example contained in the files PerlCGI_example.cgi and PerlCGI_example.html working from your own WWW space on fbsdpcu001.
(NB: your WWW space on fbsdpcu001 is accessed by URLs like http://fbsdpcu001.leeds.ac.uk/~username/filename.html and to use it you will need to make a directory called public_html in your own home directory (if it doesn't already exist) with the right access permissions. The UNIX for this is "cd ~", "mkdir public_html" and "chmod a+rx public_html". Make sure all html files are readable to everyone, and that the .cgi file is readable and executable by everyone.)
Exercise 10: Assessed course work
Using exercise 9 as a basis, produce a WWW page which acts as a front end to a fasta search. Give the user control over whatever fasta options you think are important and return the results in a similar format to that used by the fasta program.
The fasta programs are installed on fbsdpcu001 in the directory /applic/Fasta and you probably want to produce an interface to the program fasta33. You can get information on how to run fasta from /applic/Fasta/fasta3x.doc and from the various .doc files in the above directory. Remember that you will need the perl "system" command to run an external program from your Perl script. You will also need to provide a sequence library to search. I suggest you make your own short sequence library, e.g. by extracting about 100 sequences from SWISSPROT.
This exercise will be marked out of ten, in two equal components, first a mark for the quality of your (working) code and second a mark for how you give the user access to fasta options and the aesthetic qualities of your interface and the output.