Manual for the TAGsmart software

  1. Registration
    1.1  Personal information
    1.2  Key number
    1.3  Sign in
  2. Preprocessing data
    2.1  Input files: Originial TAG array readout file, Array description file, Tag description files, Chip description file, HETERO/HOMO difference set file, Tag-Quality file
    2.2  Preprocessor program
    2.3  Output of the preprocess program
  3. Online TAGsmart software
    3.1 General software information
    3.2 Experimental design
    3.3 Starting analysis
  4. 3.4 Offline TAGsmart software
  5. Interactive outputs
    4.1 Text output
    4.2 Heatmap

 

1. Registration

1.1 Personal information

Before using TagSmart, you need to register your brief information which are just your name, and e-mail address, and institute.

1.2 Key number

In order to confirm your e-mail address, we send the key number consisting of 7 letters to your e-mail address. You will receive an e-mail immediately, and then you can finish the registration with the key number.

1.3 Sign in

After registration, you can use TagSmart! In order to enter main pages, you need to sign in with your registered e-mail address.

 

2. Preprocessing data (Windows version , Linux version, Mac version)

2.1 Input files

Before preprocessing, let's walk through all the files needed in the data analysis.

2.1.1 Original TAG array readout files:

Each Affymetrix TAG3 microarray measures the population of each mutant strain at a given generation under a given environment. A typical Affymetrix TAG3 microarray readout file is at below:

1_I0_3~1.TXT (mouse right click, and choose "Save Target As"). Part of this file looks like this:

  1_I0_3_17_03B ( 570 nm ) 1_I0_3_17_03B ( 570 nm ) 1_I0_3_17_03B ( 570 nm ) 1_I0_3_17_03B ( 570 nm ) 1_I0_3_17_03B ( 570 nm )
  PM   MM   CPM   CMM   BG  
tag_3_at AAACAAACACCCGCGTGGTT 10 373 5 4 30
tag_4_at AAAGATATAACCCTGTGCCC 1 1 10 13 30
tag_5_at AAAGGAAGAACCGCGCCTCT 10 1 8 5 30
tag_6_at AAAGGCGTAAACATGCGGCC 19 11 33 20 30
tag_7_at AAATCAGCAAACGGGCTCCG 29 123 4 8 30
tag_8_at AAATGTCTAAACCCGCAGCG 19 14 14 18 30
tag_9_at AACAATGAAACGCTTCTCCG 1 6 18 13 30
tag_10_at AACTCAATAAAGCGCCCTGG 25 6 43 37 30
tag_11_at AACTCCGGCAAAGACACGGT 47 60 1 30 30
tag_12_at AACTGACTAAACTAGGTGCC 181 380 5 1 30
tag_13_at AAGAAGGGAAACTCGTTCGC 4 8 1 1 30
tag_14_at AAGGGTGGAAACGTATATCC 6 1 1 1 30
tag_15_at AAGTGGCCCAAATAACTGCC 1 1 4 1 30

TAGSmart supports CEL format, too. A typical CEL format file is at below:

A-original.CEL (mouse right click, and choose "Save Target As"). Part of this file looks like this:

[CEL]
Version=3

[HEADER]
Cols=266
Rows=266
TotalX=266
TotalY=266
OffsetX=0
OffsetY=0
GridCornerUL=73 119
GridCornerUR=2755 137
GridCornerLR=2739 2822
GridCornerLL=57 2805
Axis-invertX=0
AxisInvertY=0
swapXY=0
DatHeader=[0..5671]  A 7-12-02 RKDA:CLS=2920 RWS=2920 XIN=3  YIN=3  VE=17        2.0 07/12/02 12:49:35       2109_MU.1sq                  6
Algorithm=Percentile
AlgorithmParameters=Percentile:75;CellMargin:2;OutlierHigh:1.500;OutlierLow:1.004

[INTENSITY]
NumberCells=70756
CellHeader=X	Y	MEAN	STDV	NPIXELS
  0	  0	2750.3	506.2	 64
  1	  0	49.0	13.9	 64
  2	  0	2668.5	565.9	 64
  3	  0	47.8	19.5	 64
  4	  0	35.3	10.4	 64
  5	  0	2664.5	699.8	 64
  6	  0	49.0	32.0	 81
  7	  0	2499.3	292.0	 64
  8	  0	47.3	15.6	 64
  9	  0	2494.0	394.3	 64
 10	  0	48.0	16.2	 64

		

2.1.2. Array description file:

This is a user provided file. In this file the user should document the generation and environment under which each original TAG array was applied. A typical array description file is at below:

file_information3.txt (mouse right click, and choose "Save Target As"). Part of this file looks like below. The array description file is a table delimitated file with 3 columns and many rows. The first column provides the file names of each Original Tag array readout file. The second column provides the time (generation) on which the Tag array is applied. Under the same time point, we recommend applying 2 or more arrays. The third column provides a code for treatment/environment. For example, in the follow file, we used 0 to code for control group. 1,2,3 code for treatments with 3 different dosages of the same drug. Please fill in a digit for every row and every column in this file. Array data from the same time and the same treatment will be treated as replicated in the following analysis. For more information about how to construct Array description file, please see 2.2 Experimental design.

filename generation treatment
1_I0_3~1.TXT 0 0
2_E0_3~1.TXT 0 0
8_J0_3~1.TXT 0 0
10_F4D~1.TXT 4 0
1_C4D_~1.TXT 4 0
2_I4D_~1.TXT 4 0
11_F4C~1.TXT 4 1
4_E4C_~1.TXT 4 1
G4C_3_~1.TXT 4 1
10_J4C~1.TXT 4 2
3_I4C_~1.TXT 4 2
K4CIN_~1.TXT 4 2
C_T4CI~1.TXT 4 3
D_T4CI~1.TXT 4 3
11_J8D~1.TXT 8 0
12_F8D~1.TXT 8 0
4_I8D_~1.TXT 8 0

2.1.3.  Tag description file:

This is a array manufacturer provided file. It includes the annotation information for every probe (corresponding to a Tag) on the array. The Tag description files for Affymetrix Tag3 and Agilent arrays are at below:

TAG3_HET.txt and AGILENT.txt (mouse right click, and choose "Save Target As"). Both heterozygous and homozygous deletion mutants can use the same Tag description file. This file has to be stored in the same directory of the preprocess program (see below).

2.1.4.  Chip description file:

The CDF(Chip Description File) file for a particular array type can be obtained from the Affymetrix library file website or from Affymetrix GCOS software. It includes the annotation information for every coordinate on the array. The CDF file for Affymetrix Tag3 array is at below:

TAG_3.CDF (mouse right click, and choose "Save Target As"). This file has to be stored in the same directory of the preprocess program (see below).

2.1.5.  HETERO/HOMO difference set file:

Homozygous is a subset of Heterozygous. In order to distinguish Homozygous from Heterozygous, we use the difference set. it is from the following URL:
http://www-sequence.stanford.edu/group/yeast_deletion_project/Essential_ORFs.txt
In other words, this set is in heterozygous but not in homozygous.

2.1.6.  Tag-Quality file:

Based on our experimental results, we summerized the quality of each tag for the Tag3 array. The used method is as following:

Method:
1. Normalization with 98% trimmed mean.
2. log each member in the matrix.
3. u= mean of the reference data, stddev is the standard deviation of the reference data, x is the treated data point,
calulate Pho = (u - log(x))/div. This Pho is the Pho column entry in the markedgenes_new.txt
4. mark the tags with the following rule:
1) If a mutant only has one tag, mark it as +
2) If a mutant has two big tags, both of them are +
3) If a mutant has one good and one bad tag, the bad one is - and the good one is +
4) If a mutant has two bad tags, the relatively speaking better one is + and the other is -.

markedgenes.txt file has to be stored in the same directory of the preprocess program (see below).

2.2 The preprocessor program

Currently this preprocessor program is available for execution under Windows, Linux and Mac environments.

(a) Windows - Download

With these files at hand, we can proceed to work with the preprocessing program. Please download the zipped file PreTag3.zip which contains both the main preprocessor program and the description files.

After executing this program, the following window will appear. Use the first Select button to designate the Array Description File (See 2.1.2) which should be in the same directory with all the original array readout data files.

Next, specify the file format. You can select one among Affymetrix TXT, Affymetrix CEL, and Agilent TXT formats. If you want to filter out bad tags from the input files, set 'Bad Tag Filtering' check-box.

You can also select one between Homozygous and Heterozygous as shown the following picture,

After setting, press Start button to start preprocessing. The progress bar will indicate the progress of the process. A dialog will appear after the preprocessing is finished.

(b) Linux(Download) and Mac(Download)

With these files at hand, we can proceed to work with the preprocessing program. Download the main preprocessor program: PreTag3.tar.gz which contains both the main preprocessor program and the description files. Please put all the original array readout data files in the same directory as well. PreTag3.exe requires five input parameters, the file path of the Array Description File (See 2.1.2), an output file path, a file format(-txt,-cel, or -agi), a bad-tag-filter option(-a or -f), and a heterozygous option(-he or -ho). For example,

PreTag3.exe fileinformation.txt TagAll.txt -cel -f -he

		-txt : Affymetrix TXT format
		-cel : Affymetrix CEL format
		-agi : Agilent format
		-a : no bad-tag filtering
		-f : bad-tag filtering
		-he : heterozygous
		-ho : homozygous
		

The list of file names will indicate the progress of the process.

2.3 Output of the preprocess program

The output file of the preprocessor looks like this following:

orf gene tag tagtype PM/CPM R0_3_2~1.TXT S0_3_31_03.TXT 10_F4D~1.TXT 1_C4D_~1.TXT 11_J8D~1.TXT 12_F8D~1.TXT S8DMSO_3_31_03.TXT 11_F4C~1.TXT 4_E4C_~1.TXT G4C_3_~1.TXT 13_F8C~1.TXT 6_E8C_~1.TXT G8C_3_~1.TXT
Treatment         0 0 0 0 0 0 0 1 1 1 1 1 1
Generation         0 0 4 4 8 8 8 4 4 4 8 8 8
YKL134C Oct1 tag_9600_at uptag PM 40 9 34 32 68 37 6 46 29 31 53 45 23
YKL134C Oct1 tag_9600_at uptag CPM 23 24 23 23 46 26 54 30 27 41 27 30 38
YKL134C Oct1 tag_9696_at downtag PM 300 301 261 201 321 219 297 286 273 271 288 282 316
YKL134C Oct1 tag_9696_at downtag CPM 39 74 70 43 65 45 87 88 59 94 76 69 98
YMR056C AAC1 tag_11910_at uptag PM 60 81 63 57 95 64 62 66 41 60 23 17 11
YMR056C AAC1 tag_11910_at uptag CPM 188 278 211 185 357 207 293 212 151 246 90 81 62
YMR056C AAC1 tag_144_at downtag PM 127 41 97 78 117 116 47 86 66 56 20 19 18
YMR056C AAC1 tag_144_at downtag CPM 69 104 62 49 98 69 239 56 49 70 12 12 29
YBR085W AAC3 tag_6111_at uptag PM 124 75 131 124 153 147 67 162 131 127 145 144 107
YBR085W AAC3 tag_6111_at uptag CPM 76 120 80 66 128 86 256 114 75 130 92 105 127
YBR085W AAC3 tag_6207_at downtag PM 31 20 16 15 28 12 19 13 14 16 13 16 14
YBR085W AAC3 tag_6207_at downtag CPM 18 11 11 10 17 10 10 12 9 9 10 7 10
YNL331C AAD14 tag_1309_at uptag PM 7 3 4 5 5 3 7 7 3 3 1 4 1
YNL331C AAD14 tag_1309_at uptag CPM 4 4 5 1 1 2 1 1 1 1 2 9 5
YNL331C AAD14 tag_1405_at downtag PM 4 6 5 3 5 5 2 6 2 3 5 8 8
YNL331C AAD14 tag_1405_at downtag CPM 4 4 4 2 4 3 3 3 1 3 4 1 1
YCR107W AAD3 tag_12451_at uptag PM 99 114 50 58 92 59 109 65 52 86 69 60 51
YCR107W AAD3 tag_12451_at uptag CPM 112 285 58 59 114 59 372 73 63 142 65 58 100
YCR107W AAD3 tag_12475_at downtag PM 49 51 19 16 48 15 32 18 18 17 21 19 11
YCR107W AAD3 tag_12475_at downtag CPM 37 49 14 18 34 12 36 16 17 14 15 10 11

It is a table delimitated file. The column names are self explanatory. From the 6th column, the column names are the same to the names of the original array data files. The second row lists the codes for the different environment (treatment). 0, 1, etc are codes for different environments (treatments). The third lists the codes for time (cell generation). By design, every mutant has two Tags spiked in, called uptag and downtag. Every tag has two perfect match probes, which complementary to each other. They are termed PM and CPM probes. Altogether there are 4 probes measuring the fitness of a mutant on a Tag array. The data for the 4 probes are compiled into 4 consecutive rows in the output file.

This is a truncated output file: processed-1000.txt (mouse right click, and choose "Save Target As"). It is truncated from the preprocessor output so that there are only 1000 mutants within.

The figure below may help deeper understanding of the array design, .

The figure above is a generic gene-deletion cassette module. The biotin-labelled, deletion-specific primers (B-U1, B-U2-comp, B-D1 and B-D2-comp) are used to amplify the unique UPTAG and DNTAG sequences from genomic preparations generated in the fitness-profiling studies. B-U1 can hybridize to the uptag-PM and uptag-MM probes on the tag array. B-U2 can hybridize to the uptag-cPM and uptag-cMM probes, and so on. The figure is given by Giaever et al (Nature 2002; Redistribution permit granted by Nature Publishing Group.)

3. Online TAGSmart software

Online TAGSmart is optimized for Internet Explorer 6.0 and FireFox 1.5. If a user has any trouble to browse TAGSmart, try to use IE or FireFox, please.

3.1 General software description

The TAGsmart software searches for mutants with a very different survival rates in treated and control environments. In other words, it searches for mutants with specific interaction to certain environment (treatment). For example, the following graph shows that the MPS1 mutant has different survival rate in in drug treated environment comparing to a control environment. The X axis is the generation (time), the Y axis is the array signal (positively related to the population size of this mutant).

3.2 Experimental design
The simplest experimental design would be comparing one treated pool of mutants to one control pool at a single time point. More complex designs may detect the mutant-environment interaction with higher sensitivity. For example, the mutant population may be measured at multiple time points under both the treated environment and the control environment. Moreover, the treatment can be applied with different dosages (Dorer et al, Current Biology 2006). TAGsmart software allows all these simple or complex experimental designs. The information of the experimental design is passed to TAGsmart by providing an array description file. To write an array description file, simply put down the actual time and treatment in digit number for every original data file. Treatment "0" will be regarded as the control treatment. It is usually good to have replicates in every time point and every treatment (environment) group.

We provide not only our algorithm but also a heat map with a previous result.

3.3 Starting analysis
Choose "Analyze a preprocessed data file" radio button, and click "Next>>" button on the home page. The following screen will appear.

Use the "Browse" button to select the input data file. This should be the output file from the preprocessing! The "Sample input file" link gives an example input file. This file is explained in "Output of the preprocess program". TAGsmart allows user to use Q-value (or False Discovery Rate) or Fold Change (FC) or both of the two metrics to select mutants that show different fitness (survival rates) in the multiple environments.

In the "Q-value threshold (%)" text box, please enter a number between 0.01 and 100. The smaller Q-value is, the stronger the statistical evidence is for this mutant to have different survival rates. Choosing Q-value threshold (%) as 100 will void using Q-value as a filter to select mutants, because all mutants are expected to have a Q-value no bigger than 100%.  

In the "FC Threshold" text box, enter a positive number and choose the "<" or ">" drop-down-list accordingly. When "FC Threshold" is set between 0 and 1, and the "<" is chosen, the mutants that have inferior fitness in treated environment (i.e. treatment code =\= 0) comparing to control environment (i.e. treatment code = 0) will be kept in the output. For example, the MPS1 mutant in General Software Description is such a mutant. The smaller the "FC Threshold" is, the more stringent the selection is. When "FC Threshold" is set above 1, and the ">" is chosen, the mutants with superior fitness in the treated environment will be kept in the output. The larger the "FC Threshold" is, the more stringent the selection is. Finally, by definition FC = average (fitness in treated environment) / average (fitness in control).

In the "# of permutations" text box, please enter a natural number between 1 and 1000. This is an internal parameter used in computation of Q-value. The larger this parameter is, the more accurate the computation is, but the more time it will take to compute the result. The default number 500 is a good balanced parameter.

The "Generation-0 Correction" option should be checked, if at "time(generation) 0", there is theoretically no difference between the treated and the control environments. The  "time(generation) 0" data will then be used as the common measurement for the starting population for both groups.

Click "Submit" button after the analysis choices are made.

3.4 Offline TAGsmart software (Windows version)
Download the offline TAGSmart program. TAGSmart.exe requires different number of arguments depending on the algorithm you choose to run.
For the TAGSmart method, you need "T", Qthreshold, FCthreshold, Number of permutations, inputfilename, outputfilename, Lower, and Correction. See the figure under section 3.3 as a reference.
For example,

TAGSmart.exe T 1 1 100 input.txt output.txt true false

If you want to see the visualized result, use our online TAGSmart. Please choose the "Result from this site or offline-TAGSmart" option which enables you to use full functions of our server.

4. Interactive outputs

4.1 Interactive text output
A text output like the following will be given. Every mutant that satisfies the selection criteria specified in "Starting analysis", and its related information are given the text output. The mutants can be sorted by their Q-values, p-values, or FC (fold change). 

If there are multiple pages, page numbers "1,2,3,..." will appear at the bottom of the page.

Left click the disk icon at the upper left corner, a pure text version of the output will be given. Right click the disk icon, and select ("Save target as") can save the pure text output as a file on the local computer.

Click the HEATMAP icon on the upper left corner to see a graphical display of the result.

4.2 Heatmap output
The heatmap output looks like the following. The first two colored lines indicate the treatment and the time of the microarray sample. The actual heatmap for mutants start from the 3rd colored line. Because there are usually 4 probes on a Tag3 array (we are not considering mismatch probes) for each mutant, each mutant is represented as 4 rows in the heatmap.

Move mouse cursor over any color coded regions to get detailed information about that color code. For example, at below the cursor stays at the first row, where the treatments (environments) are color coded. From the following picture we know the first 37 microarrays are measuring the Treatment 0 group (control environment).

The following picture shows that arrays 45-47 measured mutants in generation 16.

The following picture shows that APM2 mutant has a survival rate of -1.571812 on the 43th microarray.