Manual for the TAGsmart software
Before using TagSmart, you need to register your brief information which are just your name, and e-mail address, and institute.

In order to confirm your e-mail address, we send the key number consisting of 7 letters to your e-mail address. You will receive an e-mail immediately, and then you can finish the registration with the key number.

After registration, you can use TagSmart! In order to enter main pages, you need to sign in with your registered e-mail address.

2. Preprocessing data (Windows version , Linux version, Mac version)
Before preprocessing, let's walk through all the files needed in the data analysis.
2.1.1 Original TAG array readout files:
Each Affymetrix TAG3 microarray measures the population of each mutant strain at a given generation under a given environment. A typical Affymetrix TAG3 microarray readout file is at below:
1_I0_3~1.TXT (mouse right click, and choose "Save Target As"). Part of this file looks like this:
| 1_I0_3_17_03B ( 570 nm ) | 1_I0_3_17_03B ( 570 nm ) | 1_I0_3_17_03B ( 570 nm ) | 1_I0_3_17_03B ( 570 nm ) | 1_I0_3_17_03B ( 570 nm ) | ||
| PM | MM | CPM | CMM | BG | ||
| tag_3_at | AAACAAACACCCGCGTGGTT | 10 | 373 | 5 | 4 | 30 |
| tag_4_at | AAAGATATAACCCTGTGCCC | 1 | 1 | 10 | 13 | 30 |
| tag_5_at | AAAGGAAGAACCGCGCCTCT | 10 | 1 | 8 | 5 | 30 |
| tag_6_at | AAAGGCGTAAACATGCGGCC | 19 | 11 | 33 | 20 | 30 |
| tag_7_at | AAATCAGCAAACGGGCTCCG | 29 | 123 | 4 | 8 | 30 |
| tag_8_at | AAATGTCTAAACCCGCAGCG | 19 | 14 | 14 | 18 | 30 |
| tag_9_at | AACAATGAAACGCTTCTCCG | 1 | 6 | 18 | 13 | 30 |
| tag_10_at | AACTCAATAAAGCGCCCTGG | 25 | 6 | 43 | 37 | 30 |
| tag_11_at | AACTCCGGCAAAGACACGGT | 47 | 60 | 1 | 30 | 30 |
| tag_12_at | AACTGACTAAACTAGGTGCC | 181 | 380 | 5 | 1 | 30 |
| tag_13_at | AAGAAGGGAAACTCGTTCGC | 4 | 8 | 1 | 1 | 30 |
| tag_14_at | AAGGGTGGAAACGTATATCC | 6 | 1 | 1 | 1 | 30 |
| tag_15_at | AAGTGGCCCAAATAACTGCC | 1 | 1 | 4 | 1 | 30 |
TAGSmart supports CEL format, too. A typical CEL format file is at below:
A-original.CEL (mouse right click, and choose "Save Target As"). Part of this file looks like this:
[CEL] Version=3 [HEADER] Cols=266 Rows=266 TotalX=266 TotalY=266 OffsetX=0 OffsetY=0 GridCornerUL=73 119 GridCornerUR=2755 137 GridCornerLR=2739 2822 GridCornerLL=57 2805 Axis-invertX=0 AxisInvertY=0 swapXY=0 DatHeader=[0..5671] A 7-12-02 RKDA:CLS=2920 RWS=2920 XIN=3 YIN=3 VE=17 2.0 07/12/02 12:49:35 2109_MU.1sq 6 Algorithm=Percentile AlgorithmParameters=Percentile:75;CellMargin:2;OutlierHigh:1.500;OutlierLow:1.004 [INTENSITY] NumberCells=70756 CellHeader=X Y MEAN STDV NPIXELS 0 0 2750.3 506.2 64 1 0 49.0 13.9 64 2 0 2668.5 565.9 64 3 0 47.8 19.5 64 4 0 35.3 10.4 64 5 0 2664.5 699.8 64 6 0 49.0 32.0 81 7 0 2499.3 292.0 64 8 0 47.3 15.6 64 9 0 2494.0 394.3 64 10 0 48.0 16.2 64
2.1.2. Array description file:
This is a user provided file. In this file the user should document the generation and environment under which each original TAG array was applied. A typical array description file is at below:
file_information3.txt (mouse right click, and choose "Save Target As"). Part of this file looks like below. The array description file is a table delimitated file with 3 columns and many rows. The first column provides the file names of each Original Tag array readout file. The second column provides the time (generation) on which the Tag array is applied. Under the same time point, we recommend applying 2 or more arrays. The third column provides a code for treatment/environment. For example, in the follow file, we used 0 to code for control group. 1,2,3 code for treatments with 3 different dosages of the same drug. Please fill in a digit for every row and every column in this file. Array data from the same time and the same treatment will be treated as replicated in the following analysis. For more information about how to construct Array description file, please see 2.2 Experimental design.
| filename | generation | treatment |
| 1_I0_3~1.TXT | 0 | 0 |
| 2_E0_3~1.TXT | 0 | 0 |
| 8_J0_3~1.TXT | 0 | 0 |
| 10_F4D~1.TXT | 4 | 0 |
| 1_C4D_~1.TXT | 4 | 0 |
| 2_I4D_~1.TXT | 4 | 0 |
| 11_F4C~1.TXT | 4 | 1 |
| 4_E4C_~1.TXT | 4 | 1 |
| G4C_3_~1.TXT | 4 | 1 |
| 10_J4C~1.TXT | 4 | 2 |
| 3_I4C_~1.TXT | 4 | 2 |
| K4CIN_~1.TXT | 4 | 2 |
| C_T4CI~1.TXT | 4 | 3 |
| D_T4CI~1.TXT | 4 | 3 |
| 11_J8D~1.TXT | 8 | 0 |
| 12_F8D~1.TXT | 8 | 0 |
| 4_I8D_~1.TXT | 8 | 0 |
This is a array manufacturer provided file. It includes the annotation information for every probe (corresponding to a Tag) on the array. The Tag description files for Affymetrix Tag3 and Agilent arrays are at below:
TAG3_HET.txt and AGILENT.txt (mouse right click, and choose "Save Target As"). Both heterozygous and homozygous deletion mutants can use the same Tag description file. This file has to be stored in the same directory of the preprocess program (see below).
The CDF(Chip Description File) file for a particular array type can be obtained from the Affymetrix library file website or from Affymetrix GCOS software. It includes the annotation information for every coordinate on the array. The CDF file for Affymetrix Tag3 array is at below:
TAG_3.CDF (mouse right click, and choose "Save Target As"). This file has to be stored in the same directory of the preprocess program (see below).
2.1.5. HETERO/HOMO difference set file:
Homozygous is a subset of Heterozygous. In order to distinguish Homozygous from Heterozygous,
we use the difference set. it is from the following URL:
http://www-sequence.stanford.edu/group/yeast_deletion_project/Essential_ORFs.txt
In other words, this set is in heterozygous but not in homozygous.
Based on our experimental results, we summerized the quality of each tag for the Tag3 array. The used method is as following:
Method: 1. Normalization with 98% trimmed mean. 2. log each member in the matrix. 3. u= mean of the reference data, stddev is the standard deviation of the reference data, x is the treated data point, calulate Pho = (u - log(x))/div. This Pho is the Pho column entry in the markedgenes_new.txt 4. mark the tags with the following rule: 1) If a mutant only has one tag, mark it as + 2) If a mutant has two big tags, both of them are + 3) If a mutant has one good and one bad tag, the bad one is - and the good one is + 4) If a mutant has two bad tags, the relatively speaking better one is + and the other is -.
markedgenes.txt file has to be stored in the same directory of the preprocess program (see below).
Currently this preprocessor program is available for execution under Windows, Linux and Mac environments.
(a) Windows - Download
With these files at hand, we can proceed to work with the preprocessing program. Please download the zipped file PreTag3.zip which contains both the main preprocessor program and the description files.
After executing this program, the following window will appear. Use the first Select button to designate the Array Description File (See 2.1.2) which should be in the same directory with all the original array readout data files.

Next, specify the file format. You can select one among Affymetrix TXT, Affymetrix CEL, and Agilent TXT formats. If you want to filter out bad tags from the input files, set 'Bad Tag Filtering' check-box.


After setting, press Start button to start preprocessing. The progress bar will indicate the progress of the process. A dialog will appear after the preprocessing is finished.

(b) Linux(Download) and Mac(Download)
With these files at hand, we can proceed to work with the preprocessing program. Download the main preprocessor program: PreTag3.tar.gz which contains both the main preprocessor program and the description files. Please put all the original array readout data files in the same directory as well. PreTag3.exe requires five input parameters, the file path of the Array Description File (See 2.1.2), an output file path, a file format(-txt,-cel, or -agi), a bad-tag-filter option(-a or -f), and a heterozygous option(-he or -ho). For example,
PreTag3.exe fileinformation.txt TagAll.txt -cel -f -he
-txt : Affymetrix TXT format -cel : Affymetrix CEL format -agi : Agilent format -a : no bad-tag filtering -f : bad-tag filtering -he : heterozygous -ho : homozygous
The list of file names will indicate the progress of the process.

2.3 Output of the preprocess program
The output file of the preprocessor looks like this following:
| orf | gene | tag | tagtype | PM/CPM | R0_3_2~1.TXT | S0_3_31_03.TXT | 10_F4D~1.TXT | 1_C4D_~1.TXT | 11_J8D~1.TXT | 12_F8D~1.TXT | S8DMSO_3_31_03.TXT | 11_F4C~1.TXT | 4_E4C_~1.TXT | G4C_3_~1.TXT | 13_F8C~1.TXT | 6_E8C_~1.TXT | G8C_3_~1.TXT |
| Treatment | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | ||||
| Generation | 0 | 0 | 4 | 4 | 8 | 8 | 8 | 4 | 4 | 4 | 8 | 8 | 8 | ||||
| YKL134C | Oct1 | tag_9600_at | uptag | PM | 40 | 9 | 34 | 32 | 68 | 37 | 6 | 46 | 29 | 31 | 53 | 45 | 23 |
| YKL134C | Oct1 | tag_9600_at | uptag | CPM | 23 | 24 | 23 | 23 | 46 | 26 | 54 | 30 | 27 | 41 | 27 | 30 | 38 |
| YKL134C | Oct1 | tag_9696_at | downtag | PM | 300 | 301 | 261 | 201 | 321 | 219 | 297 | 286 | 273 | 271 | 288 | 282 | 316 |
| YKL134C | Oct1 | tag_9696_at | downtag | CPM | 39 | 74 | 70 | 43 | 65 | 45 | 87 | 88 | 59 | 94 | 76 | 69 | 98 |
| YMR056C | AAC1 | tag_11910_at | uptag | PM | 60 | 81 | 63 | 57 | 95 | 64 | 62 | 66 | 41 | 60 | 23 | 17 | 11 |
| YMR056C | AAC1 | tag_11910_at | uptag | CPM | 188 | 278 | 211 | 185 | 357 | 207 | 293 | 212 | 151 | 246 | 90 | 81 | 62 |
| YMR056C | AAC1 | tag_144_at | downtag | PM | 127 | 41 | 97 | 78 | 117 | 116 | 47 | 86 | 66 | 56 | 20 | 19 | 18 |
| YMR056C | AAC1 | tag_144_at | downtag | CPM | 69 | 104 | 62 | 49 | 98 | 69 | 239 | 56 | 49 | 70 | 12 | 12 | 29 |
| YBR085W | AAC3 | tag_6111_at | uptag | PM | 124 | 75 | 131 | 124 | 153 | 147 | 67 | 162 | 131 | 127 | 145 | 144 | 107 |
| YBR085W | AAC3 | tag_6111_at | uptag | CPM | 76 | 120 | 80 | 66 | 128 | 86 | 256 | 114 | 75 | 130 | 92 | 105 | 127 |
| YBR085W | AAC3 | tag_6207_at | downtag | PM | 31 | 20 | 16 | 15 | 28 | 12 | 19 | 13 | 14 | 16 | 13 | 16 | 14 |
| YBR085W | AAC3 | tag_6207_at | downtag | CPM | 18 | 11 | 11 | 10 | 17 | 10 | 10 | 12 | 9 | 9 | 10 | 7 | 10 |
| YNL331C | AAD14 | tag_1309_at | uptag | PM | 7 | 3 | 4 | 5 | 5 | 3 | 7 | 7 | 3 | 3 | 1 | 4 | 1 |
| YNL331C | AAD14 | tag_1309_at | uptag | CPM | 4 | 4 | 5 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 2 | 9 | 5 |
| YNL331C | AAD14 | tag_1405_at | downtag | PM | 4 | 6 | 5 | 3 | 5 | 5 | 2 | 6 | 2 | 3 | 5 | 8 | 8 |
| YNL331C | AAD14 | tag_1405_at | downtag | CPM | 4 | 4 | 4 | 2 | 4 | 3 | 3 | 3 | 1 | 3 | 4 | 1 | 1 |
| YCR107W | AAD3 | tag_12451_at | uptag | PM | 99 | 114 | 50 | 58 | 92 | 59 | 109 | 65 | 52 | 86 | 69 | 60 | 51 |
| YCR107W | AAD3 | tag_12451_at | uptag | CPM | 112 | 285 | 58 | 59 | 114 | 59 | 372 | 73 | 63 | 142 | 65 | 58 | 100 |
| YCR107W | AAD3 | tag_12475_at | downtag | PM | 49 | 51 | 19 | 16 | 48 | 15 | 32 | 18 | 18 | 17 | 21 | 19 | 11 |
| YCR107W | AAD3 | tag_12475_at | downtag | CPM | 37 | 49 | 14 | 18 | 34 | 12 | 36 | 16 | 17 | 14 | 15 | 10 | 11 |
It is a table delimitated file. The column names are self explanatory. From the 6th column, the column names are the same to the names of the original array data files. The second row lists the codes for the different environment (treatment). 0, 1, etc are codes for different environments (treatments). The third lists the codes for time (cell generation). By design, every mutant has two Tags spiked in, called uptag and downtag. Every tag has two perfect match probes, which complementary to each other. They are termed PM and CPM probes. Altogether there are 4 probes measuring the fitness of a mutant on a Tag array. The data for the 4 probes are compiled into 4 consecutive rows in the output file.
This is a truncated output file: processed-1000.txt (mouse right click, and choose "Save Target As"). It is truncated from the preprocessor output so that there are only 1000 mutants within.
The figure below may help deeper understanding of the array design, .

The figure above is a generic gene-deletion cassette module. The biotin-labelled, deletion-specific primers (B-U1, B-U2-comp, B-D1 and B-D2-comp) are used to amplify the unique UPTAG and DNTAG sequences from genomic preparations generated in the fitness-profiling studies. B-U1 can hybridize to the uptag-PM and uptag-MM probes on the tag array. B-U2 can hybridize to the uptag-cPM and uptag-cMM probes, and so on. The figure is given by Giaever et al (Nature 2002; Redistribution permit granted by Nature Publishing Group.)
Online TAGSmart is optimized for Internet Explorer 6.0 and FireFox 1.5. If a user has any trouble to browse TAGSmart, try to use IE or FireFox, please.
3.1 General software description
The TAGsmart software searches for mutants with a very different survival rates in treated and control environments. In other words, it searches for mutants with specific interaction to certain environment (treatment). For example, the following graph shows that the MPS1 mutant has different survival rate in in drug treated environment comparing to a control environment. The X axis is the generation (time), the Y axis is the array signal (positively related to the population size of this mutant).

3.2 Experimental design
The simplest experimental design would be comparing one treated pool of
mutants to one control pool at a single time point. More complex designs
may detect the mutant-environment interaction with higher sensitivity.
For example, the mutant population may be measured at multiple time
points under both the treated environment and the control environment.
Moreover, the treatment can be applied with different dosages (Dorer
et al, Current Biology 2006). TAGsmart software allows all these
simple or complex experimental designs. The information of the
experimental design is passed to TAGsmart by providing an
array description file. To write
an array description file, simply put down the actual time and treatment
in digit number for every original data file. Treatment "0" will be
regarded as the control treatment. It is usually good to have replicates
in every time point and every treatment (environment) group.
We provide not only our algorithm but also a heat map with a previous result.

3.3
Starting analysis
Choose "Analyze a preprocessed data
file" radio button, and click "Next>>"
button on the home page. The following
screen will appear.

Use the "Browse" button to select the input data file. This should be the output file from the preprocessing! The "Sample input file" link gives an example input file. This file is explained in "Output of the preprocess program". TAGsmart allows user to use Q-value (or False Discovery Rate) or Fold Change (FC) or both of the two metrics to select mutants that show different fitness (survival rates) in the multiple environments.
In the "Q-value threshold (%)" text box, please enter a number between 0.01 and 100. The smaller Q-value is, the stronger the statistical evidence is for this mutant to have different survival rates. Choosing Q-value threshold (%) as 100 will void using Q-value as a filter to select mutants, because all mutants are expected to have a Q-value no bigger than 100%.
In the "FC Threshold" text box, enter a positive number and choose the "<" or ">" drop-down-list accordingly. When "FC Threshold" is set between 0 and 1, and the "<" is chosen, the mutants that have inferior fitness in treated environment (i.e. treatment code =\= 0) comparing to control environment (i.e. treatment code = 0) will be kept in the output. For example, the MPS1 mutant in General Software Description is such a mutant. The smaller the "FC Threshold" is, the more stringent the selection is. When "FC Threshold" is set above 1, and the ">" is chosen, the mutants with superior fitness in the treated environment will be kept in the output. The larger the "FC Threshold" is, the more stringent the selection is. Finally, by definition FC = average (fitness in treated environment) / average (fitness in control).
In the "# of permutations" text box, please enter a natural number between 1 and 1000. This is an internal parameter used in computation of Q-value. The larger this parameter is, the more accurate the computation is, but the more time it will take to compute the result. The default number 500 is a good balanced parameter.
The "Generation-0 Correction" option should be checked, if at "time(generation) 0", there is theoretically no difference between the treated and the control environments. The "time(generation) 0" data will then be used as the common measurement for the starting population for both groups.
Click "Submit" button after the analysis choices are made.
3.4
Offline TAGsmart software (Windows version)
Download the offline TAGSmart program. TAGSmart.exe requires different number of arguments depending on the algorithm you choose to run.
For the TAGSmart method, you need "T", Qthreshold, FCthreshold, Number of permutations, inputfilename, outputfilename, Lower, and Correction. See the figure under section 3.3 as a reference.
For example,
TAGSmart.exe T 1 1 100 input.txt output.txt true false
If you want to see the visualized result, use our online TAGSmart. Please choose the "Result from this site or offline-TAGSmart" option which enables you to use full functions of our server.
4.1
Interactive text output
A text output like the following will be
given. Every mutant that satisfies the
selection criteria specified in "Starting
analysis", and its related information
are given the text output. The mutants can
be sorted by their Q-values, p-values, or FC
(fold change).

If there are multiple pages, page numbers "1,2,3,..." will appear at the bottom of the page.
Left click the disk icon at the upper left corner, a pure text version of the output will be given. Right click the disk icon, and select ("Save target as") can save the pure text output as a file on the local computer.
Click the HEATMAP icon on the upper left corner to see a graphical display of the result.
4.2 Heatmap output
The heatmap output looks like the
following. The first two colored lines indicate the
treatment and the time of the microarray
sample. The actual heatmap for mutants start
from the 3rd colored line. Because there are
usually 4 probes on a Tag3 array (we are not
considering mismatch probes) for each
mutant, each mutant is represented as 4
rows in the heatmap.

Move mouse cursor over any color coded regions to get detailed information about that color code. For example, at below the cursor stays at the first row, where the treatments (environments) are color coded. From the following picture we know the first 37 microarrays are measuring the Treatment 0 group (control environment).

The following picture shows that arrays 45-47 measured mutants in generation 16.

The following picture shows that APM2 mutant has a survival rate of -1.571812 on the 43th microarray.
