OSCAR

1. General information about OSCAR

2.  Format of input data files

a)        Microarray data file

b)        Ortholog mapping file

c)        Cluster file

3. Use the existing clustering algorithms in OSCAR

a)        Choose algorithm

b)        Set parameters and input data files

c)        Data output

d)        Heatmap visualization

4.  Clustering executable specification

a)        Input

b)        Output

5. Add an new clustering executable to OSCAR

a)        Step1:

b)        Step2:

c)        Success   

 

 

 

 

1.        General information about OSCAR

       An enormous amount of cluster algorithms for microarray data analysis have been proposed and new algorithms are emerging at a fast rate. Not a single software tool is capable of incorporating the majority of these algorithms to allow users to use them side by side. OSCAR is an open web platform for cluster analysis of microarray data and for development of clustering related analytical tools. It provides a comprehensive but friendly environment to both users and algorithm developers. For users, OSCAR provides 1) a palette of clustering tools for analysis of a single species data, including Hierarchical Clustering, K-means Clustering, Self Organization Map, Tight Clustering, and a new algorithm called Parallel Tight-Clustering, 2) a novel tool for analysis of two species data, and 3) visualization and interactive analysis capabilities. For algorithm developers, OSACAR is a plug-and-play platform. Developers can plug in their own algorithms to the OSCAR server, which will equip their command line executables with user friendly interface and interactive analysis capability, and most importantly, allow their algorithms to run on a dedicated server and provide service to users through a web interface.

   

2.        Format of input data files

There are three typical input data files in clustering analysis of gene expressions. We will describe the format of these files accepted in our OSCAR

a)        Microarray data file   

A microarray data file is a tab delimited files which consists of firstly a column of gene IDs (may followed by a column of gene names) followed by columns of experiment data. The first line indicates whether the file contains gene names. (1 means only gene IDs; 2 means both gene IDs and gene names are provided) The second line of the data file indicates the name of each column. Note that the names of each column should be different from each other. An example microarray data file can be found here, and part of the file looks like this:  

We also support missing data in the microarray, which should be coded as "NA".

b)        Ortholog mapping file

 An ortholog mapping file is a tab delimited files which consists of two columns. The first column is the gene ID of one species and the second column the gene ID of the other species. An example ortholog mapping file can be found here, and part of the file looks like this:

New! We have provided the users with several ortholog mapping file:

"Human U133 plus 2" to "Mouse MOE430 A": ortholog-U133Plus2-MOE430A.txt

"Human U133 plus 2" to "Mouse MOE430 B": ortholog-U133Plus2-MOE430B.txt

"Human U133 A and B" to "Mouse MOE430 A": ortholog-U133AB-MOE430A.txt

"Human U133 A and B" to "Mouse MOE430 B": ortholog-U133AB-MOE430B.txt

"Mouse MOE430 A and B" to "Human U133 plus 2": ortholog-MOE430AB-U133Plus2.txt

"Mouse MOE430 A and B" to "Human U133 A": ortholog-MOE430AB-U133A.txt

"Mouse MOE430 A and B" to "Human U133 B": ortholog-MOE430AB-U133B.txt

c)        Cluster file

Please refer to 3.c) Data output

 

3.        Use the existing clustering algorithms in OSCAR

a)        Choose algorithm

 A list of different clustering algorithms is displayed on the home page. Click "Select" to choose the algorithm you would like to use. Each algorithm has a corresponding html page which can be found by clicking the hyperlink under "Description". In the description page, the author of the algorithm provides the general introduction to the algorithm and/or the format of input files accepted.

b)        Set parameters and input data files

After selecting the algorithm, you will be directed to the parameter setting page. Here you can set the input parameters of the algorithm, within certain range as specified by the algorithm designer. A default value is provided for each parameter for your convenience. We also provide a sample file for each input file of the algorithm. By clicking the hyperlink, you can see what the sample files look like. You may use the sample files by clicking "Submit using Sample Files". Or you can upload the data files that you would like to process and finally click "Submit using my files". (Note that no files larger than 2MBytes are accepted. You may need to filter the data first until the size goes under 2M.) The computation may take some time, which depends on the algorithm you choose and also on the amount of data. So please wait patiently.

c)        Data output

 When the computation is finished, the following page will show up to display the output file of clusters. Left click the disk icon at the upper right corner, a pure text version of the output will be given. Right click the disk icon, and select ("Save target as") can save the pure text output as a file on the local computer. Click the HEATMAP icon on the upper right corner to see a graphical display of the result.


d)        Heatmap visualization

 The Heatmap page visualizes the output clusters like follows:

Click "Change Color" button to get other heatmap coloring. Yellow-Blue and White-Black coloring are available now.

 

Move mouse cursor over the first line to get detailed information about the name for each experiment. For example, at below the cursor stays at number 2, which represents experiment "h1Cell-B".

The following picture shows that the normalized value is 1.054527 on the 6th microarray.

The following picture shows a heatmap for a two species clustering analysis.

 

4.        Clustering executable specification

a)        Input

 The clustering executable should run by taking a command line argument, which indicates the name of the parameter setting file which can be assumed at the same directory with the executable. The parameter setting file contains a list of "[name] = [value]", one at a line. There are three types of parameters accepted in "params.txt":

w          double values, such as "percentile = 0.10"

w          input data files, such as "exprFile = ../data/mouse_expr.txt" (should support relative file path)

w          output data file. This should be a single line with name "outputFile" and value like "../result/output.txt" (should support relative file path)

An example parameter setting file  for hierarchicalClustering.exe looks like the following:

 

b)        Output

 There are several constraints on the output data file of the executable:

1).      the first line of the file should look like this "[n] [m1] ([m2] [m3] ¡­.)" where mi (where i >=1) and n are positive integers and they are tab delimited:

-        [n]: the first [n] number of columns are ids/names of genes 

-        [mi] where i >=1: the number of columns of mircoarray data for species i. And the numbers in parentheses are optional

2).      the second line should be the names of the experiments

3).      all the following lines of the file should exactly follow the description in first line, except

-      output several lines of "NONE" to separate different clusters

-       if additional information needs to be outputted, start this line with "pseudo-probe", so that it will be ignored by our visualization module

The example output file for hierarchicalClustering.exe looks like the following:

5.        Add an new clustering executable to OSCAR

a)        Step 1

 In the following page, you should

w          Choose a name for your clustering algorithm. The name should be different from our existing ones.

w          Use the "Browse" button to select the executable file for the algorithm. (The executable should be runnable on Microsoft Window Server 2003 Enterprise Edition)

w          Use the "Browse" button to select the html description file for the executable.

w          Set the number of parameters for your algorithm. We assume all these parameters are "double". For example, hierarchicalClustering.exe only needs one parameter "maxClusterNumber", so just fill in "1"

w          Set the number of input data files for your algorithm. These input data files include microarray data files, annotation files, and regulator files.

w          Please fill in your name, institute and contact email.

w          Finally click "Add".

 

b)        Step 2

 After step 1, you should be directed to the following page. In this page, you should

w          Fill in the "ParameterName", "Desc", "LowerBound", "UpperBound" and "DefaultValue" for each input parameter in your algorithm.

w          Fill in the "ParameterName" and "Desc" and upload a sample file for each input data file in your algorithm .

w          Finally click "Add".

 

c)        Success

 After Step 2, you should be able to reach this page. Congratulation! Your algorithm has been added to our database. However, we will carefully examine and test your algorithm executable before we activate it. After it is activated, everyone else can see your algorithm on our home page and use it freely!

User Manual