Analysis Shell and Log 2 Transform Example

One common thing you may wish to do which currently is not supported in CompClustTkis the ability to transform your data set. The CompClust Python package supports this extensively among many other powerful features. In this case, let's say you want to log2 transform your data set. Load a data set as shown in section 3.2 and then launch the 'Analysis Shell' from by going to the 'Analysis' menu and choosing 'Analysis Shell'. You will find the 'Analysis Shell' in the original shell window you used to launch CompClustTkor in the 2nd window that was launch upon starting CompClustTk.

To give you an glimps of the many powerful things one can do with the CompClust Python package, I'm going to show you how to do a log2 transform of the dataset, but I'm going to do it by using a data set 'View' called the 'FunctionView'. The 'FunctionView' allows you to transform your data set by passing in a function which will be applied to every element of your data set. The function you pass to the 'FunctionView' needs to take one argument and return one value. In our case, we will make a function which will take on element of the data set, convert it to log2, and return. As you can imagine, any function you can come up with can be applied using this method.

The 'FunctionView' is only one of the many 'Views' one can use on your dataset. The nice thing about a 'View' is that it doesn't actually store a copy of the data set. It gets the data from the original dataset when you access the 'View'. This also means that any 'Labelings' you have attached to your original data set will also be accessable by your 'View', even if your 'View' is only a subset of your original data set. Since a 'View' implements all the functions a data set object has, it's usable where ever a function asks for a data set. This also means you can create another view from a view. Don't worry if that didn't make much sense, basically what it means is that it's relatively memory effecient and easy to use (from a programmer's point of view).

If you would like to see what views are available type the following from the analysis shell and then press TAB after typing the period:

views.

In the case that you are using the CompClust Python package from within Python itself without a GUI, then you will need to import the views module by typing the following command. If your using the 'Analysis Shell' then the following command has already been executed for you.

from compClust.mlx import views

To get information on any particular view, or for any Python object for that matter, type the variable/function/object name, then a '?' and press enter. For example for the 'FunctionView' you would type:

views.FunctionView?<press-enter>

Now onto example. The first thing we are going to do is create the log2 function we are going to pass to the 'FunctionView'. To do this, we will need to load the 'math' Python module by typing:

import math

We are going to use the math.log function, which takes two arguments, number and base. But the 'FunctionView' expects to receive a function which is takes only one argument, we need wrap the math.log function with the 'base' argument set to 2. Type the following do define the log2 function:

def log2(x):
  return math.log(x, 2)

Note that you need to press enter twice after writting the last line of the function. This tells Python to go ahead and define the function. If everything went well, your command prompt should look like this:

In [5]:def log2(x):
   ...:    return math.log(x, 2)
   ...:

In [6]:

If you write more Python code on the next line rather than pressing enter twice, you'll probably end up with a SyntaxError like the following:

In [5]:def log2(x):
   ...:    return math.log(x, 2)
   ...:log2(2)
------------------------------------------------------------
   File "<console>", line 3
     log2(2)
        ^
SyntaxError: invalid syntax

Feel free to try out your new function by typing:

In [9]:log2(2)
Out[9]:1.0

Now that we have our function, it's time to get the data set which you've already loaded from within the GUI. To grab the data set, type the following:

dataSet = gui.data['myDataSet']

Since we are going to replace gui.data['myDataSet'] with the transformed data set, if you want access to the original dataSet, you should save the data set to a variable you can access later. If you save it to the gui.data Python dictionary, then you will be able to access the original data set even if you close the 'Analysis Shell' and re-open it later. To do this type the following:

gui.data['originalDataSet'] = dataSet

Now we will create the log 2 transformed view (a.k.a. data set). To do this, call the 'FunctionView' with the data set and the log2 function by typing:

log2DataSet = views.FunctionView(dataSet, log2)

Now that you have the log 2 view of your data, if you want to be able to view it in CompClustTk, you will need to store it in gui.data['myDataSet'] so that the GUI knows that you want it to use the log 2 view when doing visualizations. To do this type the following:

gui.data['myDataSet'] = log2DataSet

That's it, now you can go back to the GUI and use your log 2 transformed data. At this point you can either quit out of the 'Analysis Shell' or leave it open; it's up to you. Note that if you close the 'Analysis Shell', you will be able to launch it again, but all of your local variables such as your log2 function will be lost.

If you don't want to lose what you've written, or you want to add a lot of code all at once, IPython (a.k.a Analysis Shell) will allow you to use a text editor to write your code. To launch the default text editor, type the following where <path> is that path to the file you want to create/use.

edit <path>.py

That command should launch a text editor for you to use. Type your Python code and when your done, save the file and exit out of the text editor. IPython will then read in and execute your Python code. If you get some sort of error or you want to make a change, just type the same with command as before and you will be able to modify the code some more.

Using the edit command to load in and write Python code will allow to to quickly load, test, and edit your Python code. This can be used to do automated loading of data/labelings or some advanced analysis and then view your results from within the GUI.

By the way, to change the default editor, set the environment variable 'EDITOR' to the name of or the path to the editor you wish to use.

Brandon King 2005-05-16