2007 04 29

The package e1071 for R is an interesting add on to your list of R packages. It includes functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, independent component analysis, and more.

2007 04 24

There are, at least :), two ways to compute the principal component analysis of a data set in R. The first one is from scratch computing eigenvectors and eigenvalues. It works as follows

#
# From scratch
#
cbind(1:10,1:10 + 0.25*rnorm(10)) -> myData
myData - apply(myData,2,mean) -> myDataZM
cov(myDataZM) -> cvm
eigen(cvm,TRUE) -> eCvm
t(eCvm$vector%*%t(myDataZM)) -> newMyData

This simple code just transforms the data to align it with the principal components obtained.
Of couse, the second way to compute them is using some of the functions that R provides in the stats package.


#
# Using the stats package
#
cbind(1:10,1:10 + 0.25*rnorm(10)) -> myData
myData - apply(myData,2,mean) -> myDataZM
prcomp(myData) -> pcaMyData
t(pcaMyData[[2]]%*%t(myDataZM)) -> newMyData

2007 04 23

I have been running into some problems with a feed generator I am using (yes, it is the one in WordPress MU, long story). However I found a useful tool, an on-line feed validator for Atom and RSS.

2007 04 20

Need one? Check Eclipse.

2007 04 18

Ben Shneiderman is visiting UIUC today. I am sitting at his talk “The Thrill of Discovery” at room 1040 NCSA. If you miss this one he will be at 126 GSLIS this afternoon at 3pm  given another talk Accelerating Discovery and Innovation: Designing Creativity Support Tools”. His opening today:

This talk will start by reviewing the growing commercial success stories such as www.spotfire.com, www.smartmoney.com/marketmap and www.hivegroup.com. Then it will cover recent research progress for visual exploration of large time series data applied to financial, Ebay auction, and genomic data (www.cs.umd.edu/hcil/timesearcher).

After a set of demos he also introduced, the Many Eyes project for visualization sharing and exploration. And following it, some Tree Map Viz for the stock market to plot the current situation of the stock market. The same tree map viz is also used to visualize some data provided by the music billboard. All assuming you have 2 attributes (color and size), then the tree map can render nicely (for instance color = topic and size = number of news released on the topic).

Some more examples of the visualization of time series. The interesting point is to help navigation, but also, how can relevant patterns can be identified. More interestingly, the challenges to have fast visual queries requires fast sweeping stores to be able to get the stored information. Moreover, identifying features can be done automatically, but assessing which of those are intereresting is left to human interpretation.

Some forms of analysis can greatly benefit from a proper visualization of the results. For instance, color coloring low dimension projections of high dimensional data helps to reveal patterns easy identifiable by the human eye. The bottom line, such visualizations blend analysis and users together to boost the ability to identify relevant/interesting.

And to close, how can you validate such elements. Ben’s group took the compelling road. Put people to use it. When they get relevant discoveries, try to publish it on a top conference/journal (or some sort of similar social screening).

To wrap up, a great speaker and a very compelling case for the need/benefits for information visualization techniques. Unfortunately, I cannot attend his afternoon class since it overlaps with the course I am teaching.

2007 04 17

Torsten Horthorn maintains a page with a list of packages for machine learning and statistical learning in R.

2007 04 16

I found a couple of interesting tutorials. One is on principal component analysis by Lindsay I. Smith and the second one is about independent component analysis by Hyvärinen and Oja. Good introductions if that is what you are looking for.

2007 04 14

Last Friday with ALG and DITA people we put a brief presentation for NCSA’s cyberarch group on our common efforts to create a generic framework for querying and visualizing content stored in metadata stores. Mulgara, SOAP, XLSTs, and custom Java code to render content using Prefuse and JFreeChart. You can download the slides here.

2007 04 14

The group was created a while ago to unify the research efforts conducted inside the Automated Learning Group. Michael Welge, Loretta Auvil, and I were sitting in Michael’s office a Monday morning scratching our heads. He generated the initial population, Loretta recombined the ideas, and I just selected what I liked. So, we become Data-Intensive Technologies and Applications :D.