2007 05 01

Jon Kleinberg is visiting UIUC today and giving a talk “Decentralized Search, Cascading Behavior, and the Structure of On-Line Communities” in the Age of Networks seminar series. The abstract of his talk today:

The rise of large-scale information networks has provided us with systems that are simultaneously social and technological in nature, and in which the human interactions that unfold can be recorded and studied in extraordinary detail. Against this backdrop, a significant amount of recent work has focused on the development of mathematical models that capture some of the qualitative properties observed in large-scale network data; such models have the potential to help us reason, at a general level, about the ways in which large networks are organized, and about the ways in which abstract models can capture the processes we observe.

We discuss two lines of research that illustrate this theme, concerned with two network processes that can be viewed as complementary: search, which narrows toward a specific target; and cascading behavior, which spreads outward from a small seed. We relate the search problem to issues that underpin the well-known “six degrees of separation” phenomenon in social networks, and describe how recent data from on-line communities aligns surprisingly well with some of the basic mathematical models for the underlying process. We identify cascading behavior in many social network processes that can be thought of as unfolding with the dynamics of an epidemic: as individuals become aware of new ideas, technologies, fads, rumors, or gossip, they have the potential to pass them on to their friends and colleagues, causing the resulting behavior to spread rapidly through the network. Here too, we find that the processes taking place within on-line communities provide new insights into models for cascading behavior that have been extensively studied in the social sciences.

He covered some results on algorithmic network analysis. He started with the “six degree of separation result” and continued rolling over different models. One of the interesting stop, models of innovation diffusion over inventions. Yes, he went over Everret Rogers “Diffusion of innovations” work. Also presented the NP problem involved on finding the key diffusors and diminishing returns presenting some heuristics about it.

2007 04 29

The package e1071 for R is an interesting add on to your list of R packages. It includes functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, independent component analysis, and more.

2007 04 24

There are, at least :), two ways to compute the principal component analysis of a data set in R. The first one is from scratch computing eigenvectors and eigenvalues. It works as follows

#
# From scratch
#
cbind(1:10,1:10 + 0.25*rnorm(10)) -> myData
myData - apply(myData,2,mean) -> myDataZM
cov(myDataZM) -> cvm
eigen(cvm,TRUE) -> eCvm
t(eCvm$vector%*%t(myDataZM)) -> newMyData

This simple code just transforms the data to align it with the principal components obtained.
Of couse, the second way to compute them is using some of the functions that R provides in the stats package.


#
# Using the stats package
#
cbind(1:10,1:10 + 0.25*rnorm(10)) -> myData
myData - apply(myData,2,mean) -> myDataZM
prcomp(myData) -> pcaMyData
t(pcaMyData[[2]]%*%t(myDataZM)) -> newMyData

2007 04 23

I have been running into some problems with a feed generator I am using (yes, it is the one in WordPress MU, long story). However I found a useful tool, an on-line feed validator for Atom and RSS.

2007 04 20

Need one? Check Eclipse.

2007 04 18

Ben Shneiderman is visiting UIUC today. I am sitting at his talk “The Thrill of Discovery” at room 1040 NCSA. If you miss this one he will be at 126 GSLIS this afternoon at 3pm  given another talk Accelerating Discovery and Innovation: Designing Creativity Support Tools”. His opening today:

This talk will start by reviewing the growing commercial success stories such as www.spotfire.com, www.smartmoney.com/marketmap and www.hivegroup.com. Then it will cover recent research progress for visual exploration of large time series data applied to financial, Ebay auction, and genomic data (www.cs.umd.edu/hcil/timesearcher).

After a set of demos he also introduced, the Many Eyes project for visualization sharing and exploration. And following it, some Tree Map Viz for the stock market to plot the current situation of the stock market. The same tree map viz is also used to visualize some data provided by the music billboard. All assuming you have 2 attributes (color and size), then the tree map can render nicely (for instance color = topic and size = number of news released on the topic).

Some more examples of the visualization of time series. The interesting point is to help navigation, but also, how can relevant patterns can be identified. More interestingly, the challenges to have fast visual queries requires fast sweeping stores to be able to get the stored information. Moreover, identifying features can be done automatically, but assessing which of those are intereresting is left to human interpretation.

Some forms of analysis can greatly benefit from a proper visualization of the results. For instance, color coloring low dimension projections of high dimensional data helps to reveal patterns easy identifiable by the human eye. The bottom line, such visualizations blend analysis and users together to boost the ability to identify relevant/interesting.

And to close, how can you validate such elements. Ben’s group took the compelling road. Put people to use it. When they get relevant discoveries, try to publish it on a top conference/journal (or some sort of similar social screening).

To wrap up, a great speaker and a very compelling case for the need/benefits for information visualization techniques. Unfortunately, I cannot attend his afternoon class since it overlaps with the course I am teaching.

2007 04 17

Torsten Horthorn maintains a page with a list of packages for machine learning and statistical learning in R.

2007 04 16

I found a couple of interesting tutorials. One is on principal component analysis by Lindsay I. Smith and the second one is about independent component analysis by Hyvärinen and Oja. Good introductions if that is what you are looking for.

2007 04 14

The group was created a while ago to unify the research efforts conducted inside the Automated Learning Group. Michael Welge, Loretta Auvil, and I were sitting in Michael’s office a Monday morning scratching our heads. He generated the initial population, Loretta recombined the ideas, and I just selected what I liked. So, we become Data-Intensive Technologies and Applications :D.

« Previous Page