<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/wordpress-mu-1.2.5" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Xavier Llorà @ DITA</title>
	<link>http://dita.ncsa.uiuc.edu/xllora</link>
	<description>There is never enough data</description>
	<pubDate>Sun, 18 May 2008 04:20:40 +0000</pubDate>
	<generator>http://wordpress.org/?v=wordpress-mu-1.2.5</generator>
	<language>en</language>
			<item>
		<title>Moving This Blog</title>
		<link>http://dita.ncsa.uiuc.edu/xllora/2008/05/17/moving-this-blog/</link>
		<comments>http://dita.ncsa.uiuc.edu/xllora/2008/05/17/moving-this-blog/#comments</comments>
		<pubDate>Sun, 18 May 2008 04:20:40 +0000</pubDate>
		<dc:creator>Xavier</dc:creator>
		
		<category><![CDATA[Notes]]></category>

		<guid isPermaLink="false">http://dita.ncsa.uiuc.edu/xllora/2008/05/17/moving-this-blog/</guid>
		<description><![CDATA[I am just working on unifying all my blogs, and this has been the first one. The new one is located here http://www.xavierllora.net/
Related PostsE2K blog has movedBlogging about NCSA PSPGeneric looping in Python]]></description>
			<content:encoded><![CDATA[<p>I am just working on unifying all my blogs, and this has been the first one. The new one is located here <a href="http://www.xavierllora.net/">http://www.xavierllora.net/</a></p>
<div class="aizatto_related_posts"><span class="aizatto_related_posts_header" >Related Posts</span><ul><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2007/08/02/e2k-blog-has-moved/" rel="bookmark" title="Permanent Link: E2K blog has moved" >E2K blog has moved</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2007/05/22/blogging-about-ncsa-psp/" rel="bookmark" title="Permanent Link: Blogging about NCSA PSP" >Blogging about NCSA PSP</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/01/16/generic-looping-in-python/" rel="bookmark" title="Permanent Link: Generic looping in Python" >Generic looping in Python</a></span><div class="aizatto_related_posts_excerpt"></div></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://dita.ncsa.uiuc.edu/xllora/2008/05/17/moving-this-blog/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Meandre: Semantic-Driven Data-Intensive Flow Engine</title>
		<link>http://dita.ncsa.uiuc.edu/xllora/2008/04/18/meandre-semantic-driven-data-intensive-flow-engine/</link>
		<comments>http://dita.ncsa.uiuc.edu/xllora/2008/04/18/meandre-semantic-driven-data-intensive-flow-engine/#comments</comments>
		<pubDate>Sat, 19 Apr 2008 00:30:22 +0000</pubDate>
		<dc:creator>Xavier</dc:creator>
		
		<category><![CDATA[Notes]]></category>

		<guid isPermaLink="false">http://dita.ncsa.uiuc.edu/xllora/2008/04/18/meandre-semantic-driven-data-intensive-flow-engine/</guid>
		<description><![CDATA[Finally we have finished setting up the website for Meandre a semantic-driven data-intensive flow engine. Meandre provides basic infrastructure for data-intensive computation. It provides, among others, tools for creating components and flows, a high-level language to describe flows, and multicore and distributed execution environment based on a service-oriented paradigm. We are currently working on getting [...]]]></description>
			<content:encoded><![CDATA[<p>Finally we have finished setting up the website for Meandre a semantic-driven data-intensive flow engine. Meandre provides basic infrastructure for data-intensive computation. It provides, among others, tools for creating components and flows, a high-level language to describe flows, and multicore and distributed execution environment based on a service-oriented paradigm. We are currently working on getting gear up for a first alpha release. You can visit the Meandre site <a href="http://seasr.org/meandre">here</a>. I will be posting in the Meandre blog about our current steps toward getting the release out of the door. The Meandre infrastructure is being build to support the <a href="http://seasr.org">SEASR project</a></p>
<div class="aizatto_related_posts"><span class="aizatto_related_posts_header" >Related Posts</span><ul><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2007/11/26/reasoning-for-the-semantic-web/" rel="bookmark" title="Permanent Link: Reasoning for the semantic web" >Reasoning for the semantic web</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-text-information-management-challenges-and-oportunities-chengxiang-zhai/" rel="bookmark" title="Permanent Link: [BDCSG2008] Text Information Management: Challenges and Oportunities (ChengXiang Zhai)" >[BDCSG2008] Text Information Management: Challenges and Oportunities (ChengXiang Zhai)</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-nsf-plans-for-supporting-data-intensive-computing-jeannette-wing-and-christophe-bisciglia/" rel="bookmark" title="Permanent Link: [BDCSG2008] NSF Plans for Supporting Data Intensive Computing (Jeannette Wing and Christophe Bisciglia)" >[BDCSG2008] NSF Plans for Supporting Data Intensive Computing (Jeannette Wing and Christophe Bisciglia)</a></span><div class="aizatto_related_posts_excerpt"></div></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://dita.ncsa.uiuc.edu/xllora/2008/04/18/meandre-semantic-driven-data-intensive-flow-engine/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Summary of BDCSG2008 blogging</title>
		<link>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/summary-of-bdcsg2008-blogging/</link>
		<comments>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/summary-of-bdcsg2008-blogging/#comments</comments>
		<pubDate>Thu, 27 Mar 2008 04:16:48 +0000</pubDate>
		<dc:creator>Xavier</dc:creator>
		
		<category><![CDATA[Events]]></category>

		<guid isPermaLink="false">http://dita.ncsa.uiuc.edu/xllora/2008/03/26/summary-of-bdcsg2008-blogging/</guid>
		<description><![CDATA[It has been a greet meeting. Lots of interesting ideas and a lot to explore from now on. Just what I like :D. I summarized below the list of post I make related to the meeting.

Introductory post
Data-Intensive Scalable Computing. Randy Bryant, CMU
Text Information Management: Challenges and Opportunities. ChengXiang Zhai, UIUC
Clouds and ManyCore: The Revolution. Dan [...]]]></description>
			<content:encoded><![CDATA[<p>It has been a greet meeting. Lots of interesting ideas and a lot to explore from now on. Just what I like :D. I summarized below the list of post I make related to the meeting.</p>
<ul>
<li><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/big-data-computing-study-group-2008/">Introductory post</a></li>
<li><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/data-intensive-scalable-computing-randy-bryant/">Data-Intensive Scalable Computing. Randy Bryant, CMU</a></li>
<li><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-text-information-management-challenges-and-oportunities-chengxiang-zhai/">Text Information Management: Challenges and Opportunities. ChengXiang Zhai, UIUC</a></li>
<li><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-clouds-and-manycores-the-revolution-dan-reed/">Clouds and ManyCore: The Revolution. Dan Reed, MSR</a></li>
<li><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-computational-paradigms-for-genomic-medicine-jill-mesirov/">Computational Paradigms for Genomic Medicine. Jill Mesirov, Broad Institute of MIT and Harvard</a></li>
<li><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-simplicity-and-complexity-in-data-systems-garth-gibson/">Simplicity and Complexity in Data Systems (Garth Gibson)</a></li>
<li><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-handling-large-datasets-at-google-current-systems-and-future-directions-jeff-dean/">Handling Large Datasets at Google: Current Systems and Future Directions. Jeff Dean, Google</a></li>
<li><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-algorithmic-perspectives-on-large-scale-social-network-data-jon-kleinberg/">Algorithmic Perspectives on Large-Scale Social Network Data. Jon Kleinberg, Cornell</a></li>
<li><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-mining-the-web-graph-marc-najork/">Mining the Web Graph. Marc Najork, MSR</a></li>
<li><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-what-goes-around-joe-hellerstein/">&#8220;What&#8221; Goes Around. Joe Hellerstein, Berkeley</a></li>
<li><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-sherpa-cloud-computing-of-the-third-kind-raghu-ramakrishnan/">Sherpa: Hosted Data Serving. Raghu Ramakrishnan, Yahoo!</a></li>
<li><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-scientific-applications-of-large-databases-alex-szalay/">Scientific Applications of Large Databases. Alex Szalay, JHU</a></li>
<li><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-data-rich-computing-where-its-all-phil-gibbons/">Data-Rich Computing: Where It&#8217;s At. Phil Gibbons, Intel</a></li>
<li><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-nsf-plans-for-supporting-data-intensive-computing-jeannette-wing-and-christophe-bisciglia/">NSF Plans for Supporting Data Intensive Computing: Jeannette Wing, NSF. The Google/IBM data center: Christophe Bisciglia, Google</a></li>
</ul>
<div class="aizatto_related_posts"><span class="aizatto_related_posts_header" >Related Posts</span><ul><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2007/05/22/blogging-about-ncsa-psp/" rel="bookmark" title="Permanent Link: Blogging about NCSA PSP" >Blogging about NCSA PSP</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2007/05/14/reset-users-password-on-a-mediawiki/" rel="bookmark" title="Permanent Link: Reset user&#8217;s password on a MediaWiki" >Reset user&#8217;s password on a MediaWiki</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/big-data-computing-study-group-2008/" rel="bookmark" title="Permanent Link: Big Data Computing Study Group 2008" >Big Data Computing Study Group 2008</a></span><div class="aizatto_related_posts_excerpt"></div></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/summary-of-bdcsg2008-blogging/feed/</wfw:commentRss>
		</item>
		<item>
		<title>[BDCSG2008] NSF Plans for Supporting Data Intensive Computing (Jeannette Wing and Christophe Bisciglia)</title>
		<link>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-nsf-plans-for-supporting-data-intensive-computing-jeannette-wing-and-christophe-bisciglia/</link>
		<comments>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-nsf-plans-for-supporting-data-intensive-computing-jeannette-wing-and-christophe-bisciglia/#comments</comments>
		<pubDate>Thu, 27 Mar 2008 00:39:06 +0000</pubDate>
		<dc:creator>Xavier</dc:creator>
		
		<category><![CDATA[Events]]></category>

		<guid isPermaLink="false">http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-nsf-plans-for-supporting-data-intensive-computing-jeannette-wing-and-christophe-bisciglia/</guid>
		<description><![CDATA[NSF listens at you academics. Jeannete opens the floor with this claim. Questions: What are the limitations of this modeling paradigm (data-intensive one)? What are meaningful metrics of performance here? What about security processes and data on a shared resource? How can we reduce power consumption? Can this parading problem not possible otherwise, or simplify [...]]]></description>
			<content:encoded><![CDATA[<p>NSF listens at you academics. Jeannete opens the floor with this claim. Questions: What are the limitations of this modeling paradigm (data-intensive one)? What are meaningful metrics of performance here? What about security processes and data on a shared resource? How can we reduce power consumption? Can this parading problem not possible otherwise, or simplify them, or open the door to new applications? NSF rolling out cluster exploratory program, also going to roll out a new solicitation for Data-Intensive Computing. Also emphasizing from data to knowledge, since scientist are throwing it away. This is a great opportunity for collaborative efforts between CS and scientist. NSF goal: provide access to cluster resource and access to massive data sets. Google and IBM rolling out the cluster (for academics). NSF will roll out a cluster exploratory will be the solicitation program announced yesterday to distribute access to the cluster and research grants. Review of Christophe experience on teaching a class about clustering, and he realized that providing away computer cycles is more valuable than plain grant money. It runs on Hadoop. The cluster will be allocate by rack weeks, 5 Terabytes and priority on 80 processes (but still people there and lower priority and large data sets). And since the reviewing was not Google expertise they reach to NSF to use it. Googler to start collaborations and IBM will also help providing support for it. Jeannette claiming this is a new model, but NSF is open for new model and other partners.</p>
<div class="aizatto_related_posts"><span class="aizatto_related_posts_header" >Related Posts</span><ul><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/summary-of-bdcsg2008-blogging/" rel="bookmark" title="Permanent Link: Summary of BDCSG2008 blogging" >Summary of BDCSG2008 blogging</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/data-intensive-scalable-computing-randy-bryant/" rel="bookmark" title="Permanent Link: [BDCSG2008] Data-Intensive Scalable Computing (Randy Bryant)" >[BDCSG2008] Data-Intensive Scalable Computing (Randy Bryant)</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/04/18/meandre-semantic-driven-data-intensive-flow-engine/" rel="bookmark" title="Permanent Link: Meandre: Semantic-Driven Data-Intensive Flow Engine" >Meandre: Semantic-Driven Data-Intensive Flow Engine</a></span><div class="aizatto_related_posts_excerpt"></div></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-nsf-plans-for-supporting-data-intensive-computing-jeannette-wing-and-christophe-bisciglia/feed/</wfw:commentRss>
		</item>
		<item>
		<title>[BDCSG2008] Data-Rich computing: Where It&#8217;s All (Phil Gibbons)</title>
		<link>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-data-rich-computing-where-its-all-phil-gibbons/</link>
		<comments>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-data-rich-computing-where-its-all-phil-gibbons/#comments</comments>
		<pubDate>Thu, 27 Mar 2008 00:18:08 +0000</pubDate>
		<dc:creator>Xavier</dc:creator>
		
		<category><![CDATA[Events]]></category>

		<guid isPermaLink="false">http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-data-rich-computing-where-its-all-phil-gibbons/</guid>
		<description><![CDATA[The next speaker of the afternoon is Phil Gibbons from Intel Research. Intel has created a research theme on data-rich computing for the next few years (same as the other one presented on the Hadoop summit about ground modeling). An approach, bring the computation to the data (cluster approach), but there are also two elements [...]]]></description>
			<content:encoded><![CDATA[<p>The next speaker of the afternoon is Phil Gibbons from Intel Research. Intel has created a research theme on data-rich computing for the next few years (same as the other one presented on the Hadoop summit about ground modeling). An approach, bring the computation to the data (cluster approach), but there are also two elements in the picture: (1) memory hierarchy issues, and (2) pervasive multimedia sensing. The first one is in important because for pure performance, the second one keeps forcing pushing the computation closer to the sensors. The memory hierarchy implies that multi cores share a common L2 cache, and the farther we move the bandwidth drops, and you can keep pushing in HD/SSD. And this basic unit is what gets replicated to build clusters. (Little note about SSD rewriting quirk and how cache coherency plays in the overall picture). All this lead to HI-SPADE project (Hierarchy-Savvy Parallel Algorithm Design). The goal is to hide as much as possible, but only expose what is important to tune. Continues with an example of cache misses and how that can be palliated with the right scheduler. Phil then moved to show how that would work on parallel merge sort (solution á la merge first depth search), also compared later with hash join, and again, no free lunch (they run into the usual one worst like a champ, the other fails miserably). The next topic on the presentation when to the quirk of SSD (flash based). The improvements over traditional HD reach a 3 order magnitude. But again, random rewrites in flash are painful, but there is a way to express semi-random algorithms may be help palliate the problem. Shifting gears, pervasive multimedia sensing is the next topic on the arena. Phil start reviewing sensor networks and how the sensors are becoming more powerful, but more important, the numbers are growing and scaling up (exponentially). Again, moving to their example of the IrisNet project, and how pushing computation down helps also with the distribution of the data, pushing their results as feeds (XML) into aggregations nodes (in a tree shape). Once aggregated, they provide the usual distribution, replication, and querying.</p>
<div class="aizatto_related_posts"><span class="aizatto_related_posts_header" >Related Posts</span><ul><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/summary-of-bdcsg2008-blogging/" rel="bookmark" title="Permanent Link: Summary of BDCSG2008 blogging" >Summary of BDCSG2008 blogging</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/big-data-computing-study-group-2008/" rel="bookmark" title="Permanent Link: Big Data Computing Study Group 2008" >Big Data Computing Study Group 2008</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-sherpa-cloud-computing-of-the-third-kind-raghu-ramakrishnan/" rel="bookmark" title="Permanent Link: [BDCSG2008] Sherpa: Cloud Computing of the Third Kind (Raghu Ramakrishnan)" >[BDCSG2008] Sherpa: Cloud Computing of the Third Kind (Raghu Ramakrishnan)</a></span><div class="aizatto_related_posts_excerpt"></div></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-data-rich-computing-where-its-all-phil-gibbons/feed/</wfw:commentRss>
		</item>
		<item>
		<title>[BDCSG2008] Scientific Applications of Large Databases (Alex Szalay)</title>
		<link>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-scientific-applications-of-large-databases-alex-szalay/</link>
		<comments>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-scientific-applications-of-large-databases-alex-szalay/#comments</comments>
		<pubDate>Wed, 26 Mar 2008 23:38:03 +0000</pubDate>
		<dc:creator>Xavier</dc:creator>
		
		<category><![CDATA[Events]]></category>

		<guid isPermaLink="false">http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-scientific-applications-of-large-databases-alex-szalay/</guid>
		<description><![CDATA[Alex is opening the talk showing a clear exponential growth in Astronomy (LSST and the petabyte example generation). Data generated from sensors keep growing like crazy. Images have more and more resolution. Hopkin&#8217;s databases started with the digital sky initiative, with generated 3 terabytes of data in the 90&#8217;s, and the number keep growing up [...]]]></description>
			<content:encoded><![CDATA[<p>Alex is opening the talk showing a clear exponential growth in Astronomy (LSST and the petabyte example generation). Data generated from sensors keep growing like crazy. Images have more and more resolution. Hopkin&#8217;s databases started with the digital sky initiative, with generated 3 terabytes of data in the 90&#8217;s, and the number keep growing up to the point of LSST which will be forced to dump images because is not possible to store all of them. SkyServer is  base on SQL server and .NET, serving the 3 terabytes, serving SQL queries reaching 15 millions queries recently. Alex presented a revision of the usage and data delivery. They are planning to anonymize the log and make it publicly available. Then, he switched gears toward the immersive turbulence project that seeks to generate high resolution turbulence images. Again they are storing the information on a SQL server. Moving gears to SkyQuery. A federating web services to build a join query on the fly, but the problem is some join results turn out to be unfeasible. The revision of projects then moved to &#8220;Life under your feet&#8221;, a non intrusive way to measure environments. The key component is the aggregation and drill down mode for exploring those aggregates down to a sensor level. Another one, OncoSpace, oncology treatment evolution based on the comparison of images of the same patients and their evolutions across time, again implemented on the same SQL server. But all this projects has commonalities, indexing and the need to extract small subsets from larger datasets. MyDQ targets to extract data from other databases and services and leave it on a relational database the user can use. Graywulf, there is no off-the-shelf solution for scientific 1000 TB of data, so the solution is to scale it out. They took the root of fragmentation and create chunks easy to manage, again using SQL server clusters. They also introduce a workflow manager to monitor and the control of it. And this led them to create a Petascale center to play with at  John Hopkins University, build on Pan-Starts.</p>
<div class="aizatto_related_posts"><span class="aizatto_related_posts_header" >Related Posts</span><ul><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/summary-of-bdcsg2008-blogging/" rel="bookmark" title="Permanent Link: Summary of BDCSG2008 blogging" >Summary of BDCSG2008 blogging</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-text-information-management-challenges-and-oportunities-chengxiang-zhai/" rel="bookmark" title="Permanent Link: [BDCSG2008] Text Information Management: Challenges and Oportunities (ChengXiang Zhai)" >[BDCSG2008] Text Information Management: Challenges and Oportunities (ChengXiang Zhai)</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-nsf-plans-for-supporting-data-intensive-computing-jeannette-wing-and-christophe-bisciglia/" rel="bookmark" title="Permanent Link: [BDCSG2008] NSF Plans for Supporting Data Intensive Computing (Jeannette Wing and Christophe Bisciglia)" >[BDCSG2008] NSF Plans for Supporting Data Intensive Computing (Jeannette Wing and Christophe Bisciglia)</a></span><div class="aizatto_related_posts_excerpt"></div></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-scientific-applications-of-large-databases-alex-szalay/feed/</wfw:commentRss>
		</item>
		<item>
		<title>[BDCSG2008] Sherpa: Cloud Computing of the Third Kind (Raghu Ramakrishnan)</title>
		<link>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-sherpa-cloud-computing-of-the-third-kind-raghu-ramakrishnan/</link>
		<comments>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-sherpa-cloud-computing-of-the-third-kind-raghu-ramakrishnan/#comments</comments>
		<pubDate>Wed, 26 Mar 2008 22:44:14 +0000</pubDate>
		<dc:creator>Xavier</dc:creator>
		
		<category><![CDATA[Events]]></category>

		<guid isPermaLink="false">http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-sherpa-cloud-computing-of-the-third-kind-raghu-ramakrishnan/</guid>
		<description><![CDATA[Raghu (former professor at Madison Wisconsin, now at Yahoo!) is leading a very interesting project on largely scale storage (Sherpa). Here you can find some of my unconnected notes. Software as a service requires to CPU and data. Cloud computing using assimilated to Map-Reduce grids, but they decouple computation and data. For instance Condor is [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://pages.cs.wisc.edu/~raghu/">Raghu</a> (former professor at Madison Wisconsin, now at Yahoo!) is leading a very interesting project on largely scale storage (Sherpa). Here you can find some of my unconnected notes. Software as a service requires to CPU and data. Cloud computing using assimilated to Map-Reduce grids, but they decouple computation and data. For instance Condor is great for high-throughput computing, but on the data side you run into SSDS, Hadoop, etc. But there is a third one, transactional storage. Moreover SQL is the most largely used parallel programming language. Raghu wonder why can&#8217;t we build on the lesson learned on RDBMS for OLTP. Sherpa is aiming not to support ACID models, but massively scalable via relaxation. Updates: creation, or simple object updates. Queries: selection with filtering. The vision is to start in a box, if it needs to scale, that should be transparent. PNUTS is part of Sherpa, and it is the technology for: geographic replication, uniform updates, queries, and flexible schemas. Then he goes and describe the inner parts of PNUTS and the software stack. Some interesting notes, no logging, message validation, no traditional transactions. Lower levels put and get a key, on the top of it ranges and sorted, PNUTS on the top provide the querying facility (insert, select, remove). Flexible schemas, the fields are declared at the table label, but do not need to be present (flexible growth). Records are mastered on different nodes, during the utilization, the masters can migrate depending on the usage of them. The basic consistency model is based on a timeline. Master writes and reads are ordered, others can catch up in time. Load balancing by splitting and migration, and guaranteed by the Yahoo! Message Broker. The goal, simple, light, massively scalable.</p>
<div class="aizatto_related_posts"><span class="aizatto_related_posts_header" >Related Posts</span><ul><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/summary-of-bdcsg2008-blogging/" rel="bookmark" title="Permanent Link: Summary of BDCSG2008 blogging" >Summary of BDCSG2008 blogging</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-clouds-and-manycores-the-revolution-dan-reed/" rel="bookmark" title="Permanent Link: [BDCSG2008] Clouds and ManyCores: The Revolution (Dan Reed)" >[BDCSG2008] Clouds and ManyCores: The Revolution (Dan Reed)</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/big-data-computing-study-group-2008/" rel="bookmark" title="Permanent Link: Big Data Computing Study Group 2008" >Big Data Computing Study Group 2008</a></span><div class="aizatto_related_posts_excerpt"></div></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-sherpa-cloud-computing-of-the-third-kind-raghu-ramakrishnan/feed/</wfw:commentRss>
		</item>
		<item>
		<title>[BDCSG2008] &#8220;What&#8221; goes around (Joe Hellerstein)</title>
		<link>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-what-goes-around-joe-hellerstein/</link>
		<comments>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-what-goes-around-joe-hellerstein/#comments</comments>
		<pubDate>Wed, 26 Mar 2008 22:10:01 +0000</pubDate>
		<dc:creator>Xavier</dc:creator>
		
		<category><![CDATA[Events]]></category>

		<guid isPermaLink="false">http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-what-goes-around-joe-hellerstein/</guid>
		<description><![CDATA[Joe open fires saying &#8220;The web is big, a lot of monkeys pushing keys&#8221;. Funny. The industrial revolution of data is coming. Large amounts of data are going to be produce. The other revolution is the hardware revolution, leading to the question of how we program such animals to avoid the dead of the hardware [...]]]></description>
			<content:encoded><![CDATA[<p>Joe open fires saying &#8220;The web is big, a lot of monkeys pushing keys&#8221;. Funny. The industrial revolution of data is coming. Large amounts of data are going to be produce. The other revolution is the hardware revolution, leading to the question of how we program such animals to avoid the dead of the hardware industry. The last one, the industrial revolution in software, echoing automatic programming. Declarative programs is great, but how many domains, and which ones can absorb it. Benefits: Rapid prototyping, pocket-size code bases, independent from the runtime, ease of analysis and security, allow optimization and adaptability. But the key question is where is this useful? (besides SQL and spreadsheets). His group has rolled out declarative languages for networking. That includes routing algorithms. other networking stacks, and wireless sensor nets. His approach is a reincarnation of DATALOG. It fits the centrality of the graphs and rendezvous in networks. After this initial issues P2 has been used for consensus (paxos), secure networking, flexible data replications, and mobile networks. Currently other applications being build: compilers, natural language, computer games, security protocols, information extraction, modular robotics. The current challenges they are facing include a sound system design, language facing the usage on real world programing, lack of analysis for the languages, and not turing complete, connections to graph theory and algebraic modeling, efficient models for A*. Another challenge is how you do distributed inference and metacompilation to provide hardware runtimes. The data network uncertainty and P2 can help solve the embedding of the routing information, the network routing informations, and the conceptual networks together, and being able to express them together. Evita Raced is the runtime for P2 (a simple wire data flow bootstrapper). More info <a href="http://www.declarativity.net/">here</a>.</p>
<div class="aizatto_related_posts"><span class="aizatto_related_posts_header" >Related Posts</span><ul><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/summary-of-bdcsg2008-blogging/" rel="bookmark" title="Permanent Link: Summary of BDCSG2008 blogging" >Summary of BDCSG2008 blogging</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-simplicity-and-complexity-in-data-systems-garth-gibson/" rel="bookmark" title="Permanent Link: [BDCSG2008] Simplicity and Complexity in Data Systems (Garth Gibson)" >[BDCSG2008] Simplicity and Complexity in Data Systems (Garth Gibson)</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-clouds-and-manycores-the-revolution-dan-reed/" rel="bookmark" title="Permanent Link: [BDCSG2008] Clouds and ManyCores: The Revolution (Dan Reed)" >[BDCSG2008] Clouds and ManyCores: The Revolution (Dan Reed)</a></span><div class="aizatto_related_posts_excerpt"></div></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-what-goes-around-joe-hellerstein/feed/</wfw:commentRss>
		</item>
		<item>
		<title>[BDCSG2008] Mining the Web Graph (Marc Najork)</title>
		<link>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-mining-the-web-graph-marc-najork/</link>
		<comments>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-mining-the-web-graph-marc-najork/#comments</comments>
		<pubDate>Wed, 26 Mar 2008 21:34:37 +0000</pubDate>
		<dc:creator>Xavier</dc:creator>
		
		<category><![CDATA[Events]]></category>

		<guid isPermaLink="false">http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-mining-the-web-graph-marc-najork/</guid>
		<description><![CDATA[Marc takes the floor and starts talking about the web graphs (the one generated by pages hyperlinks). Hyperlinks is a key element of the element. Lately webpages has an increase of the number of links, usually generated by CMS (for instance navigation). However, there is a change on the meaning of those hyperlinks. Analytics have [...]]]></description>
			<content:encoded><![CDATA[<p>Marc takes the floor and starts talking about the web graphs (the one generated by pages hyperlinks). Hyperlinks is a key element of the element. Lately webpages has an increase of the number of links, usually generated by CMS (for instance navigation). However, there is a change on the meaning of those hyperlinks. Analytics have different flavors, for example page rank is pretty simple, but others require random access, requiring memory storage (requiring to to huge re graphs in memory). Using their own Microsoft tools, they distribute and replicate it in a cluster, to be able to run some of these analytic algorithms (for instance HITS for page ranking). Sampling can help deal with high a-rity nodes in a graph. He continues presenting the SALSA algorithm (successor of HITS). SALSA requires sampling, and Marc suggest that uniform works pretty well) However, how you evaluate the ranking algorithms? Compile a truth set? Sometime assembled by humans (may not know what the intend of the query was), but another alternative is to use click logs (potentially biassed toward the first results presented). As a field, he claims about the need to collaborate with social sciences to model and better understand the meaning and motivations of hyperlinks.</p>
<div class="aizatto_related_posts"><span class="aizatto_related_posts_header" >Related Posts</span><ul><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/summary-of-bdcsg2008-blogging/" rel="bookmark" title="Permanent Link: Summary of BDCSG2008 blogging" >Summary of BDCSG2008 blogging</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-text-information-management-challenges-and-oportunities-chengxiang-zhai/" rel="bookmark" title="Permanent Link: [BDCSG2008] Text Information Management: Challenges and Oportunities (ChengXiang Zhai)" >[BDCSG2008] Text Information Management: Challenges and Oportunities (ChengXiang Zhai)</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2007/07/20/loading-rdfxml-files-into-virtuosos-metadata-store-2/" rel="bookmark" title="Permanent Link: Loading RDF/XML files into Virtuoso&#8217;s metadata store" >Loading RDF/XML files into Virtuoso&#8217;s metadata store</a></span><div class="aizatto_related_posts_excerpt"></div></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-mining-the-web-graph-marc-najork/feed/</wfw:commentRss>
		</item>
		<item>
		<title>[BDCSG2008] Algorithmic Perspectives on Large-Scale Social Network Data (Jon Kleinberg)</title>
		<link>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-algorithmic-perspectives-on-large-scale-social-network-data-jon-kleinberg/</link>
		<comments>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-algorithmic-perspectives-on-large-scale-social-network-data-jon-kleinberg/#comments</comments>
		<pubDate>Wed, 26 Mar 2008 21:07:53 +0000</pubDate>
		<dc:creator>Xavier</dc:creator>
		
		<category><![CDATA[Events]]></category>

		<guid isPermaLink="false">http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-algorithmic-perspectives-on-large-scale-social-network-data-jon-kleinberg/</guid>
		<description><![CDATA[How can we help social science to do their science, but also how can we create systems from the lessons learned. This topics also include security and sensitivity of the data. He also review from the Karate papers to the latest papers about social networks. Scale changes the way you approach the data. The original [...]]]></description>
			<content:encoded><![CDATA[<p>How can we help social science to do their science, but also how can we create systems from the lessons learned. This topics also include security and sensitivity of the data. He also review from the Karate papers to the latest papers about social networks. Scale changes the way you approach the data. The original studies allowed knowing what each link mean, but large scale networks loses this property. However he is approaching for a language to express some of the analysis of the social networks and processes. Also, how we bind information per user and how can we model users. But the also security policies. Diffusion in social networks and how things are propagated (even locally), but it is hard to measure how people change their minds on the diffusion process. Chain-letter study where the petition and the trace was collected, but they can also be forward to mailing list, but you can trace some some of the traces of the mailing list. The path were messed with mutations (typos) amputations, etc. They generate some algorithms for maximum likelihood of the tree assemble. But the output was unexpected, opposed to the six-degree separation, they found narrow deep trees. Why a chain-letter would run as a deep-first search? Time played a role. Since friends are small searches, and basically the replicated copies where discarded. The model of the trees was able to be replicated following this time dimension. Another element gets throw to the mix is the threshold of the diffusion. Basically, a message gets in, but how many inputs repetitions your require to validated it an pass it along? Results show that the second input the one that boost that threshold. Viral marketing is another example that wants to understand diffusion. All this leads to multiple models and how you integrate them. Privacy and social networks is another key element. How does that  play? Is anonymation the way to go? Social network graphs, even if anonymized hints can lead to the deanonymation of the picture. Before the network is release you can add actions to it, and then you have something to roll back from. The idea create a unique pattern, and then ping them to other people. You can compromise a graph with square root of the log of the number of nodes. Jeff final reflections: toward a model of you. Models of human behavior are possible (for instance the model of time to reply email). But computers track more information about your behavior, opening the door to new modeling (something that the <a href="http://www-discus.ge.uiuc.edu/">DISCUS project</a> has also been postulating for the last 5 years).</p>
<div class="aizatto_related_posts"><span class="aizatto_related_posts_header" >Related Posts</span><ul><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/summary-of-bdcsg2008-blogging/" rel="bookmark" title="Permanent Link: Summary of BDCSG2008 blogging" >Summary of BDCSG2008 blogging</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2007/05/01/jon-kleinberg/" rel="bookmark" title="Permanent Link: Jon Kleinberg visits UIUC" >Jon Kleinberg visits UIUC</a></span><div class="aizatto_related_posts_excerpt"></div></li><li><span class="aizatto_related_posts_title" ><a href="http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-handling-large-datasets-at-google-current-systems-and-future-directions-jeff-dean/" rel="bookmark" title="Permanent Link: [BDCSG2008] Handling Large Datasets at Google: Current Systems and Future Directions (Jeff Dean)" >[BDCSG2008] Handling Large Datasets at Google: Current Systems and Future Directions (Jeff Dean)</a></span><div class="aizatto_related_posts_excerpt"></div></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://dita.ncsa.uiuc.edu/xllora/2008/03/26/bdcsg2008-algorithmic-perspectives-on-large-scale-social-network-data-jon-kleinberg/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
