<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Networked Insights &#187; Sam Baskinger</title>
	<atom:link href="http://blog.networkedinsights.com/index.php/author/sbaskinger/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.networkedinsights.com</link>
	<description>Fueling Intelligent Brands</description>
	<lastBuildDate>Fri, 27 Aug 2010 20:54:46 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Sentiment in the drips-and-drabs of informal writing</title>
		<link>http://blog.networkedinsights.com/index.php/2010/02/sentiment-in-the-drips-and-drabs-of-informal-writing/</link>
		<comments>http://blog.networkedinsights.com/index.php/2010/02/sentiment-in-the-drips-and-drabs-of-informal-writing/#comments</comments>
		<pubDate>Thu, 25 Feb 2010 14:46:27 +0000</pubDate>
		<dc:creator>Sam Baskinger</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[sentiment]]></category>
		<category><![CDATA[user-generated content]]></category>

		<guid isPermaLink="false">http://blog.networkedinsights.com/?p=5814</guid>
		<description><![CDATA[Most of the effective algorithms for measuring sentiment rely on fairly well formatted, “predictable” text that follows formal grammar rules. But formal writing carries a bias. It is an immensely more difficult task to harvest information from the drips-and-drabs of informal writing such as is found in twitter and forums (or even blogs).]]></description>
			<content:encoded><![CDATA[<p>In a secret mission given to me by the commanding <a href="http://en.wikipedia.org/wiki/Natural_language_processing" target="_blank">NLP</a> officer at Networked Insights I bumped into a new kid on the sentiment-analysis block (founded in June ‘09ish, I believe), <a href="http://corporate.evri.com/" target="_blank">Evri</a>. What they do is pretty interesting! First, they comb<em> a limited number of “highly regarded” sources</em><a href="http://www.readwriteweb.com/archives/how_does_the_web_feel_evri_tells_you.php" target="_blank"><sup>[1]</sup></a>, extract entities (NLP jargon for the stuff we’re talking about — words and phrases) and relate them together. If you traffic in NLP-land this isn’t super-awesome-cool, but it is a lot of fun to see someone productize some of the algorithms out there. Kudos!</p>
<p>Now, you’re probably wondering why I italicized the little quote about highly regarded sources, and if you are the foreshadowing type, you may already be able to guess where I’m going with this. First, let me say that most of the effective algorithms for extracting entities, and almost all of NLP, rely on fairly well formatted, “predictable” text. By “predictable” I mean that it follows formal grammar rules, etc. So, in selecting highly regarded sources (say, CNN?) you are constraining your <a href="http://en.wikipedia.org/wiki/Sampling_(statistics)#Sampling_frame" target="_blank">statistical frame</a> to sites that you can processes. There isn’t anything terribly wrong with doing what you’re good at, but I would like to argue that, at least at Networked Insights, we fight to keep away from this restriction. In part, formal writing carries a bias. It carries a message motivated by some latent motivations. Keeping a high reputation, for instance. It is an immensely more difficult task to harvest information from the drips-and-drabs of informal writing such as is found in twitter and forums (or even blogs).</p>
<p>It’s because of this that, while the thought of Evri is exciting, I don’t think it will tell you anything that you didn’t already know. Now, my goal isn’t to pick on Evri, but I think it is fascinating to realize that the reason that analyzing more formal and easily analyzed text on the web is a bit of a losing battle is because the formalism comes in part from the author knowing that we are watching. Fox or CNN write with a specific audience in mind, and that audience is the same audience that TV seeks to entertain, and on, and on, and on. What is so powerful about the social web is that it’s text produced with only an audience of two or three people being expected.</p>
<p>I say this is powerful in two ways. First, biases equal out by the sheer volume and diversity of publishers. If someone is trying to catch the attention of a particular audience and tailors their text to fit, then that intentional word-smithing is likely reduced by countless other authors with similar overall ideas to express, but different spins to put on the text. Second, since most of this text is being generated quickly, and admittedly not always that well thought out, the raw feelings of people tend to leak into the text. It’s the living room conversations had without thinking. It’s the reflexive “boo” at the stadium. This rawness, if you will, if far more valuable because it’s never what you were expecting.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.networkedinsights.com/index.php/2010/02/sentiment-in-the-drips-and-drabs-of-informal-writing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Stats, Baseball, and Software Development</title>
		<link>http://blog.networkedinsights.com/index.php/2009/07/stats-baseball-and-software-development/</link>
		<comments>http://blog.networkedinsights.com/index.php/2009/07/stats-baseball-and-software-development/#comments</comments>
		<pubDate>Wed, 08 Jul 2009 17:28:45 +0000</pubDate>
		<dc:creator>Sam Baskinger</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[measurement]]></category>
		<category><![CDATA[metrics]]></category>

		<guid isPermaLink="false">http://blog.networkedinsights.com/?p=3191</guid>
		<description><![CDATA[&#8220;If you can&#8217;t measure it, you can&#8217;t change it &#8220;. This is one of those fundamentals that is so often forgotten at many levels of an organization. Every effort made by an organization to change something has behind it an implicit measurement, a representation in the mind of someone with some quality that may differ [...]]]></description>
			<content:encoded><![CDATA[<p>&#8220;<em>If you can&#8217;t measure it, you can&#8217;t change it</em> &#8220;. This is one of those fundamentals that is so often forgotten at many levels of an organization. Every effort made by an organization to change something has behind it an implicit measurement, a representation in the mind of someone with some quality that may differ from what what someone else believes it should be.</p>
<p>There is power in having concrete numbers, even if they do not mean much in isolation, for at least the sake of comparison. One of the best examples of a measure without absolute meaning is baseball&#8217;s <a href="http://mlb.mlb.com/mlb/official_info/baseball_basics/abbreviations.jsp" target="_top">OPS</a>. A player&#8217;s OPS is the sum of their On-base percentage (which is a pretty self-explanatory number) with the Slugging Percentage. The Slugging Percentage (SLG) is the sum of the total bases reached by a runner on a single at bat divided by the total number of at-bats that the batter has. The concept that SLG captures is the hitter&#8217;s overall effectiveness and power<sup> <a href="http://ezinearticles.com/?How-to-Calculate-Slugging-Percentage&amp;id=466045" target="_top">[1]</a></sup>.</p>
<p>If you close your eyes and get near to that crevasse where conscientiousness and lucid dreaming meet, you may start to understand what on earth SLG or OBP meant in the minds of their creators. They are, as numbers without a context, meaningless. That said, if you give them either the context of <em>Other Players</em> or the context of <em>Chance We&#8217;ll Win This Game</em> their value is unmatched in today&#8217;s baseball. An OPS near 1 is fabulous! What does 1 mean? Well&#8230; uh&#8230; it means we win a lot, and that&#8217;s good enough (unless you are my <a href="http://www.mets.com/" target="_top">Mets</a> who have <a href="http://newyork.mets.mlb.com/stats/sortable_player_stats.jsp?c_id=nym" target="_top">great stats</a> yet constantly <a href="http://newyork.mets.mlb.com/news/wrap.jsp?ymd=20090630&amp;content_id=5622594&amp;vkey=wrapup2005&amp;fext=.jsp&amp;team=away&amp;c_id=nym" target="_top">defy</a> the <a href="http://mlb.mlb.com/mlb/standings/index.jsp?ymd=20080930" target="_top">odds</a>). <em>Sigh,</em> I digress&#8230;</p>
<p>Software, like baseball players, is magnificent, often times mysterious, and occasionally questionably worth the sum you paid for it. And to evaluate it, like  baseball players (who&#8217;s behavior also often seems <a href="http://sigsports.net/forums/index.php?showtopic=27839" target="_top"> peculiar and irrational</a>), you must boil your application down to some simple numbers you can digest.</p>
<p>For instance, if you were to compute a <em>health</em> metric to digest your system&#8217;s state, what might you use? Some insights gained from the recent software sprint I was involved in point to more than your typical CPU usage, Memory usage, and network usage. While these numbers are valuable, they do not give you a good idea as to how your system is behaving compared to how it <em>should</em> be behaving. That is to say, if your application uses 90% of the CPU most of the time, then that is normal and acceptable. It is not until you establish the mode water mark that you can start reasoning about what <em>high</em> and <em>low</em> look like.</p>
<p>To begin our application&#8217;s instrumentation we first built a restful webpage to pump our statistics out via XML. We also wired up simple serialization of all the available system JMX server named object values. This gives us, essentially for free, the max and current values for memory, garbage cleanup statistics, thread statistics, and a host of other diagnostic bits of information.</p>
<p>In retrospect, we should have wired our application metrics into JMX as well, and perhaps that will be a future task, but for now they reside in series of iBatis calls to the database storing simple double-precision values, their <a href="http://en.wikipedia.org/wiki/Moving_average" target="_top">weighted moving averages</a>, their min, max, and current values as well as some time stamping information about how long the interval is between calls to that statistic and how often the statistic has been updated.</p>
<p>Of particular interest is that the weighted moving average consists of only two data points: The current average and the new value. Also stored with the statistic is the weight given to the new value (the value of which should be &gt;= 0 and &lt;= 1). The larger this weight is, the faster the average will move toward the new value. For daily job runs this value should be high. For frequent calls to the database, this value should be low. This causes the average to move quickly for the infrequent calls and slowly for the very frequent calls.</p>
<p>One added bonus, errors and exceptions, regardless of language, should be treated as data to be handled and preferably captured and stored. The statistics engine does provide an exception handling routine to store an exception, though this system needs refinement as we learn more about how we will be using this information in the future.</p>
<p>The goal of all this a dashboard with fields not unlike the following:</p>
<ul type="disc">
<li>exception count for the last day</li>
<li>current memory divided by peak memory</li>
<li>current threads divided by peak threads</li>
<li>batch job timeout count + batch job completion count = total job count</li>
<li>user calls per minute</li>
</ul>
<p>These are all instrumentable values and, at a glance, can tell you some profound things about the general activity of your system. It will tell you something about throughput and something about how close to your hardware limitations you are.</p>
<p>What is even more profound is that once the application holds this information, the application can &#8220;learn&#8221; and take action when the situation changes. It can throttle user queries to let the database catch up. It can send warning emails. It can stop running jobs. It can signal other clients of the DB or other resources to &#8220;back off use&#8221; for 10 minutes. These are all very simple reactions to resource strains, but impossible to do intelligently without some very basic measurements.</p>
<p><a href="http://www.addtoany.com/add_to/digg?linkurl=http%3A%2F%2Fblog.networkedinsights.com%2Findex.php%2F2009%2F07%2Fstats-baseball-and-software-development%2F&amp;linkname=Stats%2C%20Baseball%2C%20and%20Software%20Development" title="Digg" rel="nofollow" target="_blank"><img src="http://blog.networkedinsights.com/wp-content/plugins/add-to-any/icons/digg.png" width="16" height="16" alt="Digg"/></a> <a href="http://www.addtoany.com/add_to/twitter?linkurl=http%3A%2F%2Fblog.networkedinsights.com%2Findex.php%2F2009%2F07%2Fstats-baseball-and-software-development%2F&amp;linkname=Stats%2C%20Baseball%2C%20and%20Software%20Development" title="Twitter" rel="nofollow" target="_blank"><img src="http://blog.networkedinsights.com/wp-content/plugins/add-to-any/icons/twitter.png" width="16" height="16" alt="Twitter"/></a> <a href="http://www.addtoany.com/add_to/reddit?linkurl=http%3A%2F%2Fblog.networkedinsights.com%2Findex.php%2F2009%2F07%2Fstats-baseball-and-software-development%2F&amp;linkname=Stats%2C%20Baseball%2C%20and%20Software%20Development" title="Reddit" rel="nofollow" target="_blank"><img src="http://blog.networkedinsights.com/wp-content/plugins/add-to-any/icons/reddit.png" width="16" height="16" alt="Reddit"/></a> <a href="http://www.addtoany.com/add_to/delicious?linkurl=http%3A%2F%2Fblog.networkedinsights.com%2Findex.php%2F2009%2F07%2Fstats-baseball-and-software-development%2F&amp;linkname=Stats%2C%20Baseball%2C%20and%20Software%20Development" title="Delicious" rel="nofollow" target="_blank"><img src="http://blog.networkedinsights.com/wp-content/plugins/add-to-any/icons/delicious.png" width="16" height="16" alt="Delicious"/></a> <a href="http://www.addtoany.com/add_to/stumbleupon?linkurl=http%3A%2F%2Fblog.networkedinsights.com%2Findex.php%2F2009%2F07%2Fstats-baseball-and-software-development%2F&amp;linkname=Stats%2C%20Baseball%2C%20and%20Software%20Development" title="StumbleUpon" rel="nofollow" target="_blank"><img src="http://blog.networkedinsights.com/wp-content/plugins/add-to-any/icons/stumbleupon.png" width="16" height="16" alt="StumbleUpon"/></a> <a href="http://www.addtoany.com/add_to/facebook?linkurl=http%3A%2F%2Fblog.networkedinsights.com%2Findex.php%2F2009%2F07%2Fstats-baseball-and-software-development%2F&amp;linkname=Stats%2C%20Baseball%2C%20and%20Software%20Development" title="Facebook" rel="nofollow" target="_blank"><img src="http://blog.networkedinsights.com/wp-content/plugins/add-to-any/icons/facebook.png" width="16" height="16" alt="Facebook"/></a> <a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save">Share elsewhere...</a> </p>]]></content:encoded>
			<wfw:commentRss>http://blog.networkedinsights.com/index.php/2009/07/stats-baseball-and-software-development/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Finding Consensus (and Hiding from Disagreement)</title>
		<link>http://blog.networkedinsights.com/index.php/2009/06/finding-consensus-and-hiding-from-disagreement/</link>
		<comments>http://blog.networkedinsights.com/index.php/2009/06/finding-consensus-and-hiding-from-disagreement/#comments</comments>
		<pubDate>Tue, 09 Jun 2009 17:32:08 +0000</pubDate>
		<dc:creator>Sam Baskinger</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[analysis]]></category>

		<guid isPermaLink="false">http://blog.networkedinsights.com/?p=1421</guid>
		<description><![CDATA[I was hoping to have a link to a research paper or report to backup some of what I’m about to claim, but alas, the radio interview that is prompting these thoughts happened perhaps two years ago and I can’t seem to search-up (google or bing -up) the right results. I did find a nice [...]]]></description>
			<content:encoded><![CDATA[<p><img class="size-full wp-image-3691 alignright" src="http://blog.networkedinsights.com/wp-content/uploads/2009/06/image002.jpg" alt="Thumbs up and one down." width="305" height="221" align="right" />I was hoping to have a link to a research paper or report to backup some of what I’m about to claim, but alas, the radio interview that is prompting these thoughts happened perhaps two years ago and I can’t seem to search-up (google or bing -up) the right results. I did find a nice <a href="http://secretperson.wordpress.com/2008/07/04/social-networking-sites-bad-for-social-networking" target="_blank">blog entry</a> and Wikipedia has a good definition for <a href="http://en.wikipedia.org/wiki/Confirmation_bias" target="_blank">Confirmation Bias</a>. Nevertheless…</p>
<p>People come to the internet with an established view on a particular topic. We do not care precisely what the topic is, but suffice it to say, that it is one that the Internet user has some knowledge about, perhaps would like to know more about, and has some preconceived notions about. When they come to the Internet with their question, they punch it into a search engine and get back a few hundred results.</p>
<p>They read the headlines and queue in on the ones that match their preconceptions. They may click on a few that are not in agreement with what they believe is true, but they feel compelled to check them out also. The vast array of disagreeing sites are as a hydra, too numerous, and in this case too varied, for our Internet-searcher to make much sense out of. In the end, they are most likely not persuaded by any of them (mostly because they are all lumped into one category of “opposing views” and the single category of “my view” has some definite, clear, coherent representation).</p>
<p>The result? People find what they already know on the Internet. This is how forums become so mono-thought-esque and this is why people shout-down or ignore opposing views that appear in their specialized forums. This is, of course, a generalization, but a prevalent pattern.<br />
I point this out because it is a re-presentation of a fundamental pair of challenges that the Internet faced when it was just the World Wide Web. Social Media offers some ways to alleviate the problem, but not many.</p>
<ol type="1">
<li>How do you find a <em>trusted</em> voice for your point of view? That is, how can you be sure that you have a valid point of view and have found on the Internet someone who represents it with reason and balanced judgment?</li>
<li>Second, very similar to the first, and indeed necessary to satisfy the first point, how can you find balanced, well-reasoned opposition claims to the view point you are searching for? There are simply too many views that do not agree to single out the few that are perhaps actually challenging to your perspective, when considered.</li>
</ol>
<p>To put it another way, to solve #1 you need #2. To solve #2, currently, you need to read through a <em>lot</em> of information and most people simply do not have time for that. As a result, they tend to form their opinion off-line and <em>validate</em> it online.</p>
<p>The result is online communities that are essentially <a href="http://www.livingstonbuzz.com/2009/01/19/leveraging-idea-markets-while-avoiding-echo-chambers/" target="_blank">echo</a> <a href="http://personaldemocracy.com/node/5776" target="_blank">chambers</a> in which opposition perspectives have a very tough time gaining any sort of traction. Is this a bad thing? Well… not really, so long as you don’t think that your online community is some sort of truth-broker.</p>
<p>Social media offers some methods by which you can more easily find valid voices of agreement and opposition by creating continuity of relationships. If you can find one person with well-reasoned arguments for or against your position, chances are that they only dialog with like-quality people. If they disagree online, they most likely will do so at the same level at which they argue for their position. Thus, if you try to find valid opposition, find those who disagree with your initial social network contact on the subject matter.</p>
<p>Spidering along the social web, you should be able to find a much smaller set of points of view with higher quality arguments. The problem them becomes finding only that entry person, something you could perhaps accomplish in the “real” world and take online.</p>
<p>One last thought: Malcom Gladwell rightly points out in <a href="http://www.amazon.com/Tipping-Point-Little-Things-Difference/dp/0316346624" target="_blank"><span style="text-decoration: underline;">Tipping Point</span></a> that there are <a href="http://expertvoices.nsdl.org/cornell-info204/2008/04/14/the-tipping-point-three-agents-of-change/" target="_blank">different people who serve different roles</a> in societies. He breaks them up into three groups one of which are Mavens and the other of which are Connectors. The Mavens are the subject matter experts whose opinions you want. The Connectors are social butterflies looking to bring together people. If you can find not only a subject matter expert who argues at the level you are looking at, but if you can identify who among his connections is a Connector-type of person, you will have a fabulous initial sub-network of people to read for opinions.</p>
<p><a href="http://www.addtoany.com/add_to/digg?linkurl=http%3A%2F%2Fblog.networkedinsights.com%2Findex.php%2F2009%2F06%2Ffinding-consensus-and-hiding-from-disagreement%2F&amp;linkname=Finding%20Consensus%20%28and%20Hiding%20from%20Disagreement%29" title="Digg" rel="nofollow" target="_blank"><img src="http://blog.networkedinsights.com/wp-content/plugins/add-to-any/icons/digg.png" width="16" height="16" alt="Digg"/></a> <a href="http://www.addtoany.com/add_to/twitter?linkurl=http%3A%2F%2Fblog.networkedinsights.com%2Findex.php%2F2009%2F06%2Ffinding-consensus-and-hiding-from-disagreement%2F&amp;linkname=Finding%20Consensus%20%28and%20Hiding%20from%20Disagreement%29" title="Twitter" rel="nofollow" target="_blank"><img src="http://blog.networkedinsights.com/wp-content/plugins/add-to-any/icons/twitter.png" width="16" height="16" alt="Twitter"/></a> <a href="http://www.addtoany.com/add_to/reddit?linkurl=http%3A%2F%2Fblog.networkedinsights.com%2Findex.php%2F2009%2F06%2Ffinding-consensus-and-hiding-from-disagreement%2F&amp;linkname=Finding%20Consensus%20%28and%20Hiding%20from%20Disagreement%29" title="Reddit" rel="nofollow" target="_blank"><img src="http://blog.networkedinsights.com/wp-content/plugins/add-to-any/icons/reddit.png" width="16" height="16" alt="Reddit"/></a> <a href="http://www.addtoany.com/add_to/delicious?linkurl=http%3A%2F%2Fblog.networkedinsights.com%2Findex.php%2F2009%2F06%2Ffinding-consensus-and-hiding-from-disagreement%2F&amp;linkname=Finding%20Consensus%20%28and%20Hiding%20from%20Disagreement%29" title="Delicious" rel="nofollow" target="_blank"><img src="http://blog.networkedinsights.com/wp-content/plugins/add-to-any/icons/delicious.png" width="16" height="16" alt="Delicious"/></a> <a href="http://www.addtoany.com/add_to/stumbleupon?linkurl=http%3A%2F%2Fblog.networkedinsights.com%2Findex.php%2F2009%2F06%2Ffinding-consensus-and-hiding-from-disagreement%2F&amp;linkname=Finding%20Consensus%20%28and%20Hiding%20from%20Disagreement%29" title="StumbleUpon" rel="nofollow" target="_blank"><img src="http://blog.networkedinsights.com/wp-content/plugins/add-to-any/icons/stumbleupon.png" width="16" height="16" alt="StumbleUpon"/></a> <a href="http://www.addtoany.com/add_to/facebook?linkurl=http%3A%2F%2Fblog.networkedinsights.com%2Findex.php%2F2009%2F06%2Ffinding-consensus-and-hiding-from-disagreement%2F&amp;linkname=Finding%20Consensus%20%28and%20Hiding%20from%20Disagreement%29" title="Facebook" rel="nofollow" target="_blank"><img src="http://blog.networkedinsights.com/wp-content/plugins/add-to-any/icons/facebook.png" width="16" height="16" alt="Facebook"/></a> <a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save">Share elsewhere...</a> </p>]]></content:encoded>
			<wfw:commentRss>http://blog.networkedinsights.com/index.php/2009/06/finding-consensus-and-hiding-from-disagreement/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->