<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-6483962</id><updated>2012-01-16T15:07:36.214-08:00</updated><category term='stats'/><category term='nlp'/><category term='scala'/><category term='machine learning'/><category term='linguistics'/><category term='hacks'/><category term='hadoop'/><category term='cs'/><category term='coding'/><title type='text'>:dlwh</title><subtitle type='html'>Not an editor command: dlwh.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://dlwh.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://dlwh.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>David Hall</name><uri>http://www.blogger.com/profile/17870365888858866253</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>11</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6483962.post-510475581136096552</id><published>2009-04-08T19:29:00.001-07:00</published><updated>2009-04-08T20:57:30.052-07:00</updated><title type='text'>Bandwidth</title><content type='html'>Some back of the envelope style calculations:&lt;br /&gt;&lt;br /&gt;You have &lt;a href="www.willamette.edu/%7Egorr/classes/cs449/brain.html"&gt;10 billion nerve cells in your brain, with an average of 10,000 synapses per nerve cell.&lt;/a&gt; Each of these fires &lt;a href="http://askville.amazon.com/average-neuron-firing-rate-humans-include-journal-article-reference/AnswerViewer.do?requestId=1368796"&gt;about 100 times a second&lt;/a&gt;, within an order of magnitude. If each firing carries one bit of information, this puts the bandwidth of your brain to around 10 quadrillion bits every second or about a petabyte per second. &lt;br /&gt;&lt;br /&gt;Compare that to the bandwidth of the internet, which in 2004 was a mere &lt;a href="http://singularity.com/charts/page80.html"&gt;4,200 petabytes per &lt;b&gt;year&lt;/b&gt;&lt;/a&gt;, or just 141 gigabytes/second. That is, in 2004, the amount of data transferred on the entire internet is just 0.0135% the bandwidth going on in an average person.&lt;br /&gt;&lt;br /&gt;If we're being extremely optimistic and assuming that the internet will double its bandwidth every year, then it will take roughly 13 years for the internet to reach the bandwidth of a single human brain, and another 35-ish years to reach the thinking power of the world's population.&lt;br /&gt;&lt;br /&gt;These calculations aren't entirely fair (not to mention horribly inexact) for a lot of reasons. In particular, there is a lot of error in both directions: the internet isn't going to grow that fast, and the human brain has a lot of redundancy. Regardless, fun talking point.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6483962-510475581136096552?l=dlwh.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dlwh.blogspot.com/feeds/510475581136096552/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6483962&amp;postID=510475581136096552' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/510475581136096552'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/510475581136096552'/><link rel='alternate' type='text/html' href='http://dlwh.blogspot.com/2009/04/bandwidth.html' title='Bandwidth'/><author><name>David Hall</name><uri>http://www.blogger.com/profile/17870365888858866253</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6483962.post-534234571658657484</id><published>2009-02-20T13:25:00.001-08:00</published><updated>2009-02-20T13:25:04.178-08:00</updated><title type='text'>Excerpt
</title><content type='html'>... known that it is no coincidence that the sign for integration ∫ matches the contour of the Taijitu's interior, nor that the limits of integration were placed at the seeds of the other, the dual centers toward which both yin and yang are inextricably drawn. Leibniz, ever the sinophile, knew that motion and stasis were linked through the slope of time, that change could be captured in a single stroke...&lt;br /&gt;&lt;br /&gt;Unfortunately, Leibniz ignored the most crucial point, that the whole is often far more than the limit of sums, and often far less...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6483962-534234571658657484?l=dlwh.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dlwh.blogspot.com/feeds/534234571658657484/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6483962&amp;postID=534234571658657484' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/534234571658657484'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/534234571658657484'/><link rel='alternate' type='text/html' href='http://dlwh.blogspot.com/2009/02/excerpt.html' title='Excerpt&#xA;'/><author><name>David Hall</name><uri>http://www.blogger.com/profile/17870365888858866253</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6483962.post-290905582198461293</id><published>2008-11-14T23:57:00.001-08:00</published><updated>2008-11-15T00:06:10.171-08:00</updated><title type='text'>Scala and Bash in the same file</title><content type='html'>&lt;span style="font-family:arial;"&gt;...or really any other language with C-like comments:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;// 2&gt;/dev/null; echo '&lt;br /&gt;println("Hello world!");&lt;br /&gt;/*&lt;br /&gt;# Yes this script does two things.&lt;br /&gt;//' &gt; /dev/null&lt;br /&gt;echo "Hello World!"&lt;br /&gt;# */&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:times new roman;"&gt;&lt;span style="font-family:arial;"&gt;How does it work?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;// 2&gt;/dev/null;&lt;/span&gt;&lt;span style="font-family:arial;"&gt; &lt;/span&gt;&lt;span style="font-family:times new roman;"&gt;looks like a comment to Scala, but looks like you're attempting to execute the root directory to Bash (but the error message is sent to /dev/null). The 'echo' bit encapsulates all the Scala code, keeping it away from bash, and then Scala happily ignores the parts betwen the /* */.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;My first polyglot program!&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6483962-290905582198461293?l=dlwh.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dlwh.blogspot.com/feeds/290905582198461293/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6483962&amp;postID=290905582198461293' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/290905582198461293'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/290905582198461293'/><link rel='alternate' type='text/html' href='http://dlwh.blogspot.com/2008/11/scala-and-bash-in-same-file.html' title='Scala and Bash in the same file'/><author><name>David Hall</name><uri>http://www.blogger.com/profile/17870365888858866253</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6483962.post-8837989729784545751</id><published>2008-10-27T14:13:00.000-07:00</published><updated>2008-10-27T19:21:31.977-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='nlp'/><category scheme='http://www.blogger.com/atom/ns#' term='cs'/><category scheme='http://www.blogger.com/atom/ns#' term='machine learning'/><title type='text'>Labels are Noisy Features</title><content type='html'>Every label is a simplification of something more complicated.&lt;br /&gt;Every label could be wrong with non-zero probability.&lt;br /&gt;&lt;br /&gt;A principle I've just been thinking is that you just can't trust annotators. Even if interannotator agreement is high, the manual they were trained to use is a poor indicator of their understanding. Obvious examples are in the NetFlix challenge: the same person might use completely different features to rank two different movies as five-stars. One might have a great soundtrack and the other might have some actor or another. Alternatively, in parsing, one might want to distinguish between different kinds of NPs, since the distribution of nouns in a subject NP are different from the distribution of nouns in an object NP.&lt;br /&gt;&lt;br /&gt;A non-trivial amount of the work I've seen here at EMNLP and just from reading in the past couple of months can be cast as dealing with impoverished labels. Broadly, I think the approaches fall into three categories:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Deterministic Splitting. Two reasonable ways of doing this. First, along the lines of &lt;a href="nlp.stanford.edu/%7Emanning/papers/unlexicalized-parsing.pdf"&gt;Klein and Manning, 2003&lt;/a&gt;, take your labels (e.g. NP) and add information from other nearby labels (e.g. S) to produce a new label (NP^S). Alternatively, you can instead "lexicalize" your labels by adding information about observed features (e.g. NP becomes NP-dog).&lt;/li&gt;&lt;li&gt;Addition of Latent Variables. This is like the machine learning version of the above. Instead of deterministically renaming features, assume that there is a latent variable that controls what label the human assigns and acts as an intermediary between the label and the feature. In a sense, turn the label into a feature. For example, if Y is your label, X your features, and Z your latent variables, then add a new latent variable B:&lt;br /&gt;&lt;br /&gt;Y -&gt; Z -&gt; X&lt;br /&gt;&lt;br /&gt;becomes something like&lt;br /&gt;&lt;br /&gt;B -&gt; Y&lt;br /&gt;B -&gt; Z&lt;br /&gt;Z -&gt; X&lt;br /&gt;&lt;br /&gt;There's of course a lot more to be discussed here. How big should |Z| be? Is it discrete? What's the interaction between other labels? Some papers that work out (some of) these details are &lt;a href="http://www.cs.berkeley.edu/%7Epetrov/data/emnlp08a.pdf"&gt;Petrov and Klein, 2008&lt;/a&gt;, and &lt;a href="http://www.cs.umass.edu/%7Emccallum/papers/mcl-aaai06.pdf"&gt;McCallum, et al. 2006&lt;/a&gt;. One might argue that any latent variable problem is an example of this phenonemon, but it seems that in general you gain by reserving one latent variable "just" for the additional layer of indirection.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Assume a mixture of latent variables. I'm mostly interested in this method for the multilabel setting. Here, you assume that a number of unseen components lead to the actual labels. The one that I find most interesting at the moment is inspired by &lt;a href="http://en.wikipedia.org/wiki/Independent_component_analysis"&gt;ICA&lt;/a&gt;, which assumes that the observed labels are a noisy combination of the "real" labels that actually created the data. (&lt;a href="http://nyc.lti.cs.cmu.edu/yiming/Publications/zgy-nips05.pdf%20"&gt;Zhang, et al. 2005&lt;/a&gt;)&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;I don't think that this is especially profound, but it seems that too often people don't bother to try this simple extension.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6483962-8837989729784545751?l=dlwh.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dlwh.blogspot.com/feeds/8837989729784545751/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6483962&amp;postID=8837989729784545751' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/8837989729784545751'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/8837989729784545751'/><link rel='alternate' type='text/html' href='http://dlwh.blogspot.com/2008/10/labels-are-noisy-features.html' title='Labels are Noisy Features'/><author><name>David Hall</name><uri>http://www.blogger.com/profile/17870365888858866253</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6483962.post-317574391015111177</id><published>2008-09-22T10:24:00.000-07:00</published><updated>2008-09-22T10:47:13.790-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hacks'/><category scheme='http://www.blogger.com/atom/ns#' term='hadoop'/><category scheme='http://www.blogger.com/atom/ns#' term='coding'/><title type='text'>Look Ma No Jars!</title><content type='html'>Anyone who's used Hadoop knows you have to use jars to package your MapReduces. You don't get a chance to specify a CLASSPATH, and your environment variables aren't respected since Hadoop runs as a different user. This is, to be sure, a good idea, but it can be awfully annoying to figure out exactly what your dependencies are.&lt;br /&gt;&lt;br /&gt;Luckily, jars have Manifests, which let you specify meta-information like Main-Class and version information. And... a Class-Path for other dependencies on jars and directories. And there's the in.&lt;br /&gt;&lt;br /&gt;Let's suppose that you have a hadoop cluster all linked via NFS (or sshfs, or...), and you don't like jars, and you've carefully set up your $CLASSPATH to contain all the classes you'd ever care to use. Then try out this script:&lt;br /&gt;&lt;pre&gt;#!/bin/env python&lt;br /&gt;import os&lt;br /&gt;&lt;br /&gt;user=os.environ["USER"]&lt;br /&gt;&lt;br /&gt;out = open("~/.super-manifest.txt"%user,"w")&lt;br /&gt;&lt;br /&gt;out.write("Class-Path: ")&lt;br /&gt;&lt;br /&gt;# Creates a Jar from a CLASSPATH:&lt;br /&gt;classpath=os.environ["CLASSPATH"]&lt;br /&gt;for x in classpath.split(':'):&lt;br /&gt;if x is not '':&lt;br /&gt; if not x.endswith(".jar"):&lt;br /&gt;   x = x + "/" # dirs must end with a slash&lt;br /&gt; out.write(" %s \n"% x)&lt;br /&gt;&lt;br /&gt;out.close();&lt;br /&gt;&lt;br /&gt;os.system("mkdir -p ~/.superlibs/lib");&lt;br /&gt;os.system("jar cmf ~/.super-manifest.txt ~/.superlibs/lib/supererJar.jar");&lt;br /&gt;os.system("jar cf ~/.superlibs/superJar.jar -C ~/.superlibs/ lib/supererJar.jar");&lt;br /&gt;&lt;/pre&gt;Now, in your code, when you set up your job:&lt;br /&gt;&lt;pre&gt;jobConf.setJar("/home/YOURNAME/.superlibs/superJar.jar");&lt;br /&gt;&lt;/pre&gt;And give it a spin. Hadoop, with no jars!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6483962-317574391015111177?l=dlwh.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dlwh.blogspot.com/feeds/317574391015111177/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6483962&amp;postID=317574391015111177' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/317574391015111177'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/317574391015111177'/><link rel='alternate' type='text/html' href='http://dlwh.blogspot.com/2008/09/look-ma-no-jars.html' title='Look Ma No Jars!'/><author><name>David Hall</name><uri>http://www.blogger.com/profile/17870365888858866253</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6483962.post-5076433380029430945</id><published>2008-09-04T18:27:00.000-07:00</published><updated>2008-09-04T18:43:11.074-07:00</updated><title type='text'>Evolution, and Empiricism</title><content type='html'>I had a thought last night that I've been toying with all day. It seems obvious that certain kinds of knowledge are completely inaccessible to other animals: language with recursion, "deep" representation of concepts, etc. They don't have this, of course, because Nature hasn't selected for them.&lt;br /&gt;&lt;br /&gt;We have (to some extent) both of these traits, but it seems naive to think that we aren't limited in the same way, so we can only assume that there are certain aspects of the universe humanity cannot understand.&lt;br /&gt;&lt;br /&gt;This is not to say that I'm advocating the teaching of intelligent design, or the space unicorn, or whatever, and I'm certainly not advocating transhumanism. But it does put into doubt radical empiricism: that only those things we can observe or reason out exist.  In a sense, if we can't ever know about some certain aspect of the world, then it's indistinguishable from chance, and should not concern us. So from that anthropocentric point of view, radical empiricism is justified.&lt;br /&gt;&lt;br /&gt;But what if we're entitled to incomplete knowledge of something? Then any form of empiricism seems less justified.&lt;br /&gt;&lt;br /&gt;Enough philosophy.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6483962-5076433380029430945?l=dlwh.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dlwh.blogspot.com/feeds/5076433380029430945/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6483962&amp;postID=5076433380029430945' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/5076433380029430945'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/5076433380029430945'/><link rel='alternate' type='text/html' href='http://dlwh.blogspot.com/2008/09/evolution-and-empiricism.html' title='Evolution, and Empiricism'/><author><name>David Hall</name><uri>http://www.blogger.com/profile/17870365888858866253</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6483962.post-6695098444344156509</id><published>2008-09-03T22:48:00.000-07:00</published><updated>2008-09-03T22:51:04.548-07:00</updated><title type='text'>SMR, Hadoop, and Scala</title><content type='html'>I've more or less finished up the port of SMR to Hadoop. See the blogpost linked for information.&lt;br /&gt;&lt;br /&gt;Let me know here (or there) what you'd like to see in future versions of SMR.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6483962-6695098444344156509?l=dlwh.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://scala-blogs.org/2008/09/scalable-language-and-scalable.html' title='SMR, Hadoop, and Scala'/><link rel='replies' type='application/atom+xml' href='http://dlwh.blogspot.com/feeds/6695098444344156509/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6483962&amp;postID=6695098444344156509' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/6695098444344156509'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/6695098444344156509'/><link rel='alternate' type='text/html' href='http://dlwh.blogspot.com/2008/09/smr.html' title='SMR, Hadoop, and Scala'/><author><name>David Hall</name><uri>http://www.blogger.com/profile/17870365888858866253</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6483962.post-7056460050926154714</id><published>2008-07-14T21:04:00.000-07:00</published><updated>2008-07-14T22:09:29.591-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='stats'/><category scheme='http://www.blogger.com/atom/ns#' term='scala'/><title type='text'>Monadic Sampling in Scala</title><content type='html'>(If you don't know what Monads are, see &lt;a href="http://james-iry.blogspot.com/2007/09/monads-are-elephants-part-1.html"&gt;here&lt;/a&gt;. Short answer: they're a design pattern coupled with syntactic sugar for implementing "wrappers" around an arbitrary type T.)&lt;br /&gt;&lt;br /&gt;I was initially quite resistant to monads, probably because I didn't really grok the syntax that Haskell was pushing. Seeing Scala's for-comprehensions, which are just like Haskell's do-notation, I've changed my mind, and in an effort to teach myself Monads, I've decided to write a simple library for doing sampling for Bayesian inference. What follows is inspired in part by &lt;a href="http://www.blogger.com/reports-archive.adm.cs.cmu.edu/anon/2004/CMU-CS-04-173.pdf"&gt;this paper&lt;/a&gt; and &lt;a href="http://www.randomhacks.net/articles/2007/02/21/refactoring-probability-distributions"&gt;this blog series&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;For motivation, let's look at the specification of a generative model in standard notation:&lt;br /&gt;&lt;pre&gt;θ ~ Dir(α)&lt;br /&gt;z ~ Mult(θ)&lt;br /&gt;w ~ Mult(β_z)&lt;br /&gt;&lt;/pre&gt; Our goal is to make a syntax that looks a lot like that for generating new data. We'll end up with something that looks like:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;for( theta &lt;- Dir(alpha);   &lt;br /&gt;  z &lt;- Mult(theta);   &lt;br /&gt;  w &lt;- Mult(beta_z))&lt;br /&gt;    yield w;&lt;br /&gt;&lt;/pre&gt; which isn't so bad.&lt;br /&gt;&lt;br /&gt;Let's start with a simple monad in the Scala tradition:&lt;br /&gt;&lt;pre&gt;trait Rand[T] {                                                            &lt;br /&gt; def get() : T&lt;br /&gt;&lt;br /&gt; def draw() = get()&lt;br /&gt;                                                                   &lt;br /&gt; def sample(n : Int) = List.tabulate(n,x =&gt; draw());                         &lt;br /&gt;                                                                  &lt;br /&gt; def flatMap[E](f : T =&gt; Rand[E] ) : Rand[E]&lt;br /&gt;                                                                   &lt;br /&gt; def map[E](f : T=&gt;E) : Rand[E]&lt;br /&gt;                                                                 &lt;br /&gt; def filter(p: T=&gt;Boolean) = condition(p);                                &lt;br /&gt;&lt;br /&gt; def condition(p : T =&gt; Boolean) : Rand[T];&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;Thus far, nothing terribly special. Aside from normal monadic stuff and aliases, we have draw and get, which intuitively should give us a sample from the random distribution. sample is just an easy way to get many samples at once.&lt;br /&gt;&lt;br /&gt;A default implementation for most of these methods is fairly straightforward:&lt;br /&gt;&lt;pre&gt;trait Rand[T] {&lt;br /&gt; def get() : T // still undefined&lt;br /&gt;&lt;br /&gt; def draw() = get();&lt;br /&gt;&lt;br /&gt; def sample(n : Int) = List.tabulate(n,x =&gt; get);&lt;br /&gt;&lt;br /&gt; def flatMap[E](f : T =&gt; Rand[E]) =  {&lt;br /&gt;   def x = f(get());&lt;br /&gt;   new Rand[E] {&lt;br /&gt;     def get = x.get;&lt;br /&gt;   }&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt; def map[E](f : T=&gt;E) =  {&lt;br /&gt;   def x = f(get());&lt;br /&gt;   new Rand[E] {&lt;br /&gt;     def get = x;&lt;br /&gt;   }&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt; def filter(p: T=&gt;Boolean) = condition(p);&lt;br /&gt;&lt;br /&gt; // Not the most efficient implementation ever, but meh.                         &lt;br /&gt; def condition(p : T =&gt; Boolean) = new Rand[T] {&lt;br /&gt;   def get() = {&lt;br /&gt;     var x = get;                                                                 &lt;br /&gt;     while(!p(x)) {&lt;br /&gt;       x = get;                                                                   &lt;br /&gt;     }&lt;br /&gt;     x&lt;br /&gt;   }                                                                              &lt;br /&gt; }                                                                                &lt;br /&gt;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The only difference from a normal "container" monad like Option is that we re-evaluate get() on each call to map and flatMap, to ensure that we're being random.&lt;br /&gt;&lt;br /&gt;Now for some basic random number generators. These live in object Rand, but that's omitted here for something akin to clarity. First, we have the analogue of Haskell's return, for completeness.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt; def always[T](t : T) = new Rand[T] {&lt;br /&gt;   def get = t;&lt;br /&gt; }&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Straightforward enough. Now for our building block:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt; val uniform = new Rand[Double] {&lt;br /&gt;   private val r = new Random;&lt;br /&gt;   def get = r.nextDouble;&lt;br /&gt; }&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Which lets us do things like:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;scala&gt; Rand.uniform.sample(10)&lt;br /&gt;res6: List[Double] = List(0.8940037286915604, 0.34021110772450114,&lt;br /&gt;0.2045633072974703, 0.44871569906073616, 0.47697121133477594,&lt;br /&gt;0.8410830818576492, 0.6738322287017577, 0.16060602963773707,&lt;br /&gt;0.602623326916021, 0.34327615862458416)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;and:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;scala&gt; val twice = for( x &lt;- Rand.uniform) yield x * 2; &lt;br /&gt;twice: java.lang.Object with scalanlp.stats.Rand[Double] =  &lt;br /&gt;scalanlp.stats.Rand$$anon$3@5c772046  &lt;br /&gt;&lt;br /&gt;scala&gt; twice.sample(10)&lt;br /&gt;res7: List[Double] = List(1.8602320334301579, 0.0872446976570771,&lt;br /&gt;0.032309170483379335, 1.9753336995209254, 1.220452839716684,&lt;br /&gt; 1.0214181828533413, 1.41457180527561, 1.6988361279393165,&lt;br /&gt; 1.460110077486223, 0.6762038442765996)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;So we already have some use of do-notation. And if that's not convincing for you, consider sampling two correlated Gaussians variables. First, we need a univariate Gaussian:&lt;pre&gt;&lt;br /&gt; val gaussian = new Rand[Double] {&lt;br /&gt;   private val r = new java.util.Random;&lt;br /&gt;   def get = r.nextGaussian;&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt; // mu is the mean and s^2 is the variance&lt;br /&gt; def gaussian(m : Double, s : Double) = new Rand[Double] {&lt;br /&gt;   def get = m + s * gaussian.get&lt;br /&gt; }&lt;br /&gt;&lt;/pre&gt;Then sampling two independent Gaussians is straightforward:&lt;pre&gt;&lt;br /&gt;scala&gt; val biGauss = for(x &lt;- Rand.gaussian; y &lt;- Rand.gaussian) yield (x,y)&lt;br /&gt;biGauss: java.lang.Object with scalanlp.stats.Rand[(Double, Double)] = &lt;br /&gt;scalanlp.stats.Rand$$anon$2@caa6635  scala&gt; biGauss.sample(3)&lt;br /&gt;&lt;br /&gt;res9: List[(Double, Double)] = List((-0.9601505823179303,-0.1480670696609196), &lt;br /&gt;0.02594332256575975,0.02401831998712138),&lt;br /&gt; (1.4885591927916324,1.1998923591137476))&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now suppose we want to draw to correlated Gaussians. That is, Gaussians where knowing one tells you something about the other. Drawing on &lt;a href="http://www.ds.unifi.it/VL/VL_EN/special/special7.html"&gt;this article&lt;/a&gt;, suppose that z1 and z2 are independent Gaussians drawn i.i.d. from the standard normal distribution, and that we interested in sampling normals (x,y) with means mu1 and mu2 and standard deviations s1 and s2, with correlation r. Then, we can draw x and y by computing:&lt;pre&gt;&lt;br /&gt;x = mu1 + s1 * z1;&lt;br /&gt;y = mu2 + r * z1 + s2(1- r&lt;sup&gt;2&lt;/sup&gt;)&lt;sup&gt;1/2&lt;/sup&gt;*z2&lt;br /&gt;&lt;/pre&gt;With that, we can easily write a generator in Scala:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;scala&gt; def corrGauss(mu1 : Double, mu2: Double, s1: Double, s2 : Double, r: Double) =                                                                      &lt;br /&gt;     |   for( (z1,z2) &lt;- biGauss;      &lt;br /&gt;     |     x = mu1 + s1 * z1; &lt;br /&gt;     |     y = mu2 + r * z1 + s2 * Math.sqrt(1 - r * r) * z2)      &lt;br /&gt;     |   yield (x,y);&lt;br /&gt;corrGauss: (Double,Double,Double,Double,Double)java.lang.Object with scalanlp.stats.Rand[(Double, Double)]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;And that's it! Next time, I'll write about Multinomials, Dirichlets, and generating more data.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6483962-7056460050926154714?l=dlwh.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dlwh.blogspot.com/feeds/7056460050926154714/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6483962&amp;postID=7056460050926154714' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/7056460050926154714'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/7056460050926154714'/><link rel='alternate' type='text/html' href='http://dlwh.blogspot.com/2008/07/monadic-sampling-in-scala.html' title='Monadic Sampling in Scala'/><author><name>David Hall</name><uri>http://www.blogger.com/profile/17870365888858866253</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6483962.post-3348152528909774124</id><published>2008-05-18T16:50:00.000-07:00</published><updated>2008-05-18T16:51:08.674-07:00</updated><title type='text'>Dynamic Code Reloading in the Scala Interpreter</title><content type='html'>A week ago I went to the &lt;a href="http://liftweb.net/index.php/Scala_lift_off"&gt;Scala Liftoff Unconference&lt;/a&gt;, which was a pretty fun time. At some point I'll write about a conversation I had with &lt;a href="http://biosimilarity.blogspot.com/"&gt;Greg Meredith&lt;/a&gt; about Bayes and category theory, but only when I understand it all better myself.&lt;br /&gt;&lt;br /&gt;The Platinum Sponsor, &lt;a href="http://www.blogger.com/www.zeroturnaround.com"&gt;ZeroTurnaround&lt;/a&gt;, gave everyone there a free copy of &lt;a href="http://www.zeroturnaround.com/javarebel/"&gt;JavaRebel&lt;/a&gt;, which automatically reloads classes on the JVM whenever you change them. No need to restart your java instance. Pretty sweet.&lt;br /&gt;&lt;br /&gt;This got me to thinking: one of the things I love about Scala is the interpreter. In particular, one of the greatest advantages of the interpreter is ":load", which just compiles and executes your scala file and let's you play with the results. Unfortunately, the interpreter doesn't work so well when your code base gets over a certain size. But, JavaRebel fixes all that: just recompile your class files in your editor, and JavaRebel will autoreload them! Magic!&lt;br /&gt;&lt;br /&gt;Example:&lt;br /&gt;&lt;br /&gt;foo.scala:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;class Foo {&lt;br /&gt;def bar() = 3;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Then, compile it, and open up a modified jrscala interpreter (more on this below):&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;##########################################################&lt;br /&gt;&lt;br /&gt;ZeroTurnaround JavaRebel 1.1&lt;br /&gt;(c) Copyright Webmedia, Ltd, 2007. All rights reserved.&lt;br /&gt;&lt;br /&gt;This product is licensed to David Hall&lt;br /&gt;&lt;br /&gt;##########################################################&lt;br /&gt;&lt;br /&gt;Welcome to Scala version 2.7.1.final (Java HotSpot(TM) Client VM, Java 1.5.0_13).&lt;br /&gt;Type in expressions to have them evaluated.&lt;br /&gt;Type :help for more information.&lt;br /&gt;&lt;br /&gt;scala&gt; val f = new Foo&lt;br /&gt;f: Foo = Foo@950feb&lt;br /&gt;&lt;br /&gt;scala&gt; f.bar&lt;br /&gt;res0: Int = 3&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Now, open back up foo.scala, and change it:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;class Foo {&lt;br /&gt;def bar() = 10;&lt;br /&gt;}&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Recompile, and go back to your interpreter:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;scala&gt; f.bar&lt;br /&gt;f.bar&lt;br /&gt;JavaRebel: Reloading class 'Foo'.&lt;br /&gt;res1: Int = 10&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Magic! Ok, so how do I do this? Well, I just made a copy of the scala script (called jrscala) and added two options to the very last line: namely&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;"-noverify -javaagent:/path/to/your/javarebel.jar":&lt;br /&gt;&lt;/pre&gt; So it looks like:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;${JAVACMD:=java} $JAVA_OPTS -cp "$TOOL_CLASSPATH" -noverify \&lt;br /&gt;-javaagent:/path/to/your/javarebel.jar -Dscala.home="$SCALA_HOME" \&lt;br /&gt;-Denv.classpath="$CLASSPATH" -Denv.emacs="$EMACS"  scala.tools.nsc.MainGenericRunner  "$@"&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And you're done. Obviously I should be more principled/less hacky about it, but it's pretty slick.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6483962-3348152528909774124?l=dlwh.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dlwh.blogspot.com/feeds/3348152528909774124/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6483962&amp;postID=3348152528909774124' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/3348152528909774124'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/3348152528909774124'/><link rel='alternate' type='text/html' href='http://dlwh.blogspot.com/2008/05/dynamic-code-reloading-in-scala.html' title='Dynamic Code Reloading in the Scala Interpreter'/><author><name>David Hall</name><uri>http://www.blogger.com/profile/17870365888858866253</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6483962.post-6157015387409154166</id><published>2007-11-16T13:26:00.000-08:00</published><updated>2007-11-16T13:45:29.220-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='linguistics'/><title type='text'>Baker's Paradox Revoked</title><content type='html'>Recently there's been some talk here about Baker's Learnability Paradox, particularly with regards to the &lt;a href="http://links.jstor.org/sici?sici=0097-8507%28198906%2965%3A2%3C203%3ATLAAOT%3E2.0.CO%3B2-G"&gt;Dative Alternation&lt;/a&gt;. The paradox is, essentially, that you know the following:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;John gave a book to Sue.&lt;br /&gt;John gave Sue a book.&lt;/li&gt;&lt;li&gt;John donated a book to the library.&lt;br /&gt;*John donated the Library a book.&lt;/li&gt;&lt;/ol&gt;(That asterisk means "unacceptable," by the way. If you think it's good, I'd love to hear from you.)  However, no one told you that "donate" should behave any differently from "give," yet you know it does. Moreover, there are relatively new words ("text" for SMS) that   do undergo the alternation without issue, so it isn't just that you learned only a few verbs undergo dative alternation.&lt;br /&gt;&lt;br /&gt;So what's going on? Linguists have spent a lot of time working on this issue, and in some sense this issue is indicative of pretty much all linguistic research: why is it that you can say certain things, and can't say others?&lt;br /&gt;&lt;br /&gt;However, in this instance, and in a lot of other instances, I'm not sure there's really a "why" answer here. Instead, I'm coming to believe that we just have heard "donate" many times, almost never in the double object construction, and so we just think we can't say it, because we haven't heard it. It's just like on the one hand we think an arbitrary coin is unbiased, but we also know the sun is going to rise tomorrow: we've never seen it not rise, and we've never seen anything telling us the coin is biased. And sure maybe there's some physics behind the scene that can prove us right, but I'd wager that we can learn arbitrary exceptions (like "donate")  without having to worry about Newton's or Kepler's laws.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6483962-6157015387409154166?l=dlwh.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dlwh.blogspot.com/feeds/6157015387409154166/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6483962&amp;postID=6157015387409154166' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/6157015387409154166'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/6157015387409154166'/><link rel='alternate' type='text/html' href='http://dlwh.blogspot.com/2007/11/bakers-paradox-revoked.html' title='Baker&apos;s Paradox Revoked'/><author><name>David Hall</name><uri>http://www.blogger.com/profile/17870365888858866253</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6483962.post-5477841522840620488</id><published>2007-08-19T20:20:00.000-07:00</published><updated>2007-11-16T13:59:03.212-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='coding'/><title type='text'>Command line arguments considered global</title><content type='html'>&lt;span style="font-family:georgia;"&gt;I'm not gonna bother with introductions, yet. I'll see if I like this, and then I'll introduce myself.&lt;br /&gt;&lt;br /&gt;On a software project I've recently become involved with, the others involved have this strange love of command line flags. They argue (rightly) that it speeds up their research cycle since they don't have to edit any files or compile before they start another run. This is research code, so keeping the experiment cycle to a minimum is their foremost concern. Alright, that make sense.&lt;br /&gt;&lt;br /&gt;However, there's a larger problem: the unfettered use of command line arguments (and, more importantly, a global registry for these flags) gives all the same problems as global variables, and we all know how evil global variables are. Don't believe me? Here's a look at what some code looks like:&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;pre&gt;class Foo {&lt;br /&gt;private BarFile bar;&lt;br /&gt;public Foo() {&lt;br /&gt;  barfile = new BarFile(Args.getCLA("barfile"));&lt;br /&gt;}&lt;br /&gt;}&lt;/pre&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-family:georgia;"&gt;Ick. Ick. Ick. If you use these, don't write code like that, please. I'll love you forever. Try to allow people some kind of flexibility, or at least... document what flags you use. Prefer something like this:&lt;br /&gt;&lt;/span&gt;&lt;pre&gt;class Foo {&lt;br /&gt;private BarFile bar;&lt;br /&gt;public Foo(){this(Args.getCLA("barfile"));}&lt;br /&gt;public Foo(String pathToBarFile) {&lt;br /&gt;  barfile = new BarFile(Args.getCLA("barfile"));&lt;br /&gt;}&lt;br /&gt;}&lt;/pre&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;That's not too bad, in fact that's down right reasonable. And you're probably saying, "that's no&lt;/span&gt;&lt;span style="font-family:georgia;"&gt;t  so bad. Just refactor!" Sure. But what if the ctor in question actually makes 3 member function calls before using the CLA. That's no fun to refactor, and it's confusing as hell to figure out where the configuration is coming from. Don't do this, or ninjas will kill kittens or Dijsktra or something.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6483962-5477841522840620488?l=dlwh.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dlwh.blogspot.com/feeds/5477841522840620488/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6483962&amp;postID=5477841522840620488' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/5477841522840620488'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6483962/posts/default/5477841522840620488'/><link rel='alternate' type='text/html' href='http://dlwh.blogspot.com/2007/08/command-line-arguments-considered.html' title='Command line arguments considered global'/><author><name>David Hall</name><uri>http://www.blogger.com/profile/17870365888858866253</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
