<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Language and Computation</title>
	<atom:link href="http://omlog.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://omlog.wordpress.com</link>
	<description>Academia, Linguistics, Programming, and Personal stuff</description>
	<lastBuildDate>Tue, 16 Aug 2011 14:29:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='omlog.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://1.gravatar.com/blavatar/917cbe660078eb237d18a172ca969b9c?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>Language and Computation</title>
		<link>http://omlog.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://omlog.wordpress.com/osd.xml" title="Language and Computation" />
	<atom:link rel='hub' href='http://omlog.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Using Neotoma to parse PEG in Erlang</title>
		<link>http://omlog.wordpress.com/2011/02/25/using-neotoma-to-parse-peg-in-erlang/</link>
		<comments>http://omlog.wordpress.com/2011/02/25/using-neotoma-to-parse-peg-in-erlang/#comments</comments>
		<pubDate>Fri, 25 Feb 2011 12:36:20 +0000</pubDate>
		<dc:creator>Oliver Mason</dc:creator>
				<category><![CDATA[erlang]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://omlog.wordpress.com/?p=181</guid>
		<description><![CDATA[For a project I need some easy and simple way to read structured data from a text file. Initially I considered JSON, and found a JSON parser for Erlang, but then decided that this was just overkill for what I needed. Ideally there would be a better match between the data structures I needed (lists, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=181&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>For a project I need some easy and simple way to read structured data from a text file. Initially I considered <a href="http://en.wikipedia.org/wiki/Json">JSON</a>, and found <a href="https://github.com/jchris/erlang-json-eep-parser">a JSON parser for Erlang</a>, but then decided that this was just overkill for what I needed. Ideally there would be a better match between the data structures I needed (lists, atoms, strings) and the file format.</p>
<p>I then decided to use Lisp-like <a href="http://en.wikipedia.org/wiki/S-expression">S-expressions</a>; at least a simplified version thereof. The data I read from the file is basically a list which can contain other lists, strings (which technically are also just lists), and atoms. A while ago I wrote a simple Erlang module to process something similar, but that had made certain assumptions that didn&#8217;t hold anymore, and I felt something more maintainable was required. And what better way to do that than by using a formal grammar to describe the file format and a tool to generate a parser from that?</p>
<p>A simple and straight forward grammar formalism is <a href="http://en.wikipedia.org/wiki/Parsing_expression_grammar">PEG</a>, Parsing Expression Grammar, and there is already an Erlang parser available for it, <a href="https://github.com/seancribbs/neotoma">Neotoma</a> by Sean Cribbs. Installation was easy, and so was writing a grammar:<br />
<code><br />
list &lt;- open elem* close;<br />
elem &lt;- list / atom / sstring / dstring;<br />
atom &lt;- [a-z0-9_]+ space*;<br />
dstring &lt;- &#039;&quot;&#039; [^&quot;]* &#039;&quot;&#039; space*;<br />
sstring &lt;- &quot;&#039;&quot; [^&#039;]* &quot;&#039;&quot; space*;<br />
open &lt;- &#039;(&#039; space* ;<br />
close &lt;- &#039;)&#039; space* ;<br />
space &lt;- &#039; &#039; / &#039;\t&#039; / eol;<br />
eol &lt;- &#039;\r\n&#039; / &#039;\n&#039; / &#039;\r&#039;;</code></p>
<p>A list is something (or nothing) enclosed in quotes (with optional spaces). An element is a choice of things, atoms are lower case letters and digits (at least one), and with strings I allow both double and single quotes. This grammar is saved in a file &#8220;terms.peg&#8221;:<br />
<code>Eshell V5.7.3  (abort with ^G)<br />
1&gt; neotoma:file("terms.peg").<br />
ok<br />
2&gt; c(terms).<br />
{ok,terms}</code></p>
<p>and you&#8217;re ready to go. I created four short one-line test files, with the following content:</p>
<ol>
<li>(atom)</li>
<li>( &#8220;string&#8221; )</li>
<li>(foo bar)</li>
<li>(())</li>
</ol>
<p>This is the output:<br />
<code>3&gt; terms:file("test1").<br />
[["(",[]],[["atom",[]]],[")",["\n"]]]<br />
4&gt; terms:file("test2").<br />
[["(",[" "]],[["\"","string","\"",[" "]]],[")",["\n"]]]<br />
5&gt; terms:file("test3").<br />
[["(",[]],[["foo",[" "]],["bar",[]]],[")",["\n"]]]<br />
6&gt; terms:file("test4").<br />
[["(",[]],[[["(",[]],[],[")",[]]]],[")",["\n"]]]</code><br />
Not all that helpful, as there is a lot of noise in there, such as the spaces in &#8220;test2&#8243;, and all the line-breaks. So I need to go back to the AST and extract just those bits from the parse tree that I actually want. In Neotoma you can do this by adding bits of Erlang code to the grammar definition, like so:<code><br />
list &lt;- open elem* close<br />
    `[Open, Elem, Close] = Node, Elem`<br />
;<br />
atom &lt;- [a-z0-9_]+ space*<br />
    `[Atom, Space] = Node, list_to_atom(Atom)`<br />
;<br />
dstring &lt;- &#039;&quot;&#039; [^&quot;]* &#039;&quot;&#039; space*<br />
    `[Quote, Str, Quote, Space] = Node, Str`<br />
;<br />
sstring &lt;- &quot;&#039;&quot; [^&#039;]* &quot;&#039;&quot; space*<br />
    `[Quote, Str, Quote, Space] = Node, Str`<br />
;<br />
</code><br />
(All other lines are unchanged as in the grammar listed above)</p>
<p>What I do here is to split the Node into its component parts, and then discard the bits I don&#8217;t want. In the &#8216;list&#8217; rule I am only interested in the elements, but not in the enclosing brackets, so I just return &#8216;Elem&#8217;. For the &#8216;atom&#8217; I ignore the spaces and convert the matched character sequence into an atom. Now the output looks like this:<br />
<code>7&gt; neotoma:file("terms.peg").<br />
ok<br />
8&gt; c(terms).<br />
{ok,terms}<br />
9&gt; terms:file("test1").<br />
[atom]<br />
10&gt; terms:file("test2").<br />
["string"]<br />
11&gt; terms:file("test3").<br />
[foo,bar]<br />
12&gt; terms:file("test4").<br />
[[]]</code><br />
Much better, and just what I wanted. The &#8216;terms.elr&#8217; file that neotoma generated is 7kb in size, just over 220 lines, and just under 8kb compiled.</p>
<p>The only issue is speed and memory consumption: on my 8GB MacBook Pro a file of less than 40k runs out of memory and crashes after 30+ seconds. If I take a part off at the end to make it 35k, the parser succeeds, but needs 35 seconds (hand-timed). So I think I will have to revisit my hand-made parser again after all&#8230; :(</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/omlog.wordpress.com/181/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/omlog.wordpress.com/181/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/omlog.wordpress.com/181/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/omlog.wordpress.com/181/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/omlog.wordpress.com/181/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/omlog.wordpress.com/181/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/omlog.wordpress.com/181/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/omlog.wordpress.com/181/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/omlog.wordpress.com/181/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/omlog.wordpress.com/181/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/omlog.wordpress.com/181/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/omlog.wordpress.com/181/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/omlog.wordpress.com/181/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/omlog.wordpress.com/181/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=181&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" /><div class="sharedaddy"></div>]]></content:encoded>
			<wfw:commentRss>http://omlog.wordpress.com/2011/02/25/using-neotoma-to-parse-peg-in-erlang/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b3e5e6b5ecd2707930a109a46c0cfafe?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ojmason</media:title>
		</media:content>
	</item>
		<item>
		<title>The longest Chip in the World</title>
		<link>http://omlog.wordpress.com/2010/12/18/the-longest-chip-in-the-world/</link>
		<comments>http://omlog.wordpress.com/2010/12/18/the-longest-chip-in-the-world/#comments</comments>
		<pubDate>Sat, 18 Dec 2010 11:39:27 +0000</pubDate>
		<dc:creator>Oliver Mason</dc:creator>
				<category><![CDATA[misc]]></category>

		<guid isPermaLink="false">http://omlog.wordpress.com/?p=177</guid>
		<description><![CDATA[In his Metamagical Themas: Questing for the Essence of Mind and Pattern, Douglas Hofstadter talks about numerical literacy, the ability to understand large numbers. This is especially important when state budgets are through around which deal with billions of pounds or euros. At some point you just lose all feeling for quantities, as they are [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=177&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In his <a href="http://www.amazon.co.uk/gp/product/0465045669?ie=UTF8&amp;tag=phrasysnlp-21&amp;linkCode=as2&amp;camp=1634&amp;creative=19450&amp;creativeASIN=0465045669">Metamagical Themas: Questing for the Essence of Mind and Pattern</a><img src="http://www.assoc-amazon.co.uk/e/ir?t=phrasysnlp-21&amp;l=as2&amp;o=2&amp;a=0465045669" width="1" height="1" border="0" alt="" style="border:none!important;margin:0!important;" />, Douglas Hofstadter talks about numerical literacy, the ability to understand large numbers. This is especially important when state budgets are through around which deal with billions of pounds or euros. At some point you just lose all feeling for quantities, as they are all so unimaginably large and abstract. He then goes on to pose some rough calculation questions, such as &#8220;How many cigarettes are smoked in the US every day?&#8221; &#8211; you start with some estimates, and then work out an answer, which might be near the order of magnitude of the right answer (which we of course don&#8217;t know). Quite an interesting and useful mental exercise.</p>
<p>Yesterday we had Fish &amp; Chips for dinner. The girls were comparing the sizes of their chips, and then we came on to the topic of the longest chip in the world. Obviously, with &#8216;proper&#8217; chips this is limited by the size of the source potato. However, industrially produced chips are made of mashed potato formed into chip-shapes, so there is not really any fixed limit on the length. So, thinking of Hofstadter, I asked them how many potatoes we would need to make a chip that spans around the whole world.</p>
<p>Rough assumptions: one potato contains enough matter to produce 10cm worth of chip (the thickness is not specified). So, how many potatoes do we need for one metre of chip? This is also useful to practice basic primary-school-level maths&#8230; &#8211; 10. How many for a kilometre? 10,000. How many kilometres do we need to span the world? Roughly 40,000 km. So how many potatoes do we need? 400 million.</p>
<p>The next question is whether there are enough potatoes in the world to do this. Assuming a potato weighs 100g, how much do our 400 million potatoes weigh? 40 million kilogrammes, or 40,000 (metric) tons. What is the world&#8217;s potato production? According to <a href="http://en.wikipedia.org/wiki/Potato">Wikipedia</a>, this is 315 million metric tons, so plenty enough. Now, if we were to turn the annual potato crop into one long chip, how many times would it go round the Earth? 7,875 times.</p>
<p>So, with a bit of basic maths (and Wikipedia for the data) you can make maths exciting for kids, practice how to multiply and divide, teach problem-solving, and have fun at the same time. And they also get a feeling for numbers: 400 million &#8211; that&#8217;s how many potatoes you need for a chip to span the Earth.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/omlog.wordpress.com/177/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/omlog.wordpress.com/177/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/omlog.wordpress.com/177/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/omlog.wordpress.com/177/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/omlog.wordpress.com/177/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/omlog.wordpress.com/177/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/omlog.wordpress.com/177/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/omlog.wordpress.com/177/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/omlog.wordpress.com/177/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/omlog.wordpress.com/177/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/omlog.wordpress.com/177/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/omlog.wordpress.com/177/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/omlog.wordpress.com/177/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/omlog.wordpress.com/177/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=177&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" /><div class="sharedaddy"></div>]]></content:encoded>
			<wfw:commentRss>http://omlog.wordpress.com/2010/12/18/the-longest-chip-in-the-world/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b3e5e6b5ecd2707930a109a46c0cfafe?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ojmason</media:title>
		</media:content>

		<media:content url="http://www.assoc-amazon.co.uk/e/ir?t=phrasysnlp-21&#38;l=as2&#38;o=2&#38;a=0465045669" medium="image" />
	</item>
		<item>
		<title>On the trouble with &#8220;Global Warming&#8221;</title>
		<link>http://omlog.wordpress.com/2010/12/06/on-the-trouble-with-global-warming/</link>
		<comments>http://omlog.wordpress.com/2010/12/06/on-the-trouble-with-global-warming/#comments</comments>
		<pubDate>Mon, 06 Dec 2010 16:12:25 +0000</pubDate>
		<dc:creator>Oliver Mason</dc:creator>
				<category><![CDATA[linguistics]]></category>

		<guid isPermaLink="false">http://omlog.wordpress.com/?p=167</guid>
		<description><![CDATA[Global warming is a real danger to life on the planet. As I write this, another extremely cold winter approaches, with snow and ice (and -9 degrees) already starting in late November. Global warming? WTF!? The term &#8220;global warming&#8221; is obviously problematic, for several reasons, two of which I will discuss here: firstly, the climate [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=167&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/f/f8/DB-Werbung_1980er_Jahre_%22Alle_reden_vom_Wetter._Wir_nicht.%22.jpg/220px-DB-Werbung_1980er_Jahre_%22Alle_reden_vom_Wetter._Wir_nicht.%22.jpg" alt="'Everybody is talking about the weather. Not us.' - German Railways poster from 1966 (Source: Wikipedia)" align="left" hspace="10" vspace="10" />Global warming is a real danger to life on the planet. As I write this, another extremely cold winter approaches, with snow and ice (and -9 degrees) already starting in late November. Global warming? WTF!?</p>
<p>The term &#8220;global warming&#8221; is obviously problematic, for several reasons, two of which I will discuss here: firstly, the climate is a complex system, and secondly, the climate is not the weather. Both reasons have links to linguistics, which is my justification to talk about them on this blog.</p>
<p><strong>Climate is a complex system</strong></p>
<p>Chaos theory was discovered by meteorologists, running climate models on computers. Small changes in the starting conditions result in largely different outcomes. That is why nobody can really predict reliably what is going to happen, as there is no clearly visible link from A to B, the conditions today to the conditions at whatever time in the future. Reducing this to a simple statement such as &#8220;temperatures will increase globally&#8221; is dangerous, as climate is not that simple itself, and you then get people who demolish your argument on the grounds of inaccuracy.</p>
<p>Language can be seen as a complex system as well; it is influenced by so many factors that it is not possible to make any predictions about how language will change. Any statements such as those on the bad influence of text messaging on the English language are clearly not appropriate; broadly general statements of this kind miss the point about the varieties and different language communities that make up the &#8220;English&#8221; language.</p>
<p><strong>The climate is not the weather</strong></p>
<p>This point is somewhat related: &#8216;weather&#8217; is what we&#8217;ve got now, but &#8216;climate&#8217; is a broader, more general tendency. So while we might indeed have a cold winter, if we have a correspondingly hotter summer, the average annual temperature might indeed rise, even if it doesn&#8217;t feel like that as you shiver your way to work in the morning. And this year is a single event, which in context might be an outlier if it becomes really warm next winter. Weather is somewhat unpredictable and chaotic, otherwise the Met Office would be out of work.</p>
<p>Global climate would also mean that it could become colder in Western Europe, while other regions of the Earth heat up, and the people of Tuvalu will have have a different view about melting ice caps than American farmers in Arizona.</p>
<p>Michael Halliday compares <em>langue</em> and <em>parole</em> (or <em>competence</em> and <em>performance</em>) with climate and weather: while we can observe one (weather/parole/performance), the other can only be perceived indirectly (climate/langue/competence) through studying the former. But essentially they are different views on the same phenomenon, one short-term and one long-term.</p>
<p><strong>A solution?</strong></p>
<p>Coming to the point of this post, I would suggest abandoning the term &#8220;Global Warming&#8221; in favour of &#8220;Climate Change&#8221;. Change can go in different directions, and so it is harder for climate-change-deniers to win easy points whenever the weather is colder, and it also emphasises the <em>climate</em> as opposed to the <em>weather</em>. This might seem like a simplistic point similar to the political correctness debate, but lexical choices when representing reality in language are really important.</p>
<p>And thus we have moved from complex systems via Halliday&#8217;s view of langue and parole to Critical Discourse Analysis.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/omlog.wordpress.com/167/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/omlog.wordpress.com/167/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/omlog.wordpress.com/167/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/omlog.wordpress.com/167/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/omlog.wordpress.com/167/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/omlog.wordpress.com/167/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/omlog.wordpress.com/167/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/omlog.wordpress.com/167/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/omlog.wordpress.com/167/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/omlog.wordpress.com/167/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/omlog.wordpress.com/167/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/omlog.wordpress.com/167/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/omlog.wordpress.com/167/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/omlog.wordpress.com/167/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=167&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" /><div class="sharedaddy"></div>]]></content:encoded>
			<wfw:commentRss>http://omlog.wordpress.com/2010/12/06/on-the-trouble-with-global-warming/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b3e5e6b5ecd2707930a109a46c0cfafe?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ojmason</media:title>
		</media:content>

		<media:content url="http://upload.wikimedia.org/wikipedia/commons/thumb/f/f8/DB-Werbung_1980er_Jahre_%22Alle_reden_vom_Wetter._Wir_nicht.%22.jpg/220px-DB-Werbung_1980er_Jahre_%22Alle_reden_vom_Wetter._Wir_nicht.%22.jpg" medium="image">
			<media:title type="html">'Everybody is talking about the weather. Not us.' - German Railways poster from 1966 (Source: Wikipedia)</media:title>
		</media:content>
	</item>
		<item>
		<title>Sentiment Analysis &amp; the English Language</title>
		<link>http://omlog.wordpress.com/2010/11/12/sentiment-analysis-the-english-language/</link>
		<comments>http://omlog.wordpress.com/2010/11/12/sentiment-analysis-the-english-language/#comments</comments>
		<pubDate>Fri, 12 Nov 2010 09:53:30 +0000</pubDate>
		<dc:creator>Oliver Mason</dc:creator>
				<category><![CDATA[Sentiment Analysis]]></category>

		<guid isPermaLink="false">http://omlog.wordpress.com/?p=150</guid>
		<description><![CDATA[Currently I am working on Sentiment Analysis, so I will probably post a series of smaller posts on issues that I come across. Today I was looking at YouGov&#8217;s website and public opinions of Nick Clegg, the social anthropologist currently serving as deputy prime minister. Here is a snapshot of the current opinions at the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=150&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Currently I am working on <a href="http://en.wikipedia.org/wiki/Sentiment_analysis">Sentiment Analysis</a>, so I will probably post a series of smaller posts on issues that I come across.  Today I was looking at <a href="http://today.yougov.co.uk/">YouGov&#8217;s website</a> and public opinions of <a href="http://en.wikipedia.org/wiki/Nick_Clegg">Nick Clegg</a>, the social anthropologist currently serving as deputy prime minister.  Here is a snapshot of the current opinions at the time, and you can see that most of them are classed as negative:</p>
<p><a href="http://omlog.files.wordpress.com/2010/10/clegg-opinions.png"><img src="http://omlog.files.wordpress.com/2010/10/clegg-opinions.png?w=295&#038;h=300" alt="" title="clegg-opinions" width="295" height="300" class="alignleft size-medium wp-image-157" /></a></p>
<p>Most? I would say all&#8230; but apparently whatever system YouGov use for sentiment analysis cannot cope with idioms. And shooting yourself in the foot is not exactly a tricky one to identify I should think.</p>
<p>But this raises a more complex issue: there are many ways to express opinions, attitudes, judgements, etc in language. This is a much larger problem than counting the number of &#8216;positive&#8217; and &#8216;negative&#8217; words in a text. To begin with, words in isolation rarely have a meaning; opinions are usually subjective; and then there&#8217;s irony and sarcasm.</p>
<p>Yes, Clegg did really well when he supported the Tories on tuition fees&#8230;</p>
<p>Continuing on this theme, here&#8217;s another issue: the assumption that the scope of a sentiment is the whole text. Here&#8217;s an opinion (positive) from the same site about student protests:</p>
<blockquote><p>I completely support the right to protest; however, violence is unreasonable.</p></blockquote>
<p>This is somewhat positive, supporting the students, but in the second clause there is an additional judgment condemning the violent incidents that happened at the demonstration. This seems to suggest that the proper carrier of attitude should be the clause, rather than the sentence, let alone the text. Not everything is just black and white.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/omlog.wordpress.com/150/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/omlog.wordpress.com/150/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/omlog.wordpress.com/150/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/omlog.wordpress.com/150/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/omlog.wordpress.com/150/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/omlog.wordpress.com/150/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/omlog.wordpress.com/150/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/omlog.wordpress.com/150/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/omlog.wordpress.com/150/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/omlog.wordpress.com/150/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/omlog.wordpress.com/150/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/omlog.wordpress.com/150/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/omlog.wordpress.com/150/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/omlog.wordpress.com/150/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=150&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" /><div class="sharedaddy"></div>]]></content:encoded>
			<wfw:commentRss>http://omlog.wordpress.com/2010/11/12/sentiment-analysis-the-english-language/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b3e5e6b5ecd2707930a109a46c0cfafe?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ojmason</media:title>
		</media:content>

		<media:content url="http://omlog.files.wordpress.com/2010/10/clegg-opinions.png?w=295" medium="image">
			<media:title type="html">clegg-opinions</media:title>
		</media:content>
	</item>
		<item>
		<title>Update/Correction to &#8220;Elegant IR with Erlang&#8221;</title>
		<link>http://omlog.wordpress.com/2010/10/14/updatecorrection-to-elegant-ir-with-erlang/</link>
		<comments>http://omlog.wordpress.com/2010/10/14/updatecorrection-to-elegant-ir-with-erlang/#comments</comments>
		<pubDate>Thu, 14 Oct 2010 21:41:15 +0000</pubDate>
		<dc:creator>Oliver Mason</dc:creator>
				<category><![CDATA[algorithm]]></category>
		<category><![CDATA[erlang]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://omlog.wordpress.com/?p=153</guid>
		<description><![CDATA[When I tried to actually use my implementation of tf-idf that I described in the previous post, I realised that it&#8217;s not quite what I wanted: as it is, I get a different tf-idf value for each token and each document. So with a collection of 1000 documents I get 1000 dictionaries containing the tokens [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=153&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>When I tried to actually use my implementation of tf-idf that I described <a href="http://omlog.wordpress.com/2010/10/11/elegant-ir-with-erlang/">in the previous post</a>, I realised that it&#8217;s not quite what I wanted: as it is, I get a different tf-idf value for each token and each document. So with a collection of 1000 documents I get 1000 dictionaries containing the tokens in each text. However, what I really want is ONE dictionary with all the tokens in, and ONE tf-idf value for each token.</p>
<p>Merging the values is tricky, as it involves relative frequencies, so I needed to make some subtle changes. First, the <tt>term_freq/1</tt> method now deals with <em>absolute</em> frequencies, and returns a tuple containing the frequency values and the document size in tokens, so that the relative frequencies can easily be computed if required:</p>
<pre>
term_freq(Text) -&gt;
    term_freq(Text, 0, dict:new()).

term_freq([], Sum, Dict) -&gt;
    {Dict, Sum};

term_freq([Token|Rest], Sum, Dict) -&gt;
    term_freq(Rest, Sum+1,
           dict:update_counter(Token,1,Dict)).
</pre>
<p>No change really, only the terminating clause of <tt>term_freq/3</tt> has dropped its <tt>dict:map</tt> to compute the relative values, and instead returns the tuple with the frequency dictionary and the document size.</p>
<p>This also requires a minor change in the <tt>inv_doc_freq/3</tt> function, where we need to deal with the tuple and extract the dictionary from it in the second and final clause:</p>
<p><strong>old</strong></p>
<pre>
inv_doc_freq([Doc|Rest], DocNum, Dict) -&gt;
</pre>
<p><strong>new</strong></p>
<pre>
inv_doc_freq([{Doc, _Sum}|Rest], DocNum, Dict) -&gt;
</pre>
<p>The biggest change, however, is in the combined <tt>tf_idf/1</tt> function, as the algorithm has somewhat changed. Originally the function was a full screen in the editor, but I have extracted two functions to make them easier to follow; the gain in clarity will surely outweigh the minute performance penalty&#8230;</p>
<pre>
tf_idf(Docs) -&gt;
    Idfs = inv_doc_freq(Docs),
    DocLen = total_doc_size(Docs),
    DocTotalFreqs = total_token_freqs(Docs),
    dict:map(
        fun(Key, Value) -&gt;
            dict:fetch(Key, Idfs) * Value / DocLen
            end,
        DocTotalFreqs).
</pre>
<p>I need to calculate the overall size (in tokens) of the full document collection, and then add up the token frequency over all documents. These have been factored out into separate functions. Then all is left is a map over all tokens to calculate the tf-idf value from the relative frequency in the document collection multiplied by the idf value as calculated earlier.</p>
<p>Computing the total document size is trivial: we loop over the list of term frequency dictionaries and this time extract the lengths, ignoring the actual dictionaries:</p>
<pre>
total_doc_size(Docs) -&gt;
    lists:foldl(
        fun({_Doc, DocSum}, Total) -&gt; Total + DocSum end,
        0,
        Docs).
</pre>
<p>And finally, that leaves computing the total frequencies of all tokens.</p>
<pre>
total_token_freqs(Docs) -&gt;
    lists:foldl(
        fun({Doc, _Sum}, Current) -&gt;
            dict:fold(
                fun(Key, Value, AccIn) -&gt;
                    dict:update_counter(Key,Value,AccIn)
                    end,
                Current,
                Doc)
            end,
        dict:new(),
        Docs).
</pre>
<p>Here we process the document list (as there are likely to be fewer documents than tokens) and fold each dictionary, adding the tokens with their respective frequencies to our accumulator dictionary.</p>
<p>Apologies for this correction; but sometimes you only really realise that a particular interpretation of an algorithm is not the right one when you actually need to use it. The curse of developing libraries without proper specification of the requirements&#8230; </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/omlog.wordpress.com/153/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/omlog.wordpress.com/153/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/omlog.wordpress.com/153/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/omlog.wordpress.com/153/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/omlog.wordpress.com/153/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/omlog.wordpress.com/153/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/omlog.wordpress.com/153/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/omlog.wordpress.com/153/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/omlog.wordpress.com/153/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/omlog.wordpress.com/153/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/omlog.wordpress.com/153/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/omlog.wordpress.com/153/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/omlog.wordpress.com/153/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/omlog.wordpress.com/153/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=153&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" /><div class="sharedaddy"></div>]]></content:encoded>
			<wfw:commentRss>http://omlog.wordpress.com/2010/10/14/updatecorrection-to-elegant-ir-with-erlang/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b3e5e6b5ecd2707930a109a46c0cfafe?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ojmason</media:title>
		</media:content>
	</item>
		<item>
		<title>Elegant IR with Erlang</title>
		<link>http://omlog.wordpress.com/2010/10/11/elegant-ir-with-erlang/</link>
		<comments>http://omlog.wordpress.com/2010/10/11/elegant-ir-with-erlang/#comments</comments>
		<pubDate>Mon, 11 Oct 2010 17:39:01 +0000</pubDate>
		<dc:creator>Oliver Mason</dc:creator>
				<category><![CDATA[erlang]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://omlog.wordpress.com/?p=138</guid>
		<description><![CDATA[I am currently working on a project that requires processing documents. As part of that I wanted to use term weighting as used in information retrieval (IR); the individual texts I&#8217;m working with are of course of different lengths and contain different sets of words, and I didn&#8217;t want that to mess things up as [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=138&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I am currently working on a project that requires processing documents. As part of that I wanted to use term weighting as used in information retrieval (IR); the individual texts I&#8217;m working with are of course of different lengths and contain different sets of words, and I didn&#8217;t want that to mess things up as it did when I initially worked with raw token frequencies only.</p>
<p>What I actually wanted is <a href="http://en.wikipedia.org/wiki/Tf%E2%80%93idf">tf-idf</a>, the product of term frequency (tf) and inverted document frequency (idf); essentially you see how often a word/term/token occurs in a text, and multiply that with a measure of how &#8216;bursty&#8217; it is. The idea being that common words (<em>the</em>, <em>of</em>, <em>and</em> etc) occur in pretty much every document and are thus useless for categorisation of the content. In a way it is a more sophisticated approach to using a stop word list. Sophisticated because you don&#8217;t have to create such a list, and it is also not binary include/exclude, but assigns each token a continuous weight depending on its distribution.</p>
<p><strong>Term Frequency</strong></p>
<p>This is simply the relative frequency of occurrence, the number of times a token occurs in the text divided by the text length. As input I assume that the text has already been tokenised and is represented as a list of tokens. The output should be a dictionary (ie a set of key/value tuples) with each token as a key and its <em>tf</em> as the value:</p>
<pre>
term_freq(Text) -&gt;
    term_freq(Text, 0, dict:new()).

term_freq([], Sum, Dict) -&gt;
    dict:map(
        fun(_Key, Value) -&gt; Value / Sum end,
        Dict);

term_freq([Token|Rest], Sum, Dict) -&gt;
    term_freq(Rest, Sum+1,
        dict:update_counter(Token,1,Dict)).
</pre>
<p>In case another token is available, I simply update its frequency by one, add one to the text size, and re-run the function on the rest of the text. If no more tokens are left, then I map the dictionary (which at this point contains absolute frequencies) to another dictionary by way of dividing each value by the text size; this new dictionary is then returned.</p>
<p><strong>Inverted Document Frequency</strong></p>
<p>For the <em>idf</em> I count how many documents each token occurs in, and divide the total number of documents by that number; so the rarer the token, the larger the resulting value. The token <em>the</em> should just give a result of 1.0; however, to make it a bit more complicated we then take the logarithm (base-10) of it, so that the final value will be greater than or equal to zero.</p>
<p>This time the input is a list of dictionaries, one for each document. The dictionary representing each document is the output of our <tt>term_freq/1</tt> function, ie the keys are the tokens, and the values the term frequencies. We don&#8217;t really care about the frequencies here, as they all will be greater than zero &#8211; a word that does not occur in a text will not be a key in the respective dictionary. As output we will have a single dictionary of all tokens that occur in our document collection, with the values being the <em>idf</em> of each token.</p>
<pre>
inv_doc_freq(Docs) -&gt;
    inv_doc_freq(Docs, 0, dict:new()).

inv_doc_freq([], DocNum, Dict) -&gt;
    dict:map(
        fun(_Key, Value) -&gt; math:log10(DocNum/Value) end,
        Dict);

inv_doc_freq([Doc|Rest], DocNum, Dict) -&gt;
    inv_doc_freq(Rest, DocNum+1,
        dict:fold(
            fun(Key, _Value, AccIn) -&gt;
               dict:update_counter(Key,1,AccIn) end,
            Dict,
            Doc)
    ).
</pre>
<p>Again we iterate over all elements of our input list (ie the documents), and this time we iterate over all tokens of the document using a <tt>dict:fold/3</tt> function, by adding 1 to the count for each token of the current document that we have already encountered, or entering it with a frequency of 1 if we haven&#8217;t yet.  We also increment the document count by 1. This time the <tt>dict:map/2</tt> function performs the calculation for the <em>idf</em> value as soon as we have reached the end of our document list.</p>
<p><strong>tf-idf</strong></p>
<p>At this stage we have a dictionary for each document containing the term frequencies, and a dictionary for the whole document collection containing the inverted document frequencies for all the tokens. Combining the two we then get the value for the <em>tf-idf</em>, which is different for each document (so the output is a list of dictionaries, one per document).</p>
<p>To make things easier, the call to compute the <em>idf</em> is integrated into the <tt>tf_idf/1</tt> function, so the input is the same as for the <tt>inv_doc_freq/1</tt> function, a list of term frequency dictionaries:</p>
<pre>tf_idf(Docs) -&gt;
    Idfs = inv_doc_freq(Docs),
    lists:map(
        fun(TFs) -&gt; dict:map(
            fun(Key,Value) -&gt; Value *
                dict:fetch(Key, Idfs) end,
            TFs) end,
        Docs).
</pre>
<p>Here we map the list of term frequency dictionaries (<tt>Docs</tt>) to a list of dictionaries containing the <em>tf-idf</em> values. For this mapping we map each (document) term frequency dictionary to the respective (document) <em>tf-idf</em> dictionary by multiplying each token&#8217;s term frequency by its <em>idf</em> value as computed by <tt>inv_doc_freq/1</tt>.</p>
<p><strong>Summary</strong></p>
<p>Calculating a set of values from texts is very concise with Erlang. In languages like C or Java one would have to code various (nested) loops, but this can easily be accomplished by using the <tt>map</tt> and <tt>fold</tt> functions that operate on lists and dictionaries in Erlang. It does need a bit of mental acrobatics, but if you are familiar with Prolog, then the basic structure of an Erlang program is not too difficult to follow. It&#8217;s those nested mappings that sometimes can be a little confusing.</p>
<p>The beauty of Erlang, of course, is that each <tt>map</tt> can be done in parallel; if you have a large list of documents and a processor with several cores then it is not hard to make use of its full power by simply using a parallel map function. To do this in other languages where nested loops are used in place of the map function is not trivial.</p>
<p>So Erlang is not only very concise, but it can also be future-proof by allowing easy concurrency.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/omlog.wordpress.com/138/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/omlog.wordpress.com/138/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/omlog.wordpress.com/138/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/omlog.wordpress.com/138/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/omlog.wordpress.com/138/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/omlog.wordpress.com/138/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/omlog.wordpress.com/138/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/omlog.wordpress.com/138/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/omlog.wordpress.com/138/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/omlog.wordpress.com/138/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/omlog.wordpress.com/138/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/omlog.wordpress.com/138/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/omlog.wordpress.com/138/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/omlog.wordpress.com/138/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=138&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" /><div class="sharedaddy"></div>]]></content:encoded>
			<wfw:commentRss>http://omlog.wordpress.com/2010/10/11/elegant-ir-with-erlang/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b3e5e6b5ecd2707930a109a46c0cfafe?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ojmason</media:title>
		</media:content>
	</item>
		<item>
		<title>On Planning and Reality</title>
		<link>http://omlog.wordpress.com/2010/06/03/on-planning-and-reality/</link>
		<comments>http://omlog.wordpress.com/2010/06/03/on-planning-and-reality/#comments</comments>
		<pubDate>Thu, 03 Jun 2010 21:08:12 +0000</pubDate>
		<dc:creator>Oliver Mason</dc:creator>
				<category><![CDATA[Apple]]></category>
		<category><![CDATA[iphone]]></category>
		<category><![CDATA[objective-c]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://omlog.wordpress.com/?p=131</guid>
		<description><![CDATA[When I got my iPhone a little more than a year ago, and started developing programs for it, I had a clear idea what my first program was going to be. However, as always, things turn out quite different from how you think they are going to be&#8230; First, it did take me a bit [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=131&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>When I got my iPhone a little more than a year ago, and started developing programs for it, I had a clear idea what my first program was going to be. However, as always, things turn out quite different from how you think they are going to be&#8230;</p>
<p>First, it did take me a bit to get used to Objective-C. Not because it is very different from Java (I used to program in C after all before Java came along), but because all the classes in the Cocoa framework need to be learned. There are subtle differences between those and their Java cousins, and after a bit more experience I believe that the Cocoa classes are actually more powerful and easier to use than their Java counterparts.</p>
<p>Some teething troubles, lack of automatic memory management on the iPhone, and a surfeit of squa brackets meant further delays. Finally I had a program written, but it needed more work on the graphics side, artwork and so on. The stuff that really makes a difference, but is very time-consuming and hard if you&#8217;re not used to using graphics software. So the easier way out was to write a different program, which is lighter on the artwork.</p>
<p>This then was a todo-list program, which is also suitable for planning small projects. I wanted a program like that, but didn&#8217;t want to fork out the money for <a href="http://culturedcode.com/things/iphone/">Things</a>, which also looked a bit like overkill. On the life hack blog I read an article by <a href="http://dustinwax.com/">Dustin Wax</a> <a href="http://www.lifehack.org/articles/productivity/getting-ready-for-2010-my-moleskine-setup.html">on his moleskine setup</a>, and that seemed like something usable, which I then went about implementing as an iPhone app. With a bit of help from a friend with the icon design, and thanks to freely available sound files and icons, <a href="http://phrasys.net/apps/planner/">ePlanner</a> was born.</p>
<p>In ePlanner I tried out Core Data, which is really a lot easier than messing about with SQLite directly. It uses both tabs and navigation views, and a lot of tables. I found it rather tedious in that all the classes were almost identical, but only almost, not 100%, and it&#8217;s hard to see how that could be changed. The behaviour of those classes is ever so slightly different.</p>
<p>The submission procedure was very easy, thanks to <a href="http://www.idev101.com/code/Distribution/">a description I found on the web</a>. My app did get rejected, due to a crash on a 3GS; but I don&#8217;t have a 3GS, so I could only test it on a 3G and an iPod touch. Thanks to Instruments I could track down the error, which was of course a memory management issue, but one without consequences on the machines I could test it on. After that was changed, the app went through, and has indeed been bought by people all over the world.</p>
<p>It is really a nice feeling to think that someone in Argentina is using my app, as is someone in Hong Kong, some people in the US, Sweden, etc. I used some free Google advertising at the beginning, but that is really expensive, though when I stopped it, sales began to trail off. But that could also have been an effect of it slipping out of the &#8216;newly released&#8217; slots.</p>
<p>It is indeed not too hard coming up with a program that does sell. The overall process is not too hard, though there were some frustrating moments battling with the various code signing and certificate issues that Apple requires. </p>
<p>I since have bought an iPad, and am thinking of porting ePlanner to this; however, I&#8217;ll give it a while so that I get used to how the iPad works. Knowing your way round the platform makes it a lot easier to develop good software, and I am not yet sure how the UI design for the small iPhone screen can best be translated to the iPad&#8217;s larger display. But it will come, and I will describe the process on this blog&#8230;! </p>
<p>In the meantime, I will re-visit some of my previous program ideas, as it is really not hard to turn them into something that will end up in the App Store, and it is really satisfying to do so.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/omlog.wordpress.com/131/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/omlog.wordpress.com/131/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/omlog.wordpress.com/131/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/omlog.wordpress.com/131/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/omlog.wordpress.com/131/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/omlog.wordpress.com/131/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/omlog.wordpress.com/131/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/omlog.wordpress.com/131/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/omlog.wordpress.com/131/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/omlog.wordpress.com/131/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/omlog.wordpress.com/131/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/omlog.wordpress.com/131/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/omlog.wordpress.com/131/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/omlog.wordpress.com/131/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=131&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" /><div class="sharedaddy"></div>]]></content:encoded>
			<wfw:commentRss>http://omlog.wordpress.com/2010/06/03/on-planning-and-reality/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b3e5e6b5ecd2707930a109a46c0cfafe?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ojmason</media:title>
		</media:content>
	</item>
		<item>
		<title>Single user vs Multi user: how will the iPad work?</title>
		<link>http://omlog.wordpress.com/2010/03/17/single-user-vs-multi-user-how-will-the-ipad-work/</link>
		<comments>http://omlog.wordpress.com/2010/03/17/single-user-vs-multi-user-how-will-the-ipad-work/#comments</comments>
		<pubDate>Wed, 17 Mar 2010 15:19:56 +0000</pubDate>
		<dc:creator>Oliver Mason</dc:creator>
				<category><![CDATA[Apple]]></category>
		<category><![CDATA[iPad]]></category>
		<category><![CDATA[iphone]]></category>

		<guid isPermaLink="false">http://omlog.wordpress.com/?p=128</guid>
		<description><![CDATA[Note: this post is somewhat speculative, as I obviously do not have an iPad (yet!). It is simply an observation that got me thinking about how it is going to be used, and how that possible usage will influence the user experience. I suspect that in our house the iPad will be a multi-user gadget. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=128&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong>Note</strong>: this post is somewhat speculative, as I obviously do not have an iPad (yet!). It is simply an observation that got me thinking about how it is going to be used, and how that possible usage will influence the user experience.</p>
<p>I suspect that in our house the iPad will be a multi-user gadget.</p>
<p>I have an iPhone, my wife has got an iPod touch, and the kids currently use a clapped-out ancient Sony Vaio laptop whose disk is about to fail. Everybody has their own device to do things on. However, this is likely going to change when we acquire an iPad. I envisage this as being a general device that just lies on the sitting room table, to be picked up by whoever wants to use it for reading their email, checking something on Wikipedia, playing a quick game, adding an entry to a calendar, looking up a phone number, and so on.</p>
<p>When I check mail on my iPhone, it is set up to look at my email accounts. Similarly, the calendar is sync&#8217;d with my general calendar, and the address book is too. I don&#8217;t (and cannot) easily switch between identities (though it is possible to do so with mail and calendar). Some games that I play on my iPhone store their state in case of interruption, and I can resume them later. The same applies to other applications, which usually have one set of data they work with.</p>
<p>This is OK for a phone.  Typically you have a phone, and you&#8217;re the only one using it, otherwise it&#8217;d lose some of its usefulness if you don&#8217;t know who you will reach when calling a particular number. But if the iPad is shared between people, how can I avoid reading my wife&#8217;s email, or swamping her address book with all my student email addresses (which Google puts into it automatically)? If I play a game, and then have to stop and go back to it later, what if one of my daughters wants to have a go in-between? She might not only end the game I&#8217;m playing, but also mess up my high-score records. My todo-list application only has one set of todos, so what if I look at it and suddenly find &#8220;Feed my puffle on Club Penguin&#8221; on top of my priorities?</p>
<p>On a Unix system you have different user accounts, and you log in and out; this avoids the problem on Mac OS X. But logging in and out is tricky if a system is shared. If I forget to log out when I interrupt my task, and the device is locked, nobody else can use it until I have come back. And how can this be made <em>easy</em>, without constantly having to remember a user name and password?</p>
<p>I somehow suspect that this will be an issue for which there is no satisfactory solution. But if the iPad is to become a general household item like a television or a radio, then there needs to be some non-intrusive way that allows easy sharing. I&#8217;m looking forward to finding out what Apple came up with here&#8230;!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/omlog.wordpress.com/128/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/omlog.wordpress.com/128/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/omlog.wordpress.com/128/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/omlog.wordpress.com/128/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/omlog.wordpress.com/128/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/omlog.wordpress.com/128/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/omlog.wordpress.com/128/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/omlog.wordpress.com/128/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/omlog.wordpress.com/128/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/omlog.wordpress.com/128/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/omlog.wordpress.com/128/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/omlog.wordpress.com/128/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/omlog.wordpress.com/128/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/omlog.wordpress.com/128/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=128&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" /><div class="sharedaddy"></div>]]></content:encoded>
			<wfw:commentRss>http://omlog.wordpress.com/2010/03/17/single-user-vs-multi-user-how-will-the-ipad-work/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b3e5e6b5ecd2707930a109a46c0cfafe?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ojmason</media:title>
		</media:content>
	</item>
		<item>
		<title>Go &#8211; Went &#8211; Gone</title>
		<link>http://omlog.wordpress.com/2009/12/30/go-went-gone/</link>
		<comments>http://omlog.wordpress.com/2009/12/30/go-went-gone/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 15:48:47 +0000</pubDate>
		<dc:creator>Oliver Mason</dc:creator>
				<category><![CDATA[erlang]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://omlog.wordpress.com/?p=125</guid>
		<description><![CDATA[I did play around with the unhelpfully named &#8216;go&#8217; programming language, another output of the don&#8217;t-be-evil company. Trying to find any web resources for it is pretty much impossible, for one thing because it was too new, and then because of the name. I would have expected something more search-friendly from the number 1 web [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=125&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I did play around with the unhelpfully named &#8216;go&#8217; programming language, another output of the don&#8217;t-be-evil company. Trying to find any web resources for it is pretty much impossible, for one thing because it was too new, and then because of the name.  I would have expected something more search-friendly from the number 1 web search engine!</p>
<p>There were a few things I liked about go.  It&#8217;s smallish, C-like, has garbage collection, built-in support for concurrency, and unicode strings.  Hash-tables (&#8216;maps&#8217;) as a first-class data type. A nicely-looking set of libraries for all sorts of purposes.  Not quite fast, but with lots of scope for performance improvements. No header files. First class support for unit tests.</p>
<p>This was looking attractive as opposed to Erlang, which is older and more mature/stable, but still not very high-performance, has slightly awkward string handling, and exactly three data types (list, tuple, atom).  And a Prolog-style syntax with a number of inconveniences about the use of commas, semicolons, and full stops.  Editing a clause is never straightforward.</p>
<p>I have since abandoned go again. It also has inconsistencies (the use of &#8216;new&#8217; for some data types and &#8216;make&#8217; for others), and worst of all, there was so much talk about wanting to add generics to the language that I fear they will become a feature of it.  I don&#8217;t like generics: they seem to me to be more trouble than it&#8217;s worth.  They make code really hard to read, and inflexible.  They might make some kinds of bugs impossible, but in my view that is a feeble gain for wrecking a language.  As Knuth (I think) said, part of writing programs is aesthetics.  I cannot like Java code full of abstract type annotations. Objective-C is so much cleaner in comparison.  And so was go, until now.</p>
<p>Another reason is the concurrency support. Go uses pipes for that, which seems awkward.  I much prefer Erlang&#8217;s mailboxes, which neatly work together with pattern matching to respond to certain messages and ignore others.  You do not need to worry about the order in which messages arrive as much, and the whole communication process is a lot easier with only the basic data types.</p>
<p>So I&#8217;m going back to Erlang. I will dig out the string library that I started, and get back into thinking recursively.  At least I know where I am with it, and it is not suddenly going to change!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/omlog.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/omlog.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/omlog.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/omlog.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/omlog.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/omlog.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/omlog.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/omlog.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/omlog.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/omlog.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/omlog.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/omlog.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/omlog.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/omlog.wordpress.com/125/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=125&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" /><div class="sharedaddy"></div>]]></content:encoded>
			<wfw:commentRss>http://omlog.wordpress.com/2009/12/30/go-went-gone/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b3e5e6b5ecd2707930a109a46c0cfafe?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ojmason</media:title>
		</media:content>
	</item>
		<item>
		<title>Sentence Disambiguation &#8211; Modality to the Rescue!</title>
		<link>http://omlog.wordpress.com/2009/11/12/sentence-disambiguation-modality-to-the-rescue/</link>
		<comments>http://omlog.wordpress.com/2009/11/12/sentence-disambiguation-modality-to-the-rescue/#comments</comments>
		<pubDate>Thu, 12 Nov 2009 16:56:08 +0000</pubDate>
		<dc:creator>Oliver Mason</dc:creator>
				<category><![CDATA[linguistics]]></category>

		<guid isPermaLink="false">http://omlog.wordpress.com/?p=120</guid>
		<description><![CDATA[I&#8217;m currently reading a new book on iPhone development, iPhone Advanced Projects by Apress. I will probably talk about that book in a later post, but today I will just focus on one sentence I came across on page 212: I also adore the capability that I have to flag articles from folks I follow [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=120&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m currently reading a new book on iPhone development, <a href="http://www.amazon.co.uk/gp/product/1430224037?ie=UTF8&amp;tag=phrasysnlp-21&amp;linkCode=as2&amp;camp=1634&amp;creative=6738&amp;creativeASIN=1430224037">iPhone Advanced Projects</a><img src="http://www.assoc-amazon.co.uk/e/ir?t=phrasysnlp-21&amp;l=as2&amp;o=2&amp;a=1430224037" width="1" height="1" border="0" alt="" style="border:none!important;margin:0!important;" /> by Apress. I will probably talk about that book in a later post, but today I will just focus on one sentence I came across on page 212:</p>
<p><em>I also adore the capability that I have to flag articles from folks I follow on Twitter and save them to Instapaper.</em></p>
<p>This sentence has (at least) two readings, which are probably only obvious to a linguist (and who else would care?); I highlight the differences by adding commas:</p>
<ol>
<li>I adore the capability, that I have to flag articles&#8230;</li>
<li>I adore the capability that I have, to flag articles&#8230;</li>
</ol>
<p>In the first case you adore the capability.  And the capability is that you have to do something (flag articles).  Sounds rather odd, doesn&#8217;t it?  The second case is more clear-cut and easy to understand: you can flag articles, and that&#8217;s the capability you have and adore.</p>
<p>So in terms of <a href="https://arts-ccr-002.bham.ac.uk/ccr/patgram/">pattern grammar</a>, you&#8217;re either looking at <strong>N that</strong> or <strong>N <em>to</em>-inf</strong> with <em>capability</em>.   If you consult the <a href="http://www.amazon.co.uk/gp/product/1424008255?ie=UTF8&amp;tag=phrasysnlp-21&amp;linkCode=as2&amp;camp=1634&amp;creative=6738&amp;creativeASIN=1424008255">Cobuild Dictionary</a><img src="http://www.assoc-amazon.co.uk/e/ir?t=phrasysnlp-21&amp;l=as2&amp;o=2&amp;a=1424008255" width="1" height="1" border="0" alt="" style="border:none!important;margin:0!important;" />, you&#8217;ll find that <em>capability</em>  only occurs with the second pattern, the to-infinitive, so that you can rule out the first reading.</p>
<p>Another possibility would be to look at it in terms of <a href="http://en.wikipedia.org/wiki/English_modal_auxiliary_verb">modality</a>: here we could argue that <em>capability</em> prospects a modality of ability, but <em>have to</em> expresses obligation; the two don&#8217;t go together.  Hence the first reading sounds odd, as a capability does not usually force you to do anything, but rather enables you.  It could, however, be used to signal sarcasm or irony, as in (the obviously made up) <em>I really like that my new computer gives me the capability to have to save my work every five minutes.</em>  This is clearly an odd sentence, suggesting that modality works along similar lines as <a href="http://en.wikipedia.org/wiki/Discourse_prosody">discourse prosody</a> as described by Louw (1993) [for the full reference follow the previous link].</p>
<p>Here we have discussed two ways of disambiguating a sentence, one based on grammatical properties (or typical environments), and one on a non-syntactic phenomenon (modality).  Pattern grammar allows us to identify what the typical usage would be, whereas modality explains to us why the first reading is at odds with the corresponding words. Now all we need is a &#8216;pattern grammar&#8217; for modality!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/omlog.wordpress.com/120/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/omlog.wordpress.com/120/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/omlog.wordpress.com/120/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/omlog.wordpress.com/120/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/omlog.wordpress.com/120/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/omlog.wordpress.com/120/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/omlog.wordpress.com/120/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/omlog.wordpress.com/120/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/omlog.wordpress.com/120/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/omlog.wordpress.com/120/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/omlog.wordpress.com/120/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/omlog.wordpress.com/120/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/omlog.wordpress.com/120/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/omlog.wordpress.com/120/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=omlog.wordpress.com&amp;blog=2985839&amp;post=120&amp;subd=omlog&amp;ref=&amp;feed=1" width="1" height="1" /><div class="sharedaddy"></div>]]></content:encoded>
			<wfw:commentRss>http://omlog.wordpress.com/2009/11/12/sentence-disambiguation-modality-to-the-rescue/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b3e5e6b5ecd2707930a109a46c0cfafe?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ojmason</media:title>
		</media:content>

		<media:content url="http://www.assoc-amazon.co.uk/e/ir?t=phrasysnlp-21&#38;l=as2&#38;o=2&#38;a=1430224037" medium="image" />

		<media:content url="http://www.assoc-amazon.co.uk/e/ir?t=phrasysnlp-21&#38;l=as2&#38;o=2&#38;a=1424008255" medium="image" />
	</item>
	</channel>
</rss>
