<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Anterotesis &#187; digital humanities</title>
	<atom:link href="http://anterotesis.com/wordpress/tag/digital-humanities/feed/" rel="self" type="application/rss+xml" />
	<link>http://anterotesis.com/wordpress</link>
	<description>Answering one question with another</description>
	<lastBuildDate>Fri, 18 May 2012 07:51:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Mapping Petersburg</title>
		<link>http://anterotesis.com/wordpress/2011/04/mapping-petersburg/</link>
		<comments>http://anterotesis.com/wordpress/2011/04/mapping-petersburg/#comments</comments>
		<pubDate>Thu, 07 Apr 2011 10:34:32 +0000</pubDate>
		<dc:creator>johnl</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[My Projects]]></category>
		<category><![CDATA[dostoevsky]]></category>
		<category><![CDATA[literature]]></category>
		<category><![CDATA[maps]]></category>
		<category><![CDATA[petersburg]]></category>
		<category><![CDATA[russia]]></category>

		<guid isPermaLink="false">http://anterotesis.com/wordpress/?p=387</guid>
		<description><![CDATA[After months of work, Mapping Petersburg is now live! Built in collaboration with Dr Sarah J. Young, it is a pilot for a much larger project taking in two centuries of the Petersburg text. The aim is not only to &#8230; <a href="http://anterotesis.com/wordpress/2011/04/mapping-petersburg/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>After months of work, <a title="Mapping Petersburg" href="http://www.mappingpetersburg.org" target="_blank">Mapping Petersburg</a> is now live! Built in collaboration with <a title="Dr Sarah J. Young's blog" href="http://sarahjyoung.com/site/" target="_blank">Dr Sarah J. Young</a>, it is a pilot for a much larger project taking in two centuries of the Petersburg text. The aim is not only to investigate the actual writings, but also to see what tools and techniques are applicable to &#8216;literary cartography&#8217;, and to theorize just what it means to read a book in such a fashion. This test case focuses on Dostoevsky&#8217;s <em>Crime and Punishment</em>, plotting the places, events and characters of that novel upon a backdrop currently provided courtesy of Google via <a title="Mapstraction" href="http://www.mapstraction.com/" target="_blank">Mapstraction</a>.</p>
<p>Building the site has been an intense and rewarding experience, especially as the deadline drew closer, and one that requires mulling over. In the meantime, to go with <a title="Dr Sarah J Young on Mapping Petersburg" href="http://sarahjyoung.com/site/2011/04/06/mapping-petersburg/" target="_blank">Dr Young&#8217;s first thoughts</a>, here are eight things I learned from it:</p>
<p>1: Data sets are hard. It&#8217;s painstaking work generating data, especially from an unstructured, subjective text like <em>Crime and Punishment</em>.</p>
<p>2: Get into the source. The first map took ages to make, hand-coded as it was. But being close up to the code taught me alot.</p>
<p>3: A little code goes a long way. The first script to automate data-plotting took ages to write. But once it was done, I was able to generate a map in a few minutes.</p>
<p>4: We need research, theory, design. There are many possibilities when making maps, and even something seemingly simple, like icons, requires a lot of thought.</p>
<p>5: We need documentation. There were a number of promising tools that had to be put aside, because without documentation they were little more than black boxes. No, source code isn&#8217;t enough. And similarly, it behoves the ethical webmaker to describe how they constructed their site.</p>
<p>6: Geo-rectification is complex. We had wanted to use maps contemporary to Dostoevsky, but ran into all sorts of difficulties. Tools like <a title="Mapwarper" href="http://warper.geothings.net/" target="_blank">Mapwarper</a> are great, but without understanding it, and understanding the mathematics behind it, I was unable to surmount the problems we faced.</p>
<p>7: Maps are to be read. They are not transparent depictions of place. One can just as much read the map through the book as the book through the map.</p>
<p>8: The Digital Humanities is all about <a title="Stephen Ramsey &quot;On Building&quot;" href="http://lenz.unl.edu/wordpress/?p=340" target="_blank">building things</a>. The experience of doing is irreplaceable and inexhaustible.</p>
]]></content:encoded>
			<wfw:commentRss>http://anterotesis.com/wordpress/2011/04/mapping-petersburg/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Digital Humanities GIS projects</title>
		<link>http://anterotesis.com/wordpress/2011/03/digital-humanities-gis-projects/</link>
		<comments>http://anterotesis.com/wordpress/2011/03/digital-humanities-gis-projects/#comments</comments>
		<pubDate>Wed, 16 Mar 2011 22:49:40 +0000</pubDate>
		<dc:creator>johnl</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[digital history]]></category>
		<category><![CDATA[gis]]></category>
		<category><![CDATA[maps]]></category>

		<guid isPermaLink="false">http://anterotesis.com/wordpress/?p=354</guid>
		<description><![CDATA[Being involved in a number of projects with a spatial dimension, I&#8217;ve been teaching myself digital cartography for over a year. The code, however, is only half the story. Maps are not transparent depictions of reality, there are many problems, &#8230; <a href="http://anterotesis.com/wordpress/2011/03/digital-humanities-gis-projects/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Being involved in a number of projects with a spatial dimension, I&#8217;ve been teaching myself digital cartography for over a year. The code, however, is only half the story. Maps are not transparent depictions of reality, there are many problems, conceptual and technical, with combining older mapping technologies with modern cartography, and let&#8217;s not even get started on the problems of usability (the computer screen is as difficult as manipulating a fold-out map or an A-Z book).</p>
<p>One part of answering these questions is simply looking at what others are doing. So I&#8217;ve begun to <a title="List of DH GIS projects" href="http://anterotesis.com/wordpress/dh-gis-projects/">compile a list</a> of Digital Humanities projects where GIS (Geographical Information Systems) has a leading part. Aside from my own bookmarks, I&#8217;ve drawn on two similar lists: that at <a title="Historical GIS Research Network" href="http://www.hgis.org.uk/resources.htm" target="_blank">Historical GIS Research Network</a> and the <a title="AAG Historical GIS Clearing House" href="http://www.aag.org/cs/projects_and_programs/historical_gis_clearinghouse/hgis_projects_programs" target="_blank">AAG Historical GIS Clearing House</a>. It is a list of <em>academic</em> projects: although there are many excellent extra-mural mapping projects I specifically wanted to see how the digital and the humanities are combining in the university. It is also heavily weighted towards history and literary studies, as those are what I am involved in and know about. Please tell me of any other projects through the comments.</p>
<p>I&#8217;ve used GIS in a rather loose way, taking in what has been termed &#8216;neogeography&#8217; and &#8216;webmapping.&#8217; A couple of the projects I&#8217;ve listed don&#8217;t even aim to produce maps, but gazetteers of old place names, and utilize text processing technologies rather than anything that could be considered GIS. Part of this exercise is to see how space and place are being analysed, and what technologies are being used to do so; GIS seemed a useful catch-all term. I hope the purists will forgive me.</p>
<p>This list takes a snapshot of the state of the &#8216;spatial turn&#8217; in (some of) the (digital) humanities up to early 2011. The technologies used fall into four types: flash animations, Google Maps, server-side delivery and old-style downloadable shapefiles. The focus is frequently based on geographical units &#8211; cities, regions, countries, continents &#8211; and less often on particular subjects. Suprisingly, there&#8217;s only one project on the <a title="Holocaust Geographies at Stanford" href="http://www.stanford.edu/group/spatialhistory/cgi-bin/site/project.php?id=1015" target="_blank">Holocaust</a> and that barely begun; I heard of two other projects, but both seem to be defunct. Further analysis will follow as time allows.</p>
<p>Thanks to all who responded to my query on the Humanist list; the relevant postings can be found in the <a title="Humanist email list, March 2011" href="http://lists.digitalhumanities.org/pipermail/humanist/2011-March/thread.html" target="_blank">March 2011 archives</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://anterotesis.com/wordpress/2011/03/digital-humanities-gis-projects/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Victorian Books: The Frequency of Revolution</title>
		<link>http://anterotesis.com/wordpress/2011/02/victorian-books-the-frequency-of-revolution/</link>
		<comments>http://anterotesis.com/wordpress/2011/02/victorian-books-the-frequency-of-revolution/#comments</comments>
		<pubDate>Tue, 01 Feb 2011 00:07:18 +0000</pubDate>
		<dc:creator>johnl</dc:creator>
				<category><![CDATA[digital history]]></category>
		<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[history]]></category>
		<category><![CDATA[revolution]]></category>
		<category><![CDATA[text mining]]></category>
		<category><![CDATA[victorian]]></category>

		<guid isPermaLink="false">http://anterotesis.com/wordpress/?p=235</guid>
		<description><![CDATA[Opened to the public late last year was the long awaited Victorian Books, &#8216;a Distant Reading of Victorian Publications.&#8217; Working with data from Google Books,  Dan Cohen and Fred Gibbs are text mining every book published in Britain in the &#8230; <a href="http://anterotesis.com/wordpress/2011/02/victorian-books-the-frequency-of-revolution/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Opened to the public late last year was the long awaited <a title="Victorian Books" href="http://victorianbooks.org/" target="_blank">Victorian Books</a>, &#8216;a Distant Reading of Victorian Publications.&#8217; Working with data from Google Books,  <a title="Dan Cohen's homepage" href="http://www.dancohen.org/">Dan Cohen</a> and Fred Gibbs are text mining every book published in Britain in the long (meaning 1789 to 1914) nineteenth century. That&#8217;s 1,681,161 titles. And they&#8217;re releasing the data, not just the <a title="Victorian Books: The graphs" href="http://victorianbooks.org/words-in-titles-1789-1914/" target="_blank">graphs</a> showing the frequency of selected words, from &#8216;Agnosticism&#8217; to &#8216;Worship&#8217;, but also the <a title="Victorian Books: The data" href="http://victorianbooks.org/open-access-data/" target="_blank">actual counts</a> of 99 terms, in .xls (Microsoft Excel*) and .tsv (tab separated) formats.</p>
<p>Cohen&#8217;s specific historical object is the Victorian &#8216;frame of mind.&#8217; How did they think, how did they see the world, and how did they believe? His method is to use Google&#8217;s vast digitization program to read the Victorians, or at least those who were published, <em>en masse</em>, rather than rely on a canon of notable authors. The move from the anecdotal and elite selection of Houghton&#8217;s <em><a title="Open Library Record for Houghton, Victorian Frame of Mind" href="http://openlibrary.org/works/OL8070678W/The_Victorian_frame_of_mind_1830-1870" target="_blank">The Victorian Frame of Mind, 1830-1870</a>, </em>to a truly comprehensive survey of all Victorian authors, will hopefully give a broader, more accurate and more subtle view of Victorian modes of thought, and perhaps a more open one that allows for discordance and diversity.</p>
<p>This isn&#8217;t a simple matter of chucking a load of material into a database, pushing a button and then having the computer throw out unambiguous facts and truths. Cohen and Gibbs have posted <a title="Victorian Books: Some Caveats" href="http://victorianbooks.org/some-caveats/" target="_blank">some caveats</a>: the data isn&#8217;t perfect, meaning of words change over time, as yet only the titles of books are being mined, no collocation or context is given. It also requires some careful methodology, and weighing for all sorts of extraneous factors: <a title="Victorian Book Title Statistics" href="http://wmbriggs.com/blog/?p=3252" target="_blank">William Briggs </a>has done some very interesting analysis bringing in population statistics. But with freely available data, anyone with a spreadsheet program can try out ideas and run checks, allowing for the collaborative development of analytical techniques.</p>
<div id="attachment_265" class="wp-caption aligncenter" style="width: 370px"><a href="http://anterotesis.com/wordpress/wp-content/uploads/2010/11/revolutionchart.png"><img class="size-full wp-image-265 " title="revolutionchart" src="http://anterotesis.com/wordpress/wp-content/uploads/2010/11/revolutionchart.png" alt="Percentage of British books with 'revolution' in the title, 1789-1914" width="360" height="300" /></a><p class="wp-caption-text">Percentage of British books with &#39;revolution&#39; in the title, 1789-1914</p></div>
<p>Of the words Cohen and Gibbs have chosen, one stands out as being more <em>temporal</em> than the others: revolution. None of the other terms is so event-related, or has a specific chronological location. Many are abstract, like &#8216;God&#8217; or &#8216;honour&#8217;; some are names (&#8216;Aristotle, &#8216;Jesus&#8217;, &#8216;Plato&#8217; and &#8216;Socrates&#8217;); and there&#8217;s one place, Rome. That does not mean that there is no relation between these words and contemporary events &#8211; Rome has a startling <a title="Victorian Books graph for 'Rome'" href="http://chart.apis.google.com/chart?chxp=0,1790,1800,1810,1820,1830,1840,1850,1860,1870,1880,1890,1900,1910|1,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1&amp;chxr=0,1789,1914|1,0,1&amp;chxs=0,676767,11.5,0,lt,676767&amp;chxt=x,y&amp;chs=600x500&amp;cht=lc&amp;chco=3D7930&amp;chds=0,1&amp;chd=t:0.12,0.09,0.07,0.08,0.13,0.12,0.04,0.08,0.15,0.26,0.22,0.18,0.21,0.08,0.09,0.08,0.09,0.02,0.11,0.05,0.33,0.12,0.18,0.2,0.16,0.24,0.17,0.3,0.16,0.44,0.29,0.25,0.28,0.07,0.27,0.21,0.24,0.45,0.36,0.31,0.27,0.33,0.13,0.19,0.12,0.2,0.31,0.33,0.18,0.34,0.37,0.27,0.32,0.25,0.24,0.34,0.38,0.47,0.3,0.5,0.44,0.44,0.96,0.63,0.32,0.27,0.26,0.27,0.2,0.26,0.32,0.24,0.27,0.24,0.15,0.19,0.23,0.28,0.53,0.42,0.38,0.38,0.41,0.26,0.33,0.23,0.22,0.38,0.42,0.15,0.37,0.24,0.18,0.22,0.23,0.15,0.18,0.27,0.2,0.25,0.17,0.14,0.14,0.2,0.21,0.16,0.22,0.21,0.25,0.14,0.19,0.13,0.22,0.22,0.22,0.21,0.2,0.18,0.26,0.14,0.19,0.25,0.22,0.2,0.22,0.15&amp;chg=8,10,1,1,1,0&amp;chls=2,4,0&amp;chm=B,C5D4B5BB,0,0,0&amp;chtt=Rome&amp;chts=000000,16" target="_blank">peak in 1851</a>, possibly related to the French occupation in the aftermath of 1848. Nor does revolution refer only to moments of uprising; it can equally mean the movement of the planets and the development of industry (Google&#8217;s ngram machine has the latter taking off in the <a title="Google Ngram for 'industrial revolution'" href="http://ngrams.googlelabs.com/graph?content=industrial+revolution&amp;year_start=1789&amp;year_end=1914&amp;corpus=0&amp;smoothing=0" target="_blank">1880s</a>). But it is the only chosen term that has a specific chronological collorary. Although the project is oriented around more long-term and subtle concerns, the changes in Victorian mentalities, I began to wonder how much the data reflected more immediate responses to human affairs.</p>
<p>Unsurprisingly, in the case of revolution, we have a mass of titles registering in the 1790s, and a very sharp peak in 1848. There are two other clear spikes in 1817 and 1830/1. A little bit of scrutiny, and you&#8217;ll see that 1871, the year of the Paris Commune, shows a marked increase. From prior knowledge of revolutions and threats of them, we can validate the data as reflecting events. As yet the statistics are not telling us anything new. There are some differences if one visualizes the data as the number of publications rather than percentages. 1830-1 and 1848 still stand out, 1817 and the Paris Commune less so. There also seems to be a different distribution: the last few decades have far more occurrences more evenly distributed than the first half of the century.</p>
<div id="attachment_321" class="wp-caption aligncenter" style="width: 602px"><a href="http://anterotesis.com/wordpress/wp-content/uploads/2011/01/revolution.png"><img class="size-full wp-image-321  " title="Revolution in English book titles" src="http://anterotesis.com/wordpress/wp-content/uploads/2011/01/revolution.png" alt="Graph of no. of English books published 1789 - 1914 with 'Revolution' in the title" width="592" height="225" /></a><p class="wp-caption-text">Graph of no. of English books published 1789 - 1914 with &#39;Revolution&#39; in the title</p></div>
<p>Although it is important to check the data against what is already known, one must guard against presumptions of correlation. Can we be sure we know what revolution is being reflected? 1848 saw revolutions throughout Europe, but were the titles referring to all of them, a subset, or even just the domestic radicalism of the Chartists? Similarly, Cohen considers the <a title="Cohen, Searching for the Victorians" href="http://www.dancohen.org/2010/10/04/searching-for-the-victorians/" target="_blank">1830 spike</a> to point to &#8220;the successful 1830 revolution in France&#8221;; but given the figures for 1831, it could be a result of the turmoil preceding the reform act of 1832. <a title="Libcom articles on Merthy Tydfil uprising, 1831" href="http://libcom.org/library/1831-merthyr-tydfil-uprising" target="_blank">Merthyr Tydfil</a> saw perhaps the first industrial working class uprising in Britain; <a title="Spartacus Schoolnet on Bristol Riots of 1831" href="http://www.spartacus.schoolnet.co.uk/PRbristol.htm" target="_blank">Bristol</a> and <a title="People's Histreh booklet on the Nottingham Reform riots" href="http://peopleshistreh.wordpress.com/2010/10/19/to-the-castle-booklet/" target="_blank">Nottingham</a> saw state institutions go up in flames; there were incidents across the country, from Exeter to Huddersfield. The British publishing trade may have taken more note of this than three glorious days in Paris: the small rise around 1871 may also indicate that British publishing would register domestic concerns far more dramatically than events abroad. Against this, the jump in 1857 is probably due to the Indian Mutiny. In turn, the 1831 figures could indicate that the situation in Britain was far more volatile than todays historians have judged it.</p>
<p>So although there is evidence of a causal relationship between events and book titles, it is not transparent. It is further clouded by changes in the meaning of the word. The sustained increase over the last 25 years suggests a change in the conception of revolution from taking to the streets to building working class organizations, from riot and insurgency to factory strikes and the new unionism, from an immediate event to a longer term social struggle.  This indicates a fundamental change in class structure &#8211; the growth of an industrial proletariat &#8211; and consistent class antagonism. But note that events still affect the numbers: the increase from 1904 to 1905 is probably due to the first Russian revolution.</p>
<p>The greater concern with domestic events and the change in meaning of the word &#8216;revolution&#8217; are working hypotheses. Hopefully, the full corpus from which the numbers are drawn will be opened up, allowing these to be checked. I&#8217;d also like to investigate the 1853 spike, after the defeat of the Chartists and with no foreign correlate that I can think of.</p>
<p>Finally, a curious absence, and a warning against presuming an easy reflection of reality in words. The following graph is of the occurrence of the word &#8216;money&#8217; in book titles, expressed as a percentage.</p>
<div id="attachment_329" class="wp-caption aligncenter" style="width: 370px"><a href="http://anterotesis.com/wordpress/wp-content/uploads/2011/02/moneychart.png"><img class="size-full wp-image-329 " title="Money" src="http://anterotesis.com/wordpress/wp-content/uploads/2011/02/moneychart.png" alt="Money in Book Titles, 1789-1914" width="360" height="300" /></a><p class="wp-caption-text">Percentage of British books with &#39;money&#39; in the title, 1789-1914</p></div>
<p>See that dip for 1825? Yet there was a banking crisis that year!</p>
<p>* Insert standard complaint about proprietary file formats here. However, it&#8217;s a simple spreadsheet, and neither Open Office nor <a title="Libre Office, the free software office suite" href="http://www.libreoffice.org/download/" target="_blank">Libre Office</a> had any difficulties opening it.</p>
]]></content:encoded>
			<wfw:commentRss>http://anterotesis.com/wordpress/2011/02/victorian-books-the-frequency-of-revolution/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Google Ngram Games</title>
		<link>http://anterotesis.com/wordpress/2010/12/google-ngram-games/</link>
		<comments>http://anterotesis.com/wordpress/2010/12/google-ngram-games/#comments</comments>
		<pubDate>Fri, 17 Dec 2010 16:48:01 +0000</pubDate>
		<dc:creator>johnl</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[digital history]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[history]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[ngram]]></category>
		<category><![CDATA[split infinitive]]></category>
		<category><![CDATA[star trek]]></category>
		<category><![CDATA[swearing]]></category>
		<category><![CDATA[walter benjamin]]></category>

		<guid isPermaLink="false">http://anterotesis.com/wordpress/?p=282</guid>
		<description><![CDATA[Google have just opened up their text mining project, a vast and ambitious project to allow searching their digital library for the frequency of words and phrases. It&#8217;s an astonishing resource, not only for its research potential but also for &#8230; <a href="http://anterotesis.com/wordpress/2010/12/google-ngram-games/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Google have just opened up their text mining project, a vast and ambitious project to allow searching their digital library for the frequency of words and phrases. It&#8217;s an astonishing resource, not only for its research potential but also for its ludic possibilities, not to mention the time-frittering capabilities.</p>
<p>It&#8217;s easy to play. Just go to <a title="Google Ngrams home" href="http://ngrams.googlelabs.com/" target="_blank">http://ngrams.googlelabs.com/</a>, put in some words or phrases, separating multiples with commas, adjust the settings as you wish, and press the button. Up comes a graph showing distribution across your chosen time period. Thrill to the peaks! Gasp at the troughs! Wonder at the abundances! Curse the absences!</p>
<p>Like all good games, there&#8217;s much wisdom to be gained from playing. For one thing, it tests the sources, both the original works and their translation into digital format. (Some of Google&#8217;s metadata is <a title="Google Books: A Metadata Train Wreck" href="http://languagelog.ldc.upenn.edu/nll/?p=1701" target="_blank">bizarrely inaccurate</a>.) It makes one think about possible reasons: whatever the results of a query are, they never explain <em>why</em>. It questions the technologies of language, whether printed or digital, including orthography, typeface and grammar. (Note that it only covers  written language, not that spoken or sung, with rhythm, accents and inflections.) In all, it indicates the ambiguity and instability of language, as against any needs or claims of clarity and transparency of meaning.</p>
<p>Below are four jests: testing a grandiose claim, a revealing anachronism, typographical obscenity and the deleterious effects of popular culture on the Queen&#8217;s English.</p>
<h3>Benjamin was right!</h3>
<p>By searching on four major Western cities, we can see that <a title="Wikipedia: Walter Benjamin" href="http://en.wikipedia.org/wiki/Walter_Benjamin" target="_blank">Walter Benjamin</a> was right to consider Paris &#8216;<a title="Benjamin, Paris Capital of the 19th Century [PDF]" href="www.casbarcelona.org/BenjaminParis.pdf" target="_blank">the capital of the nineteenth century</a>&#8216; [pdf].</p>
<div id="attachment_293" class="wp-caption aligncenter" style="width: 910px"><a href="http://anterotesis.com/wordpress/wp-content/uploads/2010/12/cities19thcentury.png"><img class="size-full wp-image-293" title="19th century cities" src="http://anterotesis.com/wordpress/wp-content/uploads/2010/12/cities19thcentury.png" alt="Google Ngram for Paris, London, Berlin, New York, 1800-1900" width="900" height="330" /></a><p class="wp-caption-text">Google Ngram for Paris, London, Berlin, New York, 1800-1900</p></div>
<p><a title="Google Ngram for 4 c19th cities" href="http://ngrams.googlelabs.com/graph?content=paris%2Clondon%2Cberlin%2Cnew+york&amp;year_start=1800&amp;year_end=1900&amp;corpus=0&amp;smoothing=3" target="_blank">Link</a></p>
<p>And never mind population, trade, dominions and suchlike. But note that had I included &#8216;Rome&#8217;, Benjamin would have been refuted by the number of works on ancient history.</p>
<h3>Surrealism: A Victorian Creation?</h3>
<p>Although the word &#8216;surrealism&#8217; was originally coined by Apollinaire in 1917, and given substance in the 1920s by Andre Breton, Google finds it in mid-Victorian English:</p>
<div id="attachment_289" class="wp-caption aligncenter" style="width: 910px"><a href="http://anterotesis.com/wordpress/wp-content/uploads/2010/12/surrealismngram.png"><img class="size-full wp-image-289" title="Surrealism Ngram" src="http://anterotesis.com/wordpress/wp-content/uploads/2010/12/surrealismngram.png" alt="Google Ngram for 'surrealism.'" width="900" height="330" /></a><p class="wp-caption-text">Google Ngram for &#39;surrealism.&#39;</p></div>
<p><a title="Google Ngram for 'surrealism'" href="http://ngrams.googlelabs.com/graph?content=surrealism&amp;year_start=1840&amp;year_end=2000&amp;corpus=0&amp;smoothing=3" target="_blank">Link</a></p>
<p>There&#8217;s an interesting error behind this. In parsing <a title="Surrealism in 1860!" href="http://books.google.com/books?id=TFxFAAAAYAAJ&amp;pg=PA584&amp;dq=%22surrealism%22&amp;hl=en&amp;ei=ZokLTZ-rLaSJ4gaKl8DADA&amp;sa=X&amp;oi=book_result&amp;ct=result&amp;resnum=5&amp;ved=0CDoQ6AEwBA#v=onepage&amp;q=%22surrealism%22&amp;f=false" target="_blank">The London Review</a>, volume 15, 1860-1, the last hypenated word of one page, and the first of the next, the chapter heading rather than the continuation of the text proper, have been run together by Google&#8217;s OCR software. Such bugs, the &#8216;<a title="Revealing Errors blog" href="http://revealingerrors.com/">revealing errors</a>&#8216;  of logic, can be considered a manifestation of the surrealist spirit.</p>
<h3>FFS!</h3>
<p>Early modern printing is notorious for using &#8216;f&#8217; in place of &#8216;s.&#8217; Oh the comedic potential, as is born out by this graph:</p>
<div id="attachment_291" class="wp-caption aligncenter" style="width: 910px"><a href="http://anterotesis.com/wordpress/wp-content/uploads/2010/12/fuckngram.png"><img class="size-full wp-image-291" title="Fuck Ngram" src="http://anterotesis.com/wordpress/wp-content/uploads/2010/12/fuckngram.png" alt="Google Ngram for 'fuck.'" width="900" height="330" /></a><p class="wp-caption-text">Google Ngram for &#39;fuck.&#39;</p></div>
<p><a title="Google Ngram for 'fuck'" href="http://ngrams.googlelabs.com/graph?content=fuck&amp;year_start=1620&amp;year_end=2000&amp;corpus=0&amp;smoothing=3" target="_blank">Link</a></p>
<p>The alternative explanation is, of course, that we were a foulmouthed bunch until the Victorians, and only since the 1950s have we begun to throw off those moral shackles.</p>
<h3>Star Trek and the split infinitive</h3>
<p>That most controversial of grammatical issues had a historical turn in the 1980s, when &#8220;to boldly go&#8221; overtook &#8220;to go boldly.&#8221; A way of interrogating the corpus for the frequency of split infinitives isn&#8217;t obvious, but the results would be very interesting.</p>
<div id="attachment_286" class="wp-caption aligncenter" style="width: 910px"><a href="http://anterotesis.com/wordpress/wp-content/uploads/2010/12/startrekngram.png"><img class="size-full wp-image-286" title="The Star Trek Ngram" src="http://anterotesis.com/wordpress/wp-content/uploads/2010/12/startrekngram.png" alt="Google Ngram: 'to go boldly' and 'to boldly go.'" width="900" height="330" /></a><p class="wp-caption-text">Google Ngram: &#39;to go boldly&#39; and &#39;to boldly go.&#39;</p></div>
<p><a title="Google Ngram for Star Trek's split infinitive" href="http://ngrams.googlelabs.com/graph?content=to+boldly+go%2Cto+go+boldly&amp;year_start=1800&amp;year_end=2000&amp;corpus=0&amp;smoothing=3" target="_blank">Link.</a> <a title="Wikipedia: Split Infinitive" href="http://en.wikipedia.org/wiki/Split_infinitive" target="_blank">Wikipedia on Split Infinitives</a></p>
<p>Google have provided some basic, but literate, <a title="Google documentation for their Ngrams" href="http://ngrams.googlelabs.com/info" target="_blank">documentation</a>. And the <a title="Google Ngram datasets" href="http://ngrams.googlelabs.com/datasets" target="_blank">datasets</a> are freely available under a creative commons license. A <a title="Guardian on Google data-mining" href="http://www.guardian.co.uk/science/2010/dec/16/google-tool-english-cultural-trends" target="_blank">Guardian article</a> serves as a decent introduction, although it exaggerates the originality of the techniques. See also the <a title="New York Times on Googles ngrams" href="http://www.nytimes.com/2010/12/17/books/17words.html" target="_blank">New York Times</a>. And don&#8217;t ever, <em>ever</em>, use the pseudo-word &#8216;<a title="Interesting site, horrible name" href="http://www.culturomics.org/" target="_blank">culturomics</a>.&#8217;</p>
]]></content:encoded>
			<wfw:commentRss>http://anterotesis.com/wordpress/2010/12/google-ngram-games/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Visualizing the Gnu GPL</title>
		<link>http://anterotesis.com/wordpress/2010/08/visualizing-the-gpl/</link>
		<comments>http://anterotesis.com/wordpress/2010/08/visualizing-the-gpl/#comments</comments>
		<pubDate>Wed, 18 Aug 2010 09:13:35 +0000</pubDate>
		<dc:creator>johnl</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[free software]]></category>
		<category><![CDATA[gnugpl]]></category>
		<category><![CDATA[ucl ddh]]></category>
		<category><![CDATA[wordle]]></category>

		<guid isPermaLink="false">http://anterotesis.com/wordpress/?p=190</guid>
		<description><![CDATA[My suggestion for the Decoding Digital Humanities meeting has been accepted, by both the London and Melbourne groups, for next Tuesday (24th August) here in the Great Wen, and next Thursday (26th August) down under. I&#8217;m feeling the warm glow &#8230; <a href="http://anterotesis.com/wordpress/2010/08/visualizing-the-gpl/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>My <a title="Two Gnus" href="http://anterotesis.com/wordpress/2010/07/two-gnus-the-gnu-project-and-the-gnu-gpl/" target="_self">suggestion</a> for the Decoding Digital Humanities meeting has been accepted, by both the <a title="London DDH meetings" href="http://www.ucl.ac.uk/dh/decoding_digital_humanities" target="_blank">London</a> and <a title="Melbourne DDH meetings" href="http://www.2cultures.net/ddh/" target="_blank">Melbourne</a> groups, for next Tuesday (24th August) here in the Great Wen, and next Thursday (26th August) down under. I&#8217;m feeling the warm glow of internationalism!</p>
<p>One reason I suggested the Gnu GPL as a text was for its unfamiliarity of form. It&#8217;s a software license, a genre often viewed but rarely read. I&#8217;ve clicked through many, barely registering the dense legalese, meaning I&#8217;ve probably promised to sacrifice my first-born to Bill Gates. The GPL, to its great credit, has a clear and concise preamble. But nevertheless, it is a legal document, written to withstand exacting juridical scrutiny.</p>
<p>As digital humanists, we shouldn&#8217;t be frightened of such things, for we make tools to deal with such difficulties. Whether the texts are in another language, damaged, obscured, fragmentary, long-winded, self-referential, or simply too numerous &#8211; not forgetting that no text is so transparent that one simple reading will comprehend it entirely -we can hack them.</p>
<p>One popular way of doing this is with <a title="Wordle.net" href="http://www.wordle.net/" target="_blank">wordles</a>. These are, in essence, visualized concordances. The words are weighted according to frequency, then displayed as clouds. There are various options for colour, layout and font, but these do not reflect any aspect of the text, being more for aesthetic appeal, and as such a cause for their popularity. (The creator of Wordle, Jonathan Feinberg, discusses this in Viégas et al, &#8220;Participatory Visualization with Wordle.&#8221;)</p>
<p>So here I present the three versions of the Gnu GPL as wordles. They are made from the 100 most used words, filtered for the common and ordinary (&#8216;the&#8217;, &#8216;and&#8217;). I have attempted to minimize the extraneous as much as possible, having the words displayed horizontally, (near) alphabetically, in plain, plain black and white.</p>
<div id="attachment_196" class="wp-caption aligncenter" style="width: 310px"><a href="http://anterotesis.com/wordpress/wp-content/uploads/2010/08/gpl1-100wordsx.jpg"><img class="size-medium wp-image-196" title="GPL v.1: 100 top words wordle" src="http://anterotesis.com/wordpress/wp-content/uploads/2010/08/gpl1-100words-300x139.jpg" alt="Wordle of the 100 most used words in the Gnu GPL v.1." width="300" height="139" /></a><p class="wp-caption-text">Wordle of the 100 most used words in the Gnu GPL v.1.</p></div>
<div id="attachment_197" class="wp-caption aligncenter" style="width: 310px"><a href="http://anterotesis.com/wordpress/wp-content/uploads/2010/08/gpl2-100wordsx.jpg"><img class="size-medium wp-image-197" title="GPL v.2: top 100 words wordle" src="http://anterotesis.com/wordpress/wp-content/uploads/2010/08/gpl2-100words-300x231.jpg" alt="Wordle of the 100 most used words in the Gnu GPL v.2." width="300" height="231" /></a><p class="wp-caption-text">Wordle of the 100 most used words in the Gnu GPL v.2, 1991.</p></div>
<div id="attachment_198" class="wp-caption aligncenter" style="width: 310px"><a href="http://anterotesis.com/wordpress/wp-content/uploads/2010/08/gpl3-100wordsx.jpg"><img class="size-medium wp-image-198" title="GPL v.3: Top 100 words wordle" src="http://anterotesis.com/wordpress/wp-content/uploads/2010/08/gpl3-100words-300x139.jpg" alt="GPL v.3: Wordle of 100 most used words" width="300" height="139" /></a><p class="wp-caption-text">Wordle of 100 most used words, GPL v.3, 2007.</p></div>
<p>By taking the three versions, I&#8217;m treating the GPL historically, as changing over time. The most obvious and startling finding is that the term &#8216;program&#8217; has dramatically declined in use from version 2 to version 3, changing the whole picture from being arrow-shaped to more cloud-like. (The algorithm for laying out the words is in Viégas et al.)  Its synonym, &#8216;Work&#8217; has risen in its place. &#8216;Free&#8217; has declined proportionally,  but in absolute terms, the story is quite different: it features in v.1 23 times, v.2 28 times, and v.3 20 times. &#8216;Freedom&#8217;, not found in the graphics above, rises from 3 usages in v.1, to 4 in v.2, and 8 &#8211; doubled &#8211; in v.3.</p>
<p>I could spend all day pouring over these things, but I&#8217;ve probably spent too long already when I have a dissertation to write. In any case, the purpose has been to suggest ways of reading the Gnu GPL, and will leave discussion to the convivial atmosphere of the meetings.</p>
<p>NB: The code behind wordle.net is owned by IBM, and closed. A free version, that allows adjusting and playing with the code, would be most desirable.</p>
<p>Reference: Fernanda B. Viégas, Martin Wattenberg, Jonathan Feinberg,  &#8220;Participatory Visualization with Wordle,&#8221; IEEE Transactions on Visualization and Computer Graphics, vol. 15, no. 6, pp. 1137-1144, Nov./Dec. 2009, doi:10.1109/TVCG.2009.171 Behind a paywall, sadly, but <a title="'Participatory Visualization with Wordle&quot; abstract" href="http://www.computer.org/portal/web/csdl/doi/10.1109/TVCG.2009.171" target="_blank">abstract available</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://anterotesis.com/wordpress/2010/08/visualizing-the-gpl/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>DH 2010, day two</title>
		<link>http://anterotesis.com/wordpress/2010/07/dh-2010-day-two/</link>
		<comments>http://anterotesis.com/wordpress/2010/07/dh-2010-day-two/#comments</comments>
		<pubDate>Fri, 09 Jul 2010 08:05:55 +0000</pubDate>
		<dc:creator>johnl</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[archives]]></category>
		<category><![CDATA[dh2010]]></category>
		<category><![CDATA[documentation]]></category>
		<category><![CDATA[maps]]></category>

		<guid isPermaLink="false">http://anterotesis.com/wordpress/?p=154</guid>
		<description><![CDATA[I really don&#8217;t do mornings. But somehow I got to Kings on time (8.30!) and started work watching over the TEI (Text Encoding Initiative) session in the bowels of the Strand building. Errands meant I only heard the first of &#8230; <a href="http://anterotesis.com/wordpress/2010/07/dh-2010-day-two/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I really don&#8217;t do mornings. But somehow I got to Kings on time (8.30!) and started work watching over the TEI (Text Encoding Initiative) session in the bowels of the Strand building.</p>
<p>Errands meant I only heard the first of those talks, given by Flanders on <a title="Flanders and Bauman, Using ODD for Multi-purpose TEI Documentation" href="http://dh2010.cch.kcl.ac.uk/academic-programme/abstracts/papers/html/ab-750.html" target="_blank">TEI documentation</a>. To be honest, I wasn&#8217;t expecting much, but it proved to be a very important paper. Although it was focused on the needs and capabilities of TEI, the fundamental idea &#8211; that people need different forms of documentation, but basically the same information &#8211; has far wider application. From this Flanders identified nine (!) different types of document, and ways &#8216;bricks&#8217; of information could be re-used. This is moving &#8216;help&#8217; from being a bundle of text files to being a proper software application. I think the TEI ODD (&#8216;One Document Does it All&#8217;) system has some similarities with Perl&#8217;s POD (Plain Old Documentation) mark up, though not knowing a great deal about either means I may be (very) wide of the mark.</p>
<p>In the afternoon I attended the Archives session. First up was Dirk Roorda talking about &#8220;<a title="Doorn and Roorda, Ecology of Longevity" href="http://dh2010.cch.kcl.ac.uk/academic-programme/abstracts/papers/html/ab-680.html" target="_blank">The ecology of longevity</a>&#8220;, using evolutionary theory to think about the preservation of data. Normally, such biological metaphors have me reaching for my proverbial revolver, but here they were used with some subtlety and care. Unfortunately, a great leap was suddenly made into some thoroughly specious economics, which the audience rightfully picked on in the questions. How,  after discussing the complexity and chaos of biology, could the speaker throw up platitudes dating from a century before Darwin?</p>
<p>Schlosser and Ulman&#8217;s <a title="Sclosser and Ulman, The Specimen Case and The Garden" href="http://dh2010.cch.kcl.ac.uk/academic-programme/abstracts/papers/html/ab-626.html" target="_blank">talk</a> on preserving digital projects had an interesting dialectic going on between the academic and the archivist, and &#8211; very important to me &#8211; recognized that not all digital projects are ambitious, heavily funded, grand collaborations, but also &#8216;fragile vessels&#8217;, projects that are on the margin, not mission critical. Buchanan then spoke on building <a title="Buchanan and Bohata, Digital Libraries of Scholarly Editions" href="http://dh2010.cch.kcl.ac.uk/academic-programme/abstracts/papers/html/ab-814.html" target="_blank">Digital Libraries of Scholarly Editions</a>. The problem here is aggregating individual projects into a library: each edition has its own aims, quirks and standards, and a library has to create some uniformity. Buchanan spoke of the difficulties in building such libraries; it occurred to me later that perhaps the problem has to be solved by the makers of the editions, and portability is their responsibility.</p>
<p>Late afternoon was spent looking round the poster displays, noting especially the cartography projects. Google maps was used, though some were chaffing against its limitations. There is a real need for an easily deployed, standalone mapping CMS using free data. (And it&#8217;s on my to-do list).</p>
]]></content:encoded>
			<wfw:commentRss>http://anterotesis.com/wordpress/2010/07/dh-2010-day-two/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>DH 2010, day one</title>
		<link>http://anterotesis.com/wordpress/2010/07/dh-2010-day-one/</link>
		<comments>http://anterotesis.com/wordpress/2010/07/dh-2010-day-one/#comments</comments>
		<pubDate>Wed, 07 Jul 2010 21:19:57 +0000</pubDate>
		<dc:creator>johnl</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[dh2010]]></category>
		<category><![CDATA[digital history]]></category>
		<category><![CDATA[music]]></category>

		<guid isPermaLink="false">http://anterotesis.com/wordpress/?p=139</guid>
		<description><![CDATA[For the next few days I&#8217;m a student assistant at Digital Humanities 2010, doing a bit of everything, from giving directions to waving microphones under people&#8217;s noses The first day of the conference proper (there&#8217;s been many associated events in &#8230; <a href="http://anterotesis.com/wordpress/2010/07/dh-2010-day-one/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>For the next few days I&#8217;m a student assistant at <a title="Digital Humanities 2010" href="http://dh2010.cch.kcl.ac.uk/" target="_blank">Digital Humanities 2010</a>, doing a bit of everything, from giving directions to waving microphones under people&#8217;s noses</p>
<p>The first day of the conference proper (there&#8217;s been many associated events in the last few days) was mainly dealing with organization, with only a few events. I missed the second day of <a title="THATCamp London 2010" href="http://thatcamplondon.org/" target="_blank">THATCamp London</a>, twitter proving more frustrating than informative as it just made me want to be there more than ever, but managed to catch <a title="Dan Cohen" href="http://www.dancohen.org/" target="_blank">Dan Cohen</a> afterwards for my first interview.</p>
<p>The only event I attended, was the launch of the <a title="CHARM at Royal Holloway" href="http://www.charm.rhul.ac.uk/index.html" target="_blank">CHARM</a> (Centre for the History and Analysis of Recorded Music) sound files. These are digitisations of out-of-copyright, lesser known, 20s and 30s 78 rpm records, and are freely downloadable. Hallelujah for free, because there&#8217;s some gems to be discovered. Check out Mischa Spoliansky&#8217;s excellent, jaunty version of Gershwin&#8217;s Rhapsody in Blue (seemingly no static URLs, but the <a title="CHARM search for sound files" href="http://charm.kcl.ac.uk/sound/sound_search.html" target="_blank">search interface</a> is easy to use). And thank you to CHARM for not locking the music up: both the speakers spoke with an enthusiasm they wanted to share. Got interviews with them too.</p>
<p>Duties meant I missed the opening ceremony &#8211; which also featured CHARM &#8211; but had a snigger at the tweets about <a title="The Guardian on Kings axing the only UK Paleography chair" href="http://www.guardian.co.uk/education/2010/feb/09/writing-off-last-palaeographer-university">paleography</a> provoked by the words of Kings&#8217; lamentable principal.</p>
<p>Serious seminars start tomorrow. Perhaps serious blog posts too.</p>
]]></content:encoded>
			<wfw:commentRss>http://anterotesis.com/wordpress/2010/07/dh-2010-day-one/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

