Visualizing the Gnu GPL

My suggestion for the Decoding Digital Humanities meeting has been accepted, by both the London and Melbourne groups, for next Tuesday (24th August) here in the Great Wen, and next Thursday (26th August) down under. I’m feeling the warm glow of internationalism!

One reason I suggested the Gnu GPL as a text was for its unfamiliarity of form. It’s a software license, a genre often viewed but rarely read. I’ve clicked through many, barely registering the dense legalese, meaning I’ve probably promised to sacrifice my first-born to Bill Gates. The GPL, to its great credit, has a clear and concise preamble. But nevertheless, it is a legal document, written to withstand exacting juridical scrutiny.

As digital humanists, we shouldn’t be frightened of such things, for we make tools to deal with such difficulties. Whether the texts are in another language, damaged, obscured, fragmentary, long-winded, self-referential, or simply too numerous – not forgetting that no text is so transparent that one simple reading will comprehend it entirely -we can hack them.

One popular way of doing this is with wordles. These are, in essence, visualized concordances. The words are weighted according to frequency, then displayed as clouds. There are various options for colour, layout and font, but these do not reflect any aspect of the text, being more for aesthetic appeal, and as such a cause for their popularity. (The creator of Wordle, Jonathan Feinberg, discusses this in Viégas et al, “Participatory Visualization with Wordle.”)

So here I present the three versions of the Gnu GPL as wordles. They are made from the 100 most used words, filtered for the common and ordinary (‘the’, ‘and’). I have attempted to minimize the extraneous as much as possible, having the words displayed horizontally, (near) alphabetically, in plain, plain black and white.

Wordle of the 100 most used words in the Gnu GPL v.1.

Wordle of the 100 most used words in the Gnu GPL v.1.

Wordle of the 100 most used words in the Gnu GPL v.2.

Wordle of the 100 most used words in the Gnu GPL v.2, 1991.

GPL v.3: Wordle of 100 most used words

Wordle of 100 most used words, GPL v.3, 2007.

By taking the three versions, I’m treating the GPL historically, as changing over time. The most obvious and startling finding is that the term ‘program’ has dramatically declined in use from version 2 to version 3, changing the whole picture from being arrow-shaped to more cloud-like. (The algorithm for laying out the words is in Viégas et al.)  Its synonym, ‘Work’ has risen in its place. ‘Free’ has declined proportionally,  but in absolute terms, the story is quite different: it features in v.1 23 times, v.2 28 times, and v.3 20 times. ‘Freedom’, not found in the graphics above, rises from 3 usages in v.1, to 4 in v.2, and 8 – doubled – in v.3.

I could spend all day pouring over these things, but I’ve probably spent too long already when I have a dissertation to write. In any case, the purpose has been to suggest ways of reading the Gnu GPL, and will leave discussion to the convivial atmosphere of the meetings.

NB: The code behind wordle.net is owned by IBM, and closed. A free version, that allows adjusting and playing with the code, would be most desirable.

Reference: Fernanda B. Viégas, Martin Wattenberg, Jonathan Feinberg, “Participatory Visualization with Wordle,” IEEE Transactions on Visualization and Computer Graphics, vol. 15, no. 6, pp. 1137-1144, Nov./Dec. 2009, doi:10.1109/TVCG.2009.171 Behind a paywall, sadly, but abstract available.

DH 2010, day four

For me, the final day was the important one, with both the geography and history sessions taking place. The former saw three excellent presentations, from the University of North Carolina, Ian Gregory and the Hestia project. But the big news is that the UNC have built a locally-deployable, open source map server, called Main Street Carolina and available sometime this summer. There’s not much information available, but it is used for many of their projects including Going To The Show, and there’s a blurb and blogpost online. I have seriously high hopes for this, as a way of easily putting maps on the web without having to go down the Google route.

The highlight of the Professional Reflection strand was Claire RossPointless Babble or Enabled Backchannel, a witty and zippy analysis of twitter usage during three Digital Humanities conferences in 2009. Far more than 140 characters, without any excess and plenty of time for questions.

The History strand saw two very good presentations. And one that had me gawping in disbelief. Roorda’s Letters, Ideas and Information Technology, on visualizing seventeenth century correspondence, and Sainte’s Reading Darwin Between The Lines, analysing Darwin’s rare use of the term ‘evolution’, were very fine. But Blaney’s Developing a Collaborative Online Environment for History – The Experience of British History Online was a trip into the digital netherworld.

What British History Online wanted to do was crowdsource the Calendars of State Papers, those abstracts of government paperwork compiled in Victorian Times and now showing their age. So what do they do? Raise obstacles to participation. First, the CSP are behind a paywall, and as far as I can tell, there are no institutional subscriptions available. So the academics they hoped would annotate the documents had to pay for the honour. Then, to minimise contributions either malicious or erroneous, they deliberately put in obstacles and constraints to make annotation difficult. *rollseyes* Do they have any idea what crowdsourcing is?

Contributions were, unsurprisingly, sparse.

One of the audience asked about re-use. We were informed that the XML was locked up, the documents copyrighted (even though much of the material on BHO has long since passed into the public domain), but generously, we can print off as many copies as we wish. This was the only time I heard such sentiments expressed at DH2010; everyone else understood the importance of openness, of re-use, of contributing corrections and improvements, of sharing. It’s called community. And if you look at the graphic below, you’ll see it’s one of the prominent words (used 25 times) in the closing address from Melissa Terras, Present, Not Voting.

Wordle of Melissa Terras' speech at DH2010
Wordle of Melissa Terras’ speech at DH2010

(Click to view full size)

‘Transcribe’ and ‘Bentham’ also feature as this is a crowdsourcing project Terras is involved in. As she says:

one of the things we want to do with Transcribe Bentham is to provide access to the resulting XML files so that others can reuse the information (via web-services, etc). The hosting and transcription environment we are developing will be open source, so that others can use it. And this sea change, from working in small groups, to really reaching out to users is something we have to embrace, and learn to work with.

The prospect of easily setting up such collaborations is mouthwatering. Access, re-use, reaching out, yes yes yes. Sharing is fundamental to what we do, and we are stronger when we share. And right now the Digital Humanities community – like everyone else – faces terrible pressure, from government and university management, and needs to get stuck in:

We need people who are not just prepared to whine but prepared to roll up their sleeves and do things to improve our associations, our community, and our presence in academia.

Her whole speech was barnstorming, critical but not despondent, electrifying the audience, and the highlight of a conference that, for all the heat and rushing around and getting up way too early, truly inspired me.

DH 2010, day three

Not such an early start, so I missed Joshua Sternfeld’s talk on Digital Historiography. Annoying, but a sign of a good conference is that there’s too much of interest rather than too little.

For me, the important presentation in the Teaching/Managing strand was Nowviskie and Porter’s “The Graceful Degradation Survey: Managing Digital Humanities Projects Through Times of Transition and Decline.The afterlife of digital projects – and websites in general – is not only very important, but quite neglected, seemingly being done on an ad-hoc, voluntary basis. It was more to do with project management, organization and funding; I had hoped to hear something about technical solutions. It did suggest that there is a move to creating smaller, more preservable packets of information: a granular approach insuring against complete meltdown.

Another suggestion was that Digihum projects are increasingly being operated outside the academy. There’s a subterranean current here at DH2010 of extra-academic projects, ‘fragile vessels’ (as mentioned yesterday), small unfunded projects. One of those – a graduate project now continuing independently   – is contextus, which featured in the Scanning Between the Lines: The Search for the Semantic Story panel in the afternoon. Aside from being a very clear and useful introduction to RDFa (foaf etc), and being sprinkled with Doctor Who references, the speakers showed the great potential of the ‘semantic web’, about which I’d previously been a bit doubtful.

Many of the posters displayed, as on day two, were also for small, semi-independent or semi-official projects, using whatever tools are available free (in the financial sense). Somehow, this aspect of the Digital Humanities isn’t getting the full recognition it deserves. The lack of money shouldn’t mean abandoning a good or interesting idea, nor should it be considered a denial of permission to do what we want to do. It’s an obstacle, yes, but not insurmountable. Ways of operating on a shoestring need to be shared. And there is the advantage that without funds, one isn’t beholden to funders.

DH 2010, day two

I really don’t do mornings. But somehow I got to Kings on time (8.30!) and started work watching over the TEI (Text Encoding Initiative) session in the bowels of the Strand building.

Errands meant I only heard the first of those talks, given by Flanders on TEI documentation. To be honest, I wasn’t expecting much, but it proved to be a very important paper. Although it was focused on the needs and capabilities of TEI, the fundamental idea – that people need different forms of documentation, but basically the same information – has far wider application. From this Flanders identified nine (!) different types of document, and ways ‘bricks’ of information could be re-used. This is moving ‘help’ from being a bundle of text files to being a proper software application. I think the TEI ODD (‘One Document Does it All’) system has some similarities with Perl’s POD (Plain Old Documentation) mark up, though not knowing a great deal about either means I may be (very) wide of the mark.

In the afternoon I attended the Archives session. First up was Dirk Roorda talking about “The ecology of longevity“, using evolutionary theory to think about the preservation of data. Normally, such biological metaphors have me reaching for my proverbial revolver, but here they were used with some subtlety and care. Unfortunately, a great leap was suddenly made into some thoroughly specious economics, which the audience rightfully picked on in the questions. How,  after discussing the complexity and chaos of biology, could the speaker throw up platitudes dating from a century before Darwin?

Schlosser and Ulman’s talk on preserving digital projects had an interesting dialectic going on between the academic and the archivist, and – very important to me – recognized that not all digital projects are ambitious, heavily funded, grand collaborations, but also ‘fragile vessels’, projects that are on the margin, not mission critical. Buchanan then spoke on building Digital Libraries of Scholarly Editions. The problem here is aggregating individual projects into a library: each edition has its own aims, quirks and standards, and a library has to create some uniformity. Buchanan spoke of the difficulties in building such libraries; it occurred to me later that perhaps the problem has to be solved by the makers of the editions, and portability is their responsibility.

Late afternoon was spent looking round the poster displays, noting especially the cartography projects. Google maps was used, though some were chaffing against its limitations. There is a real need for an easily deployed, standalone mapping CMS using free data. (And it’s on my to-do list).

DH 2010, day one

For the next few days I’m a student assistant at Digital Humanities 2010, doing a bit of everything, from giving directions to waving microphones under people’s noses

The first day of the conference proper (there’s been many associated events in the last few days) was mainly dealing with organization, with only a few events. I missed the second day of THATCamp London, twitter proving more frustrating than informative as it just made me want to be there more than ever, but managed to catch Dan Cohen afterwards for my first interview.

The only event I attended, was the launch of the CHARM (Centre for the History and Analysis of Recorded Music) sound files. These are digitisations of out-of-copyright, lesser known, 20s and 30s 78 rpm records, and are freely downloadable. Hallelujah for free, because there’s some gems to be discovered. Check out Mischa Spoliansky’s excellent, jaunty version of Gershwin’s Rhapsody in Blue (seemingly no static URLs, but the search interface is easy to use). And thank you to CHARM for not locking the music up: both the speakers spoke with an enthusiasm they wanted to share. Got interviews with them too.

Duties meant I missed the opening ceremony – which also featured CHARM – but had a snigger at the tweets about paleography provoked by the words of Kings’ lamentable principal.

Serious seminars start tomorrow. Perhaps serious blog posts too.

Two Gnus: The Gnu Project and the Gnu GPL

For the next Decoding Digital Humanities meeting, I’d like to propose reading two fundamental documents of the free software movement, Richard Stallman’s Gnu Project and the Gnu GPL (General Public License). These texts build on the last meeting’s reading of Eric Raymond’s The Cathedral and the Bazaar, but are less about the process of coding, and more on the programmer in the world. The first is a brief history of sharing code and a plan for a completely free operating system, the second the most popular free software license, designed to protect both sharing and code.

They’re relevant to the Digital Humanities, and what we’ve been discussing, in numerous ways:

  • They show the human culture around the code, both implicitly (styles of writing, ways of thinking about a problem) and explicitly (Stallman’s description of sharing at MIT). The humanity around the digital, one can say.
  • We face very similar problems with sharing other things, like data and findings. That sharing is fundamental to learning; too much material is being locked up under dubious copyright claims and illiterate t&cs, never mind paywalls.
  • Talking of paywalls, both texts have a subtle attitude to commerce, seemingly unconcerned with money but overtly opposed to monopolisation.

And of course, we use the fruits of these works.

More than that, I think these texts can be read in very different ways: beyond being a license, the GPL can be seen as a ‘hack’, repurposing copyright into copyleft; a history of debate and struggle is found across its three revisions (and its offspring for web-deployed software, the Affero GPL); the Gnu Project is history, philosophy, polemic and an embodiment of sheer will. Reading differently is what the (digital) humanities does.

Further discussion is for the pub; this is just to suggest some suitable – and interesting! – reading around which to talk.

Murdoch redux

Following on from my previous post, some news:

The British Library has backed down from digitising and putting online out of copyright editions of the Times. This raises serious questions about its whole mission: are they resigned to the irreplaceable newspapers in Colindale crumbling over time, along with their deteriorating microfilm copies, without any digital preservation at all? It also sets a dangerous precedent in ceding public domain material to a private owner. Whilst this has no force in law, it has an intimidatory aspect. Still worse is that some of this material was not even created by The Times. Plus ça change….

And as a footnote, given The Times’ building of a paywall, Glyn Moody wonders whether commercial strategies should trump the public record: “Should retractions be behind a paywall?

The Enclosure of the Historical Commons (2): Murdoch Junior

Last week James Murdoch spoke at the launch of UCL’s new Centre for Digital Humanities. Quite why they invited him I don’t know, for he appears to have no idea of what the Digital Humanities are. That said, his speech got plenty of media coverage, so it may have been a clever piece of publicity-mongering.

Notwithstanding his protestations to be speaking “as dispassionately and factually as I can”, it was a partisan and aggressive statement for the so-called “creative industries.” The usual suspects were lined up: the BBC, file sharers, the public sector, search engines, digital utopians et al. The notorious Tera report was cited, apocalyptic visions of redundancies painted (how ironic coming from Wapping), government enforcement of “basic property rights” demanded, Sky iPhone apps and Fox films promoted.

(It was also semi-literate: “almost exactly”; “the era of Pope, and Johnson, and writers after them.” As for the trite Tolkein quotation, was he trying to show he was down with the geeks?)

But it was the attack on the British Library’s newspaper digitisation program that garnered most of the headlines; see, for example, The Indy and The Guardian. The library’s project aims to turn some 40 million pages of their newspaper holdings into searchable, preservable, accessible, distributable text; a great resource for historians. The majority of this material is clearly out of copyright and in the public domain. Where it is not, there will be agreement with, and remuneration for, the copyright holders. The press release states:

…. the partnership will also seek to digitise a range of in-copyright material, with the agreement of the relevant rightsholders. This copyright material will, with the express permission of the publishers, be made available via the online resource – providing fuller coverage for users and a much-needed revenue stream for the rightsholders.

So what’s Murdoch getting so angry about? Immediately, competition with archive.timesonline.co.uk. It’s curious he didn’t take the opportunity to plug that product along with all the others. But there’s something else: free content.

Just yesterday, the Library announced the digitisation of their newspaper archive – originally given to them by publishers as a matter of legal obligation. This is not simply being done for posterity, nor to make free access for library users easier, but also for commercial gain via a paid‐for website. The move is strongly opposed by major publishers. If it goes ahead, free content would not only be a justification for more funding, but actually become a source of funds for a public body.

As the old saw goes, there’s free as in freedom, and free as in beer. One means the freedom to use a resource in any fashion, the other simply not to pay for something. The British Library project is not free in either sense, as I’ve previously shown. Likewise the Times archive.

Clearly, the “free content” referred to is material on which copyright has expired and which is now in the public domain. It can be used in any way anyone desires. The problem is obtaining it. That means only certain institutions, the British Library and News International alike, can take advantage of this common wealth, and enclose it with an array of technological (DRM), financial (paywalls) or contractual (the terms of use and copyright claims over remastering into digital formats) fences.

For Murdoch, this isn’t enough. He believes the Times archive is his inviolable property. Although in practice only the British Library can compete with him, the root problem is that this material is in the public domain. Consequently, not only does he berate public sector competition for “profiting from work they do not create”, but demands that only the creative industries should be allowed to “develop and protect the value of what they create”,  even where copyright has lapsed. This privatisation envisages the digital humanities as an adjunct to commercial exploitation, in line with current ideologies of the “impact agenda” and business-driven education; and it dramatically diminishes our historical commons.

Standard caveats: I am not a lawyer. Nor do I play one on TV. Nor am I a mind-reader with privileged access to the inner workings of the mindset of the wealthy.

See also  Richard Lewis and The Guardian.

You can read the transcript here and here. Curiously, neither page has an explicit copyright statement, so presumably both sites are claiming the rights to the lecture!

Money the measure of all things

Despite currently studying at Kings College London, I haven’t been involved in the campaigns against the cuts in higher education in the U.K. Partly this is due to a lack of time, but also because I’ve been burnt out by politicking.

But news today that Middlesex “University” (should read: Corporate Services Provider) is to shut down the philosophy department has caused the bile to rise.

The Dean, one Edward Esche, has stated that the decision is “purely financial” and that the department has no “measurable” contribution to the university.

Yet it is an internationally renowned department, the highest research-rated subject in that University, a grade 5 rating in the 2001 RAE assessment, a 2.8 in the 2008 assessment, with “65% of its research activity judged ‘world-leading’ or ‘internationally excellent’.”

Despite the malevolence of the RAE, it does produce some sort of measurement, one designed for managers and bureaucrats such as this Dean who has arbitrarily disregarded it. Whether he has done so because he truly believes ‘money is the measure of all things’, or he is hiding his real reasons (too critical? not ‘useful’ enough?), the result is the same. An agenda driven entirely by commercial concerns, one that has finally disposed of any educational and academic criteria.

Opposition is gathering: see the Save Middlesex philosophy website and facebook page for more details.

Alchemy and Economy

Due to overwhelming popular demand *cough* I mean, as I had a couple of requests for it at Christopher Moses’ talk on Money Matters at the IHR a few weeks back, I’m posting the PDF of an essay I wrote 10 years ago on alchemy and economics. I’ve done a little tidying up, but it’s substantially unchanged.

I’ve been interested in monetary crime since reading George Caffentzis’ remarkable Clipped Coins, Abused Words and Civil Government (Open Library record). This led me to write a dissertation on coin clipping and the ‘Great Recoinage’ of the 1690s for my B.A. at the University of North London, and also gave me a taste of the delights held by the archives. The present essay, for an M.A. course, was a spin-off from that, as I had a sense of an alchemical subtext lurking in the background; the clippers were accused of being alchemists, their nemesis Isaac Newton was one, and there are fleeting mentions of it in the voluminous contemporary pamphlet literature.

A couple of disclaimers: as Moses mentioned at the seminar, the technologies developed at Potosi for extracting gold is an important element in this story, and one absent from this essay. There’s nothing about the coin clippers themselves, yet they’re the most interesting part of the whole story. I don’t like the way it is written; rather thick with academese.

But for all that, the economic aspects of alchemy have been ignored, and it does have the merit of giving voice to George Starkey’s marvellous espousal of inflation!

This article is released under a Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England and Wales License.

Levin, Alchemy and Economy PDF

Next Page »