Making the TCP-ECCO texts accessible

In April, the Text Creation Partnership released into the public domain over 2,000 eighteenth century works,  in plain text. You can read more about this project and the texts on their blog:

TCP Releases Over 4,000 New EEBO-TCP Texts

What the Public Release of ECCO-TCP Texts Means for You, Now and in the Future

Unfortunately, they didn’t make the texts easily accessible. To obtain them one had to apply by email to be subscribed to a Dropbox folder. There is a database and search interface, but it requires registration, and is unclear as to who qualifies for an account. I think that the database holds the marked up, XML texts, which have not (yet) been publicly released.

So I have created a package via the Open Knowledge Foundation’s Data Hub. You can download the zip package and an index in csv format from ckan. Note the zip bundle is around 142 mb. Don’t try this on dial-up. Check the index first. When I have time, I’ll work on a web interface that allows easy searching and sorting of it. I hope also that these texts will be made available individually, but given the number of them that’s not a trivial task.

What to do with these texts will be discussed tomorrow, Saturday 13th August, at Textcamp (London); the twitter hashtag is #tcamp11.

Update: xml (.tei) and epub versions are available from tei-Oxford.

Update, 17/04/2016: the texts can now be found at Oxford Text Archive.

This entry was posted in commons, digital history and tagged , , , , , , , , . Bookmark the permalink.

3 Responses to Making the TCP-ECCO texts accessible

  1. Pingback: Thinking about texts and communities at Textcamp – The Aust Gate

  2. Pingback: At Last: Our Publicly Accessible Portal to Search, Browse, and Read ECCO-TCP « TCP News & Views

  3. Dan Fowler says:

    Hi, this dataset was lost in a recent migration. If possible (it’s been quite some time!) would it be possible to republish?

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.