20-24 October 2014, 13-17h, Waag Society – Nieuwmarkt 4, Amsterdam
During the last decade the humanities have witnessed an explosive growth in using digital tools. While this trend has been beneficial for much humanities research, it also threatens to create a gap between humanities scholars who have and scholars who haven’t acquired the latest digital tools. To bridge this gap, the Centre for Digital Humanities offers a one-week crash course on state-of-the-art digital tools for textual, historical, visual and other humanities research. This course includes demonstration and explanation of tools, small assignments to get hands-on experience, and also offers ample space for critical discussion on the surplus and shortcoming of digital humanities. The course is open to a maximum of 50 participants. The course is taught in English and consists of five full afternoons from 13.00 to 17.00. Participants should bring their own laptop, as tools discussed and used will be mostly web based. Please install the following tools before the start of the crash course:
- Cygwin (for Windows users only, Mac and Linux users have Unix command line tools pre-installed)
- To determine whether you need the 32-bit or 64-bit version of Cygwin, check this support page.
Monday: Gathering Data
[Adjusted programme, session moved to Wednesday.] In the shortened version of the Monday session, the schedule for the rest of the week is discussed. Also, a short introduction to the seven steps of data visualization, which make up our workflow for the rest of the week, was offered. Issues that came up were the importance of focusing on your own research question, learning curves of new tools, reservations about Digital Humanities and the use of computational tools in research, and the pros and cons of collaborative learning and interdisciplinarity. One of the students of the Coding the Humanities bachelor and minor course presented his group’s research on the images in the NY Times database. Material:
- The student project discussed by Robert-Jan Korteschiel can be found here.
- Ben Fry’s discussion of the seven steps of data visualization is recommended reading for the rest of the week.
In the Crash Course, two sessions revolve around specific DH projects, providing research questions, methods, tools to work with. On Tuesday, the BiographyNet team talks about linking data and extracting relations between people and events.
- Topics: Linked Data, Natural Language Processing
- Exercises: Computational Thinking
- The BMGN issue on Digital History mentioned in this session can be found here.
- Slides of the presentation, as well als links to the BiographyNet website and Het Biografisch Portaal van Nederland.
Wednesday: Preparing Data
After a general introduction into the goals of Digital Humanities, we discuss the shift in perspective when we look at our objects of study as data points. What questions and methodologies does this new point of view prompt? We encourage you to start thinking about new research questions that could be answered by analyzing the data. In order to facilitate this, we introduce a workflow that guides the process of getting from raw through presentable data. We focus on the process of acquiring data and on how to structure it in a meaningful way. We focus on Application Programmer Interfaces (APIs) and writing REST queries to acquire data. Also, we will continue with our workflow. First, we prepare a raw dataset so that we can start exploring. We discuss how to parse, filter and extract information from our data sets. Once acquired, data is hardly ever in the shape and structure that we need to answer our research questions. There are several steps needed to transform the data, before we can query, represent and visualize it. During the transformations, new insights and questions may come up that require us to go back to previous steps.
- Topics: Research questions, digital research workflow, Regular Expressions, Unix Command Line Tools
- Exercises: APIs, queries, searching for patterns in data
- Tools: API consoles, Unix Command Line Tools (Cygwin for Windows users)
- Slides of today’s session, plus the correct reference to Unix for Poets
- Data sets: NY Times API data (click here)
- Tools: Cygwin
- Literature: John Unsworth – Scholarly Primitives (click here)
- Jelle Zuidema mentioned the concept of Culturomics
Thursday: The Riddle of Literary Quality
The second DH project of the week is The Riddle of Literary Quality. Team members focus on the analysis of digital text, discussing topics ranging from availability of texts, accessibility of text analysis tools and regular expressions as a way of searching for patterns in text data.
- Topics: Machine learning, classification
- Exercises: Identifying genres and authors
- Tools: Anaconda/Python
- Data sets: Genre classification data set (click here)
- Tools: Anaconda
- Download Andreas’ corpus & code: tinyurl.com/n9aaoht
- Andreas’ slides
- Slides of Marijn’s part of the session
- Script in Marijn’s session (click here)
Friday: Cooking data
We finish with the last stages of our workflow, focusing on presenting and interacting with the data through visualization. There are many ways to visualize information, but only some of them will present the data such that it provides an answer to our research questions. We will experiment with Many Eyes, which is a web-based visualization tool that offers a broad range of visualizations to select from, to optimally present the story we want to tell with the data. We will close the Crash Course with drinks. –Topics: Storytelling, narrative, visualizing data
- Exercises: telling stories through data visualizations
- Tools: Raw, Many Eyes, Vega, Excel, …