This module offers an introduction to tabular data and information on how to structure it to enable different styles of analysis.
- Increased understanding of what data is and why it is important to organize it in particular ways.
- Ability to “clean” or organize small sets of data using an online tool (OpenRefine).
- Increased ability to see how digital projects use structured data sets to answer research questions.
- Kosara, Robert. “Spreadsheet Thinking vs. Database Thinking.” Eager Eyes, April 14, 2016. https://eagereyes.org/basics/spreadsheet-thinking-vs-database-thinking. ♦ Estimated Read Time = 4 minutes
- Manovich, Lev. “Database as a Genre of New Media.” AI & Society 14, no. 2 (June 1, 2000). http://time.arts.ucla.edu/AI_Society/manovich.html. ♦ Estimated Read Time = 15 minutes
- Mullen, Lincoln. Introduction to Computational Historical Thinking: With Applications in R (2017). http://dh-r.lincolnmullen.com/introduction.html. ♦ Estimated Read Time = 20 minutes
- Wickham, Hadley. “Tidy Data.” Journal of Statistical Software 59, no.10 (September 2014). http://vita.had.co.nz/papers/tidy-data.pdf. ♦ Estimated Read Time = 35 minutes
Questions to Consider
- Do you agree with Manovich that “database and narrative are natural enemies”? Why or why not?
- Manovich asserts the dominance of databases in new media, but due to the age of the piece, his examples (CD-ROMs and HTML-only web pages) are outdated. What examples from the present day—in popular culture or your own field—would you use to support or critique his argument?
- Wickham and Kosara both discuss how to structure data in order to facilitate analysis. How is your data currently structured? What changes would you need to make to match their definitions of tidy data?
- Mullen argues for the uses of computational methods by historians. If you are a historian: do you find his argument convincing? If you are not a historian: what aspects of his argument for computational methods apply to your field?
Data sets for social scientists:
Cornell Institute for Social and Economic Research’s list of freely available data sets, https://www.ciser.cornell.edu/ASPs/datasource.asp
Manipulating data with Excel:
A basic introduction to best practices for data entry in a spreadsheet as well as some essential functions for manipulating data in spreadsheet applications (Excel, OpenOffice, Google Sheets) developed by Spencer Roberts, a graduate student at George Mason: http://swroberts.ca/academic/spreadsheets-for-historians/.
If you are doing a data-driven project, we have some tips for preparing your data.
Cleaning data with Open Refine:
Follow the instructions on this page to use OpenRefine, a software tool for cleaning and transforming data, to normalize and tidy two data sets: https://labs.ssrc.org/dds/articles/tidying-data-with-openrefine.
Trans-Atlantic Slave Trade Database:
Take about 20 minutes to review Voyages: The Trans-Atlantic Slave Trade Database at http://slavevoyages.org/.
- What are the goals of this digital project? Who is the primary audience? What are the assets of this site?
- What kind of data is available on this site, and how might it be useful for scholarly research?
- What else might be possible with this type of data set as a starting point?
Take a look at this summary we created to help walk you through the site: https://labs.ssrc.org/dds/articles/project-lens-trans-atlantic-slave-trade-database.