Digital Data Management - Linguistic Module - Expanded Materials - Learn and Teach
Skip to content
Login Communities Publications Calendar About AAA Contact Join Donate Shop Jobs
Baby with pacifier Mobile
Baby with pacifier desktop

In This Section

Digital Data Management - Linguistic Module - Expanded Materials

From Our Sponsors
See our latest in Anthropology. Browse Now. Stanford University Press

In This Section

American Anthropological Association

Expanded Materials for Digital Data Management in Linguistic Anthropology

Arienne M. Dwyer

Recommended citation: Dwyer, Arienne M. “ Expanded Materials for Digital Data Management in Linguistic Anthropology.” Arlington, VA: American Anthropological Association, 2016. 
© American Anthropological Association 2016

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

These materials expand on those presented in:
Dwyer, Arienne M. “Linguistic Anthropology: Principles and Practices of Digital Data Management.” In Bringing Digital Data Management Training into Methods Courses for Anthropology, edited by Blenda Femenías. Arlington, VA: American Anthropological Association, 2016.

Project support: National Science Foundation, Workshop Grant 1529315; Jeffrey Mantz, Program Director, Cultural Anthropology

Aim: To introduce best practices in data management for researchers in linguistic anthropology. Timeline: This unit can serve either as a one- or two-week standard university course or a short-term (e.g., 1.5-day) intensive workshop.

Target audience in any country: (Post-)Graduate students before and after data collection; postdoctoral researchers; early- and mid-career faculty; community members; collaborative projects.

Table of Contents

Course Guide
Course Content
1. Ensuring the future of your data and avoiding catastrophe
2. The basics: Working with data
3. What are our responsibilities
4. Archiving and re-use of data
5. Making the most of your data (Optional additional unit)

Course Guide
This course guide may be used as a course introduction.

  1. Data management is crucial for good research. Scholarship creates a range of data forms that require converting, analyzing, storing, and sharing. As scholars, we have a responsibility to make sure that data endure into the future in accessible formats. These data are gathered by researchers (often from participants), and usually receive input from many people. We are therefore also are responsible for the ethical, legal and intellectual property issues arising from these data, including proper attribution/anonymization and adhering to conventions and laws of relevant locales. Archiving and sharing the output of research in print and online venues (a.k.a. publication) requires attending to best practices in data management. Most research funders now require a data management plan, in order that your results and data be enduring and public.
  2. Beyond scope: This course does not cover methods of obtaining grants or human subjects permission. It is not a tutorial on intellectual property or other national or international laws. It is also not a guide to digitization or format conversion. Some further resources on these topics are found in the references and appendices; some will need to be added.
  3. Data management and archiving begin at research design, not after the data are collected.
  4. Ethics begins at research design: we have a responsibility to plan and carry out research in partnership with a community; to ensure that the work benefits that community, as well as our institutions, our funders, and ourselves; to ensure that our work conforms to local moral and ethical practices, institutional regulations, as well as national and international laws.
  5. Data types in linguistic anthropology: Linguistic anthropological methods reflect the inter-disciplinary nature of the sub-discipline, and overlap with qualitative and quantitative methods for linguistics (e.g., documentary linguistics, sociolinguistics, discourse analysis, cognitive tasks) and cultural anthropology (participant observation, interviewing, surveying, and re¬flexivity). Linguistic anthropologists are likely to work with human subjects. They are likely to generate data that includes any or all of the following: audio and/or video (A/V) recordings and transcriptions of them (often with translation and grammatical annotation, which is sometimes time-aligned); notes, sketches, images (photographs, maps [georectified or not], and diagrams), spatial data, artifacts and other physical data; websites, blogs, emails or other Internet-based communication; ultrasound, and MRI; word lists, grammatical paradigms, sentences, grammaticality judgments of them, and texts; the texts may include printed, handwritten, or electronic texts, questionnaires, surveys, and inscriptions, as well as metadata about these primary research data. These data may be digital or non-digital, structured or unstructured.
  6. Pretty good practice is good enough. Good practices are not out of reach; don't let “best practices” or this course keep you from learning pretty good practice (EMELD 2006).
  7. Key data management practices are common to all anthropologists. To maximize access to and ethical use of anthropological data, the AAA advocates unified guidelines for data management areas in common among all sub-disciplines. Data management methods common to all anthropologists can be summarized in four points:
    1. Data should be put into an enduring format.
    2. Data should be discoverable via metadata.
    3. Data should be archived.
    4. Data gathering, archiving, and dissemination should be fully consultative and with the permission of involved participants.

Course Content

The course sessions present an overview of the key issues in data management via a tip of the iceberg approach: the digital data workflow from research design and project planning to data creation, data management and analysis to preservation, reusability and publication. Each of the five numbered units (four main units and one optional unit) can constitute one or more class session(s); the contents can be covered in more or less detail depending on available time. Each unit requires participants to come up with examples from their own experience and apply that unit's concepts to those examples. Ideally, the instructor (and/or a future version of this course) would also provide use cases for each unit.

1. Ensuring the future of your data and avoiding catastrophe

Aim: To provide an overview of the issues; each bullet is relevant, no matter what the project. Each of these introductory issues is revisited later in the course. Bullet points can be exemplified by the instructor.


Further reading: Linguistic Data Consortium DMP resources


Exercise: Create a first draft of a Data Management Plan (1-2 pp.) and answer the two reflection questions.

Reflection questions

2. The basics: Working with data

Aim: To introduce the many data types and formats, and to describe the minimum a researcher must do to create enduring data. Part two (regarding software) will need regular updating.

2.1. Data and metadata Projects that create digital data during research (“in the field”) may need immediate storage for large files (e.g. A/V recordings and their associated metadata). DMPs describe each “field” and archival data type, and follow best practices for storing originals and altered versions.


“Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.” (NISO 2004).

Data quality

Data capture, regularization, and organization

Data standards and conventions

Organize data and metadata (spreadsheets, databases, structured text, audio, video, etc.)

Document the above activities and conventions

Need more information?

Well-formed data is enduring data.

Discussion exercises of data and metadata case studies:

Instructor: can provide examples of data/metadata in discourse analysis, documentary ethnolinguistics, language socialization


2.2. Tools (software)

Specific tools rapidly become obsolete; these will need regular updating. In any case, open-source tools that allow maximal re-use are preferable.

How tools fits into the workflow; proprietary compared to open-source tools

Digital Humanities tools of use to linguistic anthropologists

At start of project: get the necessary tools that allow you to collect, organize and work with digital data.

Optional topic: Further workflow: corpus development, lexicon, interlinear glossed texts

Discussion exercises

3. What are our responsibilities?

Aim: To discuss researchers' ethical and legal responsibilities and Intellectual Property Rights. It is best to confront these issues during project planning, well before an IRB application. Attention to ethics equals good data. Also emphasized are the limits of Open Access: full consultation with communities forms the basis for solid data management plans and sharing arrangements that align with community norms.


Exercise: Closed/Limited/Open Access debate:

Imagine or enact a role-playing debate between people who take strong positions on the issue of protecting the exploitation of community knowledge vs. “all information wants to be free.” Bring up the strongest arguments for each position (with real-life examples), and then see how best a compromise position that addresses all needs is reached. Possible roles (who may be argue any one or multiple sides of the debate): Indigenous community elders, indigenous linguists, a digital humanist or corpus linguist, an NSF representative, a PhD student (indigenous of the community, indigenous of another community, or non-indigenous), a specialist professor, the university IRB, etc.

4. Archiving and re-use of data

Why should I archive?

Data care (backup and data protection):

Key archives for linguistic anthropologists; see Appendices

Archival Concepts:

Mobilization: Re-using your outputs


5. Making the most of your data (Optional additional unit)

Below five separate topics are outlined, whose only commonality is that they are beyond introductory. Each topic awaits further development.

  1. Using Regular Expressions (RegEx) to convert data into new forms
  2. Working collaboratively at great distance, such as via remote data access or collaboration environments.
  3. Making data and websites accessible to people of all abilities (e.g. colorblind, hearing/sight impaired, multilingual, non-English speaker, elderly etc.), and to people with slow internet connections. See the W3C's Web Accessibility Initiative recommendations.



Websites mentioned and/or linked in this document do not necessarily represent the views of the author. Commercial websites mentioned and/or linked here are intended as examples, and do not represent the endorsement of the author.

Case Study: File Formats. Stanford: Stanford University Libraries, n.d.

Digital Preservation 101. Digital POWRR [Preserving (Digital) Objects with Restricted Resources], 2012.

DuBois, John W. Transcription in Action: Resources for the Representation of Linguistic Interaction. Santa Barbara: University of California, 2006.

Dwyer, Arienne M. 2006. “Ethics and Practicalities of Cooperative Fieldwork and Analysis.” In Fundamentals of Language Documentation: A Handbook, edited by Jost Gipper, Ulrike Mosel, and Nicolaus Himmelmann, 31-66. Berlin: Mouton de Gruyter, 2006.

EMELD [Electronic Metastructure for Endangered Languages Data] 2006. Working Group 1 report on Collecting Primary Texts, 2016. (Marianna Di Paolo, Gary Holton,
Susan Smith, Arienne Dwyer, Steve Moran, Doug Whalen, Julia Good Fox, and Barbara Need.)

IPinCH [Intellectual Property Issues in Cultural Heritage Project]. Think before You Appropriate: Things to Know and Questions to Ask in Order to Avoid Misappropriating Indigenous Cultural Heritage. Vancouver: Simon Fraser University, 2016.

IPinCH. Traditional Knowledge Fact Sheet. Vancouver: Simon Fraser University, 2016.

Leipzig Glossing Rules. Leipzig, Department of Linguistics, Max Planck Institute for Evolutionary Anthropology, 2015.

Levine, Melissa. “Policy, Practice and Law.” In DH Curation Guide: A Community Resource Guide to Data Curation in the Digital Humanities. 2016.

Library of Congress. Sustainability of Digital Formats: Planning for Library of Congress Collections. 2013.

Library of Congress. 2013. Recommended Formats Statement. 2015-2016.

Newman, Paul. “Copyright Essentials for Linguists.” Language Documentation & Conservation 1(1) (June 2007): 28-43.

National Information Standards Organization. Understanding Metadata. Bethesda: NISO, 2004.

OLAC. Metadata. Open Language Archives Community, 2008.

Simons, Gary F. “Ensuring That Digital Data Last: The Priority of Archival Form over Working Form and Presentation Form.” SIL Electronic Working Papers 2006-003.

UNESCO. Best practices on Indigenous Knowledge. UNESCO, Management of Social Transformations Programme, n.d.

Van den Eynden, Veerla, and Libby Bishop. Incentives and Motivations for Sharing Research Data, a Researcher’s Perspective. A Knowledge Exchange Report. 2014.

van Driem, George. “Endangered Language Research and the Moral Depravity of Ethics Protocols.” Language Documentation and Conservation 10 (2016): 243-52.

W3C [World Wide Web Consortium]. Data on the Web Best Practices. Latest published version, 2016.

W3C [World Wide Web Consortium]. WAI: Web Accessibility Initiative. 2016. http//


General Resources for all anthropologists


Modules: Writers, Arienne M. Dwyer, Blenda Femenías, Lindsay Lloyd-Smith, Kathryn Oths, George H. Perry; Editor, Blenda Femenías

Discussants: Workshop One, February 12, 2016: Andrew Asher, Candace Greene, Lori Jahnke, Jared Lyle, Stephanie Simms Workshop Two, May 13, 2016: Phillip Cash Cash, Jenny Cashman, Ricardo B. Contreras, Sara Gonzalez, Candace Greene, Christine Mallinson, Ricky Punzalan, Thurka Sangaramoorthy, Darlene Smucny, Natalie Underberg-Goode, Fatimah Williams Castro, Amber Wutich

American Anthropological Association:
Executive Director, Edward Liebow
Project Manager, Blenda Femenías
Research Assistant, Brittany Mistretta
Executive Assistant, Dexter Allen
Professional Fellow, Daniel Ginsberg
Web Services Administrator, Vernon Horn
Director, Publishing, Janine Chiappa McKenna

Author note: Feedback on this document is welcome. Previous versions: draft v.1, 2016-01-13; draft v.2: 2016-03-13; draft v.3: 2016-05-30; draft v.3.5: 2016-06-10. This document has benefitted from and incorporated the specific comments of Philip Cash Cash, Jenny Cashman, Fatimah Williams Castro, Sara Gonzales, Candace Greene, Jared Lyle, Christine Mallison, Ricardo Punzalan, Thurka Sangaramoorthy, and Stephanie Simms. Naturally, the author is responsible for any errors or infelicities.

You Might Also Like