Digital Data Management - Linguistic Module - Learn and Teach
Skip to content
Login Communities Publications Calendar About AAA Contact Join Donate Shop Jobs
Clothesline Installation Mobile
Clothesline Installation Desktop

In This Section

Digital Data Management - Linguistic Module

From Our Sponsors
See our latest in Anthropology. Browse Now. Stanford University Press

In This Section

[Slide 1]

Bringing Digital Data Management Training into Methods Courses for Anthropology

Linguistic Anthropology: Principles and Practices of Digital Data Management

Arienne M. Dwyer

[Slide 2]

Recommended citation:

Dwyer, Arienne M. “Linguistic Anthropology: Principles and Practices of Digital Data Management.” In Bringing Digital Data Management Training into Methods Courses for Anthropology, edited by Blenda Femenías. Arlington, VA: American Anthropological Association, 2016.

© American Anthropological Association 2016

Creative Commons Logo This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Bringing Digital Data Management Training into Methods Courses for Anthropology is a set of five modules:

General Principles and Practices of Digital Data Management
Archaeology: Principles and Practices of Digital Data Management
Biological Anthropology: Principles and Practices of Digital Data Management
Cultural Anthropology: Principles and Practices of Digital Data Management
Linguistic Anthropology: Principles and Practices of Digital Data Management

Project support: National Science Foundation, Workshop Grant 1529315; Jeffrey Mantz, Program Director, Cultural Anthropology

[Slide 3]


  1. Review of material from “General principles and practices” module
  2. Key data management practices
  3. Ensuring the future of your data and avoiding catastrophe
  4. The basics: Working with data
  5. What are our responsibilities?
  6. Archiving and re-use of data
  7. Exercises
  8. Instructor notes
  9. References
  10. Additional resources
  11. Acknowledgments

[Slide 4]

Review of material from “General principles and practices” module

[Slide 5]

Key data management practices

Key data management practices can be summarized in four points:

  1. Data should be put into an enduring format.
  2. Data should be discoverable via metadata.
  3. Data should be archived.
  4. Data gathering, archiving, and dissemination should be fully consultative and done with the permission of involved participants.

[Slide 6]

Ensuring the future of your data

Workflow: The data lifecycle

What constitutes data?

[Image: Map, Native Aymara language domain, Peru-Bolivia-Chile. Credit: By Haylli.]

[Optional in-class exercise: Discussion of data workflow]


[Slide 7]

Ensuring the future of your data

Archival, working, and presentation data formats: a useful distinction (Simons 2006)

[Slide 8]

Ensuring the future of your data

Ethical and legal considerations (overview)

Common data disasters include

[Slide 9]

Ensuring the future of your data
Data management plans

Data management plans (DMPs) are closely linked to the data lifecycle through

(On DMPs, see Linguistic Data Consortium 1992-2016)

[Optional in-class exercise: Discuss data management plans]
[Outside-class exercise: Draft a data management plan]

[Slide 10]

Ensuring the future of your data:
Data management plans

What's in a DMP?

[Slide 11]

Ensuring the future of your data:
Data management plans

Photo of Navajo Code Talkers


[Image] From left: Bill Toledo, Robert Walley and Alfred Newman, World War II Navajo code talker veterans, April 12, 2013, at Luke Air Force Base exchange [Credit]


[Slide 12]

The basics: Working with data
Data types in linguistic anthropology

Linguistic anthropological methods

Linguistic anthropologists are likely to


[Slide 13]

The basics: Working with data
Digital and non-digital data

Linguistic anthropological data may be digital or non-digital, structured or unstructured.

Born-digital data: Include “field” recordings, images, geolocations, experimental data

[Slide 14]

The basics: Working with data
Non-digital data

Non-digital data include print, an extant collection or archive, and pre-existing dictionaries and grammars

[Slide 15]

The basics: Working with data

“Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource” (NISO 2004).

Data quality

[In-class exercise: Discussion of data]

[Slide 16]

The basics: Working with data
Data capture, regularization, and organization

[Image] IPA chart-vowels translated into Basque
[Credit] By Gorkaazk


[Slide 17]

The basics: Working with data

Communities share certain linguistic conventions, including orthographic and disciplinary.


[Slide 18]

The basics: Working with data
Data standards and conventions

Organize data and metadata

Document the above activities and conventions (see Digital Preservation 101 [2012] and Case Study [n.d.])

Well-formed data are enduring data.

[Slide 19]

The basics: Working with data
Tools (software)

[Slide 20]

The basics: Working with data
Tools (software)

Digital Humanities tools of use to linguistic anthropologists

[Slide 21]

What are our responsibilities?

Ethical and possibly legal obligation: Consent, whether oral, written, or both; and attribution of value

[Slide 22]

What are our responsibilities?

Ethical and moral obligations

Sharing of data and research results are obligatory in two contexts:

[Optional in-class exercise: Discussion of access]

[Slide 23]

What are our responsibilities?

Legal obligations: Numerous committees and organizations review and monitor projects involving “human subjects.”

Moral obligations, beyond what the IRB requires:

[Optional in-class exercise: Discussion of obligations]

[Slide 24]

What are our responsibilities?


[Slide 25]

Archiving and re-use of data

Why should I archive?

Data care: backup and data protection

[Slide 26]

Archiving and re-use of data

Archival concepts

Use the “How to Deposit” guidelines of existing archives to learn more about

[Slide 27]

Archiving and re-use of data

Key archives for linguistic anthropologists: An alarmingly short list

Open Language Archives Community, OLAC, participating archives,

Mobilization: Re-using your outputs

[Slide 28]

In-class exercises: The basics: Working with data (Slide 15)

Discussion questions:

  1. How do you create data? In what formats? How is it organized?
  2. If you start out a project with the goal of open access (meaning that at least some of the primary data must be publicly accessible),
  3. Name at least one born-digital data type and one non-digital data type that you might use.
  4. What issues should your digital data management plan take into account? How might these issues differ for born-digital vs. non-digital data?
  5. Suppose a Martian (who speaks your language) discovers your miraculously preserved data in 100 years. What documentation would you need to include to make sure the Martian can open and understand your data?

[Slide 29]

Outside-class exercise: Create a first draft of a data management plan (Slide 9)

Objectives: Draft a DMP (1-2 pp.) and answer two reflection questions.

Use an existing DMP as a kind of a checklist, to make sure all elements are included. (See sample DMPs from NEH and NSF available at

  1. Who is your target: what institution's DMP will you use? Funding agency, First Nations/tribal institution, NSF Doctoral Dissertation Improvement Grant (DDIG)
  2. What elements must be included? Permission letter from community; permission letter from archive; acceptable archival formats; research locale and community; research scope; data backup and sustainability
  3. What elements are specific to your project? Community-specific data access and collaboration requirements; community research product desiderata; political or social considerations; special data types; special strategies to protect media in a particularly humid or cold climate.
  4. Your DMP will likely include most of the elements listed on Slides 10 and 11.

Reflection questions:

  1. What difficulties (logistical, methodological, or other) did you identify in the process?
  2. Do the above requirements change your project design? If so, how?

[Slide 30]

Optional in-class exercises

Slide 6. Ensuring the future: Data workflow

Discussion: Give examples. Brainstorm data’s path through the workflow.

Slide 7. Ensuring the future: Formats


  1. Is a word processing document (e.g. ,.docx, .odt) document an Archival, Working, or Presentation format? An mp3 audio? A .tiff image?
    Note, Many A/V formats can be more or less compressed, and for archival purposes, less compressed (less lossy) is always better.
  2. Why might an archival format be awkward to work with or share?
  3. Why might a presentation format be a poor choice for archiving?

[Slide 31]

Optional in-class exercise: Ensuring the future: Data management plans (Slide 9)

Discussion of data management plans

Video and notes of Speaker A were recorded with consent by Local Researcher B and analyzed by graduate student C; these need to be placed on Professor D's website.

Scenario 1:

  1. What will go into the DMP about crediting participants in a research paper?
  2. How about when publishing the video online?

Scenario 2:

  1. How will the DMP attend to changes in consent? Examples: a consenting participant wants to be anonymized or recognized; a consenting participant withdraws consent; community leaders or a participant stipulate(s) that part of the archived materials be closed to the public.
  2. How does your own relationship with the community affect your research design?

[Slide 32]

Optional in-class exercise: What are our responsibilities? (Slides 22, 23)

Ethical, legal, and moral obligations

  1. Instructor presents examples of legal actions that are ethically dubious and ethical actions that are potentially illegal for the students to debate. Course participants then present examples of how they have shared or will share data in the two contexts, and discuss how intellectual property rights and legal issues interact with their obligations as researchers.
  2. Name at least two locally appropriate steps that can be taken to ensure shared data access by the language community.

Closed/Limited/Open Access debate:

  1. Imagine or enact a role-playing debate between people who take strong positions on the issue of protecting the exploitation of community knowledge vs. “all information wants to be free.”
  2. Bring up the strongest arguments for each position (with real-life examples), and then see how best a compromise position that addresses all needs is reached.
  3. Possible roles (who may be argue any one or multiple sides of the debate): Indigenous community elders, indigenous linguists, a digital humanist or corpus linguist, an NSF representative, a PhD student (indigenous of the community, indigenous of another community, or non-indigenous), a specialist professor, the university IRB, etc.
  4. When might we or our language consultants not want to share data? 
  5. Will wide data-sharing lead to researchers being “scooped,” (i.e., having someone publish your intellectual property before you do)?
  6. Not all data users will be uni-disciplinary linguists or even academics; what steps can be taken to make the data maximally accessible to and interesting for multidisciplinary groups as well as non-academics (such as those in public policy, NGOs, unrelated language communities looking for a possible model, and the public)?
  7. Data collection can be faulty and preservation imperfect. What steps can be taken to mitigate mistakes?

[Slide 33]

Instructor notes: Organization and Key data management practices (Slides 3, 5)

For additional information, see also Arienne M. Dwyer, “Expanded Materials for Digital Data Management in Linguistic Anthropology.” Arlington, VA: American Anthropological Association, 2016. Available at

Aim: To introduce best practices in data management for researchers in linguistic anthropology.

Introduction: Data management is crucial for good research. Scholarship creates a range of data forms that require converting, analyzing, storing, and sharing. As scholars, we have a responsibility to make sure that data endure into the future in accessible formats. These data are gathered by researchers (often from participants), and usually receive input from many people. We are therefore also are responsible for the ethical, legal and intellectual property issues arising from these data, including proper attribution/anonymization and adhering to conventions and laws of relevant locales. Archiving and sharing the output of research in print and online venues (a.k.a. publication) requires attending to best practices in data management. Most research funders now require a data management plan, in order that your results and data be enduring and public.

Beyond scope: This module does not cover methods of obtaining grants or human subjects permission. It is not a tutorial on intellectual property or other national or international laws. It is also not a guide to digitization or format conversion. (See References and Additional Resources)

Data management and archiving begin at research design, not after the data are collected.

Ethics begins at research design: we have a responsibility to plan and carry out research in partnership with a community; to ensure that the work benefits that community, as well as our institutions, our funders, and ourselves; to ensure that our work conforms to local moral and ethical practices, institutional regulations, as well as national and international laws.

[Slide 34]

Instructor notes: Key data management practices (Slide 5) and
Ensuring the future of your data (Slides 6, 8)

Pretty good practice is good enough. Good practices are not out of reach. Don’t let “best practices” or this module keep you from learning pretty good practice (Di Paolo et al. 2006).

Module content: The module presents an overview of the key issues in data management via a tip of the iceberg approach: the digital data workflow from research design and project planning to data creation, data management and analysis to preservation, reusability and publication. The contents in each unit can be covered in more or less detail depending on available time. The accompanying exercises ask participants to come up with examples from their own experience and apply that unit's concepts to those examples. Ideally, the instructor will also provide use cases for each unit.

Ensuring the future of your data (Slide 6): Aim: To provide an overview of the issues; each bullet is relevant, no matter what the project. Each of these introductory issues is revisited later in the course. Bullet points can be exemplified by the instructor.

Ensuring the future (Slide 8). Common data disasters and workflow inefficiencies (exemplified by instructor)

[Slide 35]

Instructor notes: The basics: Working with data (Slide 12) and
The basics: Working with data: Naming and conventions (Slide 18)

Overview, Slide 12: Aim: To introduce the many data types and formats, and to describe the minimum a researcher must do to create enduring data. Information regarding software will need regular updating.

Naming and conventions (Slide 18): detailed information

Well-formed data:

[Slide 36]

Instructor notes: What are our responsibilities? (Slides 21–23)

Aim: To discuss researchers' ethical and legal responsibilities and Intellectual Property Rights. It is best to confront these issues during project planning, well before an IRB application. Attention to ethics equals good data. Also emphasized are the limits of Open Access: full consultation with communities forms the basis for solid data management plans and sharing arrangements that align with community norms.

Community access: The community, not just the researcher, determines access, in terms of who has access, and how much access.

Moral obligations, additional elements

[Slide 37]

Instructor notes: What are our responsibilities? (Slide 23)

Agreements with stakeholders

[Slide 38]


Case Study: File Formats. Stanford: Stanford University Libraries, n.d.

Dwyer, Arienne M. 2006. “Ethics and Practicalities of Cooperative Fieldwork and Analysis.” In Fundamentals of Language Documentation: A Handbook, edited by Jost Gipper, Ulrike Mosel, and Nicolaus Himmelmann, 31-66. Berlin: Mouton de Gruyter, 2006.

Di Paolo, Marianna, with the assistance of Gary Holton, Susan Smith, Arienne Dwyer, Steve Moran, Doug Whalen, Julia Good Fox, and Barbara Need. “Collecting Primary Texts, Working Group 1 Report.” In Proceedings of 2006 E-MELD Workshop. 2006.

Digital Preservation 101. Digital POWRR [Preserving (Digital) Objects with Restricted Resources], 2012.

DuBois, John W. Transcription in Action: Resources for the Representation of Linguistic Interaction. Santa Barbara: University of California, 2006. INTERACTION

IPinCH [Intellectual Property Issues in Cultural Heritage Project]. Think before You Appropriate: Things to Know and Questions to Ask in Order to Avoid Misappropriating Indigenous Cultural Heritage. Vancouver: Simon Fraser University, 2016.

IPinCH. Traditional Knowledge Fact Sheet. Vancouver: Simon Fraser University, 2016.

Leipzig Glossing Rules. Leipzig, Department of Linguistics, Max Planck Institute for Evolutionary Anthropology, 2015.

Levine, Melissa. “Policy, Practice and Law.” In DH Curation Guide: A Community Resource Guide to Data Curation in the Digital Humanities. 2016.

Library of Congress. Sustainability of Digital Formats: Planning for Library of Congress Collections. 2013.

Linguistic Data Consortium. Data Management Plans. Philadelphia: University of Pennsylvania, 1992-2016.

Newman, Paul. “Copyright Essentials for Linguists.” Language Documentation & Conservation 1(1) (June 2007): 28-43.

[Slide 39]


National Information Standards Organization. Understanding Metadata. Bethesda: NISO, 2004.

OLAC. Metadata. Open Language Archives Community, 2008.

Simons, Gary F. “Ensuring That Digital Data Last: The Priority of Archival Form over Working Form and Presentation Form.” SIL Electronic Working Papers 2006-003.

Traditional Ecological Knowledge. Washington, DC: Society for Ecological Restoration, 2016. SER

Copyright FAQ. Lexington: Department of Linguistics, University of Kentucky, n.d. UNESCO.

Best practices on Indigenous Knowledge. UNESCO, Management of Social Transformations Programme, n.d.

van Driem, George. “Endangered Language Research and the Moral Depravity of Ethics Protocols.” Language Documentation and Conservation 10 (2016): 243-52.

Additional resources

See also “Resources for Anthropological Digital Data Management” available at

Ethics and rights

American Anthropological Association. Principles of Professional Responsibility. 2012.

American Sociological Association. ASA Code of Ethics. 1999.

Linguistic Society of America. Ethics Statement. 2009.

Panel on Research Ethics. TCPS 2: Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans. Government of Canada, 2014.

World Archaeological Congress. Code of Ethics. 2016.

[Slide 40]


Modules: Writers, Arienne M. Dwyer, Blenda Femenías, Lindsay Lloyd-Smith, Kathryn Oths, George H. Perry; Editor, Blenda Femenías

Discussants: Workshop One, February 12, 2016: Andrew Asher, Candace Greene, Lori Jahnke, Jared Lyle, Stephanie Simms
Workshop Two, May 13, 2016: Phillip Cash Cash, Jenny Cashman, Ricardo B. Contreras, Sara Gonzalez, Candace Greene, Christine Mallinson, Ricky Punzalan, Thurka Sangaramoorthy, Darlene Smucny, Natalie Underberg-Goode, Fatimah Williams Castro, Amber Wutich
American Anthropological Association:
Executive Director, Edward Liebow
Project Manager, Blenda Femenías
Research Assistant, Brittany Mistretta
Executive Assistant, Dexter Allen
Professional Fellow, Daniel Ginsberg
Web Services Administrator, Vernon Horn
Director, Publishing, Janine Chiappa McKenna

You Might Also Like