American Anthropological Association
Expanded Materials for Digital Data Management in Linguistic Anthropology
Arienne M. Dwyer
Recommended citation: Dwyer, Arienne M. “ Expanded Materials for Digital Data Management in Linguistic Anthropology.” Arlington, VA: American Anthropological Association, 2016.
© American Anthropological Association 2016
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
These materials expand on those presented in:
Dwyer, Arienne M. “Linguistic Anthropology: Principles and Practices of Digital Data Management.” In Bringing Digital Data Management Training into Methods Courses for Anthropology, edited by Blenda Femenías. Arlington, VA: American Anthropological Association, 2016.
Project support: National Science Foundation, Workshop Grant 1529315; Jeffrey Mantz, Program Director, Cultural Anthropology
Aim: To introduce best practices in data management for researchers in linguistic anthropology. Timeline: This unit can serve either as a one- or two-week standard university course or a short-term (e.g., 1.5-day) intensive workshop.
Target audience in any country: (Post-)Graduate students before and after data collection; postdoctoral researchers; early- and mid-career faculty; community members; collaborative projects.
Table of Contents
1. Ensuring the future of your data and avoiding catastrophe
2. The basics: Working with data
3. What are our responsibilities
4. Archiving and re-use of data
5. Making the most of your data (Optional additional unit)
This course guide may be used as a course introduction.
- Data management is crucial for good research. Scholarship creates a range of data forms that require converting, analyzing, storing, and sharing. As scholars, we have a responsibility to make sure that data endure into the future in accessible formats. These data are gathered by researchers (often from participants), and usually receive input from many people. We are therefore also are responsible for the ethical, legal and intellectual property issues arising from these data, including proper attribution/anonymization and adhering to conventions and laws of relevant locales. Archiving and sharing the output of research in print and online venues (a.k.a. publication) requires attending to best practices in data management. Most research funders now require a data management plan, in order that your results and data be enduring and public.
- Beyond scope: This course does not cover methods of obtaining grants or human subjects permission. It is not a tutorial on intellectual property or other national or international laws. It is also not a guide to digitization or format conversion. Some further resources on these topics are found in the references and appendices; some will need to be added.
- Data management and archiving begin at research design, not after the data are collected.
- Ethics begins at research design: we have a responsibility to plan and carry out research in partnership with a community; to ensure that the work benefits that community, as well as our institutions, our funders, and ourselves; to ensure that our work conforms to local moral and ethical practices, institutional regulations, as well as national and international laws.
- Data types in linguistic anthropology: Linguistic anthropological methods reflect the inter-disciplinary nature of the sub-discipline, and overlap with qualitative and quantitative methods for linguistics (e.g., documentary linguistics, sociolinguistics, discourse analysis, cognitive tasks) and cultural anthropology (participant observation, interviewing, surveying, and re¬flexivity). Linguistic anthropologists are likely to work with human subjects. They are likely to generate data that includes any or all of the following: audio and/or video (A/V) recordings and transcriptions of them (often with translation and grammatical annotation, which is sometimes time-aligned); notes, sketches, images (photographs, maps [georectified or not], and diagrams), spatial data, artifacts and other physical data; websites, blogs, emails or other Internet-based communication; ultrasound, and MRI; word lists, grammatical paradigms, sentences, grammaticality judgments of them, and texts; the texts may include printed, handwritten, or electronic texts, questionnaires, surveys, and inscriptions, as well as metadata about these primary research data. These data may be digital or non-digital, structured or unstructured.
- Pretty good practice is good enough. Good practices are not out of reach; don't let “best practices” or this course keep you from learning pretty good practice (EMELD 2006).
- Key data management practices are common to all anthropologists. To maximize access to and ethical use of anthropological data, the AAA advocates unified guidelines for data management areas in common among all sub-disciplines. Data management methods common to all anthropologists can be summarized in four points:
- Data should be put into an enduring format.
- Data should be discoverable via metadata.
- Data should be archived.
- Data gathering, archiving, and dissemination should be fully consultative and with the permission of involved participants.
The course sessions present an overview of the key issues in data management via a tip of the iceberg approach: the digital data workflow from research design and project planning to data creation, data management and analysis to preservation, reusability and publication. Each of the five numbered units (four main units and one optional unit) can constitute one or more class session(s); the contents can be covered in more or less detail depending on available time. Each unit requires participants to come up with examples from their own experience and apply that unit's concepts to those examples. Ideally, the instructor (and/or a future version of this course) would also provide use cases for each unit.
1. Ensuring the future of your data and avoiding catastrophe
Aim: To provide an overview of the issues; each bullet is relevant, no matter what the project. Each of these introductory issues is revisited later in the course. Bullet points can be exemplified by the instructor.
- Workflow: The data lifecycle
- What constitutes data? Participants give examples, brainstorm its path through the workflow.
- Data in, data out: input/output formats; planning for output at the time of input
- in situ (“field”) work or other research creates input (recordings, notes, maps, images)
- Analysis and archiving creates output (transcriptions, translations, annotations, lessons, articles, websites, books)
- Archival, Working, and Presentation data formats (Simons 2006): a useful distinction
- Archival: highest-quality, lossless (uncompressed), non-proprietary, structured
- Working: whatever format and size that allows for data analysis
- Presentation: optimized for sharing (e.g., web or print), so usually compressed audio-visual (A/V) and highly formatted text are preferable.
- Is a word processing document (e.g., .docx, .odt) document an Archival, Working, or Presentation format? An mp3 audio? A .tiff image? N.B. Many A/V formats can be more or less compressed, and for archival purposes, less compressed (less lossy) is always better.
- Why might an archival format be awkward to work with or share? Why might a presentation format be a poor choice for archiving?
- Ethical and legal considerations (for details see Unit 3, What are our responsibilities?)
- Planning data management entails negotiation with research subjects and/or a community, about:
- Informed consent for the collecting, archiving, and sharing of data.
- Access to and availability of data.
- Issues of intellectual property.
- Common data disasters and workflow inefficiencies (exemplified by instructor):
- Losing the sole copy of one's data, by failing to back up one's in situ research
- solution: regular backup (LOCKSS)
- storing data in an obsolete proprietary format, such that it can no longer be read
- solution: use open formats and consider storing in multiple formats
- using a non-Unicode font, and later having a “character salad”
- solution: use a Unicode-based font (there are hundreds, but e.g. Arial Unicode)
- overwriting a file with an inferior copy with the same name
- solution: versioning (better) or unique file naming
- a collaborative team naming files every which way, including wedding.wav.
- solution: systematic file naming
- Data Management Plans (DMPs) and their link to the data lifecycle
- Concerns the sharing and protecting data and participants
- Motivations: funding requirement; dissertation/publication; better analysis; data reuse.
- Our responsibilities: sharing maximally and ethically
- Maximally, because a greater number of people can (1) re-use the data; (2) learn and benefit from your work; (3) learn about you and your scholarship; (4) benefit from your funder, which helps justify continued funding. (read more: Van den Eynden & Bishop 2014)
- Ethically, because we (1) enact the wishes of all project participants (especially those of the community, and including data access and credit); (2) attend to local, national, disciplinary, and funder ethical practices and codes; (3) do no harm, including unintentional harm.
- What's in a DMP?
- Ethics (research design, data gathering, collaboration, sharing, attribution, privacy, etc.)
- Data (raw and processed): What constitutes the data and its (required accompanying) metadata?
- Metadata and project documentation
- Data formats and needed tools: In what formats will the data and relevant tools be archived?
- Data sustainability and archiving plan: How exactly will the materials be archived and preserved?
- A named archive: At which archive will the materials be hosted?
- roles and responsibilities of participants: Who does what? How are they credited?
- Principles of access, attribution, and privacy for collaboration and sharing, including:
- production, access, and sharing of research products and their re-use: How and who? And how much? Who controls the data?
- relevant intellectual property rights (IPR): nation-states' perspective
- relevant IPR: Indigenous perspectives (e.g. Nichols et al. 2010)
- local, national and international laws
Further reading: Linguistic Data Consortium DMP resources
- Video and notes of Speaker A were recorded with consent by Local Researcher B and analyzed by graduate student C, and need to be placed on Professor D's website.
- What will go into the DMP about crediting participants in a research paper?
- How about when publishing the video online?
- Scenario 2:
- How will the DMP attend to changes in consent? Examples: a consenting participant wants to be anonymized or recognized; a consenting participant withdraws consent; community leaders or a participant stipulate(s) that part of the archived materials be closed to the public.
- How does your own relationship with the community affect your research design?
Exercise: Create a first draft of a Data Management Plan (1-2 pp.) and answer the two reflection questions.
- Use an existing DMP (e.g., NSF, NEH) as a kind of a checklist, to make sure all elements are included.
- Who is your target: what institution's DMP will you use? (E.g. funding agency, First Nations/tribal institution, etc.); for grad students, e.g. NSF-DDIG
- What elements must be included? (E.g. permission letter from community; permission letter from archive; acceptable archival formats; research locale and community; research scope; data backup and sustainability; etc.)
- What elements are specific to your project?
- These may include community-specific data access and collaboration requirements, community research product desiderata, political or social considerations, special data types, and special strategies to protect media in a particularly humid or cold climate.
- Not all of these belong in a DMP, but all will affect research design.)
- Yours will likely include most of the following elements listed in What's in a DMP? above.
- What difficulties (logistical, methodological, or other) did you identify in the process?
- How do the above requirements change your project design, if at all?
2. The basics: Working with data
Aim: To introduce the many data types and formats, and to describe the minimum a researcher must do to create enduring data. Part two (regarding software) will need regular updating.
2.1. Data and metadata Projects that create digital data during research (“in the field”) may need immediate storage for large files (e.g. A/V recordings and their associated metadata). DMPs describe each “field” and archival data type, and follow best practices for storing originals and altered versions.
- Digital and non-digital data
- Born-digital data: e.g., “field” recordings, images, geolocations, experimental data
- Require associated metadata (to understand the scope and organization of the data)
- May be quite large; require external storage planning (e.g., hard drive or Internet-based)
- Archiving may require conversion to open formats
- Require systematic file-naming, organizing, versioning (always retain an original, unchanged version), and backup
- Non-digital data: print; extant collection or archive; pre-existing dictionaries and grammars
- May require hundreds of hours to digitize and structure
- Processes may be different from born-digital data, e.g. :
- creating a dictionary from a digital wordlist only requires structuring and possibly encoding conversion;
- digitizing a print dictionary requires either scanning and OCR or keyboarding, and then structuring.
- Archival data may also require conversion to open formats
- Also require systematic file-naming, organizing, versioning (always retain an original, unchanged version), and backup
“Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.” (NISO 2004).
- Like a bibliographic entry (book citation), but for your data
- Provides the necessary context about your data, allowing access and retrieval
- Types of metadata: descriptive vs. structural; also administrative, rights management, preservation
- Which elements, and how much is enough are your decisions.
- One possible metadata standard: OLAC 2008
- Data quality: lossless (or less lossy), documented with metadata.
- Data quality requirements do vary depending on research purpose.
- Phonetic analysis, for example, will generally require higher-quality audio than syntactic work.
- Still: archive the highest possible data quality, regardless of your topic or discipline.
Data capture, regularization, and organization
- Digitization, regularization, conventionalization, conversion
- Digitization: representing analog data (sound, activity, object, etc.) in a digital format, i.e. into sets of binary numbers.
- Regularization: systematically replacing irregular forms with regular ones, in order to make certain data comparable. (A version of the non-regularized data is retained.) For example, some transcriptions of conversations might be done in the International Phonetic Alphabet, and others in multiple practical orthographies. To make these usable together, one system (e.g. the orthographic) is regularized to the other, such as to IPA).
- Conventionalization: Communities share certain linguistic conventions (e.g. orthographic, disciplinary, etc.). For example, it is conventional for discourse analysts to transcribe speech in the language's official orthography, adding conventionalized speech annotation marking (such as the Santa Barbara system). Phoneticians or documentary linguistic anthropologists, by contrast, often conventionally transcribe in a phonetic system (e.g., IPA), and those interested in grammar would add interlinear grammatical annotation, conventionalized by the so-called Leipzig Glossing Rules. Ethnographic convention often encourages the identification of key words. And so on.
- Conversion: transforming from one data format or structure to another, such as non-Unicode to Unicode character encoding; from a spreadsheet to comma separated text (csv).
Data standards and conventions
- Data format: What file format are the data stored in? (Best: open formats)
- Character format: How are the characters encoded? (Best: Unicode (UTF); second best: [lower) ASCII].
- File naming: consistent, documented, short, no whitespace or upper ASCII characters.
- File structure: consistent, documented, avoid over-use of folders
Organize data and metadata (spreadsheets, databases, structured text, audio, video, etc.)
Document the above activities and conventions
Need more information?
Well-formed data is enduring data.
- What is it? Distinguishing the form and content of data
- Well-formedness:durable and reusable
- Interoperable: document formats and data structures (e.g., in a readme.txt), so that your data can be opened with a wide range of generic software on any platform
- Keep data in non-proprietary (open) formats
- Proprietary (closed) formats: encoding is a trade secret of a commercial company; you generally have to purchase their software (and keep purchasing the latest version) in order to decode the data. If you don't, you lose your content.
- Non-proprietary (open) formats: published and open to everyone.
- In between the above two types, there are also
The same data allow multiple possible outputs if well-formed
- “open” proprietary formats: the company publishes the specifications, e.g., mp3;
- formerly closed formats whose specs are recently published, but whose code is not re-usable, like Microsoft's Open Specification Promise;
- closed proprietary formats that are not publicly documented, but at least are designed to interoperate in a limited way, e.g. rtf).
Discussion exercises of data and metadata case studies:
Instructor: can provide examples of data/metadata in discourse analysis, documentary ethnolinguistics, language socialization
- How do you create data? In what formats? How is it organized?
- If you start out a project with the goal of open access (at least some of the primary data must be publicly accessible), how does that choice affect your working with speakers and creating a data collection?
- In particular, how do you respect community norms, which will almost certainly show that not all data can be freely available to the public?
- Name at least one born-digital data type and one non-digital data type that you might use. What issues should your digital data management plan take into account? How might these issues differ for born-digital vs. non-digital data?
- Suppose a Martian (who speaks your language) discovers your miraculously-preserved data in 100 years. What documentation would you need to include, to make sure the Martian can open, and understand your data?
2.2. Tools (software)
Specific tools rapidly become obsolete; these will need regular updating. In any case, open-source tools that allow maximal re-use are preferable.
How tools fits into the workflow; proprietary compared to open-source tools
- Transcription tools
- For discourse and conversation analysis (DA/CA): no special software needed; optionally EXMARaLDA
- For time-alignment: Praat (originally for phonetic analysis), Transcriber
- Multi-level annotation tools for A/V (including time alignment): ANVIL, ELAN
- Producing Interlinear Glossed Text (minimally: transcription, grammatical glossing, and translation)
Digital Humanities tools of use to linguistic anthropologists
- To start out, web-based interfaces are easy, e.g. text analysis tools e.g. Taporware allow concordancing, frequency counts, etc.; online visualization tools like Vidi to map or graph data.
- For more functionality and customization, however, locally-installed tools (which may be command-line or GUI) are useful, including R (and R Studio) for both quantitative research and visualization, Gephi for network visualization, and so on.
At start of project: get the necessary tools that allow you to collect, organize and work with digital data.
Optional topic: Further workflow: corpus development, lexicon, interlinear glossed texts
- Lexical, text, and other databases
- Discuss knowledge-sharing between participants, of the benefits and limitations of the linguistic anthropology tools you've used.
- Again, imagine a Martian discovers your well-preserved data. What tools might you need to include (if any), in addition to documentation, to allow the Martian to re-use your data?
3. What are our responsibilities?
Aim: To discuss researchers' ethical and legal responsibilities and Intellectual Property Rights. It is best to confront these issues during project planning, well before an IRB application. Attention to ethics equals good data. Also emphasized are the limits of Open Access: full consultation with communities forms the basis for solid data management plans and sharing arrangements that align with community norms.
- Responsibilities include ethics, rights, and legal issues
- Ethical (and possibly legal) obligation: Consent (oral vs. written); attribution of value
- Informed consent: participation in research is voluntary, and research participants must be fully aware of the purpose of the research, how the data will be gathered, how their privacy (and/or recognition) will be handled, and how the data will be shared.
- Attribution of value means recognizing Indigenous Knowledge, Traditional Knowledge or Traditional Ecological Knowledge of indigenous and local peoples (UNESCO n.d.; IPinCH 2016; Traditional Ecological Knowledge 2016).
- Ethical/moral obligation: sharing of data and research results is obligatory in two contexts:
- For community-based research, respecting community norms for data access
- The Open Access mantra: “information wants to be free” (Stewart Brand)
- The community, not just the researcher, determines access (who? how much?)
- Project data are co-owned by researcher and community.
- What is shared depends on the types of data and the research context.
- Indigenous communities may also have their own protocols. (See Legal obligations, below)
- “Giving back”: to the individuals and community in which the data were collected, in a format, in a language, and with content that can be used locally, while attending to privacy concerns of participants.
- Sharing with the public research results and at least some of the data, unless restricted.
- Legal obligations
- University IRB (institutional review board), independent ethics committee (IEC), ethical review board (ERB), research ethics board (REB): a committee that reviews and monitors projects involving “human subjects.”
- Required at most North American institutions
- Separately, approval of other IRBs may be needed, such as school boards and Tribal IRBs.
- Moral obligations: Beyond what the IRB requires, researchers should:
- Do no (even unintentional) harm; and
- Arguably, create a research product that is useful to the native-speaker participants.
- Moral obligations sometimes appear at odds with legal obligations, such as
- Requesting written permission from “a community” as an IRB requires often sows mistrust (Dwyer 2006, cf. van Driem 2016)
- Research products useful to communities (e.g., pedagogical materials, children's books) are usually not allowable expenses by funders
- Such difficulties do not justify avoiding these legal and moral obligations;
- Resolving these issues is usually community- and project-specific.
- Examples of key conflicts linguistic anthropology (instructor supplies additional examples)
- Withdrawing “informed consent” on dictionary making
- Rights: Intellectual Property Rights (IPR), other rights
- IPR: about ownership of “creations of the mind” (see Newman 2007; Levine 2016)
- Based on Western notion of ownership, but can include TEK
- Copyright: who owns and can distribute a particular work (see e.g. UKYLing 2016)
- Recognition: All participants have the right to receive and credit for their contributions
- Co-authorship, and/or citing the specific contributions of all participants in the metadata of research data, as well as research products
- Anonymity: participants have the right to be anonymized in the case of sensitive data.
- Using alphanumeric identifiers (rather than names) for speakers
- Ensuring that speakers are not identifiable by public audio-visual materials
- Not all data can be effectively anonymized.
- Legalities: Projects subject to the laws of host and researcher country and international law, at a minimum.
- Taking ethics, rights, and laws together, our responsibilities include:
- Agreements with stakeholders
- “Stakeholders” include participants, research team, local or national bodies, funding bodies, and home institution. (See Dwyer 2006.)
- Agreements are proposed during research design and re-visited for potential changes throughout the research.
- Agreements concern many key topics: attribution/anonymization, compensation, responsibilities and division of labor, access to and rights in data and field notes, co-authorship on deliverables including publications, data access, archiving plan and liabilities.
- Agreements may be written, verbal, or third-party (e.g., via a village leader).
- International and/or interdisciplinary teams must consider all relevant nations and ethical codes/practices of all relevant disciplines.
- The IPinCH project is an excellent resource; see Think before You Appropriate.
- Instructor presents examples of legal actions that are ethically dubious and ethical actions that are potentially illegal for the students to debate. Course participants then present examples of how they have/will share data in the two contexts, and discuss how intellectual property rights and legal issues interact with their obligations as researchers.
- Name at least two locally appropriate steps that can be taken to ensure shared data access by the language community.
Exercise: Closed/Limited/Open Access debate:
Imagine or enact a role-playing debate between people who take strong positions on the issue of protecting the exploitation of community knowledge vs. “all information wants to be free.” Bring up the strongest arguments for each position (with real-life examples), and then see how best a compromise position that addresses all needs is reached. Possible roles (who may be argue any one or multiple sides of the debate): Indigenous community elders, indigenous linguists, a digital humanist or corpus linguist, an NSF representative, a PhD student (indigenous of the community, indigenous of another community, or non-indigenous), a specialist professor, the university IRB, etc.
4. Archiving and re-use of data
Why should I archive?
- Many linguistic anthropology research funders require archives.
- The data (a product of enormous effort) are backed up in a trusted repository.
- Other people and the research can re-use the data, subject to any access restrictions.
Data care (backup and data protection):
- LOCKSS (Lots of Copies Keeps Stuff Safe); pros and cons of online/offline storage
- Formats - document your archival, working, and presentation formats (Simons 2006).
- Versioning - keep track of different versions of the data (possibly with versioning software such as Subversion, a must for collaborative projects)
Key archives for linguistic anthropologists; see Appendices
- The list is alarmingly short
- Use the “How to Deposit” guidelines of existing archives to learn more about common data formats, ethical protocols, and best practices
- Users and use case scenarios
- Access: Fully open, Graded/Tiered, Closed (based on confidentiality agreements)
- Ownership: Intellectual property, copyright
- Deposit: guidelines for which language, data and metadata formats
- Original (“raw”) vs. edited data
Mobilization: Re-using your outputs
- publishing articles while writing a dissertation
- publishing a monograph after the dissertation
- sharing primary data and metadata
- maximizing re-use potential by others
- When might we or our language consultants not want to share data?
- Will wide data-sharing lead to researchers being “scooped,” (i.e., having someone publish your intellectual property before you do)?
- Not all data users will be uni-disciplinary linguists or even academics; what steps can be taken to make the data maximally accessible to and interesting for multidisciplinary groups as well as non-academics (e.g. those in public policy, NGOs, unrelated language communities looking for a possible model, and the public)?
- Data collection can be faulty, preservation imperfect; what steps can be taken to mitigate mistakes?
5. Making the most of your data (Optional additional unit)
Below five separate topics are outlined, whose only commonality is that they are beyond introductory. Each topic awaits further development.
- Using Regular Expressions (RegEx) to convert data into new forms
Working collaboratively at great distance, such as via remote data access or collaboration environments.
- RegExes are computational shorthand entered usually into a command line interface
- They allow for more powerful transformations than search and replace.
- General RegEx tutorials or sites, and RegEx linguist tutorials can be used to learn more.
Making data and websites accessible to people of all abilities (e.g. colorblind, hearing/sight impaired, multilingual, non-English speaker, elderly etc.), and to people with slow internet connections. See the W3C's Web Accessibility Initiative recommendations.
- Documentable collaboration practices include using a project wiki rather than email and attachments for communication.
- Establishing your own digital archive (optional topic if there is interest)
- Include any of the following more advanced topics: versioning; linked open data; controlled vocabularies; ontologies; data persistence (location and formats accessible into the future)
- The trusted repositories mandated by funding agencies are at best in very short supply, and there's little funding to create them.
- Archiving includes not just data, but also metadata and code.
- If you already have an archive, how can you improve it? Via assessment metrics ISO16363 and the Trustworthy Repositories Checklist
- 5. Planning for data re-use
- In teaching
- In comparative research
- In research design
- In re-analysis (with or without supplementary new data)
- Hands on data cleaning, e.g. using data from a website
- Create 2 tables and then try to merge them (shows how you have to create the conditions for easy re-use.)
- Using RegEx for simple substitutions
- RegEx lite: many text editors have some common RegEx features at a menu click; these may already be on your laptop, such as Notepad++ (PC), BBedit or TextWrangler (Mac), or Aquamacs. Give these a try to, for example, convert ALL UPPERCASE LETTERS to all lowercase.
- Can't be bothered with RegEx? Try cutting and pasting your text into Data Wrangler
- Accessibility: Look over your own or the AAA's website and make accessibility recommendations based on the W3C's guidelines. (2) For your current research project, what are three things you could do to present the data in a more accessible fashion?
- Digital archiving:
- Look at the ingesting (data-acquisition) requirements for the following two archives: (1) The Language Archive; (2) AILLA. Will they accept anyone's data from any language? What kind of data and metadata formats do they accept?
- Imagine at least two use cases for each of the two archives above.
Websites mentioned and/or linked in this document do not necessarily represent the views of the author. Commercial websites mentioned and/or linked here are intended as examples, and do not represent the endorsement of the author.
Case Study: File Formats. Stanford: Stanford University Libraries, n.d. http://library.stanford.edu/research/data-management-services/case-studies/case-study-file-formats
Digital Preservation 101. Digital POWRR [Preserving (Digital) Objects with Restricted Resources], 2012. http://digitalpowrr.niu.edu/digital-preservation-101
DuBois, John W. Transcription in Action: Resources for the Representation of Linguistic Interaction. Santa Barbara: University of California, 2006. http://www.linguistics.ucsb.edu/projects/transcription/representing
Dwyer, Arienne M. 2006. “Ethics and Practicalities of Cooperative Fieldwork and Analysis.” In Fundamentals of Language Documentation: A Handbook, edited by Jost Gipper, Ulrike Mosel, and Nicolaus Himmelmann, 31-66. Berlin: Mouton de Gruyter, 2006. https://kuscholarworks.ku.edu/handle/1808/7058
EMELD [Electronic Metastructure for Endangered Languages Data] 2006. Working Group 1 report on Collecting Primary Texts, 2016. (Marianna Di Paolo, Gary Holton,
Susan Smith, Arienne Dwyer, Steve Moran, Doug Whalen, Julia Good Fox, and Barbara Need.) http://emeld.org/workshop/2006/wg/wg1-report.rtf
IPinCH [Intellectual Property Issues in Cultural Heritage Project]. Think before You Appropriate: Things to Know and Questions to Ask in Order to Avoid Misappropriating Indigenous Cultural Heritage. Vancouver: Simon Fraser University, 2016. http://www.sfu.ca/ipinch/sites/default/files/resources/teaching_resources/think_before_you_appropriate_jan_2016.pdf
IPinCH. Traditional Knowledge Fact Sheet. Vancouver: Simon Fraser University, 2016. http://www.sfu.ca/ipinch/sites/default/files/resources/fact_sheets/ipinch_tk_factsheet_march2016_final_revised.pdf
Leipzig Glossing Rules. Leipzig, Department of Linguistics, Max Planck Institute for Evolutionary Anthropology, 2015. https://www.eva.mpg.de/lingua/resources/glossing-rules.php
Levine, Melissa. “Policy, Practice and Law.” In DH Curation Guide: A Community Resource Guide to Data Curation in the Digital Humanities. 2016. https://guide.dhcuration.org/contents/policy-practice-and-law/
Library of Congress. Sustainability of Digital Formats: Planning for Library of Congress Collections. 2013. http://www.digitalpreservation.gov/formats/
Library of Congress. 2013. Recommended Formats Statement. 2015-2016. http://www.loc.gov/preservation/resources/rfs/index.html
Newman, Paul. “Copyright Essentials for Linguists.” Language Documentation & Conservation 1(1) (June 2007): 28-43. http://scholarspace.manoa.hawaii.edu/bitstream/handle/10125/1724/newman.html
National Information Standards Organization. Understanding Metadata. Bethesda: NISO, 2004. http://niso.org/publications/press/UnderstandingMetadata.pdf
OLAC. Metadata. Open Language Archives Community, 2008. http://www.language-archives.org/OLAC/metadata.html
Simons, Gary F. “Ensuring That Digital Data Last: The Priority of Archival Form over Working Form and Presentation Form.” SIL Electronic Working Papers 2006-003. http://www-01.sil.org/silewp/2006/003/SILEWP2006-003.htm
UNESCO. Best practices on Indigenous Knowledge. UNESCO, Management of Social Transformations Programme, n.d. http://www.unesco.org/most/bpindi.htm
Van den Eynden, Veerla, and Libby Bishop. Incentives and Motivations for Sharing Research Data, a Researcher’s Perspective. A Knowledge Exchange Report. 2014. http://repository.jisc.ac.uk/5662/1/KE_report-incentives-for-sharing-researchdata.pdf
van Driem, George. “Endangered Language Research and the Moral Depravity of Ethics Protocols.” Language Documentation and Conservation 10 (2016): 243-52. http://scholarspace.manoa.hawaii.edu/bitstream/10125/24693/1/vandriem.pdf
W3C [World Wide Web Consortium]. Data on the Web Best Practices. Latest published version, 2016. http://www.w3.org/TR/dwbp/
W3C [World Wide Web Consortium]. WAI: Web Accessibility Initiative. 2016. http//www.w3.org/WAI/
General Resources for all anthropologists
- Data management and DMP planning tools
- Data management requirements and sample DMPs from funders
- U.S. National Science Foundation (NSF)
- U.S. National Endowment for the Humanities (NEH)
- Data management guidance from institutions
- Australian National Data Service on Data Management [html] and DMPs [html]
- Economic and Social Research Council - DMP guidance [pdf]
- SOAS Endangered Languages Archive - depositing guidelines [html]
- Linguistic Data Consortium DMP resources [html]
- ICSPR (Inter-university Consortium for Political and Social Research) Guidelines for effective DMPs [html] [pdf]
- Best practices
- Resources on ethics and rights
- Resources specific to linguistic anthropologists
- Archives and trusted repositories for linguistic anthropology data: Take a look at the OLAC participating archives; some examples include:
Modules: Writers, Arienne M. Dwyer, Blenda Femenías, Lindsay Lloyd-Smith, Kathryn Oths, George H. Perry; Editor, Blenda Femenías
Discussants: Workshop One, February 12, 2016: Andrew Asher, Candace Greene, Lori Jahnke, Jared Lyle, Stephanie Simms Workshop Two, May 13, 2016: Phillip Cash Cash, Jenny Cashman, Ricardo B. Contreras, Sara Gonzalez, Candace Greene, Christine Mallinson, Ricky Punzalan, Thurka Sangaramoorthy, Darlene Smucny, Natalie Underberg-Goode, Fatimah Williams Castro, Amber Wutich
American Anthropological Association:
Executive Director, Edward Liebow
Project Manager, Blenda Femenías
Research Assistant, Brittany Mistretta
Executive Assistant, Dexter Allen
Professional Fellow, Daniel Ginsberg
Web Services Administrator, Vernon Horn
Director, Publishing, Janine Chiappa McKenna
Author note: Feedback on this document is welcome. Previous versions: draft v.1, 2016-01-13; draft v.2: 2016-03-13; draft v.3: 2016-05-30; draft v.3.5: 2016-06-10. This document has benefitted from and incorporated the specific comments of Philip Cash Cash, Jenny Cashman, Fatimah Williams Castro, Sara Gonzales, Candace Greene, Jared Lyle, Christine Mallison, Ricardo Punzalan, Thurka Sangaramoorthy, and Stephanie Simms. Naturally, the author is responsible for any errors or infelicities.