Text und Data Mining
Text and data mining (TDM) searches through large amounts of text and data using computer-assisted processes. Unstructured data is processed and automatically examined for patterns, trends and connections.
Text and data mining (TDM) refers to various methods employed to search through and evaluate large quantities of texts or data. With the help of computer-assisted analysis procedures, mostly unstructured data is initially prepared in a systematic and machine-readable manner before finally being automatically analysed for patterns, trends and other research-relevant correlations.
When using copyrighted materials such as texts, images or audiovisual media as a data source for TDM, both legal and technical terms of use have to be observed. Generally speaking, the web interfaces of the respective providers are not suitable for directly downloading large quantities of data. If, for example, you would like to analyse large amounts of content from licensed e-resources of the University Library, please note the information we provide in the Self-Service portal (KI 3355) (ZHAW login required).
Many publishers have general rules on the use of text and data mining in their publications. There, you will often also find information on interfaces and their use (registration, default loading and downloading rates, etc.). (List not exhaustive).
- Cambridge University Press
- CrossRef
- Elsevier
- Oxford University Press
- Royal Society of Chemistry
- SAGE
- Springer Nature
- Wiley
In addition to licensed content, there are also freely accessible databases that allow the use of TDM (list not exhaustive):
- Arxiv
Free access to preprints from the fields of physics, mathematics, computer science, statistics, financial mathematics and biology. - BioMed Central
Open access journals from BioMed Central, Chemistry Central and SpringerOpen from the fields of biology and medicine. - Europeana
Digital library with digitised material on scientific and cultural heritage from more than 2,000 European institutions. - HathiTrust Digital Library
Digitised material from more than 100 academic institutions around the world. - Public Library of Science (PLOS)
Access to content from the journals of the Public Library of Science, an open-access scientific publisher. - PubMed Central: Databases and Text Mining Tools
Various freely accessible mining tools that can be used to search through PubMed Central, an archive with freely accessible content from the fields of biology and biomedicine.
Open access to self-created content in the sense of open science facilitates TDM processes. Clear rights management with standardised, machine-readable and open-content Creative Commons licences helps to ensure the legally secure application of TDM methods to data and text corpora.
Additional information:
- Information on TDM and Swiss copyright from CCdigitallaw.ch
- Information on legal aspects when publishing data can be found in the DMLawTool.
- More on ZHAW Research Data Service (ZHAW login required)