Institute for Language Sciences Labs

How-tos

Accessing the CGN (Corpus Gesproken Nederlands)

Last updated on 22 September 2020 by Ty Mees

If you have suggestions on how to improve this document, or find mistakes, please send them to ilslabs@nulluu.nl

As an ILS Labs user you can access the CGN (Corpus Gesproken Nederlands) under Windows and Linux on all lab PC’s (and from home). It is hosted on the UU O-Drive, under O:\Research\GW\Projects\CGNv2.

To run the corpus exploitation software (COREX) one needs to run either corex_start_for_linux.sh or corex_start_for_windows.bat. Please note that starting corex might take up to 10 seconds!

The future of COREX software is unknown, the software is getting rather old and we expect some changes in infrastructure with regard to the CGN. The current situation is that the CGN + COREX  is only accessible from within the lab computers. Accessing the corpora share can be done via:

  • In the lab (for instance K06), under Windows OS: the shared Lab-Drives should have the letter L and should be visible in file explorer. Here you should be able to find the shared folder corpora which contains the CGNv2 folder.
  • In the lab (for instance K06), under Linux OS: the Lab-Drives mount should be available as a link on your desktop.
  • From outside the lab: we do not recommend connecting to any corpora from outside the lab. Connection, speed and software problems are serious obstacles.

If you expect to be searching the corpus very intensely for a longer period we recommend asking us to create a local copy for you. This is beneficial because running large searches via the network is substantially slower then using a local copy.

New online tools

There are also many online initiatives, through which CGN is accessible and where modern tools have become available to do just about everything that is possible with COREX. Think of highly optimised searching CGN (or many other corpora) for simple words, or expert searches with Corpus Query Language. With your university credentials, you should be able to log in and have a look. Most of these projects are still actively developed and will be maintained for quite some years.

OpenSonar (INT)

https://portal.clarin.inl.nl/opensonar_frontend/opensonar/search

This tool will also make it possible to link to the audio files in the search results, however, this is still under development.

Treebank searchable CGN links

PaQu (RUG):

http://www.let.rug.nl/alfa/paqu

GrETEL 3 (Leuven):

https://gretel.ccl.kuleuven.be/gretel3/ 

GrETEL 4 (UU, under development)

http://gretel.hum.uu.nl/gretel4/ng/home

General and more specific corpus tools links can be found here:

http://portal.clarin.nl

http://portal.clarin.nl/clariah-tools-fs