OpenSoNaR+ / Whitelab 2.0

Corpus Spoken Dutch

After the success of OpenSoNaR, the Dutch language community came up with the request to also provide access to the Corpus Spoken Dutch (CGN) via Whitelab. This collection contains about 1000 hours (more than 9M words!) of recordings of spoken Dutch from various sources. Moreover, the spoken material is fully transcribed. It is also provided with linguistic information such as lemmas and parts of speech.

For this reason we developed Whitelab 2.0 in the OpenSoNaR+ project. Like Whitelab, Whitelab offers 2.0 exploration and search interfaces for all types of users. In addition, Whitelab 2.0 allows users to play the sound clips associated with their search results. For example, researchers or students can collect different pronunciations of the same word. Including the context in which they were uttered. Convenient!

Whitelab 2.0 offers support for unlocking multiple collections. To this end, it now also includes a special interface for admins. In it they can compare corpus statistics and coordinate the metadata of different collections. That way they give their users the best experience.