Query Parameters


Order by: Age MLU
Age range: to months.
Group size: months.
MLU range: to .
Group size:
Split sexes:

Quick Help

Here follows a short description of the query parameters. For more info please see the FAQ.

Search Bar: Here you can enter one or multiple words. Words that should be counted separately are separated by ','. Words that should be counted together are separated by '|'. '*' can be used as a wild-card character. E.g. the search string: 'dino*, dog|cat' would yield a two line graph with one line showing frequencies of words beginning with 'dino' and the other line showing frequencies of 'dog' and 'cat'.
Order by: Decides whether word frequencies will be ordered according to the age of the children or the MLU (Mean Length of Utterance).
Age/MLU range:Range of the age/MLU of the children to be included.
Group size: Decides how the data is grouped. A group size of 12 months will yield a graph where e.g. the data point labeled 24 will include children in the range 24 ≥ AGE < 36.
Split sexes: Splits the data by the sex of the children. Will result in an graph with twice the lines as entered search terms.

bottom corner

FAQ - Friendly Asked Questions

What is ChildFreq?
A tool to explore word frequencies in the American and British part of the Childes database. E.g, ChildFreq can be used to create hypotheses of word acquisition in children or to see how frequencies of words or word groups relate to each other in children of different age. It can of course also be used to answer less serious questions such as: "Do children in Childes speak more about dinosaurs than candy?". Even though the Childes database is huge one should be aware of that it does not necessary represents typical child language development. With this in mind, ChildFreq should still be a useful tool to find inspiration for new hypotheses.

What is the target audience of ChildFreq?
Researchers and others that are interested in language and child development.

What is Childes?
From wikipedia.org:

The Child Language Data Exchange System (CHILDES) is a corpus established in 1984 by Brian MacWhinney and Catherine Snow to serve as a central repository for first language acquisition data. Its earliest transcripts date from the 1960s, and it now has contents (transcripts, audio, and video) in over 20 languages from 130 different corpora, all of which are publicly available worldwide.

The version of Childes used in ChildFreq is the one that was publicly available on the Childes homepage 1st of February 2012.

Is there any other documentation of ChildFreq than this FAQ?
Yes! There is a technical report describing ChildFreq in more detail which can be found here. Please cite that report if you use ChildFreq in your research.

How can the search string parameter be formatted?
The search string parameter can contain letters 'a' to 'z', space, and the following special characters, ',|*?'.
',' separates words so that they are counted separately.
'|' clusters words so that they are counted together.
'*' is a wild-card character that matches zero or more characters. E.g. 'cup*' would count both 'cup', 'cups', and 'cupcake'.
'?' is not strictly a special character as it counts the question marks in the Childes database.That is, it counts the number of questions.
The two words 'xxx' and 'yyy' are also special, the first indicates unintelligible speech and the latter indicates a word that was coded phonological. It is also possible to include space to count statements such as 'give me' or 'I like'.

Here follows some examples of possible search strings:
'cat, dog' would count 'cat' and 'dog' separately.
'cat|dog' would count 'cat' and 'dog' as one word.
'cat|dog, dino*' would count 'cat' and 'dog' as one word and all words beginning with 'dino' as one word.
'he|him|his, she|her' would count the male and female pronouns as separate words.
'i like *' would count all occurrences of 'i like' followed by any word.

What is MLU?
MLU stands for Mean Length of Utterance and is a measure of complexity in children's speech taken by calculating the average number of morphemes per utterance. When MLU is used in ChildFreq it is directly calculated from the morpheme counts in Childes. As not all transcriptions include morpheme counts some are filtered away when ordering the data by MLU.

What happens when I check 'Split Sexes'?
Enabling this option splits the data by the sex of the children. And thus will result in an graph with twice the lines as entered search terms. The words are prepended with 'F' and 'M' to indicate word frequencies for females and males. As not all transcripts are tagged with the sex of the child some are filtered away when this option is checked.

What if I don't like the layout of the charts generated by ChildFreq?
That's when you should copy the information in the tables displayed below that graph and paste into your favourite spreadsheet program. In this way you can customize the chart to your liking.

Who is behind ChildFreq?
The one behind ChildFreq is me, Rasmus Bååth. I am a research Assistant at Lund University Cognitive Science and I implemented ChildFreq as a part of two projects, the VAAG project and the CCL project. Except for language developments my main interests are semantics and cognitive algorithms/robotics (I would like to write artificial intelligence but that term is (un)fortunately a bit passé). If you want to know more about me and my research check out my webpage.

Is it possible to add SOME_FEATURE to ChildFreq?
Yes it is possible, please send emails with suggestions for new features to rasmus.baath@lucs.lu.se! This does not necessary mean that suggested features will be implemented though. If you have an interest in collaboration it would be an incentive to add new features.

What is the picture of seemingly randomly sprinkled words found in the header?
It is a picture generated by Wordle, a program that visualizes word frequency. Frequent words are shown larger, while less frequent words are shown smaller. The word frequencies used for the header image are from all child utterances from the American and British parts of the Childes database. Here is a full Wordle picture of Childes, where frequent English words (such as 'I') are removed. And here is one where all words are included.

Is the source of ChildFreq available?
Yes, it is available here under the open source MIT license .

What do I do if this FAQ doesn't hold the answer to my question?
Just send it to rasmus.baath@lucs.lu.se!

bottom corner