Information

  • What is this?

    Context of Diacritics is an analysis of diacritics made to help type designers with refining the character sets of their fonts.

  • Why?

    Because you should design fonts that support as many languages as possible. Let's say your font contains an st ligature. Which variations of this ligature with diacritical marks should you make? Of course, you could create all possible combinations to be sure you don't miss anything. Or you can check this database and find out which variations are actually used in real life.

  • Where does the data come from?

    The data is a result of a frequency analysis of articles from Wikipedia. For now, 2 million characters were analyzed for each language. That's like a book with 1000+ pages.

  • What is absolute frequency?

    Absolute frequency is the total number of occurrences of a diacritical letter or combination found in the analyzed text.

  • What is relative frequency?

    Relative frequency is the absolute frequency weighted by the number of speakers of the respective language. This value is used to color-code the lists — red means high relative frequency, blue means low.

  • What are the dots under some letters and combinations?

    A dot means that the letter or combination exists as a stand-alone word.

Online resources

Thank you!

Disclaimer

Context of Diacritics is not scientific research. It is a result of a frequency analysis of articles from Wikipedia. The main goal of this project is to list diacritical combinations that exist in real life. Because of the methods used, all other data (absolute and relative frequency, position within words) is not and cannot be 100% accurate.

Two main operations were performed during the analysis:

• The first operation was to find and count occurrences of each diacritical letter and combination. The results of this operation should be highly accurate.

• The second operation was to detect possible positions of each combination within words. For now, the results of this are far from perfect, because for it to be exact, finding every existing word in all of its forms is necessary. Therefore, if a certain combination isn't marked as Is a word, Can start words or Can end words, it doesn't mean that it actually cannot, it just did not occur in the respective position in the analyzed text.

The data is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.

Who?

Context of Diacritics was made by Ondrej Jób, who runs his type foundry Setup Type in Slovakia, can be followed on Instagram, and checks email every day.