Tools for NLP in French and texts from Marcel Proust

About this page

This webpage (built with pkgdown) contains an overview of what you can do with {proustr}, a tool designed for doing natural language processing in French with R (but which you can also use in other languages).


The package gives you access to tools designed to do Natural Language Processing in French. You can use these tools with the books from Marcel Proust “À la recherche du temps perdu”, which are provided in this package. Of course, these tools can be expanded to almost all french texts.

All the functions from this package are consistent with the tidyverse philosophy.

Here is a list of all the books contained in this pacakage :

  • Du côté de chez Swann (1913): ducotedechezswann.
  • À l’ombre des jeunes filles en fleurs (1919): alombredesjeunesfillesenfleurs.
  • Le Côté de Guermantes (1921): lecotedeguermantes.
  • Sodome et Gomorrhe (1922) : sodomeetgomorrhe.
  • La Prisonnière (1923) :laprisonniere.
  • Albertine disparue (1925, also know as : La Fugitive) : albertinedisparue.
  • Le Temps retrouvé (1927) : letempretrouve.

Find your way into {proustr}

{proustr} is divided into two type of functions :

  • proust_*() functions return data objects (books, characters, stop words, random Proust extract…)

  • pr_*() functions perform actions on the data. pr is short for p(roust)r, pr(oust), p(rocessing f)r(ench), or anything you can think of :). This shortcode refers to functions like pr_clean_punc().

For an overview of all the available functions, please visit the Reference page.