À l’Ombre des Mains en Latex Violet

Post Category: archives, digital, editing

Add comment May 9th, 2011 at 01:32pmJeff Drouin

Proust

Just found out about this website called Proust that facilitates asking life-defining questions of your loved ones and having deep conversations with them. It works like a questionnaire. Not really sure how this is different from Story Corps, or why you wouldn’t want to just ask your loved ones in person.

Introduction To Proust.com from Proust on Vimeo.

Post Category: networked media, other websites, social software, video

Add comment March 29th, 2011 at 09:27pmJeff Drouin

Foray Into Topic Modeling

Topic modeling suggests new avenues for Proust studies. Applications like Mallet and PhiloMine compute the statistical relationships among tokens (as single, double, or triple word phrases) appearing within specified spans of text such as paragraphs or groups of, say, fifty words. Since the Recherche embodies more than one million words, topic modeling can be used to highlight features of the text that are not perceptible during the act of serial reading. I ran the first tome, which contains Du côté de chez Swann part I, “Combray,” and part II, “Un Amour de Swann,” through Mallet to generate token clusters for ten topics, which reveals some interesting patterns. The command line output shows the top nineteen recurring words that are statistically significant within the top ten recurring patterns (topics) in the text.

  1. chose moment pouvait jamais puis rien esprit pourtant visage savait voulait dire savoir mal trouvait première devait autres instant
  2. dit bien dire air jamais beaucoup tête toujours princesse ami docteur reste choses sais enfin regard répondit jeune entendu
  3. vie amour plaisir souvent celle ainsi gilberte pu pensée besoin donnait tant sorte milieu cause femmes étais connaître joie
  4. après temps jusqu heure pendant allait presque chambre longtemps près seul passer heures penser jour tard souvenir chercher toute
  5. combray côté déjà rue soleil semblait fleurs saint bois place eau ciel petits vers jardin matin champs dessus autour
  6. faisait toutes petite peine seule beau toute sourire donner phrase quelques trouver parfois contraire nature suite musique croire corps
  7. swann odette chez verdurin monde disait gens femme forcheville homme soir effet amis connaissait demander personne cœur cottard
  8. voir faire aller autres jours jour toujours maison venait venir désir grande contre dès autant paris rien lequel bien
  9. grand tante mère père françoise faire bien fille disait parents maman voix partie personne bonne petit mort famille laisser
  10. devant guermantes yeux nom air petit surtout or doute mieux église image fit vue dame tant aussitôt figure lesquelles

Some of the results are unsurprising, such as topic 7, which clearly derives from the many evening scenes at the Verdurins (soir, chez, maison) where Swann courted Odette among their coterie (forcheville, cottard), often becoming jealously heartbroken (cœur, désir) with wondering whether she was seeing other admirers on the sly (demander, conaissait, amis). Other topics reveal interesting patterns that fit with scenes across the entire narrative, such as number 10. It emphasizes the use and observation of the eyes (yeux, vue) in connection with the Duc and Duchesse de Guermantes, whose mysterious airs and glances are described in several Combray church passages, as well as their association with art and symbolism of France (image, figure). But what also emerges is the consistency of the preposition before (devant), emphasizing the narrator’s location not only in front of their paintings and of their glances, but also in front of a church in connection to a woman (dame), a recurrence that we can tease out by reading the database passages from the English translation.

Using a PHP script and MySQL database (graciously provided by Elijah Meeks), we can extract the tokens, word counts, and their connections from the Mallet topic model files into a graph file that generates edges and nodes, allowing us to view the ten topics as a network model in Gephi.

This entirely computer-generated model of associative networks in tome 1 of the Recherche is markedly different from the static model created by my particular reading of the church motif above, though it shares some consistencies and interesting disparities.

For instance, when we drill down and filter to look more closely at the terms that join the different topics, we see that the word for nothing (rien) is the one that most frequently connects topics 6 and 9, which respectively center on themes of beautiful bodily gestures in music and family/home relationships, while time (temps) joins topic 6 with 3, which is focused on positive terms for love of Gilberte.

According to the statistical features of the text, then, the first two parts of Du côté de chez Swann associate the expression of romantic love primarily with time, while the memory of familial love is associated primarily with absence. This perhaps comes as no shock to most readers of Proust, but if we compare this model with a search for the term nothing in the church motif database, we receive a number of passages associated predominantly with romantic love. These two fields of data, then, suggest a reading of the church motif as concerned with concepts of absence in romantic love, somewhat against the grain of the rest of the novel. There is not enough space here to deal with the problematics of translation/tutor text comparisons or the relation of computation algorithms to critical interpretation. But it is clear that domain expertise is just as necessary with digital scholarship as it is in print, as shown by the (illuminating) disparities between a human-reading and machine-reading of the text.

Post Category: data mining, narrative, taxonomy, text mining, visualization

1 comment March 17th, 2011 at 11:45amJeff Drouin

French Stop Words / Mots d’arrêt français

I’ve searched for French stop word lists for use in text mining and synthesized my findings here. It may not be definitive, but could be useful for those looking for stop lists.

J’ai effectué une recherche pour des listes de mots d’arrêt français pour utilisation dans l’exploration de texte et synthétisé mes conclusions ici. Il ne peut être définitive, mais pourrait être utile pour ceux qui recherchent des listes d’arrêt.

à
ah
ai
aie
aient
aies
ait
alors
as
au
aucuns
aurai
auraient
aurais
aurait
auras
auriez
aurions
aussi
autre
aux
avaient
avais
avait
avant
avec
avez
aviez
avions
avoir
avons
ayant
ayez
ayons
bon
car
ce
ceci
cela
celà
celles
celui
ces
cet
cette
ceux
chaque
ci
comme
comment
dans
de
des
du
dedans
dehors
depuis
deux
devrait
doit
donc
dont
dos
droite
début
elle
elles
en
encore
es
essai
est
et
eu
eue
eues
eusse
eusses
eûmes
eurent
eus
eussions
eussiez
eut
eût
eûtes
eux
fait
faites
fois
font
force
fûmes
furent
fus
fusse
fussent
fusses
fussions
fussiez
fut
fût
fûtes
haut
hors
ici
il
ils
je
jusqu’à
juste
la
laquelle
lequelle
le
les
leur
leurs
lui

ma
maintenant
mais
me
mes
mine
moi
moins
mon
mot
même
ne
ni
nommés
non
nos
notre
nous
nouveaux
on
ont
ou

par
parce
parole
pas
personnes
peut
peut-être
peu
pièce
plupart
plus
pour
pourquoi
quand
qu
que
quel
quelle
quelles
quelque
quels
qui
sa
sans
se
sera
serai
seraient
serais
serait
seras
serez
seriez
serions
serons
seront
ses
seulement
si
sien
soi
soient
sois
soit
somme
sommes
son
sont
sous
soyez
soyons
suis
sujet
sur
ta
tandis
te
tellement
tels
tes
toi
ton
tous
tout
trop
très
tu
un
une
valeur
voie
voient
vont
vos
votre
vous
vu
ça
étaient
était
étant
état
étions
été
étée
étées
étés
êtes
être

mme
mlle

a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
1
2
3
4
5
6
7
8
9
10
~
`
!
@
#
$
%
^
&
*
(
)
_
-
=
+
{
}
[
]


;
:
,
.
/
<
>
?
«
»

Post Category: data mining, digital humanities, text mining

Add comment March 9th, 2011 at 03:40pmJeff Drouin

Gorgeous Network Visualizations with Gephi

I just started playing around with Gephi, which produces really amazing interactive graphs for intuitive data analysis. Thought I would post one here. Not sure how to read or use it yet, but it shows the entire church motif in the Recherche.

Post Category: visualization

Add comment December 3rd, 2010 at 12:11amJeff Drouin

A Little Close Reading with Network Analysis Software

I thought I would do some closer “distant-reading” of the Recherche. When using ORA to look at a metanetwork, the visualization can be manipulated in real time to highlight the links among nodes and their related concepts or passages.  What this means for the study of Proust is that we can think of the novel (and the novel genre) as a network of nodes consisting of concepts, characters, narrative elements, and any other unit of meaning that might enhance exploration of its text.

For instance, isolating the network constellated around the note “Finding common element of formative impressions” shows that the narrator’s activity of reflecting upon his formative impressions is primarily connected to the associations of Memory, Music, and Literature.

The Memory association in turn connects with “Memory at Grandmother’s deathbed”; Music connects with “End of Mass at Combray church, return home” and “Vinteul’s sonata played at Swann’s”; and Literature connects with “Charlus berating Marcel,” “Epiphany at Guermantes’ Party,” and “Describing morning routine back in Paris, after second Balbec visit.” When the nodes above are moused over to show the passages they represent, we see that most of them, from various sequences involving musical performance or literary discussion from all over the novel, refer to the twin steeples at Martinville that formed the subject of the narrator’s first piece of writing as a youth (I.253-257). Thus, the steeples form an orientation point for that part of the narrator’s writerly vocation that pertains to analysis of impressions, filtered primarily through memories of music and literature. These connections are not apparent by searching the database at the Ecclesiastical Proust Archive because it does not provide the simultaneous view of layered networks afforded by ORA. It seems on the surface, then, that this particular network within the Recherche forms a theory of impressionism based on the structural commonalities between music and literature.

The network for the Time association similarly connects various types of recollection to provide insight into the narrator’s artistic development.

We find Time at the center, ringed by “Contemplation sparked by conversation with M. de Cambremer, at Guermantes party,” “Imagining Florence and Venice (before visit),” “Contemplating experience of Vinteuil’s sonata while jealous of Mlle Vinteuil and Albertine,” “Contemplating women and past,” “Observations at Guermantes party,” and “First visit to Balbec.” The last in turn connects with Narthex and Carqueville. In other words, the primary function of Time as a backwards-looking concept is associated with jealousy over women, while the forward-looking passages imagine the reddish domes of Florence and the frescos of Venice. This suggests a deepening of the structure that became apparent in the database searches, where the thought of meeting a future lover in early passages, though not explicitly concerned with the nature of time, took place on the porch of a gothic cathedral. These nodes presented by ORA show that the church passages that consciously deal with the nature of time happen after the narrator as had experience being in love with women. And correspondingly, the architectural element of this ring is the narthex, which is the entrance area just indoors or on the threshold to the porch. The narthex was not considered part of the church proper, but was placed close enough so that those not worthy of entry, such as the unbaptized or unconfessed, could still receive instruction from services. Hence, the experience of love has brought the narrator past the porch but, because he is lost through jealousy, he still remains an outsider.

The Truth network presents a very clear view of the novel’s main thematic chains and character developments.

With Truth at the center of the middle network, the first ring comprises “Riding in Dr. Percepied’s carriage” (the moment at which he observed in motion the twin steeples of Martinville), “Reflections on getting the truth about Albertine from Andrée” (in which he had final confirmation of his lack of knowledge about Albertine’s lesbianism, the root of his obsessive jealousy), “Reading Bergotte” (the writer who most influenced his literary sensibility, and who figures so prominently in his appreciation for churches), and “Reflection on Charlus’ perversion” (the unmasking of homosexuality as a major recurring element of the novel’s concern with epistemology). What we also see in the picture above are two micro networks that are not directly connected, yet were placed close to the Truth network because of their conceptual affinity. If we take all three networks into consideration, the second ring around the Truth association comprises Motion, Laws, the Archaic, Beauty, and Knowledge, which further connect with three passages about household habits and the Great War. Taken together, Truth in Proust’s novel can ultimately be understood as a rather stable essence based on the epistemic laws of motion and observation, as well as the aesthetic laws of beauty as evident in old objects. These, too, are a function of Time. While this visualization might not provide much insight that is new in Proust studies, the interface at least allows the reader instantly to access the passages that contribute to a given part of the network.

Post Category: archives, churches / cathedrals, digital, digital humanities, narrative, networked media, novel, proust, textual criticism, visualization

Add comment November 30th, 2010 at 01:12pmJeff Drouin

Tome I Word Cloud

Generated by Wordle. Wordle recognized the French and stripped out the common words, but many of them, like comme, quand, et, si, etc., still crept in. Interesting, though, that in this visualization of absolute word frequency, the words Swann and Odette are weighted as heavy or heavier than many prepositions and conjunctions. Given that this tome covers the “Combray” and “Swann in Love” sections, it accounts for the narrator’s obsession with the pair in his early childhood, and likewise Swann’s obsession with Odette in the years before the narrator’s birth. I would have expected words like église or fenêtre or mère to weigh heavier. Interesting too that the other meaningful words that make it into this cloud are Verdurin (emphasizing the salon and the coterie culture), yeux (where the narrator reads the souls of others), Françoise (who is mentioned — and valued? — more than his mother), and tante (Léonie, the relative in residence at Combray).

Post Category: visualization

Add comment November 17th, 2010 at 12:47amJeff Drouin

Upcoming Lecture — Digital Methods for Literary Criticism: Proust, Illustration, and the Archive

I’m giving a lecture on some of my recent digital research on Proust. The talk will cover methods in text annotation and visualization, with a view toward their theoretical implications for literary criticism. Along the way it will describe some of my experiments with text mining and social network analysis for generating and representing associative paths.

  • Wednesday November 17, 5:00-7:00 pm
  • Lucy Ellis Lounge, 1st floor Foreign Language Building
  • University of Illinois, Urbana-Champaign

Proust Flyer

Post Category: events

1 comment November 6th, 2010 at 11:40pmJeff Drouin

Proust LOLCat

Just made a LOLCat to explain Proust and churches.

Post Category: churches / cathedrals

Add comment October 25th, 2010 at 11:15pmJeff Drouin

Social Network Analysis of the Recherche (and The Novel)

VeniceWe can run Proust through social network analysis (SNA) software in order to generate network models among its information nodes. The visualization to the left shows the association (basically, an uncategorized tag I use to label church-related passages) of Venice as it is networked among passages (referenced by their ID numbers and pagination codes) and notes on narrative context. When manipulated in real time, the visualization highlights the links to other nodes and their related concepts or passages.  What this means for the study of Proust is that we can think of the novel as a network of nodes consisting of concepts, characters, narrative elements, and any other unit of meaning that might enhance exploration of its text.

Narrative ContextsWe can even include various texts and external information for a genetic or contextual study of the novel. In a hypothetical archive containing digitized avant-texte and published variants of the Recherche, we could potentially see — in strikingly visual terms — the correspondence of, say, the impact of WWI on the development of different sections. This could provide new insights into Proust’s writing process as this work continually ballooned and changed during and after the war. What kinds of associations got the most development during and immediately after the war, and in what points of the narrative did they occur? Which churches received the most attention and where are they located in both fictional and real space? There is a wealth of traditional scholarship addressing genetic and contextual issues like these. However, SNA presents an opportunity to view all of the information nodes simultaneously, a much more powerful (and accurate) tool for the study of a book than other print books are. In that way, we would see and move around in the Recherche as a writerly text that responds to its own inner needs in reference to the war.

Tools like SNA also present new pastures for narratology. If the text is marked up appropriately, all instances of a particular narrative device or structure could be instantly recalled by a researcher and viewed in relation to any parameters desired. It would be an even more comprehensive supplement than Barthes and Genette.

All of this goes to say that a more rigorous taxonomy of the Recherche would be necessary for a meaningful SNA application. At present, the interpretive apparatus of the Ecclesiastical Proust Archive consists solely of associations, which are uncategorized tags denoting concepts, themes, important details, architectural elements of the churches described, and so on. It would be far more meaningful to tag separately the characters, churches, architectural elements, themes, concepts, and plot elements that make up the rich density of this novel, as well as the images and other media added here to illustrate it, so that they become individual nodes within the information network. Then a far more rigorous and powerful visualization of the novel would be possible, and new discoveries will almost certainly be made.

But so far, these notions pertain only to my particular study of the church motif. A far richer application would be made if we took an entire electronic text (or, better, all of the variants and translations) and allowed researchers to mark them up and add media by way of illustration. In that way, the Proust archive would become a collaborative, electronic research and editing environment that takes shape from individuals’ own scholarly pursuits

AssociationsSocial network analysis (SNA) software combines a variety of methods commonly used in digital humanities research, such as text mining, visualization, and modeling. SNA software can pour over the data and metadata in the archive’s XML files and generate a network of nodes. It could be trained to recognize and normalize names, or even pseudonyms, and the metadata, provided by readers, would tell it whether a given passage contained the idea of Venice, or the subject/object distinction, or jealousy (or all three). If it had a qualitative analysis component it could even recognize concepts. And of course there would need to be the capability for scholars to add and tag information about the documents in the archive. This is a daunting task, but eminently possible with the aid of text and data mining software.

There are pitfalls, of course. The accuracy of any analytic tool depends on the quality of the data it operates upon. We must always be aware that tools like these, powerful and impressive though they are, always represent a state of the information realm. This is no different from traditional, print-based scholarship, but it bears consideration given the sometimes exaggerated hype of digital humanities at the time of this writing.

So, now that AccessTEI has provided us an XML file with structural TEI markup, I’ll be looking for ways to mine it with text analysis and SNA software. Stay tuned for more updates.

As an aside, SNA has tremendous possibilities for the study of modernist magazine culture, which is an actual, publishing network. See my post from earlier today at the Magazine Modernisms blog.

Post Category: archives, books, churches / cathedrals, collaboration, community, developing the archive, digital, digital humanities, disciplinarity, editing, literary scholarship, multimedia, narrative, networked media, novel, reading, social software, taxonomy, textual criticism, textual theory, visualization

Add comment October 21st, 2010 at 04:22pmJeff Drouin

Previous Posts


recent posts:

post history:

  • May 2011
  • March 2011
  • December 2010
  • November 2010
  • October 2010
  • August 2010
  • April 2009
  • November 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • Recent Comments:

  • Recent Trackbacks:

  • categories:

    feeds: