The first 20 years of this century were focused on developing technologies for managing and using structured information.
Information that is usually found in relational databases, where the variables of the analysis units are differentiated; in other words, information that we normally analyze in Excel!
We are moving into the decade of unstructured data, such as videos, images, sounds and texts. An important part of this information comes from human beings writing or speaking, i.e., using language.
From the standpoint of consumer psychology, language is the most efficient way to access human being's thoughts and internal processes. It is through language that we can detect unconscious tensions and unresolved needs that we can meet with product, communication and service strategies.
From the standpoint of consumption anthropology, language is a vehicle for transmitting culture. Through language, myths and rituals are created that condition consumption habits and occasions.
From the standpoint of semiotics, language is a system of signs that acquire meaning depending on the context; they condition the way of thinking and interpreting reality. Brands can achieve greater message adoption among consumers if they focus on using semiotic paths that are already established, organically in their minds.
Content production has grown by 78% in social media, blogs, news, referrals, user comments in e-commerce and chats, among others. Behind this information there are immense learnings and hidden competitive advantages.
In the last 10 years, a discipline called Natural Language Processing, or NLP has grown stronger. It is a joint effort between linguistics, computer science and statistics to create models that allow machines to understand, process and produce language.
An NLP project requires a scientific process with the following steps:
Text analysis:
- Parsing: Taking texts and breaking them down into their syntactic and semantic units. Separating the text between verbs, adverbs, entities, etc.
- Filtering: This consists in removing those text units that do not contribute to the overall meaning. The text is quantified according to its probability of occurrence, eliminating those texts with low probability.
Context analysis:
3. Sequence: Text has meaning based on context, which is detected based on the sequence of words. In a sea of texts, detecting probable sequences is the first step in creating context.
4. Predicate logic: The use of the language rules to provide complements to verbs is key in the dynamics of understanding the
Modeling
- Unsupervised learning: Classify texts according to their similarity, stylistics and narrative.
- Supervised learning: Finding common elements between different groups of texts, differentiating between authors, themes, moments in time, etc.