Summary: A new deep learning algorithm is able to identify the gender of a writer based on written text with 80% accuracy.
Source: National Research Nuclear University.
A team of researchers from the National Research Nuclear University MEPhI, the National Research Center Kurchatov Institute and the Voronezh State University has developed a new learning algorithm that allows a neural network to identify a writer’s gender by the written text on a computer with up to 80 percent accuracy.
This is a new development in the field of computational linguistics. The research was funded by a Russian Science Foundation grant. The findings were published in the Procedia Computer Science journal.
Many scientific studies show that writing style can reflect certain characteristics of a writer – gender, physiological personality traits, and level of education. Speech patterns are a valuable psycho-diagnostic tool, and are often used by human resources professionals and security services.
By analyzing a person’s speech, researchers can diagnose certain illnesses such as dementia and depression, and the person’s inclination toward suicidal behavior. The demand for identifying certain characteristics of a writer’s personality is increasing against the backdrop of the development of internet communications—companies want to know which demographics like their products and services.
Using the numerical values for various parameters in a text, researchers in this area (linguists, psychologists, IT experts) have created mathematical models to identify certain traits in the writer’s personality. Using neural networks, the researchers analyzed the effectiveness of various machine-learning algorithms for text analysis.
During the research, the scientists compared the accuracy of gender identification by text based on two types of data-driven modeling: first, machine-learning algorithms (such as a support vector machine and gradient boosting), and, second, a deep learning neural network (such as convolutional neural networks and the long short-term memory recurrent neural networks).
“Using these advanced neural network models, we have achieved great results in identifying the gender of the writer based on text, under conditions in which the author is not attempting to hide his/her gender,” said Alexander Sboyev, assistant professor at MEPhI. “Our next step is to teach the neural network to identify the gender of a writer who is deliberately trying to hide it.”
Thus, in the following texts, originally published on dating websites, the neural network easily identified the writer’s gender 10 out of 10 times, despite the fact that authors were free to sign their texts with a name typical of the opposite gender.
This text was written by a female: “I am a handsome, fit 30-year-old man. I have a high-paying job at a large oil and gas company. I live in my own flat in Moscow, and also own a small but nice house in an Italian village. I am into sports, mainly football. I love going out on weekends, I can’t stand homebodies. My perfect girl would be modest and beautiful, and would have an attractive body, based on today’s standards. She would share my interests and would not be jealous or try to make me jealous. In the future, I do not plan to be the sole provider in a family, as I believe that when it comes to families, both men and women must earn the money. I would like to have separate budgets as well. I will not tolerate cheating.”
This text was written by a male: “Hello! I am very angry, very! Why do you keep treating us like this?! We are people, too, all of us are equal! Are you sexist? I will not tolerate this anymore! I’m going to smash your car into pieces; I will spray paint all over it. You just wait, you monster. It sucks to be you.”
This research indicated that the approach based on using convolutional neural networks and methods of deep learning to identify a writer’s gender, is the most optimal. The team of researchers is currently working on identifying a writer’s age.
About this neuroscience research article
Source:National Research Nuclear University Publisher: Organized by NeuroscienceNews.com. Image Source: NeuroscienceNews.com image is in the public domain. Original Research:Abstract for “Deep Learning neural nets versus traditional machine learning in gender identification of authors of RusProfiling texts” by Alexander Sboev, Ivan Moloshnikov, Dmitry Gudovskikh, Anton Selivanov, Roman Rybka, and Tatiana Litvinova in Procedia Computer Science. Published February 3 2018. doi:10.1016/j.procs.2018.01.065
Cite This NeuroscienceNews.com Article
[cbtabs][cbtab title=”MLA”]National Research Nuclear University “Researchers Teach Neural Network to Identify a Writer’s Gender.” NeuroscienceNews. NeuroscienceNews, 27 April 2018. <https://neurosciencenews.com/ann-gender-id-8904/>.[/cbtab][cbtab title=”APA”]National Research Nuclear University (2018, April 27). Researchers Teach Neural Network to Identify a Writer’s Gender. NeuroscienceNews. Retrieved April 27, 2018 from https://neurosciencenews.com/ann-gender-id-8904/[/cbtab][cbtab title=”Chicago”]National Research Nuclear University “Researchers Teach Neural Network to Identify a Writer’s Gender.” https://neurosciencenews.com/ann-gender-id-8904/ (accessed April 27, 2018).[/cbtab][/cbtabs]
Deep Learning neural nets versus traditional machine learning in gender identification of authors of RusProfiling texts
In this paper we compare accuracies of solving the task of gender identification of RusPro-filing texts without gender deception on base of two types of data-driven modeling approaches: on the one hand, well-known conventional machine learning algorithms, such as Support Vector machine, Gradient Boosting; and, on the other hand, the set of Deep Learning neuronets, such as neuronet topologies with convolution, fully-connected, and Long Short-Term Memory layers, etc. The dependence of effectiveness of these models on the feature selection and on their representation is investigated. The obtained F1-score of 88% establishes the state of the art in the gender identification task with the RusProfiling corpus.