Elias Ponvert, Director of Data Science at People Pattern presents on Unsupervised Partial Parsing
A few weeks ago, Elias Ponvert, Director of Data Science at People Pattern, was asked to do a presentation for a group of eager listeners at an Austin meet-up group for Machine Learning enthusiasts. Ponvert chose to present his thesis defense to the group, and for 70-minutes walked folks through an intricate process that ultimately resulted in his unsupervised method being at or near the state-of-the-art standard.
The meeting was held in downtown Austin at the BlackLocus headquarters. The crowd was diverse- a fine blend of finance folk and entrepreneurs, data science whiz kids, and those interested in learning a bit more about language and computers. I even saw a few kids swatting a ball around on the ping pong table.
“Even though the problem is considered difficult,” Ponvert said, “I managed to find a solution which outperformed the state-of-the-art, but actually used fairly simple machine learning and hacks.”
For those relatively new to parsing methods and models, parsing is the process of analyzing symbols- in natural or computer languages- according to the rules of grammar. Very basic forms of parsing are sometimes taught in introductory grammar lessons in grade school. You may remember drawing sentence trees to understand basic sentence structure in order to get a better grasp on what the sentence actually means. You were engaged in an elementary exercise in parsing.
Now back to unsupervised parsing. Ponvert studied how to unpack sentences and larger “chunks” of texts without assistance provided by human information or annotation.
“The trick is to break the problem down: instead of trying to predict full tree structures, start by learning to predict small phrasal structure — basic noun phrases and things like that. A well known class of machine learning techniques — hidden Markov models — excelled at predicting these small phrases unsupervised.
So, how did he solve the problem?
“I developed a simple hack to cascade these simple phrasal predictions to construct full trees,” Ponvert explained. “In experiments against human-annotated sentences, this technique performed better than the state of the art, and still is the best performing technique on this problem.”
Need a moment to process all of that? Click on the image here to take it in steps using the slideshow.
Recent Comments