uni/prosem/prosempaper.tex

315 lines
20 KiB
TeX
Executable File

\documentclass[12pt,twoside]{scrartcl}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Meta informations:
\newcommand{\trauthor}{Jim Martens}
\newcommand{\trtype}{Proseminar Paper} %{Seminararbeit} %{Proseminararbeit}
\newcommand{\trcourse}{Proseminar Artificial Intelligence}
\newcommand{\trtitle}{Methods for understanding natural language}
\newcommand{\trmatrikelnummer}{6420323}
\newcommand{\tremail}{2martens@informatik.uni-hamburg.de}
\newcommand{\trarbeitsbereich}{Knowledge Technology, WTM}
\newcommand{\trdate}{10.02.2014}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Languages:
% Falls die Ausarbeitung in Deutsch erfolgt:
% \usepackage[german]{babel}
% \usepackage[T1]{fontenc}
% \usepackage[latin1]{inputenc}
% \usepackage[latin9]{inputenc}
% \selectlanguage{german}
% If the thesis is written in English:
\usepackage[english]{babel}
\selectlanguage{english}
\addto{\captionsenglish}{\renewcommand{\refname}{Bibliography}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Bind packages:
\usepackage{acronym} % Acronyms
\usepackage{algorithmic} % Algorithms and Pseudocode
\usepackage{algorithm} % Algorithms and Pseudocode
\usepackage{amsfonts} % AMS Math Packet (Fonts)
\usepackage{amsmath} % AMS Math Packet
\usepackage{amssymb} % Additional mathematical symbols
\usepackage{amsthm}
\usepackage{booktabs} % Nicer tables
%\usepackage[font=small,labelfont=bf]{caption} % Numbered captions for figures
\usepackage{color} % Enables defining of colors via \definecolor
\definecolor{uhhRed}{RGB}{254,0,0} % Official Uni Hamburg Red
\definecolor{uhhGrey}{RGB}{122,122,120} % Official Uni Hamburg Grey
\usepackage{fancybox} % Gleichungen einrahmen
\usepackage{fancyhdr} % Packet for nicer headers
%\usepackage{fancyheadings} % Nicer numbering of headlines
%\usepackage[outer=3.35cm]{geometry} % Type area (size, margins...) !!!Release version
%\usepackage[outer=2.5cm]{geometry} % Type area (size, margins...) !!!Print version
%\usepackage{geometry} % Type area (size, margins...) !!!Proofread version
\usepackage[outer=3.15cm]{geometry} % Type area (size, margins...) !!!Draft version
\geometry{a4paper,body={5.8in,9in}}
\usepackage{graphicx} % Inclusion of graphics
%\usepackage{latexsym} % Special symbols
\usepackage{longtable} % Allow tables over several parges
\usepackage{listings} % Nicer source code listings
\usepackage{multicol} % Content of a table over several columns
\usepackage{multirow} % Content of a table over several rows
\usepackage{rotating} % Alows to rotate text and objects
\usepackage[hang]{subfigure} % Allows to use multiple (partial) figures in a fig
%\usepackage[font=footnotesize,labelfont=rm]{subfig} % Pictures in a floating environment
\usepackage{tabularx} % Tables with fixed width but variable rows
\usepackage{url,xspace,boxedminipage} % Accurate display of URLs
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Configurationen:
\hyphenation{whe-ther} % Manually use: "\-" in a word: Staats\-ver\-trag
\hyphenation{spe-ci-fies}
\hyphenation{spe-ci-fi-ca-tion}
%\lstloadlanguages{C} % Set the default language for listings
\DeclareGraphicsExtensions{.pdf,.svg,.jpg,.png,.eps} % first try pdf, then eps, png and jpg
\graphicspath{{./src/}} % Path to a folder where all pictures are located
\pagestyle{fancy} % Use nicer header and footer
% Redefine the environments for floating objects:
\setcounter{topnumber}{3}
\setcounter{bottomnumber}{2}
\setcounter{totalnumber}{4}
\renewcommand{\topfraction}{0.9} %Standard: 0.7
\renewcommand{\bottomfraction}{0.5} %Standard: 0.3
\renewcommand{\textfraction}{0.1} %Standard: 0.2
\renewcommand{\floatpagefraction}{0.8} %Standard: 0.5
% Tables with a nicer padding:
\renewcommand{\arraystretch}{1.2}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Additional 'theorem' and 'definition' blocks:
\theoremstyle{plain}
\newtheorem{theorem}{Theorem}[section]
%\newtheorem{theorem}{Satz}[section] % Wenn in Deutsch geschrieben wird.
\newtheorem{axiom}{Axiom}[section]
%\newtheorem{axiom}{Fakt}[chapter] % Wenn in Deutsch geschrieben wird.
%Usage:%\begin{axiom}[optional description]%Main part%\end{fakt}
\theoremstyle{definition}
\newtheorem{definition}{Definition}[section]
%Additional types of axioms:
\newtheorem{lemma}[axiom]{Lemma}
\newtheorem{observation}[axiom]{Observation}
%Additional types of definitions:
\theoremstyle{remark}
%\newtheorem{remark}[definition]{Bemerkung} % Wenn in Deutsch geschrieben wird.
\newtheorem{remark}[definition]{Remark}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Provides TODOs within the margin:
\newcommand{\TODO}[1]{\marginpar{\emph{\small{{\bf TODO: } #1}}}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Abbreviations and mathematical symbols
\newcommand{\modd}{\text{ mod }}
\newcommand{\RS}{\mathbb{R}}
\newcommand{\NS}{\mathbb{N}}
\newcommand{\ZS}{\mathbb{Z}}
\newcommand{\dnormal}{\mathit{N}}
\newcommand{\duniform}{\mathit{U}}
\newcommand{\erdos}{Erd\H{o}s}
\newcommand{\renyi}{-R\'{e}nyi}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Document:
\begin{document}
\renewcommand{\headheight}{14.5pt}
\fancyhead{}
\fancyhead[LE]{ \slshape \trauthor}
\fancyhead[LO]{}
\fancyhead[RE]{}
\fancyhead[RO]{ \slshape \trtitle}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Cover Header:
\begin{titlepage}
\begin{flushleft}
Universit\"at Hamburg\\
Department Informatik\\
\trarbeitsbereich\\
\end{flushleft}
\vspace{3.5cm}
\begin{center}
\huge \trtitle\\
\end{center}
\vspace{3.5cm}
\begin{center}
\normalsize\trtype\\
[0.2cm]
\Large\trcourse\\
[1.5cm]
\Large \trauthor\\
[0.2cm]
\normalsize Matr.Nr. \trmatrikelnummer\\
[0.2cm]
\normalsize\tremail\\
[1.5cm]
\Large \trdate
\end{center}
\vfill
\end{titlepage}
%backsite of cover sheet is empty!
\thispagestyle{empty}
\hspace{1cm}
\newpage
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Abstract:
% Abstract gives a brief summary of the main points of a paper:
\section*{Abstract}
Your text here...
% Lists:
\setcounter{tocdepth}{2} % depth of the table of contents (for Seminars 2 is recommented)
\tableofcontents
\pagenumbering{arabic}
\clearpage
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Content:
% the actual content, usually separated over a number of sections
% each section is assigned a label, in order to be able to put a
% crossreference to it
\section{Introduction}
\label{sec:introduction}
\begin{itemize}
\item two kinds of natural language: spoken language and written language
\item will concentrate on written language
\item definition of syntax, semantics and pragmatics
\item two important methods: syntactic parsing and semantic analysis
\end{itemize}
\section{Evaluation of methods}
\label{sec:evalMethods}
\subsection{Syntactic Parsing}
\label{subSec:syntacticParsing}
Syntactic Parsing is used to create parse trees. These can be used for grammar checks in a text editor: ``A sentence that cannot be parsed may have grammatical errors''\footnote{\cite{Jurafsky2009b}, page 461}. But they more likely ``serve as an important intermediate stage of representation for semantic analysis''\footnote{\cite{Jurafsky2009b}, page 461}. There are different algorithms available to create such trees. The CYK\footnote{named after inventors John Cocke, Daniel Younger and Tadeo Kasami} algorithm will be explained further. But before the CYK algorithm is explained, the reason for its existance is presented.
There are two classical ways of parsing a sentence. The one is bottom-up and the other one is top-down. Both approaches have their own advantages and disadvantages. In addition the ambiguity creates problems. To implement bottom-up and top-down search algorithms in the face of ambiguity, ``an agenda-based backtracking strategy''\footnote{\cite{Jurafsky2009b}, page 468} is used. The problem here is that every time the parser recognizes that the current parse tree is wrong, it has to backtrack and explore other parts of the sentence. This creates a huge amount of work duplication and is therefore inefficient.
A solution to these problems is offered by ``dynamic programming parsing methods''\footnote{\cite{Jurafsky2009b}, page 469}. The CYK algorithm is one of multiple algorithms based on dynamic programming.
The CYK does only work with grammars in the Chomsky Normal Form (CNF). Every context-free grammar can be converted to CNF without loss in expressiveness. Therefore this restriction does no harm but simplifies the parsing. For information on how context-free grammars can be converted to CNF, refer to Jurafsky\cite{Jurafsky2009b}.
CYK requires $\mathcal{O}(n^{2}m)$ space for the $P$ table (a table with probabilities), where ``$m$ is the number of nonterminal symbols in the grammar''\footnote{\cite{Russel2010}, page 893}, and uses $\mathcal{O}(n^{3}m)$ time. ``$m$ is constant for a particular grammar, [so it] is commonly described as $\mathcal{O}(n^{3})$''\footnote{\cite{Russel2010}, page 893}. There is no algorithm that is better than CYK for general context-free grammars\cite{Russel2010}.
But how does CYK work? CYK doesn't examine all parse trees. It just examines the most probable one and computes the probability of that tree. All the other parse trees are present in the $P$ table and could be enumerated with a little work (in exponential time). But the strength and beauty of CYK is, that they don't have to be enumerated. CYK defines ``the complete state space defined by the `apply grammar rule' operator''\footnote{\cite{Russel2010}, page 894}. You can search just a part of this space with $A^{*}$ search.\cite{Russel2010} ``With the $A^{*}$ algorithm [...] the first parse found will be the most probable''\footnote{\cite{Russel2010}, page 895}.
But these probabilities need to be learned from somewhere. This somewhere is usually a ``treebank''\footnote{\cite{Russel2010}, page 895}, which contains a corpus of correctly parsed sentences. The best known is the Penn Treebank\cite{Russel2010}, which ``consists of 3 million words which have been annotated with part of speech and parse-tree structure, using human labor assisted by some automated tools''\footnote{\cite{Russel2010}, page 895}. The probabilities are then computed by counting and smoothing in the given data.\cite{Russel2010} There are other ways to learn the probabilities that are more difficult. For more information refer to Russel\cite{Russel2010}.
\subsection{Semantic Analysis}
\label{subSec:semanticAnalysis}
Semantic analysis provides multiple approaches. In this paper the approach of ``syntax-driven semantic analysis''\footnote{\cite{Jurafsky2009}, page 617} is explained further. In this approach the output of a parser, the syntactic analysis, ``is passed as input to a semantic analyzer to produce a meaning representation''\footnote{\cite{Jurafsky2009}, page 618}.
Therefore context-free grammar rules are augmented with ``semantic attachments''\footnote{\cite{Jurafsky2009}, page 618}. Every word and syntactic structure in a sentence gets such a semantic attachment. The tree with syntactic components is now traversed in a bottom-up manner. On the way the semantic attachments are combined to finally produce ``First-Order Logic''\footnote{\cite{Jurafsky2009a}, page 589} that can be interpreted in a meaningful way. This procedure has some prerequisites that will be explained first.
The mentioned \textit{First-Order Logic} can be represented by a context-free grammar specification. It is beyond this paper to describe this specification completely. Jurafsky\cite{Jurafsky2009a} provides a detailed picture of the specification with all elements in figure 17.3. The most important aspects of this specification are explained here. The logic provides terms which can be functions, constants and variables. Functions have a term as argument. Syntactically they are the same as single-argument predicates. But functions represent one unique object.
Predicates can have multiple terms as arguments. In addition the logic provides quantifiers ($\forall, \exists$) and connectives ($\wedge, \vee, \Rightarrow$).
Another prerequisite is the ``lambda notation''\footnote{\cite{Jurafsky2009a}, page 593}. A simple example of this notation is an expression of the following form\footnote{examples taken from Jurafsky\cite{Jurafsky2009a}, pp. 593-594}:
\[
\lambda x.P(x)
\]
The $\lambda$ can be reduced in a so called ``$\lambda$-reduction''\footnote{\cite{Jurafsky2009a}, page 593}. The expression above could be reduced in the following way:
\begin{alignat*}{2}
\lambda x.&P(x)&(A) \\
&P(A)&
\end{alignat*}
Those expressions can be extended to $n$ such $\lambda$s. An example is this expression:
\[
\lambda x.\lambda y.Near(x,y)
\]
This expression can be reduced in multiple steps.
\begin{alignat*}{1}
\lambda x.\lambda y.&Near(x,y)(Bacaro) \\
\lambda y.&Near(Bacaro, y)(Centro) \\
&Near(Bacaro, Centro)
\end{alignat*}
This technique is called ``currying''\footnote{\cite{Jurafsky2009a}, page 594} and is used to convert ``a predicate with multiple arguments into a sequence of single-argument predicates''\footnote{\cite{Jurafsky2009a}, page 594}.
After the prerequisites are now explained, it is time to start with the actual syntax-driven semantic analysis. It will be shown with an example provided by Jurafsky. Assume the sentence \textit{Every restaurant closed}. ``The target representation for this example should be the following''\footnote{\cite{Jurafsky2009}, page 621}.
\begin{equation}
\label{eq:tarRep}
\forall x \,Restaurant(x) \Rightarrow \exists e \,Closed(e) \wedge ClosedThing(e,x)
\end{equation}
The first step is to determine what the meaning representation of \textit{Every restaurant} should be. \textit{Every} is responsible for the $\forall$ quantifier and \textit{restaurant} specifies the category over which is quantified. This is called the ``restriction''\footnote{\cite{Jurafsky2009}, page 622} of the noun phrase. The meaning representation could be $\forall x\,Restaurant(x)$. It is a valid logical formula but it doesn't make much sense. ``It says that everything is a restaurant.''\footnote{\cite{Jurafsky2009}, page 622} ``Noun phrases like [this] are [usually] embedded in expressions that [say] something about the universally quantified variable. That is, we're probably trying to \textit{say something} about all restaurants. This notion is traditionally referred to as the \textit{NP}'s nuclear scope''\footnote{\cite{Jurafsky2009}, page 622}. In the given example, the nuclear scope is \textit{closed}. To represent this notion in the target representation, a dummy predicate $Q$ is added, which results in this expression:
\[
\forall x\,Restaurant(x) \Rightarrow Q(x)
\]
To replace $Q$ with something meaningful, the $\lambda$ notation is needed.
\[
\lambda Q.\forall x\,Restaurant(x) \Rightarrow Q(x)
\]
After more generalization, this is the result:
\[
\lambda P.\lambda Q.\forall x\,P(x) \Rightarrow Q(x)
\]
What happened? The descriptor \textit{every} gets this last expression as semantic attachment. The noun \textit{restaurant} gets $\lambda x.Restaurant(x)$. When combined, the second expression is the result. The verb is still missing. Therefore the verb \textit{closed} gets the following expression.
\[
\lambda x.\exists e\,Closed(e) \wedge ClosedThing(e,x)
\]
After combining the formulas of the verb and the noun phrase, the previously shown target representation\eqref{eq:tarRep} is the result.
This example is just one of many, but it shows how semantic meaning can be attached to syntactic components. Furthermore it should be clear now, how semantic analysis in a syntax-driven approach works.
\section{Critical discussion}
\label{sec:critDiscussion}
Now that both methods have been presented with one selected approach each, it is time to discuss them critically. The CYK algorithm solves many problems like ambiguity; at least to a certain degree. But it also is problematic, because of the restriction to CNF. While in theory every context-free grammar can be converted to CNF, in practice it poses ``some non-trivial problems''\footnote{\cite{Jurafsky2009b}, page 475}. One of this problems can be explored in conjunction with the second presented method (semantic analysis). ``[T]he conversion to CNF will complicate any syntax-driven approach to semantic analysis''\footnote{\cite{Jurafsky2009b}, page 475}. A solution to this problem is some kind of post-processing in which the trees are converted back to the original grammar.\cite{Jurafsky2009b} Another option is to use a more complex dynamic programming algorithm that accepts any kind of context-free grammar. Such an algorithm is the ``Earley Algorithm''\footnote{\cite{Jurafsky2009b}, page 477}.
The syntax-driven semantic analysis, as it has been presented, is a powerful method that is easy to understand. But it has one essential problem. It relies upon an existing set of grammar rules with semantic attachments to them. In a real world example such a table would contain thousands of grammar rules. While it is relatively easy to compute the final meaning representation with such a given table, it is very hard to create the table in the first place. The difficulty to create this table is split into two main issues. The first one being that you must find a grammar specification that fits all your use cases. This problem applies for the syntactic parsing as well. The second issue is that one has to find out the semantic attachments to the grammar rules.
This initial workload to create a state, in which the semantic analysis works, is a unique effort. In a restricted environment with a limited set of words and topics, this workload is of low importance. Even if it takes one month to create such a table by hand or by computing it, the subsequent analysis of input based on this table is rather quick and the initial workload is therefore acceptable. But this is only true for restricted environments. If someone tried to use syntax-driven semantic analysis for the complete language of modern English, the creation of such a table would outweigh any possible usage.
Comparing the complexity of the two methods it shows a mirror-like image. For the parsing the creation of the grammar is comparatively easy. The presented CYK algorithm works with context-free grammars which are a very restricted set compared to natural languages. But even within these context-free grammars there are ambiguities inside the texts themselves. The creation of the parse trees is therefore more of a problem.
Syntax-driven semantic analysis on the other hand requires a decent amount of work to add semantic attachments to grammar rules. But once this has been done, it works very fast.
Both methods require a unique work for one specific usage. This unique workload is the grammar creation for the parsing and the extension of the grammar with semantic attachments for the semantic analysis. The less restricted the usage environment, the more complex the initial workload becomes. The same is true for the recurring workload inside one specific usage.
Judging by the state-of-the-art of computer technology, parsing does still pose a significant challenge once the restricted field of programming languages is left. The semantic analysis as the second method in the chain has therefore even more problems to date. As the presented syntax-driven approach does only work with parse trees, a semantic analysis can only be undertaken once the syntactic parsing succeeds. The dream of a sentient AI capable of communicating with humans in natural language is therefore far from being reached, judging by the previously described problems alone (leaving out all other problems related with this dream).
\section{Conclusion}
\label{sec:concl}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% hier werden - zum Ende des Textes - die bibliographischen Referenzen
% eingebunden
%
% Insbesondere stehen die eigentlichen Informationen in der Datei
% ``bib.bib''
%
\clearpage
\bibliography{prosem-ki}
\bibliographystyle{plain}
\addcontentsline{toc}{section}{Bibliography}% Add to the TOC
\end{document}