\pdfoutput=1 \documentclass[11pt]{article} \usepackage{times} \usepackage{latexsym} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{microtype} \usepackage{inconsolata} \usepackage{bussproofs} \usepackage{amsmath} \usepackage{amssymb, mathrsfs} \usepackage{tikz} \usepackage{pgfplots} \usepackage{subcaption} \usepackage{tikz-dependency} \usepackage{hyperref} \pgfplotsset{compat=1.17} \usetikzlibrary{positioning} \newcommand{\singleprop}{s_{p}} \newcommand{\singlepred}{s_{q}} \newcommand{\grouppred}{g_{q}} \newcommand{\groupprop}{g_{p}} \newcommand{\inference}{\ell_{gsr}} \newcommand{\singlepropi}[1]{s_{p,#1}} \newcommand{\implicationpred}{(g_p, s_p, (r_g, r_p))} \newcommand{\backlinks}{B_\Psi} \newcommand{\forwardlinks}{\textsc{forward}_\Phi} \newcommand{\propgraph}{\Phi} \newcommand{\propgraphs}{\Phi(\singleprop)} \newcommand{\fnname}{\mathscr{F}} \newcommand{\argset}{\mathcal{A}} \newcommand{\argmap}{\left\{(r, a)\right\}} \newcommand{\andsign}{\textbf{\em and}} \newcommand{\orsign}{\textsc{Or}} \newcommand{\constant}[1]{{\bf c}_{#1}} \newcommand{\variable}[1]{{\bf x}_{#1}} \newcommand{\type}[1]{\tau_{#1}} \newcommand{\xvariable}{{\bf x}} \newcommand{\rvariable}{{\bf r}} \newcommand{\zvariable}{{\bf z}} \newcommand{\cvariable}{{\bf c}} \newcommand{\avariable}{{\bf a}} \newcommand{\yvariable}{{\bf y}} \newcommand{\svariable}{{\bf s}} \newcommand{\pconstant}{{\bf p}} \newcommand{\pvariable}{{\bf p}} \newcommand{\tvariable}{{\bf t}} \newcommand{\nvariable}{{\bf n}} \newcommand{\pvariableset}{\left\{\pvariable\right\}} \newcommand{\qvariable}{{\bf q}} \newcommand{\gvariable}{{\bf g}} \newcommand{\dset}{{\bf d}} \newcommand{\aset}{{\bf A}} \newcommand{\cset}{{\bf C}} \newcommand{\hvariable}{{\bf h}} \newcommand{\wvariable}{{\bf w}} \newcommand{\mvariable}{{\bf m}} \newcommand{\condsep}{\ |\ } \newcommand{\varmask}{\textsc{mask}} \newcommand{\roleset}{\left\{r_s\right\}} \newcommand{\rolemap}{\left\{r_{\qvariable_a}, r_{\qvariable_c}\right\}} \newcommand{\xjack}{\xvariable_{jack}} \newcommand{\xjacka}{\xvariable_{jack1}} \newcommand{\xjackb}{\xvariable_{jack2}} \newcommand{\xjill}{\xvariable_{jill}} \newcommand{\cjack}{\cvariable_{jack}} \newcommand{\cjill}{\cvariable_{jill}} \newcommand{\opand}{\textbf{\em and}} \newcommand{\opor}{\textbf{\em or}} \newcommand{\opxor}{\textbf{\em xor}} \newcommand{\psiand}{\Psi_\opand} \newcommand{\psior}{\Psi_\opor} \newcommand{\subj}{\textsc{subj}} \newcommand{\dobj}{\textsc{dobj}} \newcommand{\iobj}{\textsc{iobj}} \title{\bf A Categorization of Complexity Classes for Information Retrieval and Synthesis Using Natural Logic \\ \vspace{25pt} } \author{ {\Large Greg Coppola} \\ {\em coppola.ai} \\ Research. Develop. Meme. } \date{\today} \begin{document} \maketitle \begin{abstract} \noindent Given the emergent reasoning abilities of large language models, \emph{information retrieval} is becoming more complex. Rather than just \emph{retrieve a document}, modern information retrieval systems adverstise that they can \emph{synthesize an answer} based on potentially many different documents, conflicting data sources, and using \emph{reasoning}. But, different kinds of questions have different answers, and different answers have different complexities. In this paper, we introduce a novel framework for analyzing the complexity of a question answer based on the \emph{natural deduction calculus} as presented in \cite{Prawitz1965}. Our framework is novel both in that no one to our knowledge has used this logic as a basis for complexity classes, and also in that no other existing complexity classes to these have been delineated using any analagous methods either. We identify three decidable fragments in particular called the \emph{forward}, \emph{query} and \emph{planning} fragments, and we compare this to what would be needed to do proofs for the \emph{complete} first-order calculus, for which theorem-proving is long known to be undecidable. \end{abstract} \tableofcontents \section{The Forward Fragment} \label{s:forward} In this section, we investigate the \emph{Forward Fragment} of the deduction calculus. This is the fragment implemented in \cite{Coppola2024}, and corresponds to a single forwards pass through a graphical network, in which the nodes are indexed by sentences in the first-order language. \subsection{Fragment Definition} \subsubsection{Horn Clauses} Consider the first-order language with a set of predicate symbols, function symbols, and constants. Suppose that that we have $n$ different functions $P_i$ of at most $k$ variables, and suppose that $C$ is a function of $k$ open variables. Then, the \emph{Forward} fragment, implemented in \cite{Coppola2024}, involves a system of \emph{quantified Horn clauses} of the form: \begin{equation} \forall x_1, \ldots, \forall x_k, \Bigg[ \bigwedge_{i=1}^{n} P_i(x_1, \ldots, x_k) \Bigg] \rightarrow C(x_1, \ldots, x_k) \label{e:basic_horn} \end{equation} We emphasize that every variable mentioned $x_1, ..., x_k$ must appear in $C$. This is called the \emph{safety} restriction of \emph{Datalog}, which is very crucial to the distinction in complexity classes, and is further discussed in Section \ref{s:datalog}. An example of an inference that uses the form is the following: \begin{equation} \forall x_1, x_2, \Bigg[like(x_1, x_2) \wedge like(x_2, x1) \Bigg] \rightarrow friends(x_1, x_2) \end{equation} This says that if any $x_1$ and $x_2$ both like \emph{each other}, then they are \emph{friends}. \subsubsection{Disjunctive Normal Form} The presence of multiple ways to derive the same predicate amounts to a \emph{disjunction} over the possible ways to reach that conclusion, and thus $m$ statements of the form \ref{e:basic_horn} can be written as a statement with a disjoined premise as: \begin{equation} \forall x_1, \ldots, \forall x_k, \Bigg[ \bigvee_{i=1}^{m} \bigwedge_{i=1}^{n} P_{i,j}(x_1, \ldots, x_k) \Bigg] \rightarrow C(x_1, \ldots, x_k) \label{e:dnf_form} \end{equation} Thus, any sentence in \emph{disjunctive normal form}, apart from the restriction on quantification, can be used as a premise. Since any statement in the propositional calculus can be written in disjunctive normal form \cite{andrews1986introduction}, any statement obeying the restriction on quantification can be a premise to a deduction rule, so this is a very powerful framework. \subsubsection{Conjoined Conclusions} Finally, if we have a statements of the form \ref{e:dnf_form}, note that in some cases, the premise can be the same, and so a database of different statements like \ref{e:dnf_form} can be re-written as a single statement like: \begin{equation} \forall x_1, \ldots, \forall x_k, \Bigg[ \bigvee_{i=1}^{n} \bigwedge_{i=1}^{n} P_i(x_1, \ldots, x_k) \Bigg] \rightarrow \Bigg[ \bigwedge_{i=1}^{n} C(x_1, \ldots, x_k) \Bigg] \label{e:final_horn} \end{equation} That is, a set of statements of the form \ref{e:basic_horn} and \ref{e:final_horn} are equivalent in the proofs that they allow. \subsection{Datalog's Safety Restriction} \label{s:datalog} In this \emph{Forward} fragment, we are incorporating an important restriction on the forms of clauses, which is that we can only mention in the $P_i(x_1, ..., x_k)$ variables that have also been mentioned in $C(x_1, ..., x_k)$. That is, in this fragment, we cannot introduce new universally quantified variables used in $P_i$ but not in $C$, and means we cannot say that we are \emph{truly} implementing \emph{$\forall$-Elimination}, for example, because the \emph{$\forall$-Elimination} rule in \cite{Prawitz1965}'s system does \emph{not} include this restriction. A Datalog rule is considered to be \textit{safe} if every variable that appears in the head of the rule (the conclusion part) also appears in a positive literal in the body of the rule (the condition part) \cite{abiteboul1995foundations}. \subsection{Analysis of the Fragment} This fragment allows all statements that use only the following rules: \emph{$\land$-Elimination}, \emph{$\land$-Introduction}, \emph{$\lor$-Introduction}, \emph{$\forall$-Elimination} and \emph{$\rightarrow$-Elimination}, but subject to the \emph{safety} restriction on quantification as discussed above. Because this fragment follows the safety restriction of Datalog, it amounts to \emph{Horn Satisfiability}, which is a well-known efficient fragment, where the time taken to do inference is \emph{linear} in the number of variables \emph{total} in the theory, which is really very efficient, when compared to the other fragments, as we will see. \cite{Coppola2024} implements this fragment, and shows how we can not only do logical inference, but even assign probabilities if this fragment is stuck to. That is, we give probabilities to a similar fragment as that for which \cite{pereira1987prolog} investigated theorem-provability. That is, we can distinguish between theories that are more or less \emph{likely}. And, we can determine a \emph{generative} probability for the data set, so that it can be compressed \cite{SutskeverObservation}. \section{The Query Fragment} \label{s:query} \subsection{Motivation} Suppose that instead of \ref{e:basic_horn}, suppose we have an integer $K > k$, and: \begin{equation} \forall x_1, \ldots, \forall x_K, \Bigg[ \bigwedge_{i=1}^{n} P_i(x_1, \ldots, x_k, x_{k+1}, ..., x_K) \Bigg] \rightarrow C(x_1, \ldots, x_k) \label{e:unbound_universal} \end{equation} This statement is \emph{unsafe} in the terms of Datalog (see Section \ref{s:datalog}), because the variables $x_{k+1}, ..., x_K$ are \emph{not mentioned} in the conclusion, and thus \emph{unbound}. Such a statement can be useful, for example, in saying that if $x_1$ and $x_2$ both want the same thing $x_3$, then $x_1$ and $x_2$ are \emph{in competition}: \begin{equation} \forall x_1, x_2, x_3, \Bigg[ want(x_1, x_3) \wedge want(x_2, x_3) \Bigg] \rightarrow compete(x_1, x_2) \end{equation} Thus, the restriction that we have put in Section \ref{s:datalog} is something that we want to relax. \subsection{Existential Quantifiers as Queries} We can re-write \ref{e:unbound_universal} as: \begin{equation} \forall x_1, \ldots, \forall x_k, \Bigg[ \exists x_{k+1}, ..., x_K, \bigwedge_{i=1}^{n} P_i(x_1, \ldots, x_k, x_{k+1}, ..., x_K) \Bigg] \rightarrow C(x_1, \ldots, x_k) \end{equation} Each existentially quantified variable $\exists x_{k+i}$ corresponds to a \emph{query} as to whether any element in the domain can be found that satisfies the premise. \subsection{Complexity of the Full Fragment} If our goal is to do \emph{full deterministic} theorem-proving, the worst-case bound is very bad, for reasons that we will now investigate. \paragraph{Quantifiers in Same Sentence} Suppose that the $x_i$ are universally quantified and the $y_j$ are existentially quantified, but we will still write $\exists$ for clarity. Now, consider an implication with a single existential: \begin{equation} \left[ \exists y_1, P(x_1, y_1) \right] \rightarrow C(x_1) \end{equation} If the domain that $\exists y_1$ ranges over is finite, which we assume that it is, then we can \emph{implement} $\exists$ using \emph{$\vee$-Introduction}, with a disjunction over $D$ elements in the domain: \begin{equation} \left[ \bigvee_{i=1}^D (x_1, c_i) \right] \rightarrow \exists y_1 P(x_1, y_1) \end{equation} This is computable, but quickly becomes a problem from an efficiency perspective. For example, to implement a premise with two exisentially quantified variables in the premise like: \begin{equation} \left[ \exists y_1, y_2, P(x_1, y_1, y_2) \right] \rightarrow C(x_1) \end{equation} This would require $\Theta(D^2)$ simple rules to implement. In general, if we begin with a graph of size $G$, and a domain of size $D$, then to implement $N$ $\exists$ would require work $\Theta(GD^N)$. \subsection{Useful Best-Effort Fragments} We must stress that the exponential blow-up in computational cost described above is only in the \emph{worst-case} assuming \emph{full theorem-proving}. We discuss two efficient but useful fragments of the query calculus that we believe can be helpful in practice. \paragraph{Shallow Queries} Suppose instead, we can say that, when eliminating a $\exists$, we are interested only in facts that are \emph{already in} an existing \emph{statically} stored database, without requiring any \emph{dynamic} theorem-proving. In other words, we can make a single database query to see if there is any $x_3$ such that $want(x_1, x_3)$ and $want(x_2, x_3)$. This can be done in $O(1)$ time relative to the complexity of the graph, possibly even using a database index. \paragraph{Probabilistic Ranking} In traditional, non-probabilistic theorem-proving, if we want to prove $\exists x P(x)$, we have no a priori way to \emph{order} the $x$ in terms of which would be \emph{most likely} make $P(x)$ true. Such a ranking could cut down on the search time. We could also use a heuristic like \emph{try only the most probable candidate}, which would eliminate the exponential blow up in the search space, by only allowing $1$ try per $\exists$. \paragraph{A* Search} Otherwise, if we have a \emph{ranking} of different candidates, and we really want to search exhaustively, or as exhaustively as possible, we speculate we can view this as analogous to searching through a \emph{maze} and use \emph{A* search} as in \cite{lehnert2024a}. \section{The Planning Fragment} \label{s:planning} The natural rule of \emph{$\lor$-Elimination} we also call \emph{reasoning by cases}, and now we will investigate why that is, and see the relation to \emph{planning} under \emph{uncertain conditions}. \subsection{Example of Reasoning Under Uncertainty} \paragraph{Set-Up} Let us consider the example of a party that is planning an \emph{enjoyable excursion}, where the goal at the end is to be \emph{satisfied} with their experience. This particular party $\pvariable$ has their own individual preferences. And, they are considering a trip to a particular beach town $\tvariable$. Suppose the party will be \emph{satisfied} with their excursion if the excitement level $l$ with excursion $e$ is at least a $7$ out of $10$: \begin{equation} \forall e, \exists l, \Bigg[ excited(\pvariable, e, l) \wedge l \geq 7 \Bigg] \rightarrow satisfied(\pvariable, e) \end{equation} Here, we assume that we can use $\exists$. If the party visits the beach town $\tvariable$ they can go to the beach: \begin{equation} \Bigg[ visit(\pvariable, e, \tvariable) \Bigg] \rightarrow visit(\pvariable, e, beach(\tvariable)) \end{equation} If they visit the beach, and it is sunny, the happiness level will be a $10$. \begin{equation} \Bigg[ visit(\pvariable, e, beach(\tvariable)) \wedge sunny(e) \Bigg] \rightarrow excited(\pvariable, e, 10) \end{equation} But, if at the beach, it is not sunny, the party will be \emph{very} unhappy: \begin{equation} \Bigg[ visit(\pvariable, e, beach(\tvariable)) \wedge \neg sunny(e) \Bigg] \rightarrow excited(\pvariable, e, 1) \end{equation} Now, there is a favorite \emph{restaurant} of the party's in \tvariable, which they enjoy going to when it rains: \begin{equation} \Bigg[ visit(\pvariable, e, restaurant(\tvariable)) \wedge \neg sunny(e) \Bigg] \rightarrow excited(\pvariable, e, 7) \end{equation} Going to this restaurant is not as fun as the beach, but it is still fun on a rainy day. However, going to the restaurant if it is \emph{not} sunny does not make the party happy, as they will wonder why they are not at the beach: \begin{equation} \Bigg[ visit(\pvariable, e, restaurant(\tvariable)) \wedge sunny(e) \Bigg] \rightarrow excited(\pvariable, e, 3) \end{equation} \paragraph{Observation} We are now in a situation where the party actually \emph{can} guarantee that they will be satisfied with their excursion, becuase they can go to the town $\tvariable$, and if it is sunny, they can go to the beach, and if it rains, they can go to their restaurant, and they will be happy either way. The fact that they will be happy \emph{either way} requires the ability to reason by \emph{cases}, and evaluate their position \emph{in either case}. \subsection{Disjunctive Normal Form} In order to handle the motivating case, and others as well, it suffices to allow, instead of \ref{e:dnf_form}, that a conclusion can now itself contain a disjunction as in: \begin{equation} \forall x_1, \ldots, \forall x_k, \Bigg[ \bigvee_{i=1}^{n} \bigwedge_{i=1}^{n} P_i(x_1, \ldots, x_k) \Bigg] \rightarrow \bigvee_{i=1}^{n} C(x_1, \ldots, x_k) \label{s:planning_form} \end{equation} Of course, it is less information to learn that $A \lor B$ than to learn $A$, because the latter allows us to conclude $A$, and the former does not, and this is why we must \emph{resaon by cases}. \subsection{Two-Player Games} In general, interaction between the agent and the environment can be viewed as a two-player game, where the agent plays themselves, and \emph{reality} plays as the opponent. In an \emph{antagoistic} two-player game, we assume that the \emph{opponent} always does their best to make the life of the \emph{protagonist} as bad as possible. In the case of the \emph{protagonist} against \emph{reality}, \emph{reality} might not always try to make things as bad as possible for the \emph{protagonist}, but rather reality will react according to a \emph{probability distribution}. In either case, this is similar to a game of \emph{chess} or \emph{go} \cite{silver2016mastering}. \subsection{Relevant Empirical Results} Because of the $\Omega(2^N)$ explosion in complexity of solving a boolean satisfiability problem \cite{Cook1971}, we expect that a \emph{large language model} of finite size will not be able to handle problems corresponding to boolean satisfiability, if the input grows large enough. This has been investigated and shown in \cite{almeekam2023planning, kambhampati2023can}. \section{Discussion} \subsection{Review of Studied Fragments} In the \emph{forward} fragment of Section \ref{s:forward}, we investigate a system that contains the rules: \begin{itemize} \item {\bf forward fragment} \begin{itemize} \item \emph{$\land$-Introduction} \item \emph{$\land$-Elimination} \item \emph{$\lor$-Introduction} \item \emph{$\rightarrow$-Elimination} \item \emph{$\forall$-Elimination} (limited by the \emph{safety} requirement on quantification) \end{itemize} \end{itemize} In the \emph{query} fragment of Section \ref{s:query}, we weaken the restriction on quantification and add \emph{$\exists$-Introduction}: \begin{itemize} \item {\bf query fragment} \begin{itemize} \item all those from the \emph{direct fragment}, plus \item \emph{$\forall$-Elimination} (without limitation) \item \emph{$\exists$-Introduction} \end{itemize} \end{itemize} Adding in \emph{reasoning by cases} creates the \emph{planning fragment} of Section \ref{s:planning}: \begin{itemize} \item {\bf planning fragment} \begin{itemize} \item all those from the \emph{query fragment}, plus \item \emph{$\vee$-Elimination} \end{itemize} \end{itemize} \subsection{The Remaining Fragments} The remaining fragments are the ones that \cite{Prawitz1965} called \emph{improper}: \begin{itemize} \item {\bf improper rules} \begin{itemize} \item \emph{$\lor$-Elimination} \item \emph{$\rightarrow$-Introduction} \item \emph{$\forall$-Introduction} \item \emph{$\exists$-Elimination} \item \emph{$\bot$-Introduction} \end{itemize} \end{itemize} In \cite{Coppola2024} we implemented the \emph{direct} fragment, and showed how to assign probabilities to conclusions. We said that one feature of the research is that it shows how to \emph{unify logical and probabilistic reasoning}, by providing a \emph{unified model} of the two. This is because, while we can show what happens in a \emph{forward pass} of a \emph{direct} fragment, we can also show how this fragment that is implemented related to a \emph{complete} and \emph{consistent} calculus for the first-order logic. However, it is important to understand the reason that we can \emph{not} implement an entire logic as a \emph{forward pass} in a graphical network. To do proofs in the unbound fragments, one must implement \emph{partial} strategies for these rules that \cite{Prawitz1965} called \emph{improper}. Such a list must always be incomplete, because full theorem proving in the first-order calculus is undecidable in general \cite{Church1936, Turing1937}. \bibliographystyle{apalike} % \bibliography{bibtex} \begin{thebibliography}{} \bibitem[Abiteboul et~al., 1995]{abiteboul1995foundations} Abiteboul, S., Hull, R., and Vianu, V. (1995). \newblock {\em Foundations of Databases}. \newblock Addison-Wesley. \bibitem[Almeekam et~al., 2023]{almeekam2023planning} Almeekam, K., Marquez, M., Sreedharan, S., and Kambhampati, S. (2023). \newblock On the planning abilities of large language models - a critical investigation. \newblock In {\em Thirty-seventh Conference on Neural Information Processing Systems}. \newblock URL https://openreview.net/forum?id=X6dEqXIsEW. \bibitem[Andrews, 1986]{andrews1986introduction} Andrews, P.~B. (1986). \newblock {\em An Introduction to Mathematical Logic and Type Theory: To Truth Through Proof}. \newblock Academic Press, Inc., Orlando, San Diego, New York, Austin, Boston, London, Sydney, Tokyo, Toronto. \bibitem[Church, 1936]{Church1936} Church, A. (1936). \newblock An unsolvable problem of elementary number theory. \newblock {\em American Journal of Mathematics}, 58(2):345--363. \bibitem[Cook, 1971]{Cook1971} Cook, S.~A. (1971). \newblock The complexity of theorem-proving procedures. \newblock {\em Proceedings of the third annual ACM symposium on Theory of computing}. \bibitem[Coppola, 2024]{Coppola2024} Coppola, G. (2024). \newblock The quantified boolean bayesian network: Theory and experiments with a logical graphical model. \bibitem[Kambhampati, 2023]{kambhampati2023can} Kambhampati, S. (2023). \newblock Can llms really reason and plan? \newblock {\em Communications of the Association for Computing Machinery (CACM) Blog}. \newblock URL https://cacm.acm.org/blogs/blog-cacm/276268-can-llms-really-reason-and-plan/fulltext. \bibitem[Lehnert et~al., 2024]{lehnert2024a} Lehnert, L., Sukhbaatar, S., Mcvay, P., Rabbat, M., and Tian, Y. (2024). \newblock Beyond a*: Better planning with transformers via search dynamics bootstrapping. \bibitem[Pereira and Shieber, 1987]{pereira1987prolog} Pereira, F. C.~N. and Shieber, S.~M. (1987). \newblock {\em Prolog and Natural-Language Analysis}. \newblock CSLI Lecture Notes 10. Center for the Study of Language and Information - CSLI. \bibitem[Prawitz, 1965]{Prawitz1965} Prawitz, D. (1965). \newblock {\em Natural Deduction: A Proof-Theoretical Study}. \newblock Stockholm Studies in Philosophy 3. Almqvist \& Wiksell, Stockholm; Göteborg; Uppsala. \newblock Acta Universitatis Stockholmiensis. \bibitem[Silver et~al., 2016]{silver2016mastering} Silver, D., Huang, A., Maddison, C.~J., Guez, A., Sifre, L., Van Den~Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et~al. (2016). \newblock Mastering the game of go with deep neural networks and tree search. \newblock {\em Nature}, 529(7587):484--489. \bibitem[Sutskever, 2023]{SutskeverObservation} Sutskever, I. (2023). \newblock An observation on generalization. \newblock Simons Institute, YouTube. \newblock Accessed: 2024-01-29. \bibitem[Turing, 1937]{Turing1937} Turing, A. (1937). \newblock On computable numbers, with an application to the entscheidungsproblem. \newblock {\em Proceedings of the London Mathematical Society}, s2-42(1):230--265. \end{thebibliography} \end{document}