Attended a talk on neuro-symbolic AI recently. It was by Kareem Ahmed from UCLA, presented at UTD.

I have been fascinated by neuro-symbolic AI for a while. It was first started from a talk I watched in 2021 from some guy at IBM advocating for symbolic AI in current AI. The argument for it still holds over the last three years. From that appreciation, I sometimes get overly excited when I see ideas coming from that realm (such as AlphaGeometry which claims to be neuro-symbolic — I have not read the report in depth but already feel excited).

The main idea of the talk was to introduce the Probabilistic Semantic Layer (PLS), which is a layer that is supposed to replace softmax in neural networks, that enforces soft constraints that can represent domain knowledge to the output of the network. The narrative is as follows:

  • As an example, consider a (probabilistic) language model that outputs JSON contents. He says some codeLMs still assign significantly large non-positive weights to syntactically incorrect output. One solution is to somehow reduce the probability for those outputs to near zero.
  • The domain knowledge constraints, such as “choosing two in five items such that …”, can be implemented as a “circuit”, which can be translated into a differentiable computational graph to be plugged into a neural network.
  • This approach is different from what ChatGPT is possibly doing, which is to first have the core LLM do some inference that may speak inappropriate content, then place a filtering layer on top to review if the content is appropriate. This approach has a drawback, which is that the filtering layer is non-differentiable, making the system not end-to-end trainable.

During the talk, there are some interesting references, such as Perspective API (which takes a piece of content (text?) and outputs the probably that someone might be offended by it) and AutoDiff. Methodologically, I like how he suggests engineering patches, but reason about the soundness in mathematical dimension (KL divergence, first-order Taylor expansion, conditional marginals, etc.)