How to make good decisions under uncertainty?

Probabilistic Graphical Models

INTRODUCTION

Content

PROBABILISTIC

Problem

mystery

$P(x)$

prior

culprit = createCPT(list("culprit"), 
                    probs = c(0.2, 0.8), 
                    levelsList = list(c("butler","cook"))) 
                    
culprit

  probs culprit
1   0.2  butler
2   0.8    cook

$P(y|x)$

conditional


culprit = createCPT(list("culprit"), 
                    probs = c(0.2, 0.8), 
                    levelsList = list(c("butler","cook"))) 
                    
culprit

  probs culprit
1   0.2  butler
2   0.8    cook

weapon.culprit = createCPT(list("weapon","culprit"), 
                           probs = c(0.80,0.05,0.10,0.65,0.10,0.30), 
                           levelsList = list(c("pistol","knife","poker"),
                            c("butler","cook")))

weapon.culprit

  probs weapon culprit
1  0.80 pistol  butler
2  0.05 pistol    cook
3  0.10  knife  butler
4  0.65  knife    cook
5  0.10  poker  butler
6  0.30  poker    cook

$P(x,y)=P(x) \times P(y|x)$

factor

$P(x,y)$

joint

  
culprit = createCPT(list("culprit"), 
                    probs = c(0.2, 0.8), 
                    levelsList = list(c("butler","cook"))) 
                    
culprit
  probs culprit
1   0.2  butler
2   0.8    cook

weapon.culprit = createCPT(list("weapon","culprit"), 
                           probs = c(0.80,0.05,0.10,0.65,0.10,0.30), 
                           levelsList = list(c("pistol","knife","poker"),
                                             c("butler","cook")))
weapon.culprit
  probs weapon culprit
1  0.80 pistol  butler
2  0.05 pistol    cook
3  0.10  knife  butler
4  0.65  knife    cook
5  0.10  poker  butler
6  0.30  poker    cook
  
productFactor(culprit,weapon.culprit)
  probs culprit weapon
1  0.16  butler pistol
2  0.02  butler  knife
3  0.02  butler  poker
4  0.04    cook pistol
5  0.52    cook  knife  <--MEDIA_HYPE (looks like the cook did it with the knife!)
6  0.24    cook  poker

joint_p

factor_1

> murderNet = list("culprit" = culprit, "weapon.culprit" = weapon.culprit )

> murderNet
$culprit
  probs culprit
1   0.2  butler
2   0.8    cook

$weapon.culprit
  probs weapon culprit
1  0.80 pistol  butler
2  0.05 pistol    cook
3  0.10  knife  butler
4  0.65  knife    cook
5  0.10  poker  butler
6  0.30  poker    cook

$P(x)-revisited$

marginal

marginalize(murderNet, "weapon")

$culprit
  probs culprit
   0.2  butler
   0.8    cook

$P(y)$

marginal_1

> marginalize(murderNet, "culprit")

  weapon probs
  knife  0.54
 pistol  0.20
  poker  0.26

$P(x-given-y)$

post

murderNet
$culprit
  probs culprit
1   0.2  butler
2   0.8    cook

$weapon.culprit
  probs weapon culprit
1  0.80 pistol  butler
2  0.05 pistol    cook
3  0.10  knife  butler
4  0.65  knife    cook
5  0.10  poker  butler
6  0.30  poker    cook

observe(murderNet, "weapon", "pistol")

$culprit
  probs culprit
1   0.2  butler
2   0.8    cook

Go-Back

rb

Bayes' theorem

bt

Rules

(1)
\[P(x) = \sum_y P(x,y) \]
(2)
\[P(x,y) = P(y|x) P(x) \]
(3)
\[P(y|x) = \frac{P(x|y) P(y)}{P(x)} \]
(4)
\[P(x) = \sum_y P(x|y) P(y) \]

Probability

  • Dealing with uncertainity ?
    * Partial knowledge of the world
    * Noisy observation
    * Phenomenon not convered by model
    * Inherent stochasticity
  • … Use Probability theory
    * Declarative representation with clear semantics
    * Reasoning patterns, For Eg Conditioning
    * Established learning methods

GRAPHICAL

$P(a,b,c)$

decompose

Condition on Parents

dag

  • Spec of a BN

      * We need to define its Parameters.
    
      * These Parameters are the Conditional Probabilities
        of each node given
        its parents in the graph.
  • If we consider discrete variables:

    * Root nodes  : Cector of marginal probabilities.
    
    * Other nodes : Conditional Probability Table (CPT) 
                    of the variable given 
                    its parents in the graph.

Conditional Independence

$not - V$

ci1

$\rightarrow$

ci2

$V$

ci3

Bayes' Ball

blocked

What did we do ?

  • Graph Theory

    * Intuitive & compact data structures
    * Efficient reasoning using general purpose algorithms
    * Sparse parameterization
      * feasible elicitations (by hand)
      * learning from data (auto)
  • Combined Probability Theory with Graph Theory

    * New insights into existing models
    * Framework for designing new models
    * Graph-based algo's for calculation and computations
    * Feynman diagrams in Physics
    * Efficient software implementation

MODELS

model

6 Examples

XyzNet

ci2

GrassNet

grass

LateNet

ci3

CarNet

carnet

(5)
\[p(B = 1) = 0.9 \]
(6)
\[p(F = 1) = 0.9 \]
(7)
\[p(G = 1|B = 1, F = 1) = 0.8 \]
(8)
\[p(G = 1|B = 1, F = 0) = 0.2 \]
(9)
\[p(G = 1|B = 0, F = 1) = 0.2 \]
(10)
\[p(G = 1|B = 0, F = 0) = 0.1 \]

carnet

b = createCPT(list("battery"), probs = c(0.9, 0.1), levelsList = list(c(1, 0)))

f = createCPT(list("fuel"), probs = c(0.9, 0.1), levelsList = list(c(1, 0)))

g.bf = createCPT(list("gauge", "battery", "fuel"),
                 probs = c(0.8, 0.2, 0.2, 0.1, 0.2, 0.8, 0.8, 0.9),
                 levelsList = list(c(1, 0), c(1, 0), c(1, 0)))

carNet = list("battery" = b, "fuel" = f, "gauge" = g.bf)

Bayes rule (d)

carnet

  • We observe the fuel guage and discover that it reads empty G=0
(11)
\[p(G = 0) = \sum_{B \in \{0,1\}} \sum_{F \in \{0,1\}} p(G=0|B,F)p(B)p(F) = 0.315 \]
> infer(carNet, c("battery", "fuel"), NULL, NULL)
  gauge probs
1     0 0.315
2     1 0.685

Bayes rule (n)

carnet

  • Calculate the first term of the numerator in the Bayes theorem
(12)
\[p(G = 0|F = 0) = \sum_{B \in \{0,1\}} p(G=0|B,F=0)p(B) = 0.81 \]
> infer(carNet, c("battery"), "fuel", 0)
  probs fuel gauge
1  0.81    0     0
2  0.19    0     1

posterior probability

  • We can use bayes' theorem to evaluate the posterior probability of the fuel tank being empty.
(13)
\[p(F=0|G=0) = \frac{p(G=0|F=0)p(F=0)}{P(G=0)} = 0.257 \]
> infer(carNet, c("battery"), "gauge", 0)
      probs fuel gauge
1 0.2571429    0     0
2 0.7428571    1     0

Explaining Away

  • The posterior probability that the fuel tank is empty given the observations of both the fuel gauge and the battery state is then given by:
(14)
\[P(F=0|G=0,B=0) = \frac{p(G=0|B=0,F=0) p(F=0)}{\sum_{F\in\{0,1\}} p(G=0|B=0,F)p(F) } \]
> infer(carNet, NULL, c("gauge", "battery"), c(0, 0))
      probs battery gauge fuel
1 0.1111111       0     0    0
2 0.8888889       0     0    1

Variable Elimination

  • 4 steps for inference

    * Building up factors needed for bayes rule, trick is to rearrange terms
    * Conceptually simple, naive , but works for any possbile bayes net.
  • Multiplication

    productFactor(A, B)
  • Marginalize

    marginalizeFactor(A, margVar)
    marginalize(bayesNet, margVars)
  • Observe

    observe(bayesNet, obsVars, obsVals)
  • Infer

    infer(bayesNet, margVars, obsVars, obsVals)

Formalizing PGM

cold_net

(15)
\[p(x_1, \cdots, x_n) = \prod_{i=1}^{n} p(x_i|pa(x_i)) \]
  • Tables grow exponentially in numbers of parents
    • $2^{\#pa + 1}$
    • $N^{\#pa + 1}$
  • Keep $\#$arrows to a minimum, if you can,
    • But try and explain the data sufficiently

Formalizing PGM (queries)

cold_net

(16)
\[\text{ variables: } X = \{X_1, \cdots, X_n \} \]
(17)
\[\text{ query: what's the } P(Q | E= E_0) ? \]
(18)
\[\text{ query variables: } Q \subseteq X \]
(19)
\[\text{ evidence: you got ? } E \subseteq X \]
(20)
\[\text{ marginalize: did not ask! : } M = X - (Q \cup E) \]

RiskNet

risknet

HandNet

handnet_1

handnet_2

handnet_3

handnet_4

handnet_5

handnet_6

handnet_7

handnet_8

handnet_9

handnet_10

handnet_11

handnet_12

handnet_13

handnet_14

5 Stages of Model based Learning

  1. Build a model:

    • Joint probability distribution of all of the relevant variables (e.g. as a graph)
  2. Incorporate the observed data

  3. Compute the distributions over the desired variables: inference

  4. Iterate 2 and 3 in real-time applications

  5. Extend model as required

3 Types

  1. Directed or undirected :
    • Type of graph used to represent the dependence relations.
    • Undirected graphs represent symmetric relations,
    • Directed graphs represent relations in which the direction is important.
  1. Static or dynamic
    • If the model represents a set of variables at a certain point in time (static) or across different times (dynamic).
  1. Probabilistic or decisional
    • Probabilistic models only include random variables, while decisional models also include decision and utility variables.

Bayesian networks

  • 2011 Turing Award
  • judaepearl
  • Nothing Bayesian about these networks

Undireted Static PGM.

markovnet

mrf_factors

- Clique factorization

mrf_ci

Image Denoising

mrf_denoise

mrf_formal

Factor graphs

mrf_factorize

mrf_factorgraphs

mrf_dgtofg

pgmtypes

Source: Advances in Computer Vision and Pattern Recognition ISBN 978-1-4471-6699-3 (eBook) DOI 10.1007/978-1-4471-6699-3

3 main aspects

  1. Representation
  2. Inference
  3. Learning

DETAILS

Representation

  • Basic property of each model
  • Defines which entities constitute it
  • How are these entities related ?
    * All PGMs can be represented as Graphs G=(V,E)
      that define the structure of the model
      
    * Local f() that describe its parameters
    
    * The type of graph and the local f()
       vary from different types of model

Inference

  • Answering different probabilistic queries
    • based on the model and some evidence.
      * Obtaining the `posterior` probability distribution of
      a `variable` or *set* of `variables` 
      given that other variables in the model are known. 
      
      * The research problem is how to do this efficiently.?
      

Inference Algorithms

  • Good News
    1. Efficient (polynomial) algorithms for certain types of structures (singly connected networks);
    2. For other structures it depends on the connectivity of the graph.
    3. In many applications, the graphs are sparse inference algorithms which are very efficient.
  * Pearls Algorithm
  * Variable Elimination
  * Conditioning
  * Junction Tree
  * Stochastic Simulation
* Not so good news
  * In the worst case, Inference is NP-hard

Learning

  • Build it “by hand” with the aid of domain experts
  • Induce the model from data. T
    • The emphasis in recent years has been to induce the models based on ML
  * Its a pain and $$ to do it with the aid of Experts.
   
    * Obtaining the parameters for the models
      usually done based on data.
     
    * Humans tend to be bad estimators of probabilities.

Why PGM?

  1. Separate the inference and learning techniques from the model. (generic)
  2. Knowledge base depends on the particular application. (app specific)
  3. Techniques developed for inference in each class of PGM can be applied directly for different models in a variety of applications.

bigpic

PGM for FLiT

bigpic

  1. What data do we want to use for learning
  2. What do we want to learn
  3. How are we going to learn it
        By Hand (elicitation)
        By Data (statistical methods)
  4. Build PGM
        Root Nodes  : ?
        Other Nodes : ?
        Edges: Probabilistic connections
  5. What do we want to infer
        How -> TensorFlow - Edward lib
        What are the evidence we have - PIN data      

PGM for SDC Detectors

bigpic

  1. What data do we want to use for learning
  2. What do we want to learn
  3. How are we going to learn it
        By Hand (elicitation)
        By Data (statistical methods)
  4. Build PGM
        Root Nodes  : ?
        Other Nodes : ?
        Edges: Probabilistic connections
  5. What do we want to infer
        How -> TensorFlow - Edward lib
        What are the evidence we have - DOE Mini Apps      

PGM + First Order Logic

CS 6961 Class Project ?

mln

  • Proposal for CS 6961 Class Project
    PGM + First Order Logic 
    • A Markov logic network (MLN) is a probabilistic logic
      Applies the ideas of a Markov network to first-order logic.
      Enabling uncertain inference. 
    • Markov logic networks generalize first-order logic,
      in the sense that, in a certain limit, 
      all unsatisfiable statements have a probability of zero, 
      and all tautologies have probability one.
    • Work in this area began in 2003
      Pedro Domingos and Matt Richardson, 
      Use the term MLN to describe it

WHAT NEXT ?

Other Models

Undirected Graphs (mrf)

APPROXIMATE INFERENCE

Monte Carlo Methods

Markov Chain Monte Carlo mcmc()

Gibbs Sampling gs()

Metropolis Hastings mh()

Hamiltonian Monte Carlo hmc()