How to make good decisions under uncertainty?

Probabilistic Graphical Models

INTRODUCTION

Content

PROBABILISTIC

Problem

mystery

prior

culprit = createCPT(list("culprit"), 
                    probs = c(0.2, 0.8), 
                    levelsList = list(c("butler","cook"))) 
                    
culprit

  probs culprit
1   0.2  butler
2   0.8    cook

conditional


culprit = createCPT(list("culprit"), 
                    probs = c(0.2, 0.8), 
                    levelsList = list(c("butler","cook"))) 
                    
culprit

  probs culprit
1   0.2  butler
2   0.8    cook

weapon.culprit = createCPT(list("weapon","culprit"), 
                           probs = c(0.80,0.05,0.10,0.65,0.10,0.30), 
                           levelsList = list(c("pistol","knife","poker"),
                            c("butler","cook")))

weapon.culprit

  probs weapon culprit
1  0.80 pistol  butler
2  0.05 pistol    cook
3  0.10  knife  butler
4  0.65  knife    cook
5  0.10  poker  butler
6  0.30  poker    cook

factor

joint

  
culprit = createCPT(list("culprit"), 
                    probs = c(0.2, 0.8), 
                    levelsList = list(c("butler","cook"))) 
                    
culprit
  probs culprit
1   0.2  butler
2   0.8    cook

weapon.culprit = createCPT(list("weapon","culprit"), 
                           probs = c(0.80,0.05,0.10,0.65,0.10,0.30), 
                           levelsList = list(c("pistol","knife","poker"),
                                             c("butler","cook")))
weapon.culprit
  probs weapon culprit
1  0.80 pistol  butler
2  0.05 pistol    cook
3  0.10  knife  butler
4  0.65  knife    cook
5  0.10  poker  butler
6  0.30  poker    cook
  
productFactor(culprit,weapon.culprit)
  probs culprit weapon
1  0.16  butler pistol
2  0.02  butler  knife
3  0.02  butler  poker
4  0.04    cook pistol
5  0.52    cook  knife  <--MEDIA_HYPE (looks like the cook did it with the knife!)
6  0.24    cook  poker

joint_p

factor_1

> murderNet = list("culprit" = culprit, "weapon.culprit" = weapon.culprit )

> murderNet
$culprit
  probs culprit
1   0.2  butler
2   0.8    cook

$weapon.culprit
  probs weapon culprit
1  0.80 pistol  butler
2  0.05 pistol    cook
3  0.10  knife  butler
4  0.65  knife    cook
5  0.10  poker  butler
6  0.30  poker    cook

marginal

marginalize(murderNet, "weapon")

$culprit
  probs culprit
   0.2  butler
   0.8    cook

marginal_1

> marginalize(murderNet, "culprit")

  weapon probs
  knife  0.54
 pistol  0.20
  poker  0.26

post

murderNet
$culprit
  probs culprit
1   0.2  butler
2   0.8    cook

$weapon.culprit
  probs weapon culprit
1  0.80 pistol  butler
2  0.05 pistol    cook
3  0.10  knife  butler
4  0.65  knife    cook
5  0.10  poker  butler
6  0.30  poker    cook

observe(murderNet, "weapon", "pistol")

$culprit
  probs culprit
1   0.2  butler
2   0.8    cook

Go-Back

Bayes' theorem

Rules

(1)

(2)

(3)

(4)

Probability

Dealing with uncertainity ?

* Partial knowledge of the world
* Noisy observation
* Phenomenon not convered by model
* Inherent stochasticity

… Use Probability theory

* Declarative representation with clear semantics
* Reasoning patterns, For Eg Conditioning
* Established learning methods

GRAPHICAL

decompose

Condition on Parents

dag

Spec of a BN

  * We need to define its Parameters.

  * These Parameters are the Conditional Probabilities
    of each node given
    its parents in the graph.

If we consider discrete variables:

* Root nodes  : Cector of marginal probabilities.

* Other nodes : Conditional Probability Table (CPT) 
                of the variable given 
                its parents in the graph.

Conditional Independence

ci1

ci2

ci3

Bayes' Ball

blocked

What did we do ?

Graph Theory

* Intuitive & compact data structures
* Efficient reasoning using general purpose algorithms
* Sparse parameterization
  * feasible elicitations (by hand)
  * learning from data (auto)

Combined Probability Theory with Graph Theory

* New insights into existing models
* Framework for designing new models
* Graph-based algo's for calculation and computations
* Feynman diagrams in Physics
* Efficient software implementation

MODELS

model

6 Examples

XyzNet

ci2

GrassNet

grass

LateNet

ci3

CarNet

carnet

(5)

(6)

(7)

(8)

(9)

(10)

carnet

b = createCPT(list("battery"), probs = c(0.9, 0.1), levelsList = list(c(1, 0)))

f = createCPT(list("fuel"), probs = c(0.9, 0.1), levelsList = list(c(1, 0)))

g.bf = createCPT(list("gauge", "battery", "fuel"),
                 probs = c(0.8, 0.2, 0.2, 0.1, 0.2, 0.8, 0.8, 0.9),
                 levelsList = list(c(1, 0), c(1, 0), c(1, 0)))

carNet = list("battery" = b, "fuel" = f, "gauge" = g.bf)

Bayes rule (d)

carnet

We observe the fuel guage and discover that it reads empty G=0

(11)

> infer(carNet, c("battery", "fuel"), NULL, NULL)
  gauge probs
1     0 0.315
2     1 0.685

Bayes rule (n)

carnet

Calculate the first term of the numerator in the Bayes theorem

(12)

> infer(carNet, c("battery"), "fuel", 0)
  probs fuel gauge
1  0.81    0     0
2  0.19    0     1

posterior probability

We can use bayes' theorem to evaluate the posterior probability of the fuel tank being empty.

(13)

> infer(carNet, c("battery"), "gauge", 0)
      probs fuel gauge
1 0.2571429    0     0
2 0.7428571    1     0

Explaining Away

The posterior probability that the fuel tank is empty given the observations of both the fuel gauge and the battery state is then given by:

(14)

> infer(carNet, NULL, c("gauge", "battery"), c(0, 0))
      probs battery gauge fuel
1 0.1111111       0     0    0
2 0.8888889       0     0    1

Variable Elimination

4 steps for inference

* Building up factors needed for bayes rule, trick is to rearrange terms
* Conceptually simple, naive , but works for any possbile bayes net.

Multiplication
```
productFactor(A, B)
```

Marginalize

marginalizeFactor(A, margVar)
marginalize(bayesNet, margVars)

Observe
```
observe(bayesNet, obsVars, obsVals)
```

Infer

infer(bayesNet, margVars, obsVars, obsVals)

Formalizing PGM

cold_net

(15)

Tables grow exponentially in numbers of parents
Keep arrows to a minimum, if you can,
- But try and explain the data sufficiently

Formalizing PGM (queries)

cold_net

(16)

(17)

(18)

(19)

(20)

RiskNet

risknet

HandNet

handnet_1

handnet_2

handnet_3

handnet_4

handnet_5

handnet_6

handnet_7

handnet_8

handnet_9

handnet_10

handnet_11

handnet_12

handnet_13

handnet_14

5 Stages of Model based Learning

Build a model:
- Joint probability distribution of all of the relevant variables (e.g. as a graph)
Incorporate the observed data
Compute the distributions over the desired variables: inference
Iterate 2 and 3 in real-time applications
Extend model as required

3 Types

Directed or undirected :
- Type of graph used to represent the dependence relations.
- Undirected graphs represent symmetric relations,
- Directed graphs represent relations in which the direction is important.

Static or dynamic
- If the model represents a set of variables at a certain point in time (static) or across different times (dynamic).

Probabilistic or decisional
- Probabilistic models only include random variables, while decisional models also include decision and utility variables.

Bayesian networks

2011 Turing Award

Nothing Bayesian about these networks

Undireted Static PGM.

markovnet

mrf_factors

- Clique factorization

mrf_ci

Image Denoising

mrf_denoise

mrf_formal

Factor graphs

mrf_factorize

mrf_factorgraphs

mrf_dgtofg

pgmtypes

Source: Advances in Computer Vision and Pattern Recognition ISBN 978-1-4471-6699-3 (eBook) DOI 10.1007/978-1-4471-6699-3

3 main aspects

Representation
Inference
Learning

DETAILS

Representation

Basic property of each model
Defines which entities constitute it

How are these entities related ?

* All PGMs can be represented as Graphs G=(V,E)
  that define the structure of the model
  
* Local f() that describe its parameters

* The type of graph and the local f()
   vary from different types of model

Inference

Answering different probabilistic queries

based on the model and some evidence.

* Obtaining the `posterior` probability distribution of
a `variable` or *set* of `variables` 
given that other variables in the model are known. 

* The research problem is how to do this efficiently.?

Inference Algorithms

Good News
1. Efficient (polynomial) algorithms for certain types of structures (singly connected networks);
2. For other structures it depends on the connectivity of the graph.
3. In many applications, the graphs are sparse inference algorithms which are very efficient.

  * Pearls Algorithm
  * Variable Elimination
  * Conditioning
  * Junction Tree
  * Stochastic Simulation

* Not so good news
  * In the worst case, Inference is NP-hard

Learning

Build it “by hand” with the aid of domain experts
Induce the model from data. T
- The emphasis in recent years has been to induce the models based on ML

  * Its a pain and $$ to do it with the aid of Experts.
   
    * Obtaining the parameters for the models
      usually done based on data.
     
    * Humans tend to be bad estimators of probabilities.

Why PGM?

Separate the inference and learning techniques from the model. (generic)
Knowledge base depends on the particular application. (app specific)
Techniques developed for inference in each class of PGM can be applied directly for different models in a variety of applications.

bigpic

PGM for FLiT

bigpic

  1. What data do we want to use for learning
  2. What do we want to learn
  3. How are we going to learn it
        By Hand (elicitation)
        By Data (statistical methods)
  4. Build PGM
        Root Nodes  : ?
        Other Nodes : ?
        Edges: Probabilistic connections
  5. What do we want to infer
        How -> TensorFlow - Edward lib
        What are the evidence we have - PIN data

PGM for SDC Detectors

bigpic

  1. What data do we want to use for learning
  2. What do we want to learn
  3. How are we going to learn it
        By Hand (elicitation)
        By Data (statistical methods)
  4. Build PGM
        Root Nodes  : ?
        Other Nodes : ?
        Edges: Probabilistic connections
  5. What do we want to infer
        How -> TensorFlow - Edward lib
        What are the evidence we have - DOE Mini Apps

PGM + First Order Logic

CS 6961 Class Project ?

mln

Proposal for CS 6961 Class Project

PGM + First Order Logic

A Markov logic network (MLN) is a probabilistic logic

Applies the ideas of a Markov network to first-order logic.
Enabling uncertain inference.

Markov logic networks generalize first-order logic,

in the sense that, in a certain limit, 
all unsatisfiable statements have a probability of zero, 
and all tautologies have probability one.

Work in this area began in 2003

Pedro Domingos and Matt Richardson, 
Use the term MLN to describe it

How to make good decisions under uncertainty?

Probabilistic Graphical Models

Content

PROBABILISTIC

Problem

$P(x)$

$P(y|x)$

$P(x,y)=P(x) \times P(y|x)$

$P(x,y)$

$P(x)-revisited$

$P(y)$

$P(x-given-y)$

Go-Back

Bayes' theorem

Rules

Probability

GRAPHICAL

$P(a,b,c)$

Condition on Parents

Conditional Independence

$not - V$

$\rightarrow$

$V$

Bayes' Ball

What did we do ?

MODELS

6 Examples

XyzNet

GrassNet

LateNet

CarNet

Bayes rule (d)

Bayes rule (n)

posterior probability

Explaining Away

Variable Elimination

Formalizing PGM

Formalizing PGM (queries)

RiskNet

HandNet

5 Stages of Model based Learning

3 Types

Bayesian networks

Undireted Static PGM.

Image Denoising

Factor graphs

3 main aspects

DETAILS

Representation

Inference

Inference Algorithms

Learning

Why PGM?

PGM for FLiT

PGM for SDC Detectors

PGM + First Order Logic

CS 6961 Class Project ?

WHAT NEXT ?

Other Models

Undirected Graphs (mrf)

APPROXIMATE INFERENCE

Monte Carlo Methods

Markov Chain Monte Carlo mcmc()

Gibbs Sampling gs()

Metropolis Hastings mh()

Hamiltonian Monte Carlo hmc()

Markov Chain Monte Carlo `mcmc()`

Gibbs Sampling `gs()`

Metropolis Hastings `mh()`

Hamiltonian Monte Carlo `hmc()`