Entropy Calculator NetLogo Model

Produced for the book series "Artificial Intelligence";

Author: W. J. Teahan; Publisher: Ventus Publishing Aps, Denmark.

powered by NetLogo

view/download model file: Entropy-Calculator.nlogo

WHAT IS IT?

This model allows the user to calculate the entropy for a specific probability distribution.

The analogy is that there is an imaginary bag that a person pulls coloured balls from. The range of colours that the ball can have is its "alphabet". Each colour is a "symbol". Each symbol has a frequency count which can be used to determine its probability by dividing the count by the sum of the counts from all the symbols. The probabilities therefore sum to 1 and form the probability distribution for the set of balls.

Entropy provides a measure of how much variation there is in the probability distribution. It will be minimum when the symbols are all equiprobable. If one of the symbols have a higher count than the others, then the entropy will be correspondingly higher than the minimum when all the counts are equal.


WHAT IS ITS PURPOSE?

The purpose of this model is show how entropy calculations are made given a probability distribution.


HOW IT WORKS

A symbol turtle agent is used to represent the symbols in the distribution. Each symbol owns a frequency variable which is its count. The entropy is calculated by summing all frequency counts for all symbols first to find out the total count, then by calculating the sum of the negative log (to the base 2) of the symbol probabilities.


HOW TO USE IT

First select the size of the alphabet using the alphabet-size slider. Then select the frequency counts and desired colours using the respective sliders for each symbol. The sliders for symbols whose number is greater than or equal to the alphabet size will be ignored.

Then press the setup button followed by the calculate-entropy button.


THE INTERFACE

The model's Interface buttons are defined as follows:

- setup: This clears the environment and then draws coloured circles to represent the balls in the distribution.

- calculate-entropy: This calculates the entropy for the distribution.

The model's Interface sliders are defined as follows:

- alphabet-size: This sets the number of symbols (coloured balls) in the distribution.

The remaining sliders have the following naming convention:
<symbol>-<number>-<count> and <symbol>-<number>-<colour>.
This sets the count and colour for the specified symbol. If a number of the symbol is >= the alphabet-size, then these symbols are ignored in the calculations.

These counts are used to determine the probability for a specific symbol using the following formula:

P(symbol) = C(symbol) / C_total

where C(symbol) is the count for the symbol and C_total is the sum of all counts for all symbols.

The model's Output is used to show how the entropy value is calculated.


THINGS TO TRY

Try changing the distribution counts in the sliders to see what affect this has on the entropy calculations.

Try to find out the conditions when the entropy is minimised.

Then try to determine if there is a maximum entropy value. To do this, edit one of the sliders to increase the maximum frequency count for a particular symbol. Increase the maximum count from 100 to 1000. What does this do to the entropy as a result? Then increase it to 1000000. What happens?


CREDITS AND REFERENCES

This model was created by William John Teahan.

To refer to this model in publications, please use:

Entropy Calculator NetLogo model.
Teahan, W. J. (2010). Artificial Intelligence. Ventus Publishing Aps.


PROCEDURES

; Cars Guessing Game model.
; 
; Three agents try to guess the probabilities of cars passing by.
;
; Copyright 2010 William John Teahan. All Rights Reserved.

extensions [ array ]

breed [symbols symbol]    ;; represents symbols in the probability distribution

symbols-own
[ frequency ]             ;; the frequency count for the symbol

to setup
  ca ;; clear everything
  create-symbols alphabet-size
  [
    set size 8
    set shape "circle"
    setxy (-45 + (10 - alphabet-size) * 5  + who * 10) 0
    if (who = 0) [ set frequency symbol-0-count set color symbol-0-colour ]
    if (who = 1) [ set frequency symbol-1-count set color symbol-1-colour ]
    if (who = 2) [ set frequency symbol-2-count set color symbol-2-colour ]
    if (who = 3) [ set frequency symbol-3-count set color symbol-3-colour ]
    if (who = 4) [ set frequency symbol-4-count set color symbol-4-colour ]
    if (who = 5) [ set frequency symbol-5-count set color symbol-5-colour ]
    if (who = 6) [ set frequency symbol-6-count set color symbol-6-colour ]
    if (who = 7) [ set frequency symbol-7-count set color symbol-7-colour ]
    if (who = 8) [ set frequency symbol-8-count set color symbol-8-colour ]
    if (who = 9) [ set frequency symbol-9-count set color symbol-9-colour ]
  ]
end

to-report neg-log-prob [p q]
;; returns the negative of the log to base 2 of the probability p/q.
  report (- log (p / q) 2)
end

to calculate-entropy
;; returns the entropy for the distribution.

  let dist-total 0
  let dist-entropy 0
  let this-neg-log-prob 0
  
  ask symbols
  [ ; find the total frequency count
    set dist-total dist-total + frequency
  ]
  
  foreach sort symbols
  [ ; calculate the entropy in sorted order of symbols
    ask ?
    [
      set label (word frequency "/" dist-total "    ")
      set label-color white
      set this-neg-log-prob neg-log-prob frequency dist-total

      output-print (word "symbol " who ": probability = " frequency "/" dist-total ", negative log probability = "
        (precision this-neg-log-prob 3))
    
      set dist-entropy dist-entropy + this-neg-log-prob
    ]
  ]

  output-print ""
  output-print (word "Entropy of distribution = " (precision dist-entropy 3))
end
;
; Copyright 2010 by William John Teahan.  All rights reserved.
;
; Permission to use, modify or redistribute this model is hereby granted,
; provided that both of the following requirements are followed:
; a) this copyright notice is included.
; b) this model will not be redistributed for profit without permission
;    from William John Teahan.
; Contact William John Teahan for appropriate licenses for redistribution for
; profit.
;
; To refer to this model in publications, please use:
;
; Teahan, W. J. (2010).  Cars Guessing Game NetLogo model.
;   Artificial Intelligence. Ventus Publishing Aps.
;