Set Shaping Theory is a relatively new area of information theory that seeks to optimize information transmission and compression.
In this post, we'll look at the foundations of set shaping theory, as well as its benefits and practical uses. We will also talk about the future implications of this theory and how it might alter how we perceive information transmission and compression.
It all starts out with a quote by Riemann, which says:
"Two sets that contain the same number of elements can be interpreted as two points of view that observe the same phenomenon"
Before we get into how this simple quote inspired the theory, lets first look at the foundations of information theory as developed by Claude Shannon.
Information Theory
Information theory is a field of study that deals with the representation, transmission, and processing of information. It was developed by Claude Shannon in 1948 in his famous article "A Mathematical Theory of Communication". One of the fundamental aspects of information theory is data compression. Data compression refers to the process of reducing the size of data while being able to recover the original data. The goal of compression is to maximize the efficiency of the information transmission process.
Let's consider a realworld example to understand how data compression works in information theory.
Let's assume Alice wants to send a message to Bob, and they are both communicating over a channel with limited capacity, in order to make the most of this channel, Alice applies data compression to the message before sending it.
Lets say the message Alice wants to send is "AAAAAABBBCCC"
Our channel, having limited capacity also only has the ability to transmit two symbols 0
and 1
.
If we were to encode the message using ASCII format we get these codewords for each symbol:
 A  65 (1000001)
 B  66 (1000010)
 C  67 (1000011)
So to the original message would become: 100000110000011000001100000110000011000001100001010000101000010100001110000111000011
A total of 84 bits for this short message. One thing to also note is that Bob will not be able to decode the original message not knowing how it was encoded, so Alice needs to send a onetime message to Bob to let them know the decoding scheme, which in this case is ASCII.
This encoding scheme is independent of the message we are trying to transmit, therefore it is impossible to achieve compression here because all codewords will have the same length.
Can we do better than this? Yes and that's what Shannon tries to solve in his treatise, data compression aims to reduce the size of data while being able to recover the original.
When we encode a message we convert the symbols in that message to codewords, and also we develop the coding scheme that can convert the codewords back to the original message. If we want to transmit efficiently we need to represent each symbol in a message with the minimum possible codeword.
Instead of focusing on the specific message, Shannon's approach focuses on the probability distribution of the ensemble
An ensemble
The key insight of Shannon's approach to data compression is that by analyzing the probability distribution of the ensemble X, we are able to determine the most efficient way to represent the information contained in the sequence. This is done by assigning shorter codewords to the symbols that occur more frequently in the message, and longer codewords to the symbols that occur less frequently.
In our original message "AAAAAABBBCCC"
we can see that A
occurs more frequently, followed by B
and C
. So it makes sense to assign a shorter codeword to A
.
We can then do this:
 A  0
 B  10
 C  11
The encoded message would now become 000000101010111111
(18 bits), but now we would need to transmit the coding scheme along with the encoded message whenever we need to send a message. By exploiting redundancies in the message and using messagedependent encoding schemes, Shannon's approach to data compression is able to achieve a more efficient representation of the information contained in the sequence.
Can we do better than this? Well according to Shannon we can't.
For the message Alice tries to send "AAAAAABBBCCC"
:
The probabilities for each symbol is:
 A  6/12 (0.5)
 B  3/12 (0.25)
 C  3/12 (0.25)
We define the ensemble
The entropy of an ensemble
Entropy sets the theoretical limit on the minimum number of bits necessary to represent the information in a sequence. This means that no encoding scheme can achieve a compression ratio lower than the entropy of the ensemble
According to the source coding theorem,
For example, let's consider the message "AAAAAABBBCCC"
. The entropy of the source can be calculated as:
Since the length of this message is
We define the information content (used to measure the amount of information contained in message) of our message as follows:
This means that, in theory, we cannot represent this message using fewer than
But that is where set shaping theory comes in. Set shaping theory takes the concept of data compression a step further by not only considering the probability distribution of the ensemble, but also the structure of the set itself.
Set Shaping Theory
Set shaping explores the idea of finding a better way to encode messages. In this theory, the goal is to find a set of messages, which on average have a lower entropy than the original set, so that they can be encoded more efficiently.
The Set Shaping theory allows us to improve data compression and make the transmission of messages more efficient by finding a subset of all possible messages of a certain length and alphabet with a lower average coding limit. By finding this subset, a bijective function can be created that transforms the original message into a new message with a lower entropy.
Given an ensemble
The bijective function
If we look at the ensemble that generated our message "AAAAAABBBCCC"
:
Set Shaping Theory tells us that we can find a subset of messages in
According to Riemann's intuition, each sequence in the set
So we need to ask, what kind of sequences should the new set
To demonstrate this, lets take
So we get all possible sequences generated by the ensemble
While it may seem that increasing the size of our message will also increase the entropy of the sequences, it turns out that this is not the case. In fact we notice an unexpected result, when
When we sort our new set, we would be able to map our message "AAAAAABBBCCC"
to a new sequence in the set "BAAAAAAABAAAA"
(we can practically do this by getting the index of our original message in
If we measure the information content of this new sequence we find it to be less than our original message. And it can be encoded more efficiently since we could just use an encoding scheme as simple as:
 A  0
 B  1
Then our encoded message would become 1000000010000
(13 bits)
This is a much better result than what we got with Shannon's approach, where we needed
This process of mapping our message to a new message can be reversed and we don't need to transmit the mapping function since it isn't dependent on the message we are trying to send.
Let's demonstrate this with code:
import math
import itertools
# compute the information content of a given string
def I(S):
N = len(S)
prob = [S.count(c)/N for c in string] # probability of each symbol in the string
return round(sum([math.log2(p) for p in prob]), 2)
# generate all possible sequences of length N having symbols A
def generate_sequences(A, N):
seqs = [''.join(i) for i in itertools.product(A, repeat = N)]
return seqs
# example
A = "ABC" # alphabet of our message
N = 12 # length of the message
K = 1 # shaping order
X = generate_sequences(A, N) # contains our original message > (X^N)
Y = generate_sequences(A, N+K) # all possible sequences of length N+K > (Y^{N+K})
Y.sort(key=lambda x: I(x)) # sort in increasing order of entropy
M = 'AAAAAABBBCCC' # our original message
print(f"Original message: {M} (information content = {I(M)})")
# map our original message to a sequence in Y
NM = Y[X.index(M)] # new message
print(f"Transformed message: {NM} (information content = {I(NM)})")
Original message: AAAAAABBBCCC (information content = 18.0)
Transformed message: BAAAAAAABAAAA (information content = 8.05)
And if we wanted to get our original message we can simply do:
M = X[Y.index(NM)]
print(f'Original message: {M}')
Original message: AAAAAABBBCCC
Well you could make the case that we just got lucky with our chosen message, in order to verify this we can run a simulation to see the percentage of messages in
count = 0
for i in range(len(X)):
M = X[i]
NM = Y[i]
if I(NM) < I(M): # entropy of transformed message is less than original message
count += 1
print(f"Probability: {count/len(X) * 100}")
Probability: 54.17798024616091
And this depends on the ensemble
A  N  Pr 

40  80  69% 
50  100  72% 
60  120  80% 
500  1000  88% 
Pr is the probability that the Huffman encoding of the transformed message has a length less than than the information content (coding limit) of the original message.
By using this method, it is possible to compress data using fewer bits than would otherwise be necessary. This could lead to more efficient storage and transmission of data, potentially leading to faster and more efficient communication.
Consequences of the Set Shaping Theory on Shannon's theorem
the result obtained has no implication on Shannon's first theorem but has profound implications on the perception and use of Shannon's first theorem  (Dixon, J. K., [6])
In order for Shannon to define a theory for information transmission, he created hypotheses that would allow him to simplify the mathematical treatment of the problem

Shannon's first hypothesis suggests that instead of transmitting a message, the source that generated it should be encoded and transmitted instead. This concept shifts the focus from the message to the source itself.

The second hypothesis is to assume that the encoder and the decoder both know the information describing the source. This simplifies the mathematical treatment, but it is risky because the decoder will already have some knowledge of the message they are supposed to decode. Therefore, the message must be associated with a source that is not completely independent from the message and that cannot be obtained solely by the decoder.
Shannon's first theorem is often used to represent the general form of data transmission without information loss (lossless encoding), but this assumption is incorrect. Set shaping theory addresses this problem in its most general form, by considering a message transmission process whereby the receiver has no information about the message being transmitted. This is done by representing the message with a unique encoding and a list of codewords. However because Shannon's first theorem is based on certain hypotheses it is not representative of the general problem of data transmission, and cannot be used to derive an absolute limit.
Conclusion
The set shaping theory tells us that we can create a bijective function that transforms our original message into a new message with a lower entropy, making it possible to efficiently encode the message. This technique can be used to improve data compression and make the transmission of messages more efficient.
The practical application of set shaping theory has important implications for the perception and application of Shannon's first theorem. While Shannon's theorem is a subproblem of the more general problem of data transmission, the set shaping theory addresses the problem in its most general form. The application of set shaping theory reveals the limitations of using Shannon's theorem to define the absolute encoding limit for lossless data transmission.
The future implications of set shaping theory are farreaching and will most likely change how we perceive information transmission and compression. As set shaping theory evolves and matures, it will most likely lead to new and innovative approaches to data compression and information transmission, allow for greater levels of efficiency and optimization.
References
 C.E. Shannon, “A Mathematical Theory of Communication”
 C. Schmidt, A. Vdberg, A. Petit, “Practical applications of Set Shaping Theory in Huffman coding”
 Kozlov, S. “Introduction to Set Shaping Theory”
 Dixon, J. K. “Set Shaping Theory (the future of information theory)”
 Koch, A. “As one Riemann idea, it is revolutionizing information theory (Set Shaping Theory, Entropy coding)”
 Dixon, J. K. “Consequences of the practical application of set shaping theory on Shannon's first theorem”)