[aklug] Re: [OT] Re: random bits vs random Hex

From: Doug Davey <doug.davey@gmail.com>
Date: Wed May 29 2013 - 13:17:06 AKDT

Bryan is right, if you have saved the binary data poorly, in an odd way,
there is potential for fluff that would be easily be trimmed by a
compression algorithm.

As for pattern recognition, a random stream will hopefully have no
patterns, so further compression won't work. If it does then the stream
wasn't random.

On Wed, May 29, 2013 at 1:12 PM, <bryanm@acsalaska.net> wrote:

> On Wed, May 29, 2013 12:26 pm, Arthur Corliss wrote:
> > On Wed, 29 May 2013, bryanm@acsalaska.net wrote:
> >
> >> I don't know enough to address entropy, but I can say that changing
> >> from binary triplets to decimal digits leaves some of the pattern space
> >> unused (i.e. 8 and 9). In other words, the same data takes up more
> space,
> >> leaving open the possibility for an algorithm to compress it back to
> close
> >> to its original size.
> >
> > If this were true that we'd be able to get great compression on any data,
> > random or not. By your logic, compressing binary data should be awesome,
> > since there's only two choices: 1 or 0.
>
> That's not what I'm saying at all. I'm talking about unused pattern space.
> As an extreme example, imagine representing binary data by letting each
> *byte* represent either a 0 or a 1. Obviously, there would be tremendous
> opportunity for compression. The same thing happens (to a lesser degree)
> in my binary triplet -> decimal digit conversion. In each case, there
> are some possible values for each data element that will *never* be used.
>
> I'm speaking mathematically, and don't claim to know how to implement an
> algorithm to take advantage of this property.
>
> > You can't cheat around the basic problem of pattern recognition by
> changing
> > how the same data is presented. Choosing to evaluate smaller chunks of
> data
> > is a zero sum game because you either have to inflate your translation
> maps
> > or look for longer pattern strings than you would in larger chunks. In
> the
> > end, it's the repeatability of data chunks, regardless of presentation,
> > which will determine compressability.
>
> The idea of pattern recognition for the purpose of data compression
> intrigues me, though I've never fully researched the details.
>
> --
> Bryan Medsker
> bryanm@acsalaska.net
>
> ---------
> To unsubscribe, send email to <aklug-request@aklug.org>
> with 'unsubscribe' in the message body.
>
>

---------
To unsubscribe, send email to <aklug-request@aklug.org>
with 'unsubscribe' in the message body.
Received on Wed May 29 13:17:32 2013

This archive was generated by hypermail 2.1.8 : Wed May 29 2013 - 13:17:32 AKDT