Yeah, it's not compression in the sense of compressing data. Kind of compression in that it takes less resource to encode general rules than to remember the answer for everything.
The paper said was that the most efficient bits of the network were those that encoded rules rather than remembered data. Somehow those bits gradually took over from the less efficient parts. I'll have to dig around, can't seem to find it right now.
The paper said was that the most efficient bits of the network were those that encoded rules rather than remembered data. Somehow those bits gradually took over from the less efficient parts. I'll have to dig around, can't seem to find it right now.