I feel like use of the term "homoiconic" is misguided. It seems like an attempt to turn an incidental attribute of some Lisps into a sort of Platonic ideal. I don't think that's helpful.
I think the property being discussed is more understandable if you just describe it simply: in some Lisps (notably Common Lisp and its direct ancestors) source code is not made of text strings; it's made of symbolic expressions consisting of cons cells and atoms.
The text that you see in "foo.lisp" isn't Lisp source code; it's a serialization of Lisp source code. You could serialize it differently to get a different text file, but the reader would turn it into the same source code. The actual source code is distinct from any specific text serialization of it.
We write programs in the form of text serialization because the reader will convert it for us, and because it's easier and more rewarding to write good and comfortable text editors than to write good and comfortable s-expression editors.
There are of course text editors and addons that attempt to make text editing act more like s-expression editing, but I don't know of many actual s-expression editors. The canonical one, I suppose, is Interlisp's DEdit, which operates on actual s-expression data structures in memory.
From this point of view, what people mean by "homoiconic" is just that source code is all made of convenient arrangements of standard data structures defined by the language that can be conveniently operated on by standard functions defined by the language.
Or, to put it another way, "homoiconic" basically means "convenient", and "non-homoiconic" means "inconvenient".
That's all there is to it, really, but it has far-reaching consequences. In a Lisp designed this way, basic manipulation of source code is trivially easy to do with operations that are all provided for you in advance by the language itself. That makes all sorts of code-processing tools exceptionally easy to write.
That's not true in most languages. Take C, for example: sure, a C compiler parses text and turns it into an abstract syntax tree before processing it further in order to eventually yield executable machine code. Is all of that machinery part of the language definition? Can you count on those APIs and data structures to be exposed and documented by any arbitrary C compiler?
No.
In that sense, any programming language could be made "homoiconic" if enough people wanted it. They manifestly don't, because most languages aren't.
But some programmers prefer working with a language implementation that makes it so very easy to manipulate code. So that's what we use.
It's not some Platonic ideal of language design, but it doesn't need to be. It's a pragmatic design decision made by certain implementors in a certain lineage, and it has consequences that a certain fraction of programmers find congenial. Congenial enough that it makes some of us prefer to work with languages and implementations that work that way.
Nice description; it makes me wonder if there are any languages in which code and data have different serialisations, but these are isomorphic in the sense that code and data can be turned into each other losslessly? (we ought to be able to round trip between the two: code->data->code and data->code->data ought to produce equivalent structures to what they started from)
What would prevent us from being used the "wrong" way around: code being written with a data notation and vice versa. When the system prints data it could choose one or the other, based on some educated guess as to whether it is cold or data.
I think the property being discussed is more understandable if you just describe it simply: in some Lisps (notably Common Lisp and its direct ancestors) source code is not made of text strings; it's made of symbolic expressions consisting of cons cells and atoms.
The text that you see in "foo.lisp" isn't Lisp source code; it's a serialization of Lisp source code. You could serialize it differently to get a different text file, but the reader would turn it into the same source code. The actual source code is distinct from any specific text serialization of it.
We write programs in the form of text serialization because the reader will convert it for us, and because it's easier and more rewarding to write good and comfortable text editors than to write good and comfortable s-expression editors.
There are of course text editors and addons that attempt to make text editing act more like s-expression editing, but I don't know of many actual s-expression editors. The canonical one, I suppose, is Interlisp's DEdit, which operates on actual s-expression data structures in memory.
From this point of view, what people mean by "homoiconic" is just that source code is all made of convenient arrangements of standard data structures defined by the language that can be conveniently operated on by standard functions defined by the language.
Or, to put it another way, "homoiconic" basically means "convenient", and "non-homoiconic" means "inconvenient".
That's all there is to it, really, but it has far-reaching consequences. In a Lisp designed this way, basic manipulation of source code is trivially easy to do with operations that are all provided for you in advance by the language itself. That makes all sorts of code-processing tools exceptionally easy to write.
That's not true in most languages. Take C, for example: sure, a C compiler parses text and turns it into an abstract syntax tree before processing it further in order to eventually yield executable machine code. Is all of that machinery part of the language definition? Can you count on those APIs and data structures to be exposed and documented by any arbitrary C compiler?
No.
In that sense, any programming language could be made "homoiconic" if enough people wanted it. They manifestly don't, because most languages aren't.
But some programmers prefer working with a language implementation that makes it so very easy to manipulate code. So that's what we use.
It's not some Platonic ideal of language design, but it doesn't need to be. It's a pragmatic design decision made by certain implementors in a certain lineage, and it has consequences that a certain fraction of programmers find congenial. Congenial enough that it makes some of us prefer to work with languages and implementations that work that way.