Also, look! A blog!
As the IRC chat regulars and readers of my last blog know, I'm developing a language. Some of you think I'm crazy! But the Banana God tells me YOU are the crazy ones! Wahahahaha!The serious issue of mental health aside, I've kept this project a little quiet - not really talking about it or getting into my goals and plans for it. Part of the reason I've been so quiet is that I have a history of starting things and never finishing them. I always get distracted by something else; my hard drive is littered with old projects in a half-dozen languages, all gathering metaphorical dust. I've made A* and game engines in everything from Actionscript to Objective-C, but its been years since I 'published' anything outside of for-someone-else work. I didn't want to talk about a project I wasn't sure I was going to follow through on.Then, I realized I simply don't care about making content - I just want to make things - engines, frameworks, tools. And what is a programming language but a tool? I've spent the last four months dripping knowledge about language parsing and virtual machines into my mind. There have been nights where I could not sleep because my mind kept iterating on some small facet, wouldn't stop pulling on a thread of an idea until I'd laid bare the consequences. It's the cliche of the mad artist, unable to stop himself from composing - but the composition is of logic, not music. There is a strange beauty in the carefully placed domino sets that we call interpreters and compilers - and that is exactly what they are. Complex, for sure, but they only do what you tell them to do.So I'm making a language, and its name is 'Firefly'. I have reasons, and I almost called it 'Serenity'. Interestingly, the TV show is not the origin of either possibility.The language is just a part of a set of things I'm building, but it is the cornerstone, the foundation of the project. It is my magnum opus. I've become fascinated with the self-hosting and bootstrapping of compilers and interpreters. Its an idea that just feels so right, as if it were the satisfying *click* of the last lego piece snapping into place. It's the programmer's version of origami - "see my program, watch it unfold!" You use a simpler grammar or restricted subset of a language to write a compiler that you then use to make a more complicated version of your language. There are several benefits to doing this as well, chiefly that you can build your language piecemeal as well as the compiler being a test of itself which is really cool.One of the most common pieces of feedback I've received is something along the lines of 'how does this improve on existing things?' In other words,I know about the XKCD comic.But sometimes, you can make valid improvements, even if they are subtle. Sometimes, that 15th standard is good enough to switch to.So what am I trying to improve upon?Mostly, Python.Python has so many itty bitty teensy weensy annoying quirks, mostly due to the whole 2.x/3.x schism. And "Argh!" they are everywhere. Things like string formatting, the `except:` syntax being screwy (looking at you 2vs3), bitwise manipulation decorators and decorator generators being just a wee bit more annoying to mess with than they should be, and the god-fucking-awful module management. I like Python a lot, but sometimes it feels that there's just a few things missing, and I want to remedy this. Basically, I want to bring Ruby's 'principal of least astonishment' to a more compact and syntax-flexible Python. I am also handling things like scope differently.
Python is like this, only you're stubbing your toe on some stupid legacy syntax.Also Python has a GIL. I know some libraries for and forks of Python to deal with this, but *stubs toe before he can complete the sentence*"Argh!"
Now for some language snippets! Here, have a Fizzbuzz test!
-- Firefly fizzbuzz test
for n in 1...100:
if n % 3 == 0:
print('Woof' if n % 5 == 0 else 'Fizz')
elif n % 5 == 0:
print('Buzz')
else:
print(n)
(*module 'interpreter' -> interpreter
body=[
(for-loop
var=(name 'n')
iterable=(irange
first=(literal
value=(int '1')
)
second=(literal
value=(int '100')
)
)
body=[
(branch
condition=(infix '=='
left=(infix '%'
left=(name 'n')
right=(literal
value=(int '3')
)
)
right=(literal
value=(int '0')
)
)
true-branch=[
(call
target=(name 'print')
args=[
(ternary-if
first=(literal
value=(string 'Woof')
)
second=(infix '=='
left=(infix '%'
left=(name 'n')
right=(literal
value=(int '5')
)
)
right=(literal
value=(int '0')
)
)
third=(literal
value=(string 'Fizz')
)
)
]
)
]
false-branch=(branch
condition=(infix '=='
left=(infix '%'
left=(name 'n')
right=(literal
value=(int '5')
)
)
right=(literal
value=(int '0')
)
)
true-branch=[
(call
target=(name 'print')
args=[
(literal
value=(string 'Buzz')
)
]
)
]
false-branch=[
(call
target=(name 'print')
args=[
(name 'n')
]
)
]
)
)
]
)
]
)
12Fizz4BuzzFizz78FizzBuzz11Fizz1314Woof1617Fizz19BuzzFizz2223FizzBuzz26Fizz2829Woof3132Fizz34BuzzFizz3738FizzBuzz41Fizz4344Woof4647Fizz49BuzzFizz5253FizzBuzz56Fizz5859Woof6162Fizz64BuzzFizz6768FizzBuzz71Fizz7374Woof7677Fizz79BuzzFizz8283FizzBuzz86Fizz8889Woof9192Fizz94BuzzFizz9798FizzBuzz
I had it spit out HTML, of which this is a screenshot! Because lazy and couldn't remember how to embed HTML into 64digits! Also this is the snippet from the last blog!
Congratulations on working on a programming language! I love everything about programming languages and love to see other people like it too. Back at uni I took a whole bunch of courses on it (wrote a compiler) and then taught several quarters an interpreters class. It seems to be a pretty popular topic lately, god, everybody and their pet is making a new programming language these days.
Making a language looks simple at first. Data structures look simple enough, parsing algorithms are well known, ditto for a whole bunch of algorithms for analysis, code generation and optimization. It looks as if it was just a matter of designing the syntax and plugging all the pieces together. I worry that this apparent simplicity makes it all too easy to just whip out a new language that just grabs and tweaks the syntax of existing languages without actually innovating about the way we write and reason about programs. It also makes it easy to pretend to be innovating by just piling up a bunch of smaller language features.I say all that because that's exactly what I did last month. I was tired of Python's dynamic types, found no other language that satisfied me, so I decided to write a new one just for the hell of it. In the end I ended up scrapping it because I realized I was doing nothing but Python with static types, and pile of smaller features I grabbed from other languages. I figured, if I'm just going to repackage existing concepts, what's the point? What am I actually bringing to the table? And that took out all the fun from it for me…I'm still looking for that 'thing', that new concept that makes me think 'wow, that's an interesting way of doing things'.Sorry for turning this comment into a miniblog…So, are you familiar with LLVM?My thoughts on apparent simplicity:
If it looks simple, and is, it is simple.If it looks simple, and isn't, it is complex.If it looks complex, it could just be complex, but it is more likely complicated.Languages are complex. They are information-dense creatures, some of them succinct (Ruby and Python), and others are absolute beasts (C++). Minimalism and density is part of the mindset of my language. Everything that I consider as a feature, any integration must feel natural and not interrupt the design of the language, nor must it create clutter. Any further syntax sugar can later be implemented by extending the parser from within the language. That's the point of meta-programming. Part of my interests lie in exploring language-oriented programming with this project, as that sort of thing could be incredibly useful for me.That being said, I'm trying to avoid making my project into a frankenlanguage. I'm focused on getting the simplest viable syntax up and running for the virtual machine. For instance, I'm currently absolutely agonizing over how I want to handle scope syntaxwise. It is one of the few real remaining puzzle pieces before I can lock it all down and begin writing the parser in C.To answer your end-of-comment question - I'm a little familiar with LLVM as it was part of my research on virtual machines. I focused more on LuaVM/YARV as they were more what I was looking for. Plus, the temptation of implementing a VM myself was impossible to say no to :PMind explaining that AST? Would've assumed textual AST to be more minimalist lisp looking
This also allows you to be creative in your execution of the language. Will it be self-compiling or need a compiler/interpreter? I know you already pretty much stated that you have a compiler that essentially tested itself upon use, but it's a thought.
I enjoy the theory and ideas behind computer science, but I generally just use programming as a means to an end - I would NOT enjoy writing my own language. :P@s Its a bit more hefty than a standard AST I suppose, though for a reason. I'm using a pratt parser which affords me a lot of flexibility mid-parse - it means I can do fiddly stuff like recasting, named children, and all sorts of other bits of logic that are more difficult to do with BNF/EBNF.
These are just the formatting rules for when I print the AST, which doesn't contain every bit of information but would be enough for reconstruction. Its not exactly perfect (missing commas between items in a list, no quotes around namespace items, other symptoms of a work-in-progress)A symbol must have an id, and may have a value.