Language Design

Posted by Rusky on Feb. 21, 2011, 7:09 p.m.

So, before I begin, how can I update my submissions? Do I just resubmit them or is that not implemented (yet) in v3?

C is a relatively nice high-level assembly language. C++ is the kitchen sink sitting on top of that, so you can do lots of things with it but it can be painful. I'm sure this will be disputed, so I'll get to it in a minute. Java is a simplification of C++, but in some places it oversimplified like with "a.setFoo(a.getFoo() + 3)." Scala and C# move back toward C++, but they're both still on VMs and designed for high level applications.

That leaves us with few options for low-level and/or performance-sensitive projects like game engines, operating system kernels, embedded applications and almost anything else fun. Contemporary, well-supported, easily-accessible and well-enough designed languages include C, C++ and… well, anything else fails to meet our requirements.

It's obviously possible to write large projects in C (e.g. the Linux kernel) but it's not the best-designed of languages. All the little things like type conversions, declaration syntax, scope and namespaces and verbosity make C more painful than it could be. Then larger-scale issues like polymorphism (ad-hoc/inclusion and parametric), higher-level functions and type safety go out the window.

C++ almost had to be based on C to become popular, so it inherits a lot of its problems. In addition, it's just a big kitchen sink bolted on top, so it lacks the elegance of languages like Smalltalk that a lot of its ideas come from. Larger projects have to enforce restrictions on different allowed subsets of features (Google eliminates exceptions, RTTI and multiple inheritance in the interest of compatibility, less fragile code and consistency).

I would like to design a systems-level programming language that can be as efficient as C but can also be much more expressive and easy to use than C++. Rather than designing by committee to build a checklist of features, it would need to have a unifying idea, like Lisp, Smalltalk, Haskell or Go. The systems-level programming world needs something like that.

Rather than starting from C or C++, I'll start from scratch. The goal will be to use a small, easily-learned feature set that will allow a large range of simple, concise designs. Things like Lisp macros, Haskell type classes, Smalltalk/Objective-C message passing, etc. Haskell comes closer to this than Smalltalk, in that its type system is much stronger than C's (although it would need some tweaking to allow the required control over memory layout, etc.), but still allows all the uses of C++ virtual function calls, CLOS multimethods and OOP-style namespaces.

Anyone have any stories of things they'd have liked to be able to express in their favorite programming language but couldn't without resorting to verbosity, repetition or some ugly hack or other?

Comments

Rusky 13 years, 10 months ago

I like a lot of ideas from D. They got rid of the stupid .h model and just get symbols out of object files, like pretty much every language not based on C. They allow forward references as well, so you can write mutually recursive definitions and order declarations sanely. They fixed at least some (not sure about all) declaration syntax. I also like the way they offer a lot of the language at compile time (e.g. static if) rather than using #if gunk. Overall, D looks like a cleaner C++ (half the time by breaking C compatibility) that I might like to use if it were more popular. However, I'm not sure its features justify using it with its relatively small community.

Besides type classes, I've been thinking about dependent types. The idea is to have a type include (depend on) some value. I'm not sure how far I'd want to take this for a low-level language, but for sure it could be used for things like compile-time array bounds checking. For example, when declaring an array of ints you might do something like this:

int[3] array;

That by itself doesn't offer anything (semantically) over C, but because the number 3 is included in the type you can statically enforce that you don't access values outside the array. When I just have some int I pulled out of user input or a file or anywhere that doesn't give me any information about its range, I get a compiler error when doing something like this:

int i = getMeAnInt();

int foo = array;

Instead, you need to check that i is in range:

int i = getMeAnInt();

if (i < lengthof(array))

int foo = array;

That might sound tedious but you really want to check anyway in most situations, and this way you can statically eliminate unnecessary checks. This is different from languages like Java and C# that just always check at runtime, making arrays somewhat heavier (messing up layout with their length) and slower (always checking bounds even when they're provably correct at compile time). For example, a for loop already checks the int's range, so you don't need another runtime check:

for (int i = 0; i < lengthof(array); i++) {

doSomethingWith(array);

}

These examples probably don't have the best syntax- all the ints have some extra information included in their type and the notation doesn't show that. Something like int { n < 3 } maybe? That's a little gross too though. However, one part of the notation I do like would be for passing arrays as arguments. In C you just get a pointer and you have to explicitly pass the length, and in C# et. al. you have to pass the length. Using my version of dependent typing, you might say something like this:

double[n] transform(int[n] array) {

double[n] ofTheJedi;

for (int i = 0; i < n; i++) {

ofTheJedi = array * 3 - 2;

}

return ofTheJedi;

}

// …

int[3] array;

double[n] = transform(array);

The definition of transform has an extra n variable floating around- it's the size of the argument array. It's also used in the return value, which means the function returns the size of its resulting array as well as the values themselves. The scoping here is a bit wonky, since it's not clear who owns the array ofTheJedi or where it's allocated, and it's also not clear how the returned size would work in more complicated functions, but it's enough to get the general idea of implicitly/almost-implicitly passing and returning array sizes.

Undeadragons 13 years, 10 months ago

I like the passing and returning of arrays, treating the case as more or less basic types themselves, or more or less a native extension of functions. That in itself would actually be quite easy to abstract over a pre-existing programming language, but I digress, the only question there would be how it is implemented, into which scope the passing and return arrays belong, how they are allowed to be treated.

Fuck me, this all really whets the proverbial appetite, this could be a fantastic programming language.

Rusky 13 years, 10 months ago

One possibility for array ownership is arrays as first-class values. It's possible now that they have a size, and while a naive implementation would wind up copying the array, it would be possible to use return value optimizations (return by value by simply constructing the return value in a position given by the caller) and reference types to solve that problem, while still keeping the possibility when you need it.

Undeadragons 13 years, 10 months ago

Just looking at D, I like how much nicer the templates look compared to C++. I haven't had that good a look at it yet though.

Still there appears to be some stuff that is a bit iffy with it, but it certainly looks like a step in the right direction compared to C++.

Also something I like in D that one usually only sees in scripting languages, is the foreach statement, it's nice and also reduces to more or less a for statement with a few bounds checks.

Rusky 13 years, 10 months ago

C++0x has a foreach statement too, but I don't quite like the idea of a dedicated language construct (although that might just be vague dislike of change). In a scripting language I would most definitely replace foreach with a .each method that takes a closure, but in a C++-like language closures are not something you want to just toss around. Of course they can usually be optimized away in a situation like this, but hrm.

Speaking of, higher-order functions are integral to Haskell-style programming, which is where type classes originated, but the carrying state around implicitly (and the overhead of indirect calls) is not something you want to do in a C-like. I'm not sure how to solve that, though. A macro system could explicitly inline those functions, but that gets rid of the runtime polymorphism of type classes. Should I make it available and just hope it doesn't turn the language into something people won't use for systems programming?

sirxemic 13 years, 10 months ago

As I am not 100% sober I didn't get all that's been said, so I might be throwing in the following irrelevantly:

Haskell is a functional programming language and you keep mentioning it. Are you thinking of an imperative programming language, a functional one, or a hybrid?

Quietus 13 years, 10 months ago

lmao xemic, that's funny to read when it's 2PM over here =P

sirxemic 13 years, 10 months ago

It was 8:49pm here when I said that, and I just edited it. Is it still funny to read?

Quietus 13 years, 10 months ago

yes lol. =]

i just realized how out of place i am here given that i only barely know GM <_<

Rusky 13 years, 10 months ago

Haskell is a lazy, purely functional, highly theory-based language. It's almost the furthest thing away from a C-like language I can think of, but it's full of ideas that are more or less independent of functional programming. The goal is to take those ideas and put them in a lower level language that's primarily imperative, as it's designed for popular microprocessors. However, that doesn't mean it won't include a few functional aspects as well like better first-class functions and more composable syntax (e.g. no difference between statements and expressions).