I find the direction of zig confusing. Is it supposed to be a simple language or a complex one? Low level or high level? This feature is to me a strange mix of high and low level functionality and quite complex.
The io interface looks like OO but violates the Liskov substitution principle. For me, this does not solve the function color problem, but instead hides it. Every function with an IO interface cannot be reasoned about locally because of unexpected interactions with the io parameter input. This is particularly nasty when IO objects are shared across library boundaries. I now need to understand how the library internally manages io if I share that object with my internal code. Code that worked in one context may surprisingly not work in another context. As a library author, how do I handle an io object that doesn't behave as I expect?
Trying to solve this problem at the language level fundamentally feels like a mistake to me because you can't anticipate in advance all of the potential use cases for something as broad as io. That's not to say that this direction shouldn't be explored, but if it were my project, I would separate this into another package that I would not call standard.
I like Zig a lot, but something about this has been bothering me since it was announced. I can't put my finger on why, I honestly don't have a technical reason, but it just feels like the wrong direction to go.
Hopefully I'm wrong and it's wildly successful. Time will tell I guess.
What exactly makes it unpredictable? The functions in the interface have a fairly well defined meaning, take this input, run I/O operation and return results. Some implementation will suspend your code via user-space context switching, some implementation will just directly run the syscall. This is not different than approaches like the virtual thread API in Java, where you use the same APIs for I/O no matter the context. In Python world, before async/await, this was solves in gevent by monkey patching all the I/O functions in the standard library. This interface just abstracts that part out.
i think you are missing that a proper io interface should encapsulate all abstractions that care about asynchrony and patterns thereof. is that possible? we will find out. It's not unreasonable to be skeptical but can you come up with a concrete example?
> As a library author, how do I handle an io object that doesn't behave as I expect
you ship with tests against the four or five default patterns in the stdlib and if anyone wants to do anything substantially crazier to the point that it doesnt work, thats on them, they can submit a PR and you can curbstomp it if you want.
> function coloring
i recommend reading the function coloring article. there are five criteria that describe what make up the function coloring problem, it's not just that there are "more than one class of function calling conventions"
fwiw i thought the previous async based on whole-program analysis and transformation to stackless coroutines was pretty sweet, and similar sorts of features ship in rust and C++ as well
It's more about allowing a-library-fits-all than forcing it. You don't have to ask for io, you just should, if you are writing a library. You can even do it the Rust way and write different libraries for example for users who want or don't want async if you really want to.
Yeah, these kinds of "orthogonal" things that you want to set up "on the outside" and then have affect the "inner" code (like allocators, "io" in this case, and maybe also presence/absence of GC, etc.) all seem to cry out for something like Lisp dynamic variables.
It seems to me that async io struggles whenever people try it.
For instance it is where Rust goes to die because it subverts the stack-based paradigm behind ownership. I used to find it was fun to write little applications like web servers in aio Python, particularly if message queues and websockets were involved, but for ordinary work you're better off using gunicorn. The trouble is that conventional async i/o solutions are all single threaded and in an age where it's common to have a 16 core machine on your desktop it makes no sense. It would be like starting a chess game dumping out all your pieces except for your King.
Unfashionable languages like Java and .NET that have quality multithreaded runtimes are the way to go because they provide a single paradigm to manage both concurrency and parallelism.
> Unfashionable languages like Java and .NET that have quality multithreaded runtimes are the way to go because they provide a single paradigm to manage both concurrency and parallelism.
At the cost of not being able to actually provide the same throughput, latency, or memory usage that lower level languages that don't enforce the same performance pessimizing abstractions on everything can. Engineering is about tradeoffs but pretending like Java or .NET have solved this is naiive.
> At the cost of not being able to actually provide the same throughput, latency, or memory usage
Only memory usage is true with regards to Java in this context (.NET actually doesn't offer a shared thread abstraction; it's Java and Go that do), and even that is often misunderstood. Low-level languages are optimised for minimal memory usage, which is very important on RAM-constrained devices, but is could be wasting CPU on most machines: https://youtu.be/mLNFVNXbw7I
This optimisation for memory footprint also makes it harder for low-level languages to implement user-mode threading as efficiently as high-level languages.
Another matter is that there are two different use cases for asynchronous constructs that may tempt implementors to address them with a single implementation. One is the generator use case. What makes it special is that there are exactly two communicating parties, and both of their state may fit in the CPU cache. The other use case is general concurrency, primarily for IO. In that situation, a scheduler juggles a large number of user-mode threads, and because of that, there is likely a cache miss on every context switch, no matter how efficient it is. However, in the second case, almost all of the performance is due to Little's law rather than context switch time (see my explanation here: https://inside.java/2020/08/07/loom-performance/).
That means that a "stackful" implementation of user-mode threads can have no significant performance penalty for the second use case (which, BTW, I think has much more value than the first), even though a more performant implementation is possible for the first use case. In Java we decided to tackle the second use case with virtual threads, and so far we've not offered something for the first (for which the demand is significantly lower).
What happens in languages that choose to tackle both use cases with the same construct is that in the second and more important use case they gain no more than negligible performance (at best), but they're paying for that with a substantial degradation in user experience.
It sounds like you’re disagreeing yet no case is made that throughout and latency isn’t worse.
For example, the best frameworks on TechEmpower are all Rust, C and C++ with the best Java coming in at 25% slower on that microbenchmark. My point stands - it is generally true that well written rust/c/c++ outperforms well written Java and .Net and not just with lower memory usage. The “engineering effort per performance” maybe skews to Java but that’s different than absolute performance. With rust to me it’s also less clear if that is actually even true.
First, in all benchmarks but two, Java performs just as well as C/C++/Rust, and in one of those two, Go performs as well as the low-level languages. Second, I don't know the details of that one benchmark where the low-level languages indeed perform better than high-level ones, but I don't see any reason to believe it has anything to do with virtual threads.
Modern Java GCs typically offer a boost over more manual memory management. And on latency, even if virtual were very inefficient and you'd add a GC pause with Java's new GCs, you'd still be well below 1ms, i.e. not a dominant factor in a networked program.
(Yes, there's still one cause for potential lower throughput in Java, which is the lack of inlined objects in arrays, but that will be addressed soon, and isn't a big factor in most server applications anyway or related to IO)
BTW, writing a program in C++ has always been more or less as easy as writing it in Java/C# etc.; the big cost of C++ is in evolution and refactoring over many years, because in low-level languages local changes to code have a much more global impact, and that has nothing to do with the design of the language but is an essential property of tracking memory management at the code level (unless you use smart pointers, i.e. a refcounting GC for everything, but then things will be really slow, as refcounting does sacrifice performance in its goal of minimising footprint).
Any gc pause is unacceptable if your goal is predictable throughput and latency
Modern gcs can be pauseless, but either way you’re spending CPU on gc and not servicing requests/customers.
As for c++,
std::unique_ptr has no ref counting at all.
shared_ptr does, but that’s why you avoid it at all costs if you need to move things around. you only pay the cost when copying the shared_ptr itself, but you almost never need a shared_ptr and even when you need it, you can always avoid copying in the hot path
Honestly, if these languages are only winning by 25% in microbenchmarks, where I’d expect the difference to be biggest, that’s a strong boost for Java for me. I didn’t realise it was so close, and I hate async programming so I’m definitely not doing it for an, at most, 25% boost.
I didn’t make the claim that it’s worth it. But when it is absolutely needed Java has no solution.
And remember, we’re talking about a very niche and specific I/O microbenchmark. Start looking at things like SIMD (currently - I know Java is working on it) or in general more compute bound and the gap will widen. Java still doesn’t yet have the tools to write really high performance code.
I honestly doubt any of the frameworks in that benchmark are using virtual threads yet. The top one is still using vert.x which is an event loop on native platform threads.
> Unfashionable languages like Java and .NET that have quality multithreaded runtimes are the way to go because they provide a single paradigm to manage both concurrency and parallelism.
First, that would be Java and Go, not Java and .NET, as .NET offers a separate construct (async/await) for high-throughput concurrency.
Second, while "unfashionable" in some sense, I guess, it's no wonder that Java is many times popular than any "fashionable" language. Also, if "fashionable" means "much discussed on HN", then that has historically been a terrible predictor of language success. There's almost an inverse correlation between how much a language is discussed on HN and its long-term success, and that's not surprising, as it's the less commonplace things that are more interesting to talk about. HN is more Vogue magazine than the New York Times.
> but for ordinary work you're better off using gunicorn
I'd like to see some evidence for this. Other than simplicity, IMO there's very little reason to use synchronous Python for a web server these days. Streaming files, websockets, etc. are all areas where asyncio is almost a necessity (in the past you might have used twisted), to say nothing of the performance advantage for typical CRUD workloads. The developer ergonomics are also much better if you have to talk to multiple downstream services or perform actions outside of the request context. Needing to manage a thread pool for this or defer to a system like Celery is a ton more code (and infrastructure, typically).
> async i/o solutions are all single threaded
And your typical gunicorn web server is single threaded as well. Yes you can spin up more workers (processes), but you can also do that with an asgi server and get significantly higher performance per process / for the same memory footprint. You can even use uvicorn as a gunicorn worker type and continue to use it as your process supervisor, though if you're using something like Kubernetes that's not really necessary.
> It seems to me that async io struggles whenever people try it.
Promises work great in javascript, either in the browser or in node/bun. They're easy to use, and easy to reason about (once you understand them). And the language has plenty of features for using them in lots of ways - for example, Promise.all(), "for await" loops, async generators and so on. I love this stuff. Its fast, simple to use and easy to reason about (once you understand it).
Personally I've always thought the "function coloring problem" was overstated. I'm happy to have some codepaths which are async and some which aren't. Mixing sync and async code willy nilly is a code smell.
Personally I'd be happy to see more explicit effects (function colors) in my languages. For example, I'd like to be able to mark which functions can't panic. Or effects for non-divergence, or capability safety, and so on.
Promises in JS are particularly easy because JS is single-threaded. You can be certain that your execution flow won't be preepted at an arbitrary point. This greatly reduces the need for locks, atomics, etc.
> Promises work great in javascript, either in the browser or in node/bun.
I can't disagree more. They suffer from the same stuff rust async does: they mess with the stack trace and obscure the actual guarantees of the function you're calling (eg a function returning a promise can still block, or the promise might never resolve at all).
Personally I think all solutions will come with tradeoffs; you can simply learn them well enough to be productive anyway. But you don't need language-level support for that.
> I can't disagree more. They suffer from the same stuff rust async does: they mess with the stack trace and obscure the actual guarantees of the function you're calling (eg a function returning a promise can still block, or the promise might never resolve at all).
These are inconveniences, but not show stoppers. Modern JS engines can "see through" async call stacks. Yes, bugs can result in programs that hang - but that's true in synchronous code too.
But async in rust is way worse:
- Compilation times are horrible. An async hello world in javascript starts instantly. In rust I need to compile and link to tokio or something. Takes ages.
- Rust doesn't have async iterators or async generators. (Or generators in any form.) Rust has no built in way to create or use async streams.
- Rust has 2 different ways to implement futures: the async keyword and impl Future. You need to learn both, because some code is impossible to write with the async keyword. And some code is impossible to write with impl Future. Its incredibly confusing and complicated and its difficult to learn it properly.
- Rust doesn't have a built in run loop ("executor"). So - best case - your project pulls in tokio or something, which is an entire kitchen sink and all your dependencies use that. Worst case, the libraries you want to use are written for different async executors and ??? shoot me. In JS, everything just works out of the box.
I love rust. But async rust makes async javascript seem simple and beautiful.
If you watched the video closely, you'll have noticed that this design parameterizes the code by an `io` interface, which enables pluggable implementations. Correctly written code in this style can work transparently with evented or threaded runtimes.
Really? Ordinary synchronous code calls an I/O routine which returns. Asynchronous code calls an I/O routine and then gets called back. That’s a fundamental difference and you can only square it by making the synchronous code look like asynchronous code (callback gets called right away) or asynchronous code look like synchronous code (something like async python which breaks up a subroutine into multiple subroutines and has an event loop manage who calls who.
I know that it depends on how much you disentangle your network code from your business logic. The question is the degree. Is it enough, or does it just dull the pain?
If you give your business logic the complete message or send it a stream, then the flow of ownership stays much cleaner. And the unit tests stay substantially easier to write and more importantly, to maintain.
I know too many devs who don't see when they bias their decisions to avoid making changes that will lead to conflict with bad unit tests and declare that our testing strategy is Just Fine. It's easier to show than to debate, but it still takes an open mind to accept the demonstration.
Project Loom makes Java in particular really nice, virtual threads can "block" without blocking the underlying OS thread. No callbacks at all, and you can even use Structured Concurrency to implement all sorts of Go- and Erlang-like patterns.
(I use it from Clojure, where it pairs great with the "thread" version of core.async (i.e. Go-style) channels.)
I haven’t actually seen it in the wild yet, just talked about in technical talks from engineers at different studios, but I’m interested in designs in which there isn’t a traditional main thread anymore.
Instead, everything is a job, and even what is considered the main thread is no longer an orchestration thread, but just another worker after some nominal set up between scaffolding enough threads, usually minus one, to all serve as lockless, work-stealing worker threads.
Conventional async programming relies too heavily on a critical main thread.
I think it’s been so successful though, that unfortunately we’ll be stuck with it for much longer than some of us will want.
It reminds me of how many years of inefficient programming we have been stuck with because cache-unfriendly traditional object-oriented programming was so successful.
One thing I like about the design is it locks in some of the "platforms" concepts seen in other languages (e.g. Roc), but in a way that goes with Zig's "no hidden control flow" mantra.
The downstream effect is that it will be normal to create your own non-posix analogue of `io` for wherever you want code to hook into. Writing a game engine? Let users interact with a set of effectful functions you inject into their scripts.
As a "platform" writer (like the game engine), essentially you get to create a sandbox. The missing piece may be controlling access to calling arbitrary extern C functions - possibly that capability would need to be provided by `io` to create a fool-proof guarantees about what some code you call does. (The debug printing is another uncontrolled effect).
Oh boy. Example 7 is a bit of a mindfuck. You get the returned string in both in await and cancel.
Feels like this violates zig “no hidden control flow” principle. I kinda see how it doesn’t. But it sure feels like a violation. But I also don’t see how they can retain the spirit of the principle with async code.
> Feels like this violates zig “no hidden control flow” principle.
A hot take here is that the whole async thing is a hidden control flow. Some people noticed that ever since plain callbacks were touted as a "webscale" way to do concurrency. The sequence of callbacks being executed or canceled forms a hidden, implicit control flow running concurrently with the main control logic. It can be harder to debug and manage than threads.
But that said, unless, Zig adds a runtime with its own scheduler and turns into a bytecode VM, there is not much it can do. Co-routines and green threads have been done before in C and C-like languages, but not sure how easily the would fit with Zig and its philosophy.
hidden control flow means no control flow occurs outside of function boundaries, keywords, short circuiting operators, or builtins. i believe there is a plan for a asyncresume and asyncsuspend builtins that show the actual sites where control flow happens.
It's worth noting that this is not async/await in the sense of essentially every other language that uses those terms.
In other languages, when the compiler sees an async function, it compiles it into a state machine or 'coroutine', where the function can suspend itself at designated points marked with `await`, and be resumed later.
In Zig, the compiler used to support coroutines but this was removed. In the new design, `async` and `await` are just functions. In the threaded implementation used in the demo, `await` just blocks the thread until the operation is done.
To be fair, the bottom of the post explains that there are two other Io implementations being planned.
One of them is "stackless coroutines", which would be similar to traditional async/await. However, from the discussion so far this seems a bit like vaporware. As discussed in [1], andrewrk explicitly rejected the idea of just (re-)adding normal async/await keywords, and instead wants a different design, as tracked in issue 23446. But in issue 23446 the seems to be zero agreement on how the feature would work, how it would improve on traditional async/await, or how it would avoid function coloring.
The other implementation being planned is "stackful coroutines". From what I can tell, this has more of a plan and is more promising, but there are significant unknowns.
The basis of the design is similar to green threads or fibers. Low-level code generation would be identical to normal synchronous code, with no state machine transform. Instead, a library would implement suspension by swapping out the native register state and stack, just like the OS kernel does when switching between OS threads. By itself, this has been implemented many times before, in libraries for C and in the runtimes of languages like Go. But it has the key limitation that you don't know how much stack to allocate. If you allocate too much stack in advance, you end up being not much cheaper than OS threads; but if you allocate too little stack, you can easily hit stack overflow. Go addresses this by allocating chunks of stack on demand, but that still imposes a cost and a dependency on dynamic allocation.
andrewrk proposes [2] to instead have the compiler calculate the maximum amount of native stack needed by a function and all its callees. In this case, the stack could be sized exactly to fit. In some sense this is similar to async in Rust, where the compiler calculates the size of async function objects based on the amount of state the function and its callees need to store during suspension. But the Zig approach would apply to all function calls rather than treating async as a separate case. As a result, the benefits would extend beyond memory usage in async code. The compiler would statically guarantee the absence of stack overflow, which benefits reliability in all code that uses the feature. This would be particularly useful in embedded where, typically, reliability demands are high and memory available is low. Right now in embedded, people sometimes use a GCC feature ("-fstack-usage") that does a similar calculation, but it's messy enough that people often don't bother. So it would be cool to have this as a first-class feature in Zig.
But.
There's a reason that stack usage calculators are uncommon. If you want to statically bound stack usage:
First, you have to ban recursion, or else add some kind of language mechanism for tracking how many times a function can possibly recurse. Banning recursion is common in embedded code but would be rather annoying for most codebases. Tracking recursion is definitely possible, as shown by proof languages like Agda or Coq that make you prove termination of recursive functions - but those languages have a lot of tools that 'normal' languages don't, so it's unclear how ergonomic such a feature could be in Zig. The issue [2] doesn't have much concrete discussion on how it would work.
Second, you have to ban dynamic calls (i.e. calls to function pointers), because if you don't know what function you're calling, you don't know how much stack it will use. This has been the subject of more concrete design in [3] which proposes a "restricted" function pointer type that can only refer to a statically known set of functions. However, it remains to be seen how ergonomic and composable this will be.
Zooming back out:
Personally, I'm glad that Zig is willing to experiment with these things rather than just copying the same async/await feature as every other language. There is real untapped potential out there. On the other hand, it seems a little early to claim victory, when all that works today is a thread-based I/O library that happens to have "async" and "await" in its function names.
Heck, it seems early to finalize an I/O library design if you don't even know how the fancy high-performance implementations will work. Though to be fair, many applications will get away just fine with threaded I/O, and it's nice to see a modern I/O library design that embraces that as a serious option.
> But it has the key limitation that you don't know how much stack to allocate. If you allocate too much stack in advance, you end up being not much cheaper than OS threads; but if you allocate too little stack, you can easily hit stack overflow.
With a 64-bit address space you can reserve large contiguous chunks (e.g. 2MB), while only allocating the minimum necessary for the optimistic case. The real problem isn't memory usage, per se, it's all the VMA manipulation and noise. In particular, setting up guard pages requires a separate VMA region for each guard (usually two per stack, above and below). Linux recently got a new madvise feature, MADV_GUARD_INSTALL/MADV_GUARD_REMOVE, which lets you add cheap guard pages without installing a distinct, separate guard page. (https://lwn.net/Articles/1011366/) This is the type of feature that could be used to improve the overhead of stackful coroutines/fibers. In theory fibers should be able to outperform explicit async/await code, because in the non-recursive, non-dynamic call case a fiber's stack can be stack-allocated by the caller, thus being no more costly than allocating a similar async/await call frame, yet in the recursive and dynamic call cases you can avoid dynamic frame bouncing, which in the majority of situations is unnecessary--the poor performance of dynamic frame allocation/deallocation in deep dynamic call chains is the reason Go switched from segmented stacks to moveable stacks.
Another major cost of fibers/thread is context switching--most existing solutions save and restore all registers. But for coroutines (stackless or stackful), there's no need to do this. See, e.g., https://photonlibos.github.io/blog/stackful-coroutine-made-f..., which tweaked clang to erase this cost and bring it line with normal function calls.
> Go addresses this by allocating chunks of stack on demand, but that still imposes a cost and a dependency on dynamic allocation.
The dynamic allocation problem exists the same whether using stackless coroutines, stackful coroutines, etc. Fundamentally, async/await in Rust is just creating a linked-list of call frames, like some mainframes do/did. How many Rust users manually OOM check Boxed dyn coroutine creation? Handling dynamic stack growth is technically a problem even in C, it's just that without exceptions and thread-scoped signal handlers there's no easy way to handle overflow so few people bother. (Heck, few even bother on Windows where it's much easier with SEH.) But these are fixable problems, it just requires coordination up-and-down the OS stack and across toolchains. The inability to coordinate these solutions does not turn ugly compromises (async/await) into cool features.
> First, you have to ban recursion, or else add some kind of language mechanism for tracking how many times a function can possibly recurse. [snip]
>
> Second, you have to ban dynamic calls (i.e. calls to function pointers)
Both of which are the case for async/await in Rust; you have to explicitly Box any async call that Rust can't statically size. We might frame this as being transparent and consistent, except it's not actually consistent because we don't treat "ordinary", non-async calls this way, which still use the traditional contiguous stack that on overflow kills the program. Nobody wants that much consistency (too much of a "good" thing?) because treating each and every call as async, with all the explicit management that would entail with the current semantics would be an indefensible nightmare for the vast majority of use cases.
> tracking how many times a function can possibly recurse.
> Tracking recursion is definitely possible, as shown by proof languages like Agda or Coq that make you prove termination of recursive functions
Proof languages don't really track how many times a function can possibly recurse, they only care that it will eventually terminate. The amount of recursive steps can even easily depend on the inputs, making it unknown at the moment a function is defined.
Right. They use rules like "a function body must destructure its input and cannot use a constructor" which implies that, since input can't be infinite, the function will terminate. That doesn't mean that the recursion depth is known before running.
have not played with zig for a while, remain in the C world.
with the cleanup attribute(a cheap "defer" for C), and the sanitizers, static analysis tools, memory tagging extension(MTE) for memory safety at hardware level, etc, and a zig 1.0 still probably years away, what's the strong selling point that I need spend time with zig these days? Asking because I'm unsure if I should re-try it.
I don't think any language that's not well-established, let alone one that isn't stabilised yet, would have a strong selling point if what you're looking for right now is to write production code that you'll maintain for a decade or more to come (I mean, companies do use languages that aren't as established as C in production apps, including Zig, but that's certainly not for everyone).
But if you're open to learning languages for tinkering/education purposes, I would say that Zig has several significant "intrinsic" advantages compared to C.
* It's much more expressive (it's at least as expressive as C++), while still being a very simple language (you can learn it fully in a few days).
* Its cross-compilation tooling is something of a marvel.
* It offers not only spatial memory safety, but protection from other kinds of undefined behaviour, in the form of things like tagged unions.
My 2ct: A much more useful stdlib (that's also currently the most unstable part though, and I don't agree with all the design decisions in the stdlib - but mostly still better than a nearly useless stdlib like C provides - and don't even get me started about the C++ stdlib heh), integrated build system and package manager (even great for pure C/C++ projects), and comptime with all the things it enables (like straightforward reflection and generics).
It also fixes a shitton of tiny design warts that we've become accustomed to in C (also very slowly happening in the C standard, but that will take decades while Zig has those fixes now).
Also, probably the best integration with C code you can get outside of C++. E.g. hybrid C/Zig projects is a regular use case and has close to no friction.
C won't go away for me, but tinkering with Zig is just more fun :)
I really like Zig as a language, but I think the standard library is a huge weakness. I find the design extremely inconsistent. It feels like a collection of small packages, not a coherent unit. And I personally think this std.Io interface is on a similar path. The idea of abstracting our all I/O calls is great, but the actual interface is sketchy, in my opinion.
Many parts of the stdlib don't separate between public interface and internal implementation details. It's possible to accidentially mess with data items which are clearly meant to be implementation-private.
I think this is because many parts of the stdlib are too object-oriented, for instance the containers are more or less C++ style objects, but this style of programming really needs RAII and restricted visibility rules like public/private (now Zig shouldn't get those, but IMHO the stdlib shouldn't pretend that those language features exist).
As a sister comment says, Zig is a great programming language, but the stdlib needs some sort of basic and consistent design philosophy whitch matches the language capabilities.
Tbf though, C gets around this problem by simply not providing a useful stdlib and delegating all the tricky design questions to library authors ;)
Honestly getting to mess with internal implementation details is my favorite part of using Zig's standard library. I'm working on a multithreaded interpreter right now, where each object has a unique index, so I need effectively a 4GB array to store them all. It's generally considered rude to allocate that all at once, so I'm using 4GB of virtual memory. I can literally just swap out std.MultiArrayList's backing array with vmem, set its capacity, and use all of its features, except now with virtual memory[1].
I would at least like to see some sort of naming convention for implementation-private properties, maybe like:
const Bla = struct {
// public access intended
bla: i32,
blub: i32,
// here be dragons
_private: struct {
x: i32,
y: i32,
},
};
...that way you can still access and mess up those 'private' items, but at least it's clear now which of the struct items are part of the 'public API contract' and which are considered internal implementation details which may change on a whim.
This is my main frustration with the push back against visibility modifiers. It's treated as an all or nothing approach, that any support for visibility modifiers locks anyone out from touching those fields.
It could just be a compiler error/warning that has to be explicitly opted into to touch those fields. This allows you to say "I know this is normally a footgun to modify these fields, and I might be violating an invariant condition, but I am know what I'm doing".
"Visibility" invariably bites you in the ass at some point. See: Rust, newtypes and the Orphan rule, for example.
As such, I'm happy to not have visibility modifiers at all.
I do absolutely agree that "std" needs a good design pass to make it consistent--groveling in ".len" fields instead of ".len()" functions is definitely a bad idea. However, the nice part about Zig is that replacement doesn't need extra compiler support. Anyone can do that pass and everyone can then import and use it.
> This allows you to say "I know this is normally a footgun to modify these fields, and I might be violating an invariant condition, but I am know what I'm doing".
Welcome to "Zig will not have warnings." That's why it's all or nothing.
It's the single thing that absolutely grinds my gears about Zig. However, it's also probably the single thing that can be relaxed at a later date and not completely change the language. Consequently, I'm willing to put up with it given the rest of the goodness I get.
Aah, that's a good point. I suppose you could argue that the doc notes indicate whether something it meant to be accessed directly, like when ArrayList mentions "This field is intended to be accessed directly.", but that's really hard to determine at a glance.
How do you cope with the lack of vector operators? I am basically only writing vector code all day and night, and the lack of infix operators for vectors is just as unacceptable as not having them for normal ints and floats would be.
Nobody is asking for Pizza * Weather && (Lizard + Sleep), that strawman argument to justify ints and floats as the only algebraic types is infuriating :(
I'd love to read a blog post about your Zig setup and workflow BTW.
If you want to write really cursed code, you could use comptime to implement a function that takes comptime strings and parses them into vector function calls ;)
I've actually thought about that, and looked a little into canonical/idiomatic ways to implement a DSL in Zig, and didn't find anything small and natural.
I just find it intellectually offensive that this extremely short-sighted line is drawn after ints and floats, any other algebraic/number types aren't similarly dignified.
If people had to write add(2, mul(3, 4)) etc for ints and floats the language would be used by exactly nobody! But just because particular language designers aren't using complex numbers and vectors all day, they interpret the request as wanting stupid abstract Monkey + Banana * Time or whatever. I really wish more Language People appreciated that there's only really one way to do complex numbers, it's worth doing right once and giving proper operators, too. Sure, use dot(a, b) and cross(a, b) etc, that's fine.
The word "number" is literally half of "complex number", and it's not like there are 1024 ways to implement 2D, 3D, 4D vector and complex number addition, subtraction, multiplication, maybe even division. There are many languages one can look to for guidance here, e.g. OpenCL[0] and Odin[1].
Andrew Kelley is one of my all-time favorite technical speakers, and zig is packed full of great ideas. He also seems to be a great model of an open source project leader.
I would say zig is not just packed full of great ideas, there's a whole graveyard of ideas that were thrown out (and a universe of ideas that were rejected out of hand). Zig is full of great ideas that compose well together.
I really hope this is going to be entirely optional, but I know realistically it just won't be. If Rust is any example, a language that has optional async support, async will permeate into the whole ecosystem. That's to be expected with colored functions. The stdlib isn't too bad but last time I checked a lot of crates.io is filled with async functions for stuff that doesn't actually block.
Async clearly works for many people, I do fully understand people who can't get their heads around threads and prefer async. It's wonderful that there's a pattern people can use to be productive!
For whatever reason, async just doesn't work for me. I don't feel comfortable using it and at this point I've been trying on and off for probably 10+ years now. Maybe it's never going to happen. I'm much more comfortable with threads, mutex locks, channels, Erlang style concurrency, nurseries -- literally ANYTHING but async. All of those are very understandable to me and I've built production systems with all of those.
I hope when Zig reaches 1.0 I'll be able to use it. I started learning it earlier this month and it's been really enjoyable to use.
It's optional to the point that you can write single-threaded version without any io.async/io.concurrent, but you will need to pass the io parameter around, if you want to do I/O. You are mistaking what is called "async" here for what other languages call async/await. It's a very different concept. Async in this context means just "spawn this function in a background, but if you can't, just run it right now".
Yeah, it always mystifies me when people talking about async vs threads when they are completely orthogonal concepts. It doesn't give me the feeling they understand what they are talking about.
The example code shown in the first few minutes of the video is actually using regular OS threads for running the async code ;)
The whole thing is quite similar to the Zig allocator philosophy. Just like an application already picks a root allocator to pass down into libraries, it now also picks an IO implementation and passes it down. A library in turn doesn't care about how async is implemented by the IO system, it just calls into the IO implementation it got handed from the application.
or you can pick one manually. Or, you can pick more than one and use as needed in different parts of your application. (probably less of a thing for IO than allocator).
You don’t have to hope. Avoiding function colours and being able to write libraries that are agnostic to whether the IO is async or not is one of the top priorities of this new IO implementation.
If you don’t want to use async/await just don’t call functions through io.async.
same. i don't like async. i don't like having to prepend "await" to every line of code. instead, lately (in js), i've been playing more with worker threads, message passing, and the "atomics" api. i get the benefits of concurrency without the extra async/await-everywhere baggage.
I understand threads but I like using async for certain things.
If I had a web service using threads, would I map each request to one thread in a thread pool? It seems like a waste of OS resources when the IO multiplexing can be done without OS threads.
> last time I checked a lot of crates.io is filled with async functions for stuff that doesn't actually block.
Like what? Even file I/O blocks for large files on slow devices, so something like async tarball handling has a use case.
It's best to write in the sans-IO style and then your threading or async can be a thin layer on top that drives a dumb state machine. But in practice I find that passable sans-IO code is harder to write than passable async. It makes a lot of sense for a deep indirect dependency like an HTTP library, but less sense for an app
I agree, async is way more popular than it should be. At least (AFAIU) Zig doesn't color functions so there won't be a big ecosystem rift between blocking and async libraries.
> I do fully understand people who can't get their heads around threads and prefer async
This is a bizarre remark
Async/await isn't "for when you can't get your head around threads", it's a completely orthogonal concept
Case in point: javascript has async/await, but everything is singlethreaded, there is no parallelism
Async/await is basically just coroutines/generators underneath.
Phrasing async as 'for people who can't get their heads around threads' makes it sound like you're just insecure that you never learned how async works yet, and instead of just sitting down + learning it you would rather compensate
Async is probably a more complex model than threads/fibers for expressing concurrency. It's fine to say that, it's fine to not have learned it if that works for you, but it's silly to put one above the other as if understanding threads makes async/await irrelevant
> The stdlib isn't too bad but last time I checked a lot of crates.io is filled with async functions for stuff that doesn't actually block
Can you provide an example? I haven't found that to be the case last time I used rust, but I don't use rust a great deal anymore
>Case in point: javascript has async/await, but everything is singlethreaded, there is no parallelism, Async/await is basically just coroutines/generators underneath.
May be I just wish Zig dont call it async and use a different name.
Async-await in JS is sometimes used to swallow exceptions. It's very often used to do 1 thing at a time when N things could be done instead. It serializes the execution a lot when it could be concurrent.
if (await is_something_true()) {
// here is_something_true() can be false
}
And above, the most common mistake.
Similar side-effects happen in other languages that have async-await sugar.
It smells as bad as the Zig file interface with intermediate buffers reading/writing to OS buffers until everything is a buffer 10 steps below.
It's fun for small programs but you really have to be very strict to not have it go wrong (performance, correctness).
That being said, I don't understand your `is_something_true` example.
> It's very often used to do 1 thing at a time when N things could be done instead
That's true, but I don't think e.g. fibres fare any better here. I would say that expressing that type of parallel execution is much more convenient with async/await and Promise.all() or whatever alternative, compared to e.g. raw promises or fibres.
Weird claim since threads were originally introduced as a concurrency primitive, basically a way to make user facing programs more responsive while sharing the same address space and CPU.
The idea of generalizing threads for use in parallel computing/SMP didn't come until at least a decade after the introduction of threads for use as a concurrency tool.
I don't understand, why is allocation not part of IO? This seems like effect oriented programming with a kinda strange grouping: allocation, and the rest (io).
Most effect-tracking languages have a GC and allocations/deallocations are handled implicitly.
To perform what we normally call "pure" code requires an allocator but not the `io` object. Code which accepts neither is also allocation-free - something which is a bit of a challenge to enforce in many languages, but just falls out here (from the lack of closures or globally scoped allocating/effectful functions - though I'm not sure whether `io` or an allocator is now required to call arbitrary extern (i.e. C) functions or not, which you'd need for a 100% "sandbox").
TL;DR. Zig now has an effect system with multiple handlers for alloc and io effects. But the designer doesn't know anything about effect systems and it's not algebraic, probably. Will be interesting to see how this develops.
I find the direction of zig confusing. Is it supposed to be a simple language or a complex one? Low level or high level? This feature is to me a strange mix of high and low level functionality and quite complex.
The io interface looks like OO but violates the Liskov substitution principle. For me, this does not solve the function color problem, but instead hides it. Every function with an IO interface cannot be reasoned about locally because of unexpected interactions with the io parameter input. This is particularly nasty when IO objects are shared across library boundaries. I now need to understand how the library internally manages io if I share that object with my internal code. Code that worked in one context may surprisingly not work in another context. As a library author, how do I handle an io object that doesn't behave as I expect?
Trying to solve this problem at the language level fundamentally feels like a mistake to me because you can't anticipate in advance all of the potential use cases for something as broad as io. That's not to say that this direction shouldn't be explored, but if it were my project, I would separate this into another package that I would not call standard.
I like Zig a lot, but something about this has been bothering me since it was announced. I can't put my finger on why, I honestly don't have a technical reason, but it just feels like the wrong direction to go.
Hopefully I'm wrong and it's wildly successful. Time will tell I guess.
What exactly makes it unpredictable? The functions in the interface have a fairly well defined meaning, take this input, run I/O operation and return results. Some implementation will suspend your code via user-space context switching, some implementation will just directly run the syscall. This is not different than approaches like the virtual thread API in Java, where you use the same APIs for I/O no matter the context. In Python world, before async/await, this was solves in gevent by monkey patching all the I/O functions in the standard library. This interface just abstracts that part out.
i think you are missing that a proper io interface should encapsulate all abstractions that care about asynchrony and patterns thereof. is that possible? we will find out. It's not unreasonable to be skeptical but can you come up with a concrete example?
> As a library author, how do I handle an io object that doesn't behave as I expect
you ship with tests against the four or five default patterns in the stdlib and if anyone wants to do anything substantially crazier to the point that it doesnt work, thats on them, they can submit a PR and you can curbstomp it if you want.
> function coloring
i recommend reading the function coloring article. there are five criteria that describe what make up the function coloring problem, it's not just that there are "more than one class of function calling conventions"
First thought is that IO is just hard
I thought the same, that zig is too low level to have async implemented in the language. It's experimental and probably going to change
fwiw i thought the previous async based on whole-program analysis and transformation to stackless coroutines was pretty sweet, and similar sorts of features ship in rust and C++ as well
It's more about allowing a-library-fits-all than forcing it. You don't have to ask for io, you just should, if you are writing a library. You can even do it the Rust way and write different libraries for example for users who want or don't want async if you really want to.
Couldn't the same thing be said about functions that accept allocators?
I start to want a Reader Monad stack for all the stuff I need to thread through all functions.
Yeah, these kinds of "orthogonal" things that you want to set up "on the outside" and then have affect the "inner" code (like allocators, "io" in this case, and maybe also presence/absence of GC, etc.) all seem to cry out for something like Lisp dynamic variables.
I don't think so because the result of calling an allocator is either you got memory or you don’t, while the IO here will be “it depends”
I don't get it, what's the difference between "got or don't" vs "it depends"?
This just sounds like a threadpool implementation to me. Sure you can brand it async, but it's not what we generally mean when we say async.
It seems to me that async io struggles whenever people try it.
For instance it is where Rust goes to die because it subverts the stack-based paradigm behind ownership. I used to find it was fun to write little applications like web servers in aio Python, particularly if message queues and websockets were involved, but for ordinary work you're better off using gunicorn. The trouble is that conventional async i/o solutions are all single threaded and in an age where it's common to have a 16 core machine on your desktop it makes no sense. It would be like starting a chess game dumping out all your pieces except for your King.
Unfashionable languages like Java and .NET that have quality multithreaded runtimes are the way to go because they provide a single paradigm to manage both concurrency and parallelism.
> Unfashionable languages like Java and .NET that have quality multithreaded runtimes are the way to go because they provide a single paradigm to manage both concurrency and parallelism.
At the cost of not being able to actually provide the same throughput, latency, or memory usage that lower level languages that don't enforce the same performance pessimizing abstractions on everything can. Engineering is about tradeoffs but pretending like Java or .NET have solved this is naiive.
> At the cost of not being able to actually provide the same throughput, latency, or memory usage
Only memory usage is true with regards to Java in this context (.NET actually doesn't offer a shared thread abstraction; it's Java and Go that do), and even that is often misunderstood. Low-level languages are optimised for minimal memory usage, which is very important on RAM-constrained devices, but is could be wasting CPU on most machines: https://youtu.be/mLNFVNXbw7I
This optimisation for memory footprint also makes it harder for low-level languages to implement user-mode threading as efficiently as high-level languages.
Another matter is that there are two different use cases for asynchronous constructs that may tempt implementors to address them with a single implementation. One is the generator use case. What makes it special is that there are exactly two communicating parties, and both of their state may fit in the CPU cache. The other use case is general concurrency, primarily for IO. In that situation, a scheduler juggles a large number of user-mode threads, and because of that, there is likely a cache miss on every context switch, no matter how efficient it is. However, in the second case, almost all of the performance is due to Little's law rather than context switch time (see my explanation here: https://inside.java/2020/08/07/loom-performance/). That means that a "stackful" implementation of user-mode threads can have no significant performance penalty for the second use case (which, BTW, I think has much more value than the first), even though a more performant implementation is possible for the first use case. In Java we decided to tackle the second use case with virtual threads, and so far we've not offered something for the first (for which the demand is significantly lower). What happens in languages that choose to tackle both use cases with the same construct is that in the second and more important use case they gain no more than negligible performance (at best), but they're paying for that with a substantial degradation in user experience.
It sounds like you’re disagreeing yet no case is made that throughout and latency isn’t worse.
For example, the best frameworks on TechEmpower are all Rust, C and C++ with the best Java coming in at 25% slower on that microbenchmark. My point stands - it is generally true that well written rust/c/c++ outperforms well written Java and .Net and not just with lower memory usage. The “engineering effort per performance” maybe skews to Java but that’s different than absolute performance. With rust to me it’s also less clear if that is actually even true.
[1] https://www.techempower.com/benchmarks/#section=data-r23
First, in all benchmarks but two, Java performs just as well as C/C++/Rust, and in one of those two, Go performs as well as the low-level languages. Second, I don't know the details of that one benchmark where the low-level languages indeed perform better than high-level ones, but I don't see any reason to believe it has anything to do with virtual threads.
Modern Java GCs typically offer a boost over more manual memory management. And on latency, even if virtual were very inefficient and you'd add a GC pause with Java's new GCs, you'd still be well below 1ms, i.e. not a dominant factor in a networked program.
(Yes, there's still one cause for potential lower throughput in Java, which is the lack of inlined objects in arrays, but that will be addressed soon, and isn't a big factor in most server applications anyway or related to IO)
BTW, writing a program in C++ has always been more or less as easy as writing it in Java/C# etc.; the big cost of C++ is in evolution and refactoring over many years, because in low-level languages local changes to code have a much more global impact, and that has nothing to do with the design of the language but is an essential property of tracking memory management at the code level (unless you use smart pointers, i.e. a refcounting GC for everything, but then things will be really slow, as refcounting does sacrifice performance in its goal of minimising footprint).
Any gc pause is unacceptable if your goal is predictable throughput and latency
Modern gcs can be pauseless, but either way you’re spending CPU on gc and not servicing requests/customers.
As for c++, std::unique_ptr has no ref counting at all.
shared_ptr does, but that’s why you avoid it at all costs if you need to move things around. you only pay the cost when copying the shared_ptr itself, but you almost never need a shared_ptr and even when you need it, you can always avoid copying in the hot path
Honestly, if these languages are only winning by 25% in microbenchmarks, where I’d expect the difference to be biggest, that’s a strong boost for Java for me. I didn’t realise it was so close, and I hate async programming so I’m definitely not doing it for an, at most, 25% boost.
I didn’t make the claim that it’s worth it. But when it is absolutely needed Java has no solution.
And remember, we’re talking about a very niche and specific I/O microbenchmark. Start looking at things like SIMD (currently - I know Java is working on it) or in general more compute bound and the gap will widen. Java still doesn’t yet have the tools to write really high performance code.
I honestly doubt any of the frameworks in that benchmark are using virtual threads yet. The top one is still using vert.x which is an event loop on native platform threads.
> Unfashionable languages like Java and .NET that have quality multithreaded runtimes are the way to go because they provide a single paradigm to manage both concurrency and parallelism.
First, that would be Java and Go, not Java and .NET, as .NET offers a separate construct (async/await) for high-throughput concurrency.
Second, while "unfashionable" in some sense, I guess, it's no wonder that Java is many times popular than any "fashionable" language. Also, if "fashionable" means "much discussed on HN", then that has historically been a terrible predictor of language success. There's almost an inverse correlation between how much a language is discussed on HN and its long-term success, and that's not surprising, as it's the less commonplace things that are more interesting to talk about. HN is more Vogue magazine than the New York Times.
> but for ordinary work you're better off using gunicorn
I'd like to see some evidence for this. Other than simplicity, IMO there's very little reason to use synchronous Python for a web server these days. Streaming files, websockets, etc. are all areas where asyncio is almost a necessity (in the past you might have used twisted), to say nothing of the performance advantage for typical CRUD workloads. The developer ergonomics are also much better if you have to talk to multiple downstream services or perform actions outside of the request context. Needing to manage a thread pool for this or defer to a system like Celery is a ton more code (and infrastructure, typically).
> async i/o solutions are all single threaded
And your typical gunicorn web server is single threaded as well. Yes you can spin up more workers (processes), but you can also do that with an asgi server and get significantly higher performance per process / for the same memory footprint. You can even use uvicorn as a gunicorn worker type and continue to use it as your process supervisor, though if you're using something like Kubernetes that's not really necessary.
Not many use cases actually need websockets. We're still building new shit in sync python and avoiding the complexity of all the other bullshit
> It seems to me that async io struggles whenever people try it.
Promises work great in javascript, either in the browser or in node/bun. They're easy to use, and easy to reason about (once you understand them). And the language has plenty of features for using them in lots of ways - for example, Promise.all(), "for await" loops, async generators and so on. I love this stuff. Its fast, simple to use and easy to reason about (once you understand it).
Personally I've always thought the "function coloring problem" was overstated. I'm happy to have some codepaths which are async and some which aren't. Mixing sync and async code willy nilly is a code smell.
Personally I'd be happy to see more explicit effects (function colors) in my languages. For example, I'd like to be able to mark which functions can't panic. Or effects for non-divergence, or capability safety, and so on.
Promises in JS are particularly easy because JS is single-threaded. You can be certain that your execution flow won't be preepted at an arbitrary point. This greatly reduces the need for locks, atomics, etc.
Also task-local variables, which almost all systems other than C-level threads basically give up on despite being widely demanded.
> Promises work great in javascript, either in the browser or in node/bun.
I can't disagree more. They suffer from the same stuff rust async does: they mess with the stack trace and obscure the actual guarantees of the function you're calling (eg a function returning a promise can still block, or the promise might never resolve at all).
Personally I think all solutions will come with tradeoffs; you can simply learn them well enough to be productive anyway. But you don't need language-level support for that.
> I can't disagree more. They suffer from the same stuff rust async does: they mess with the stack trace and obscure the actual guarantees of the function you're calling (eg a function returning a promise can still block, or the promise might never resolve at all).
These are inconveniences, but not show stoppers. Modern JS engines can "see through" async call stacks. Yes, bugs can result in programs that hang - but that's true in synchronous code too.
But async in rust is way worse:
- Compilation times are horrible. An async hello world in javascript starts instantly. In rust I need to compile and link to tokio or something. Takes ages.
- Rust doesn't have async iterators or async generators. (Or generators in any form.) Rust has no built in way to create or use async streams.
- Rust has 2 different ways to implement futures: the async keyword and impl Future. You need to learn both, because some code is impossible to write with the async keyword. And some code is impossible to write with impl Future. Its incredibly confusing and complicated and its difficult to learn it properly.
- Rust doesn't have a built in run loop ("executor"). So - best case - your project pulls in tokio or something, which is an entire kitchen sink and all your dependencies use that. Worst case, the libraries you want to use are written for different async executors and ??? shoot me. In JS, everything just works out of the box.
I love rust. But async rust makes async javascript seem simple and beautiful.
If you watched the video closely, you'll have noticed that this design parameterizes the code by an `io` interface, which enables pluggable implementations. Correctly written code in this style can work transparently with evented or threaded runtimes.
Really? Ordinary synchronous code calls an I/O routine which returns. Asynchronous code calls an I/O routine and then gets called back. That’s a fundamental difference and you can only square it by making the synchronous code look like asynchronous code (callback gets called right away) or asynchronous code look like synchronous code (something like async python which breaks up a subroutine into multiple subroutines and has an event loop manage who calls who.
I know that it depends on how much you disentangle your network code from your business logic. The question is the degree. Is it enough, or does it just dull the pain?
If you give your business logic the complete message or send it a stream, then the flow of ownership stays much cleaner. And the unit tests stay substantially easier to write and more importantly, to maintain.
I know too many devs who don't see when they bias their decisions to avoid making changes that will lead to conflict with bad unit tests and declare that our testing strategy is Just Fine. It's easier to show than to debate, but it still takes an open mind to accept the demonstration.
Project Loom makes Java in particular really nice, virtual threads can "block" without blocking the underlying OS thread. No callbacks at all, and you can even use Structured Concurrency to implement all sorts of Go- and Erlang-like patterns.
(I use it from Clojure, where it pairs great with the "thread" version of core.async (i.e. Go-style) channels.)
I haven’t actually seen it in the wild yet, just talked about in technical talks from engineers at different studios, but I’m interested in designs in which there isn’t a traditional main thread anymore.
Instead, everything is a job, and even what is considered the main thread is no longer an orchestration thread, but just another worker after some nominal set up between scaffolding enough threads, usually minus one, to all serve as lockless, work-stealing worker threads.
Conventional async programming relies too heavily on a critical main thread.
I think it’s been so successful though, that unfortunately we’ll be stuck with it for much longer than some of us will want.
It reminds me of how many years of inefficient programming we have been stuck with because cache-unfriendly traditional object-oriented programming was so successful.
Personally I find this really cool.
One thing I like about the design is it locks in some of the "platforms" concepts seen in other languages (e.g. Roc), but in a way that goes with Zig's "no hidden control flow" mantra.
The downstream effect is that it will be normal to create your own non-posix analogue of `io` for wherever you want code to hook into. Writing a game engine? Let users interact with a set of effectful functions you inject into their scripts.
As a "platform" writer (like the game engine), essentially you get to create a sandbox. The missing piece may be controlling access to calling arbitrary extern C functions - possibly that capability would need to be provided by `io` to create a fool-proof guarantees about what some code you call does. (The debug printing is another uncontrolled effect).
Oh boy. Example 7 is a bit of a mindfuck. You get the returned string in both in await and cancel.
Feels like this violates zig “no hidden control flow” principle. I kinda see how it doesn’t. But it sure feels like a violation. But I also don’t see how they can retain the spirit of the principle with async code.
> Feels like this violates zig “no hidden control flow” principle.
A hot take here is that the whole async thing is a hidden control flow. Some people noticed that ever since plain callbacks were touted as a "webscale" way to do concurrency. The sequence of callbacks being executed or canceled forms a hidden, implicit control flow running concurrently with the main control logic. It can be harder to debug and manage than threads.
But that said, unless, Zig adds a runtime with its own scheduler and turns into a bytecode VM, there is not much it can do. Co-routines and green threads have been done before in C and C-like languages, but not sure how easily the would fit with Zig and its philosophy.
hidden control flow means no control flow occurs outside of function boundaries, keywords, short circuiting operators, or builtins. i believe there is a plan for a asyncresume and asyncsuspend builtins that show the actual sites where control flow happens.
It's worth noting that this is not async/await in the sense of essentially every other language that uses those terms.
In other languages, when the compiler sees an async function, it compiles it into a state machine or 'coroutine', where the function can suspend itself at designated points marked with `await`, and be resumed later.
In Zig, the compiler used to support coroutines but this was removed. In the new design, `async` and `await` are just functions. In the threaded implementation used in the demo, `await` just blocks the thread until the operation is done.
To be fair, the bottom of the post explains that there are two other Io implementations being planned.
One of them is "stackless coroutines", which would be similar to traditional async/await. However, from the discussion so far this seems a bit like vaporware. As discussed in [1], andrewrk explicitly rejected the idea of just (re-)adding normal async/await keywords, and instead wants a different design, as tracked in issue 23446. But in issue 23446 the seems to be zero agreement on how the feature would work, how it would improve on traditional async/await, or how it would avoid function coloring.
The other implementation being planned is "stackful coroutines". From what I can tell, this has more of a plan and is more promising, but there are significant unknowns.
The basis of the design is similar to green threads or fibers. Low-level code generation would be identical to normal synchronous code, with no state machine transform. Instead, a library would implement suspension by swapping out the native register state and stack, just like the OS kernel does when switching between OS threads. By itself, this has been implemented many times before, in libraries for C and in the runtimes of languages like Go. But it has the key limitation that you don't know how much stack to allocate. If you allocate too much stack in advance, you end up being not much cheaper than OS threads; but if you allocate too little stack, you can easily hit stack overflow. Go addresses this by allocating chunks of stack on demand, but that still imposes a cost and a dependency on dynamic allocation.
andrewrk proposes [2] to instead have the compiler calculate the maximum amount of native stack needed by a function and all its callees. In this case, the stack could be sized exactly to fit. In some sense this is similar to async in Rust, where the compiler calculates the size of async function objects based on the amount of state the function and its callees need to store during suspension. But the Zig approach would apply to all function calls rather than treating async as a separate case. As a result, the benefits would extend beyond memory usage in async code. The compiler would statically guarantee the absence of stack overflow, which benefits reliability in all code that uses the feature. This would be particularly useful in embedded where, typically, reliability demands are high and memory available is low. Right now in embedded, people sometimes use a GCC feature ("-fstack-usage") that does a similar calculation, but it's messy enough that people often don't bother. So it would be cool to have this as a first-class feature in Zig.
But.
There's a reason that stack usage calculators are uncommon. If you want to statically bound stack usage:
First, you have to ban recursion, or else add some kind of language mechanism for tracking how many times a function can possibly recurse. Banning recursion is common in embedded code but would be rather annoying for most codebases. Tracking recursion is definitely possible, as shown by proof languages like Agda or Coq that make you prove termination of recursive functions - but those languages have a lot of tools that 'normal' languages don't, so it's unclear how ergonomic such a feature could be in Zig. The issue [2] doesn't have much concrete discussion on how it would work.
Second, you have to ban dynamic calls (i.e. calls to function pointers), because if you don't know what function you're calling, you don't know how much stack it will use. This has been the subject of more concrete design in [3] which proposes a "restricted" function pointer type that can only refer to a statically known set of functions. However, it remains to be seen how ergonomic and composable this will be.
Zooming back out:
Personally, I'm glad that Zig is willing to experiment with these things rather than just copying the same async/await feature as every other language. There is real untapped potential out there. On the other hand, it seems a little early to claim victory, when all that works today is a thread-based I/O library that happens to have "async" and "await" in its function names.
Heck, it seems early to finalize an I/O library design if you don't even know how the fancy high-performance implementations will work. Though to be fair, many applications will get away just fine with threaded I/O, and it's nice to see a modern I/O library design that embraces that as a serious option.
[1] https://github.com/ziglang/zig/issues/6025#issuecomment-3072...
[2] https://github.com/ziglang/zig/issues/157
[3] https://github.com/ziglang/zig/issues/23367
> But it has the key limitation that you don't know how much stack to allocate. If you allocate too much stack in advance, you end up being not much cheaper than OS threads; but if you allocate too little stack, you can easily hit stack overflow.
With a 64-bit address space you can reserve large contiguous chunks (e.g. 2MB), while only allocating the minimum necessary for the optimistic case. The real problem isn't memory usage, per se, it's all the VMA manipulation and noise. In particular, setting up guard pages requires a separate VMA region for each guard (usually two per stack, above and below). Linux recently got a new madvise feature, MADV_GUARD_INSTALL/MADV_GUARD_REMOVE, which lets you add cheap guard pages without installing a distinct, separate guard page. (https://lwn.net/Articles/1011366/) This is the type of feature that could be used to improve the overhead of stackful coroutines/fibers. In theory fibers should be able to outperform explicit async/await code, because in the non-recursive, non-dynamic call case a fiber's stack can be stack-allocated by the caller, thus being no more costly than allocating a similar async/await call frame, yet in the recursive and dynamic call cases you can avoid dynamic frame bouncing, which in the majority of situations is unnecessary--the poor performance of dynamic frame allocation/deallocation in deep dynamic call chains is the reason Go switched from segmented stacks to moveable stacks.
Another major cost of fibers/thread is context switching--most existing solutions save and restore all registers. But for coroutines (stackless or stackful), there's no need to do this. See, e.g., https://photonlibos.github.io/blog/stackful-coroutine-made-f..., which tweaked clang to erase this cost and bring it line with normal function calls.
> Go addresses this by allocating chunks of stack on demand, but that still imposes a cost and a dependency on dynamic allocation.
The dynamic allocation problem exists the same whether using stackless coroutines, stackful coroutines, etc. Fundamentally, async/await in Rust is just creating a linked-list of call frames, like some mainframes do/did. How many Rust users manually OOM check Boxed dyn coroutine creation? Handling dynamic stack growth is technically a problem even in C, it's just that without exceptions and thread-scoped signal handlers there's no easy way to handle overflow so few people bother. (Heck, few even bother on Windows where it's much easier with SEH.) But these are fixable problems, it just requires coordination up-and-down the OS stack and across toolchains. The inability to coordinate these solutions does not turn ugly compromises (async/await) into cool features.
> First, you have to ban recursion, or else add some kind of language mechanism for tracking how many times a function can possibly recurse. [snip] > > Second, you have to ban dynamic calls (i.e. calls to function pointers)
Both of which are the case for async/await in Rust; you have to explicitly Box any async call that Rust can't statically size. We might frame this as being transparent and consistent, except it's not actually consistent because we don't treat "ordinary", non-async calls this way, which still use the traditional contiguous stack that on overflow kills the program. Nobody wants that much consistency (too much of a "good" thing?) because treating each and every call as async, with all the explicit management that would entail with the current semantics would be an indefensible nightmare for the vast majority of use cases.
> If you allocate too much stack in advance, you end up being not much cheaper than OS threads;
Maybe. a smart event loop could track how many frames are in flight at any given time and reuse preallocated frames when their frames dispatch out.
> tracking how many times a function can possibly recurse.
> Tracking recursion is definitely possible, as shown by proof languages like Agda or Coq that make you prove termination of recursive functions
Proof languages don't really track how many times a function can possibly recurse, they only care that it will eventually terminate. The amount of recursive steps can even easily depend on the inputs, making it unknown at the moment a function is defined.
Right. They use rules like "a function body must destructure its input and cannot use a constructor" which implies that, since input can't be infinite, the function will terminate. That doesn't mean that the recursion depth is known before running.
Really really hope they make it easy and ergonomic to integrate with single threaded cooperative scheduling paradigm like seastar or glommio.
I wrote a library that I use for this but it would be really nice to be able to cleanly integrate it into async/await.
https://github.com/steelcake/csio
I wrote a library that does single threaded cooperative scheduling and async I/O and from the ground up, it was designed to implement this interface.
https://github.com/lalinsky/zio
If I know anything about anything it is that a new language debuting a new async/await plan will be well received by casual and expert alike.
have not played with zig for a while, remain in the C world.
with the cleanup attribute(a cheap "defer" for C), and the sanitizers, static analysis tools, memory tagging extension(MTE) for memory safety at hardware level, etc, and a zig 1.0 still probably years away, what's the strong selling point that I need spend time with zig these days? Asking because I'm unsure if I should re-try it.
I don't think any language that's not well-established, let alone one that isn't stabilised yet, would have a strong selling point if what you're looking for right now is to write production code that you'll maintain for a decade or more to come (I mean, companies do use languages that aren't as established as C in production apps, including Zig, but that's certainly not for everyone).
But if you're open to learning languages for tinkering/education purposes, I would say that Zig has several significant "intrinsic" advantages compared to C.
* It's much more expressive (it's at least as expressive as C++), while still being a very simple language (you can learn it fully in a few days).
* Its cross-compilation tooling is something of a marvel.
* It offers not only spatial memory safety, but protection from other kinds of undefined behaviour, in the form of things like tagged unions.
My 2ct: A much more useful stdlib (that's also currently the most unstable part though, and I don't agree with all the design decisions in the stdlib - but mostly still better than a nearly useless stdlib like C provides - and don't even get me started about the C++ stdlib heh), integrated build system and package manager (even great for pure C/C++ projects), and comptime with all the things it enables (like straightforward reflection and generics).
It also fixes a shitton of tiny design warts that we've become accustomed to in C (also very slowly happening in the C standard, but that will take decades while Zig has those fixes now).
Also, probably the best integration with C code you can get outside of C++. E.g. hybrid C/Zig projects is a regular use case and has close to no friction.
C won't go away for me, but tinkering with Zig is just more fun :)
I really like Zig as a language, but I think the standard library is a huge weakness. I find the design extremely inconsistent. It feels like a collection of small packages, not a coherent unit. And I personally think this std.Io interface is on a similar path. The idea of abstracting our all I/O calls is great, but the actual interface is sketchy, in my opinion.
https://github.com/ziglang/zig/issues/1629
Could you expand on what are the design decisions you disapprove of ? You got me curious
Many parts of the stdlib don't separate between public interface and internal implementation details. It's possible to accidentially mess with data items which are clearly meant to be implementation-private.
I think this is because many parts of the stdlib are too object-oriented, for instance the containers are more or less C++ style objects, but this style of programming really needs RAII and restricted visibility rules like public/private (now Zig shouldn't get those, but IMHO the stdlib shouldn't pretend that those language features exist).
As a sister comment says, Zig is a great programming language, but the stdlib needs some sort of basic and consistent design philosophy whitch matches the language capabilities.
Tbf though, C gets around this problem by simply not providing a useful stdlib and delegating all the tricky design questions to library authors ;)
Honestly getting to mess with internal implementation details is my favorite part of using Zig's standard library. I'm working on a multithreaded interpreter right now, where each object has a unique index, so I need effectively a 4GB array to store them all. It's generally considered rude to allocate that all at once, so I'm using 4GB of virtual memory. I can literally just swap out std.MultiArrayList's backing array with vmem, set its capacity, and use all of its features, except now with virtual memory[1].
[1] https://github.com/smj-edison/zicl/blob/bacb08153305d5ba97fc...
I would at least like to see some sort of naming convention for implementation-private properties, maybe like:
...that way you can still access and mess up those 'private' items, but at least it's clear now which of the struct items are part of the 'public API contract' and which are considered internal implementation details which may change on a whim.This is my main frustration with the push back against visibility modifiers. It's treated as an all or nothing approach, that any support for visibility modifiers locks anyone out from touching those fields.
It could just be a compiler error/warning that has to be explicitly opted into to touch those fields. This allows you to say "I know this is normally a footgun to modify these fields, and I might be violating an invariant condition, but I am know what I'm doing".
"Visibility" invariably bites you in the ass at some point. See: Rust, newtypes and the Orphan rule, for example.
As such, I'm happy to not have visibility modifiers at all.
I do absolutely agree that "std" needs a good design pass to make it consistent--groveling in ".len" fields instead of ".len()" functions is definitely a bad idea. However, the nice part about Zig is that replacement doesn't need extra compiler support. Anyone can do that pass and everyone can then import and use it.
> This allows you to say "I know this is normally a footgun to modify these fields, and I might be violating an invariant condition, but I am know what I'm doing".
Welcome to "Zig will not have warnings." That's why it's all or nothing.
It's the single thing that absolutely grinds my gears about Zig. However, it's also probably the single thing that can be relaxed at a later date and not completely change the language. Consequently, I'm willing to put up with it given the rest of the goodness I get.
Aah, that's a good point. I suppose you could argue that the doc notes indicate whether something it meant to be accessed directly, like when ArrayList mentions "This field is intended to be accessed directly.", but that's really hard to determine at a glance.
How do you cope with the lack of vector operators? I am basically only writing vector code all day and night, and the lack of infix operators for vectors is just as unacceptable as not having them for normal ints and floats would be.
Nobody is asking for Pizza * Weather && (Lizard + Sleep), that strawman argument to justify ints and floats as the only algebraic types is infuriating :(
I'd love to read a blog post about your Zig setup and workflow BTW.
If you want to write really cursed code, you could use comptime to implement a function that takes comptime strings and parses them into vector function calls ;)
I've actually thought about that, and looked a little into canonical/idiomatic ways to implement a DSL in Zig, and didn't find anything small and natural.
I just find it intellectually offensive that this extremely short-sighted line is drawn after ints and floats, any other algebraic/number types aren't similarly dignified.
If people had to write add(2, mul(3, 4)) etc for ints and floats the language would be used by exactly nobody! But just because particular language designers aren't using complex numbers and vectors all day, they interpret the request as wanting stupid abstract Monkey + Banana * Time or whatever. I really wish more Language People appreciated that there's only really one way to do complex numbers, it's worth doing right once and giving proper operators, too. Sure, use dot(a, b) and cross(a, b) etc, that's fine.
The word "number" is literally half of "complex number", and it's not like there are 1024 ways to implement 2D, 3D, 4D vector and complex number addition, subtraction, multiplication, maybe even division. There are many languages one can look to for guidance here, e.g. OpenCL[0] and Odin[1].
[0] OpenCL Quick Reference Card, masterpiece IMO: https://www.khronos.org/files/opencl-1-2-quick-reference-car...
[1] Odin language, specifically the operators: https://odin-lang.org/docs/overview/#operators
Andrew Kelley is one of my all-time favorite technical speakers, and zig is packed full of great ideas. He also seems to be a great model of an open source project leader.
I would say zig is not just packed full of great ideas, there's a whole graveyard of ideas that were thrown out (and a universe of ideas that were rejected out of hand). Zig is full of great ideas that compose well together.
One that was tossed in 0.15 was `usingnamespace` which was a bit rough to refactor away from.
I really hope this is going to be entirely optional, but I know realistically it just won't be. If Rust is any example, a language that has optional async support, async will permeate into the whole ecosystem. That's to be expected with colored functions. The stdlib isn't too bad but last time I checked a lot of crates.io is filled with async functions for stuff that doesn't actually block.
Async clearly works for many people, I do fully understand people who can't get their heads around threads and prefer async. It's wonderful that there's a pattern people can use to be productive!
For whatever reason, async just doesn't work for me. I don't feel comfortable using it and at this point I've been trying on and off for probably 10+ years now. Maybe it's never going to happen. I'm much more comfortable with threads, mutex locks, channels, Erlang style concurrency, nurseries -- literally ANYTHING but async. All of those are very understandable to me and I've built production systems with all of those.
I hope when Zig reaches 1.0 I'll be able to use it. I started learning it earlier this month and it's been really enjoyable to use.
It's optional to the point that you can write single-threaded version without any io.async/io.concurrent, but you will need to pass the io parameter around, if you want to do I/O. You are mistaking what is called "async" here for what other languages call async/await. It's a very different concept. Async in this context means just "spawn this function in a background, but if you can't, just run it right now".
> I do fully understand people who can't get their heads around threads and prefer async.
Those are independent of each other. You can have async with and without threads. You can have threads with and without async.
Yeah, it always mystifies me when people talking about async vs threads when they are completely orthogonal concepts. It doesn't give me the feeling they understand what they are talking about.
> I'm much more comfortable with threads
The example code shown in the first few minutes of the video is actually using regular OS threads for running the async code ;)
The whole thing is quite similar to the Zig allocator philosophy. Just like an application already picks a root allocator to pass down into libraries, it now also picks an IO implementation and passes it down. A library in turn doesn't care about how async is implemented by the IO system, it just calls into the IO implementation it got handed from the application.
or you can pick one manually. Or, you can pick more than one and use as needed in different parts of your application. (probably less of a thing for IO than allocator).
You don’t have to hope. Avoiding function colours and being able to write libraries that are agnostic to whether the IO is async or not is one of the top priorities of this new IO implementation.
If you don’t want to use async/await just don’t call functions through io.async.
> can't get their heads around threads and prefer async
Wow. Do you expect anyone to continue reading after a comment like that?
same. i don't like async. i don't like having to prepend "await" to every line of code. instead, lately (in js), i've been playing more with worker threads, message passing, and the "atomics" api. i get the benefits of concurrency without the extra async/await-everywhere baggage.
lol it’s just a very different tradeoff. Especially in js those approaches are far more of a “baggage” than async/await
probably, but i'm petty like that. i just really don't like async/await. i'm looking forward to eventually being punished for that opinion!
I understand threads but I like using async for certain things.
If I had a web service using threads, would I map each request to one thread in a thread pool? It seems like a waste of OS resources when the IO multiplexing can be done without OS threads.
> last time I checked a lot of crates.io is filled with async functions for stuff that doesn't actually block.
Like what? Even file I/O blocks for large files on slow devices, so something like async tarball handling has a use case.
It's best to write in the sans-IO style and then your threading or async can be a thin layer on top that drives a dumb state machine. But in practice I find that passable sans-IO code is harder to write than passable async. It makes a lot of sense for a deep indirect dependency like an HTTP library, but less sense for an app
I agree, async is way more popular than it should be. At least (AFAIU) Zig doesn't color functions so there won't be a big ecosystem rift between blocking and async libraries.
> I do fully understand people who can't get their heads around threads and prefer async
This is a bizarre remark
Async/await isn't "for when you can't get your head around threads", it's a completely orthogonal concept
Case in point: javascript has async/await, but everything is singlethreaded, there is no parallelism
Async/await is basically just coroutines/generators underneath.
Phrasing async as 'for people who can't get their heads around threads' makes it sound like you're just insecure that you never learned how async works yet, and instead of just sitting down + learning it you would rather compensate
Async is probably a more complex model than threads/fibers for expressing concurrency. It's fine to say that, it's fine to not have learned it if that works for you, but it's silly to put one above the other as if understanding threads makes async/await irrelevant
> The stdlib isn't too bad but last time I checked a lot of crates.io is filled with async functions for stuff that doesn't actually block
Can you provide an example? I haven't found that to be the case last time I used rust, but I don't use rust a great deal anymore
> This is a bizarre remark
> makes it sound like you're just insecure
> instead of just sitting down + learning it you would rather compensate
Can you please edit out swipes like these from your HN posts? This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html.
Your comment would be fine without those bits.
>Case in point: javascript has async/await, but everything is singlethreaded, there is no parallelism, Async/await is basically just coroutines/generators underneath.
May be I just wish Zig dont call it async and use a different name.
It's more about how the code ends up evolving.
Async-await in JS is sometimes used to swallow exceptions. It's very often used to do 1 thing at a time when N things could be done instead. It serializes the execution a lot when it could be concurrent.
And above, the most common mistake.Similar side-effects happen in other languages that have async-await sugar.
It smells as bad as the Zig file interface with intermediate buffers reading/writing to OS buffers until everything is a buffer 10 steps below.
It's fun for small programs but you really have to be very strict to not have it go wrong (performance, correctness).
I think you replied to the wrong person.
That being said, I don't understand your `is_something_true` example.
> It's very often used to do 1 thing at a time when N things could be done instead
That's true, but I don't think e.g. fibres fare any better here. I would say that expressing that type of parallel execution is much more convenient with async/await and Promise.all() or whatever alternative, compared to e.g. raw promises or fibres.
It sounds like your trouble with async is mistaking concurrency for parallelism.
Weird claim since threads were originally introduced as a concurrency primitive, basically a way to make user facing programs more responsive while sharing the same address space and CPU.
The idea of generalizing threads for use in parallel computing/SMP didn't come until at least a decade after the introduction of threads for use as a concurrency tool.
> Weird claim since threads were originally introduced as a concurrency primitive
Wasn't this only true when CPUs were single core only? Only when multi core CPUs came, true parallelism could happen (outside of using multiple CPUs)
I almost fell out of my chair at 22:06 from laughter :D
due to broken audio?
Not broken, that's Charlie Brown Teacher.
I guess they didn't get a release from the question asker and so they edited it out?
The problem here though is that the presenter didn't repeat the question for the audience which is a rookie mistake.
text version: https://andrewkelley.me/post/zig-new-async-io-text-version.h...
Thanks - we'll switch the URL to that above and re-up the thread.
I don't understand, why is allocation not part of IO? This seems like effect oriented programming with a kinda strange grouping: allocation, and the rest (io).
Most effect-tracking languages have a GC and allocations/deallocations are handled implicitly.
To perform what we normally call "pure" code requires an allocator but not the `io` object. Code which accepts neither is also allocation-free - something which is a bit of a challenge to enforce in many languages, but just falls out here (from the lack of closures or globally scoped allocating/effectful functions - though I'm not sure whether `io` or an allocator is now required to call arbitrary extern (i.e. C) functions or not, which you'd need for a 100% "sandbox").
https://andrewkelley.me/post/zig-new-async-io-text-version.h...
Text version.
The desynced video makes the video a bit painful to watch.
Yup, we'll use the text version and keep the video link in the toptext.
TL;DR. Zig now has an effect system with multiple handlers for alloc and io effects. But the designer doesn't know anything about effect systems and it's not algebraic, probably. Will be interesting to see how this develops.