I’ve just finished building new infrastructure for Dartdocs, and Seth asked me to share my experience with Dart there. Obviously, you expect infrastructure for generating Dart docs for all the packages on pub to be written in Dart, though it might be built in any programming language. All it needs in a nutshell is to run a dartdoc shell command, which actually generates the documentation, and then upload the generated files somewhere.
This is not my first Dart project, we use Dart in Mixbook (where I work) pretty heavily, mostly for various client-side projects, but we also have some microservices on backend written in Dart as well. I love Dart, but like with any technology, using it is a tradeoff. It has its own pros (great async support, amazing tooling and ecosystem, staticly typed, but interpreted – no compile time) and cons (type system is not strict enough, IMHO, also a bit conservative for a modern language, i.e. where are my non-nullable types, method generics and immutable value objects?! :)), but for me pros outweight cons, so I use it a lot, and actually pretty happy with it.
The requirements for http://www.dartdocs.org were pretty simple – the infrastructure for generating docs for the Dart pub ecosystem, which is:
- Efficient – should use the available computing resources efficiently, avoid unnecessary work
- Scalable – allows to regenerate the documentation for the whole Dart ecosystem quickly (which is thousands and thousands Dart packages), within several hours, in case there is a new version of dartdoc tool.
- Reliable – should work unattended, restore itself in case of failures, and should be easy to debug in case of failures.
These requirements describe pretty much any web service you usually write :) So, I’ll describe several tricks below I used in dartdocs to achieve these.
As I already said, Dart has really nice async support, for me this is one of its killer features. Futures and Streams were added to SDK from the very early versions, which means all the packages just use the SDK’s ones, and nobody tries to reinvent their own implementations of Futures or Streams. And this is a big deal! A lot of packages are built with concurrency support as well, methods return Futures and Streams, and that makes these packages composable with each other, since every package uses the same implementation of Futures and Streams. Web frameworks, database drivers, unit test packages, file system tools, loggers, http libraries, socket handling, etc – all of them usually support concurrency and non-blocking operations. This allows us to build event-based fully concurrent web services pretty easily (especially after adding async/await support in Dart 1.9!)
So, a workflow for the main dartdocs.org script is:
- Download all the existing package names and versions from pub (or refresh the list if already downloaded)
- Download (or refresh) the metadata for already generated packages from Google Cloud Datastore (was it successfully generated or not, generation datetime, etc)
- Figure out the next batch of packages to generate
- Actually generate the docs
- Upload the docs to Google Cloud Storage
- Update the metadata for the newly generated packages on Google Cloud Datastore
Mostly, all of these are network calls (except actual generation of docs), so it’d be dumb to do that sequently. But thanks to all the things about async and concurrency I desribed above, it could be done pretty easily! The HTTP lib and libs for working with Google Cloud services – they all return Futures, of course, so we can group them and then handle these groups in parallel. E.g., you could implement uploading to Google Cloud Storage in the following way:
1 2 3 4 5 6 7 8 9 10 11 12 13
Pretty simple, and easy to read and reason about. There is some room for improvement (you could queue all the uploads, and just make sure you handle 20 at a time, using some task queue, for example), but this example is simple enough to demonstrate the use case.
This non-blocking nature may be not so important for dartdocs.org, but it’s becoming way more important when you use it in web services with HTTP front-end. Usually, you don’t spend a lot of time and CPU on crunching numbers in a request, but instead you just make a lot of network calls to various services and databases, combine data together and return to the requester. Something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13
If every piece is non-blocking, the app spends very really little computation time on each request and has availability to process other concurrent requests. So you can process a lot of requests concurrently in a single thread. Which is great!
Scalability in this context means – if we want to finish regenerating docs faster, we just need to launch more instances in the cloud. So, we need a way to split the work between them, and rebalance the work when we add or remove the instances.
Let’s observe some properties of the pub Dart packages – each package is uniquely identified by its name and version. It’s immutable – the source code of the package never changes. Also, the packages are never being deleted, all their versions will forever stay in pub. So, developers can only add new packages, so the list is always growing, but never shrinking.
I hosted the infrastructure on Google Cloud, on Google Compute Engine (GCE) instances. GCE has the ability to create instance groups, where you can specify how many instances should be run within that group. Each instance within that group will have a unique name, and you can get a list of all the instance names within a group. So, splitting the work in this case is pretty simple – after finishing generating docs for another batch of packages, we ask for the list of currently existing GCE instances within the group, sort it, check what’s the current instance index within the list, and depending on that retrieve the next batch of packages from the whole list. I.e. if we have 5 instances in the group – foo1, foo2, foo3, foo4, and foo5. The current instance name is foo2. Whole number of unhandled packages is 10000. The batch size is 20 packages. Given all that, we’ll take the range 2001-2020 from the list of all unhandled packages as the next batch.
Sometimes, when we increase or reduce number of instances in the group, we may end up with 2 instances generating docs for the same package, but that’s fine – they will produce the same result and rare enough we can ignore that.
The scripts should work unattended, and should try to restore themselves after failures. Also, there are a lot of network calls happening during the workflow of the script, so it should be tolerable to network failures, or external service failures. Also, in case of a failure, we need to know why the failure happened. We can achieve that by logging, timeouts, retries, and if we are out of retries, then just fail completely, and let something like monit to get us up and running again.
It’s actually very simple to do – let’s define this function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Now, with any sync or async
body(), in case it throws some exception, it will retry several times with specified durations, and then fail. We wrap every single network call into that retry, something like this:
1 2 3
Unfortunately, we lose the return type of
body() in this case, this is would be a great use case for method generics (there is a ticket though).
Every network call in dartdocs.org scripts is wrapped with
retry(), and it greatly reduces the number of failures, which may happen just because the network had problems or some service had occasional internal server error.
It’s a good practice to specify meaningful timeouts for the things that are not under your control (like network calls, or external shell commands runs). There is the
Future.timeout() method, which completes a future after specified duration, you use it somewhat like this:
I used this approach everywhere at first, but then figured out that for the shell scripts it doesn’t really work – if the shell script hangs, timeout fires, and the Dart script continues to run, but that script is not killed, it still hangs. So, a better approach would be to use
timeout program, which kills the script and exits with non-zero status after specified timeout.
Sometimes things go wrong, and we need to know why. To figure out why we get some exception, it’s usually not enough to have just that exception and the stack trace, you usually need to know what happened before that, that’s why we need logging.
The most popular logging package in Dart is named (surprisingly!) logging. The nice thing about it, that it gives a stream with all the log records as part of its API, so you can subscribe to that stream and do whatever you want with it – write it to STDOUT, to a file, etc.
E.g., if we just want to print every log record into STDOUT, it may look something like this:
1 2 3
Since the code is async from the top to the bottom, in case of failure, what’d you expect to see in the stacktrace? E.g. for this code:
1 2 3 4 5 6 7 8 9 10 11 12 13
Probably something like this:
1 2 3 4 5 6 7 8 9 10
The first line shows where the exception happened, but we have no idea where we came from when we reached that line, making the stacktrace useless. This is what stack traces usually look like in heavily async programs, and that’s one of the reasons why it’s hard to debug them.
Thankfully, stack_trace package gives you the tools to solve that problem. If you wrap everything into
Chain.capture, like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
you’ll get way nicer stack traces:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
It adds some performance and memory penalty, but it’s neglectable for the dartdocs.org scripts, and simplifies debugging a lot.
All in all, it was a pretty straightforward project, and Dart didn’t give me any unpleasant surprises, everything works just fine and as expected, and generates documentation for the new packages every day. Check it out, if you haven’t yet!