Qubyte Codes

Parsing input from stdin to structures in Rust

Published

Last updated

I've been working on this post for a while, but about a month ago my firstborn arrived, and he's been getting the lion's share of my attention. I start work again tomorrow, so I decided to just publish this post and be done with it. Apologies if it's a little rough around the edges!

In a post a while back I described in depth my solution to an Advent of Code challenge. This year I took part again, but rather than using Node.js to write programs to solve challenges I decided to use Rust.

The input for these challenges fell into one of three broad categories; input short enough to provide as an argument to the solver program, longer input as a single line in a file, and input over many lines in a file. In the last case, each line usually represents an individual structure of some kind in a collection. In all cases, I chose to keep the input separate. I'll be focussing on how I approached the last case for this article.

Rust has excellent IO support, but parsing of strings into structures is more difficult than for a language like JavaScript. As a meta-challenge, I found parsing input interesting, and I learnt a lot about iterators, the Option and Result types, and type casting in the process. I'll use the challenge of day 10 for illustrative purposes.

To be unix-y, I decided to read input from stdin rather than reading from a file.

cargo run < input.txt

On day 10 the data looked like

position=<-10427, -42253> velocity=< 1,  4>
position=< 21343,  42515> velocity=<-2, -4>
position=<-10417, -52846> velocity=< 1,  5>

Each line represents what I will call a particle, including its initial position and velocity.

Rust ships with some nice functionality for working with input from stdin. The main function for my solution to Day 10 looks like

use std::io::{stdin, BufRead};

// ...

fn main() {
  let mut particles: Vec<Particle> = stdin()
    .lock()
    .lines()
    .filter_map(|line_result| line_result.ok()) // #1
    .filter_map(|line| line.parse().ok())       // #2
    .collect();                                 // #3

  // use particles
}

I like the readability of this chain. stdin().lock().lines() returns an iterator which yields lines, each of which is a Result containing either a string or an error. The rest of this chain filters out errors (#1), parses line strings into a custom Particle type (#2), and finally collects elements into a vector of particles (#3)`.

Steps #1 and #2 both make use of filter_map. This is a method of iterators which takes a closure. The closure takes an element and returns an Option. When the option is None the element is filtered out. When it returns Some(value) the element is mapped to value in the ongoing stream. Since elements of the line iterator are Results, I call ok on them, which returns an Option type which discards an Err(error) for a None. filter_map then unwraps the value for the next step in the chain, or filters out the None (so line errors are effectively ignored).

Early on in my journey into Rust I found Results and Options quite annoying. I wrote a lot of ifs and unwraps to force values out of them. After I while I came across things like filter_map, if let, and match for working with them more fluently, I realised that they're really neat.

The next step (#2) is to parse the line string into a Particle structure wrapped in a result, and again discard errors and unwrap. Finally (#3) the results are collected into a vector for easy use in the rest of the program.

You might have noticed that I've skipped a big chunk of what's going on in #2. How do I parse lines into a structure which I've defined? By implementing std::str::FromStr for the Particle structure, parse gains the ability to parse to Particle. In the chain above, the type of particles is Vec<Particle>, which is enough for Rust to infer that the call to parse means to parse to Particle. This type annotation is also used by collect to determine that we want Particles to be collected into a vector.

Here's the Particle definition:

struct Particle {
  position_x: isize,
  position_y: isize,
  velocity_x: isize,
  velocity_y: isize
}

There's not a lot to see here. It's really just a wrapper for x and y positions and velocities as read from the input. I assumed that isize will be sufficient to contain any values read (though this probably isn't the correct integer type to use).

It's possible for a string to not parse to a Particle. For this, a new type of error needs to be defined. I'm just going to put it below here since there's a lot I don't understand about defining errors in Rust yet.

use std::{
  fmt::{self, Display, Formatter},
  error::Error,
  num::ParseIntError
};

#[derive(Debug, Clone)]
struct ParseParticleError;

impl Display for ParseParticleError {
  fn fmt(&self, f: &mut Formatter) -> fmt::Result {
    write!(f, "Unable to parse to a Particle.")
  }
}

impl Error for ParseParticleError {
  fn description(&self) -> &str {
    "Unable to parse to a Particle."
  }

  fn cause(&self) -> Option<&Error> {
    None
  }
}

impl From<ParseIntError> for ParseParticleError {
  fn from(_error: ParseIntError) -> Self {
    ParseParticleError
  }
}

The one part I will dwell on is the final implementation. It allows ParseIntError errors to be cast to ParseParticleError. This becomes handy in the FromStr implementation:

use regex::Regex;
use lazy_static::lazy_static;
use std::str::FromStr;

const PARTICLE_REGEX: &str = r"<\s*(-?\d+),\s*(-?\d+)>.*<\s*(-?\d+),\s*(-?\d)>";

impl FromStr for Particle {
  type Err = ParseParticleError;

  fn from_str(particle_str: &str) -> Result<Self, Self::Err> {
    lazy_static! {
      static ref regex: Regex = Regex::new(PARTICLE_REGEX).unwrap();
    }

    regex.captures(particle_str)
      .ok_or(ParseParticleError)
      .and_then(|cap| Ok(Particle {
        position_x: cap[1].parse()?,
        position_y: cap[2].parse()?,
        velocity_x: cap[3].parse()?,
        velocity_y: cap[4].parse()?
      }))
  }
}

While Rust doesn't come with a library for working with regular expressions, there is the excellent regex crate on crates.io. I used this and lazy_static to match and extract numbers (as strings) from each line of the input. This is a quick-and-dirty solution to parsing input. Rust has other crates for parsing input which may be a better fit for other purposes. I found their APIs difficult to comprehend and laborious though and given that the source of the input here is low risk and the regular expression unsophisticated I decided it was a reasonable course.

The creation of the regular expression is wrapped in lazy_static, which means that this only occurs once, and subsequent calls to from_str will reuse it. The documentation for the regex library suggests the use of lazy_static, so there was no ingenuity on my part.

The regular expression matches lines which look like lines of the above input sample, capturing the numbers (including a leading - if there is one).

Onto the chain! The regular expression is applied to particle_str. The call to captures returns an option. The ok_or step in the chain turns a None into a ParseParticleError. When the regular expression finds no match in a string then it cannot be parsed to a particle!

The next step is an and_then. This unwraps Some(value) to value and passes the value to its closure. Errors are waved past by this step. The return value of the closure is a Result. There are four integers to parse. The regular expression already filters out most issues, but an integer might still be too large to be represented as an isize. The ? operator unwraps result so that a Particle structure can be composed. If any isize parse results in an error, the error is immediately returned (and cast to a ParseParticleError). Since the return value of this closure must be a Result, the particle is wrapped in an Ok.

All of this effort to implement string parsing for Particle might seem redundant given that errors are filtered out by the main function anyway, but I enjoyed exploring this functionality of Rust. If you're interested in the full code of my solution to day 10, you can find it here.

Update 2019-04-22

I was told today about an interesting crate which automates a lot of the parsing work in my code through use of a macro. The recap crate neatly avoids the need to manually apply a regular expression and extract elements from it, parsing each individually.

Adding parsing of lines to particle structures now looks like:

use recap::Recap;
use serde::Deserialize;

#[derive(Debug, Clone, Deserialize, Recap)]
#[recap(regex = r"<\s*(?P<position_x>-?\d+),\s*(?P<position_y>-?\d+)>.*<\s*(?P<velocity_x>-?\d+),\s*(?P<velocity_y>-?\d)>")]
struct Particle {
    position_x: isize,
    position_y: isize,
    velocity_x: isize,
    velocity_y: isize
}

That's it! Errors produced (or lack thereof) might be quite different, I've not checked it out in detail yet. For the purposes of the task though, it works well. Named capture groups isolate elements by name, and the corresponding type of the field in the particle struct is used to parse each. A gist containing the complete updated solution can be found here.