Parsing input from stdin to structures in Rust
Published
Last updated
I've been working on this post for a while, but about a month ago my firstborn arrived, and he's been getting the lion's share of my attention. I start work again tomorrow, so I decided to just publish this post and be done with it. Apologies if it's a little rough around the edges!
In a post a while back I described in depth my solution to an Advent of Code challenge. This year I took part again, but rather than using Node.js to write programs to solve challenges I decided to use Rust.
The input for these challenges fell into one of three broad categories; input short enough to provide as an argument to the solver program, longer input as a single line in a file, and input over many lines in a file. In the last case, each line usually represents an individual structure of some kind in a collection. In all cases, I chose to keep the input separate. I'll be focussing on how I approached the last case for this article.
Rust has excellent IO support, but parsing of strings into structures is more
difficult than for a language like JavaScript. As a meta-challenge, I found
parsing input interesting, and I learnt a lot about iterators, the Option
and
Result
types, and type casting in the process. I'll use the challenge of
day 10 for illustrative purposes.
To be unix-y, I decided to read input from stdin rather than reading from a file.
cargo run < input.txt
On day 10 the data looked like
position=<-10427, -42253> velocity=< 1, 4>
position=< 21343, 42515> velocity=<-2, -4>
position=<-10417, -52846> velocity=< 1, 5>
Each line represents what I will call a particle, including its initial position and velocity.
Rust ships with some nice functionality for working with input from stdin. The
main
function for my solution to Day 10 looks like
use std::io::{stdin, BufRead};
// ...
fn main() {
let mut particles: Vec<Particle> = stdin()
.lock()
.lines()
.filter_map(|line_result| line_result.ok()) // #1
.filter_map(|line| line.parse().ok()) // #2
.collect(); // #3
// use particles
}
I like the readability of this chain. stdin().lock().lines()
returns an
iterator which yields lines, each of which is a Result
containing
either a string or an error. The rest of this chain filters out errors (#1
),
parses line strings into a custom Particle
type (#2
), and finally collects
elements into a vector of particles (#3
)`.
Steps #1
and #2
both make use of filter_map. This is a method of
iterators which takes a closure. The closure takes an element and returns an
Option
. When the option is None
the element is filtered out. When it returns
Some(value)
the element is mapped to value
in the ongoing stream. Since
elements of the line iterator are Result
s, I call ok
on them, which returns
an Option
type which discards an Err(error)
for a None
.
filter_map
then unwraps the value for the next step in the chain, or filters
out the None
(so line errors are effectively ignored).
Early on in my journey into Rust I found Result
s and Options
quite annoying.
I wrote a lot of if
s and unwraps
to force values out of them. After I while
I came across things like filter_map
, if let
, and match
for working with
them more fluently, I realised that they're really neat.
The next step (#2
) is to parse the line string into a Particle
structure
wrapped in a result, and again discard errors and unwrap. Finally (#3
) the
results are collected into a vector for easy use in the rest of the program.
You might have noticed that I've skipped a big chunk of what's going on in #2
.
How do I parse lines into a structure which I've defined? By implementing
std::str::FromStr
for the Particle
structure, parse
gains the
ability to parse to Particle
. In the chain above, the type of particles
is
Vec<Particle>
, which is enough for Rust to infer that the call to parse
means to parse to Particle
. This type annotation is also used by collect to
determine that we want Particles
to be collected into a vector.
Here's the Particle definition:
struct Particle {
position_x: isize,
position_y: isize,
velocity_x: isize,
velocity_y: isize
}
There's not a lot to see here. It's really just a wrapper for x
and y
positions and velocities as read from the input. I assumed that isize
will
be sufficient to contain any values read (though this probably isn't the correct
integer type to use).
It's possible for a string to not parse to a Particle
. For this, a new type
of error needs to be defined. I'm just going to put it below here since there's
a lot I don't understand about defining errors in Rust yet.
use std::{
fmt::{self, Display, Formatter},
error::Error,
num::ParseIntError
};
#[derive(Debug, Clone)]
struct ParseParticleError;
impl Display for ParseParticleError {
fn fmt(&self, f: &mut Formatter) -> fmt::Result {
write!(f, "Unable to parse to a Particle.")
}
}
impl Error for ParseParticleError {
fn description(&self) -> &str {
"Unable to parse to a Particle."
}
fn cause(&self) -> Option<&Error> {
None
}
}
impl From<ParseIntError> for ParseParticleError {
fn from(_error: ParseIntError) -> Self {
ParseParticleError
}
}
The one part I will dwell on is the final implementation. It allows
ParseIntError
errors to be cast to ParseParticleError
. This becomes handy in
the FromStr
implementation:
use regex::Regex;
use lazy_static::lazy_static;
use std::str::FromStr;
const PARTICLE_REGEX: &str = r"<\s*(-?\d+),\s*(-?\d+)>.*<\s*(-?\d+),\s*(-?\d)>";
impl FromStr for Particle {
type Err = ParseParticleError;
fn from_str(particle_str: &str) -> Result<Self, Self::Err> {
lazy_static! {
static ref regex: Regex = Regex::new(PARTICLE_REGEX).unwrap();
}
regex.captures(particle_str)
.ok_or(ParseParticleError)
.and_then(|cap| Ok(Particle {
position_x: cap[1].parse()?,
position_y: cap[2].parse()?,
velocity_x: cap[3].parse()?,
velocity_y: cap[4].parse()?
}))
}
}
While Rust doesn't come with a library for working with regular expressions,
there is the excellent regex
crate on crates.io. I used this
and lazy_static
to match and extract numbers (as strings) from each line
of the input. This is a quick-and-dirty solution to parsing input. Rust has
other crates for parsing input which may be a better fit for other purposes. I
found their APIs difficult to comprehend and laborious though and given that
the source of the input here is low risk and the regular expression
unsophisticated I decided it was a reasonable course.
The creation of the regular expression is wrapped in lazy_static
, which means
that this only occurs once, and subsequent calls to from_str
will reuse it.
The documentation for the regex
library suggests the use of lazy_static
, so
there was no ingenuity on my part.
The regular expression matches lines which look like lines of the above input
sample, capturing the numbers (including a leading -
if there is one).
Onto the chain! The regular expression is applied to particle_str
. The call
to captures
returns an option. The ok_or
step in the chain turns a None
into a ParseParticleError
. When the regular expression finds no match in a
string then it cannot be parsed to a particle!
The next step is an and_then
. This unwraps Some(value)
to value
and passes
the value to its closure. Errors are waved past by this step. The return value
of the closure is a Result
. There are four integers to parse. The regular
expression already filters out most issues, but an integer might still be too
large to be represented as an isize
. The ?
operator unwraps result so
that a Particle
structure can be composed. If any isize parse results in an
error, the error is immediately returned (and cast to a ParseParticleError
).
Since the return value of this closure must be a Result
, the particle is
wrapped in an Ok
.
All of this effort to implement string parsing for Particle
might seem
redundant given that errors are filtered out by the main function anyway, but I
enjoyed exploring this functionality of Rust. If you're interested in the full
code of my solution to day 10, you can find it here.
Update 2019-04-22
I was told today about an interesting crate which automates a lot of the parsing work in my code through use of a macro. The recap crate neatly avoids the need to manually apply a regular expression and extract elements from it, parsing each individually.
Adding parsing of lines to particle structures now looks like:
use recap::Recap;
use serde::Deserialize;
#[derive(Debug, Clone, Deserialize, Recap)]
#[recap(regex = r"<\s*(?P<position_x>-?\d+),\s*(?P<position_y>-?\d+)>.*<\s*(?P<velocity_x>-?\d+),\s*(?P<velocity_y>-?\d)>")]
struct Particle {
position_x: isize,
position_y: isize,
velocity_x: isize,
velocity_y: isize
}
That's it! Errors produced (or lack thereof) might be quite different, I've not
checked it out in detail yet. For the purposes of the task though, it works
well. Named capture groups isolate elements by name, and the corresponding type
of the field in the particle struct
is used to parse each. A gist containing
the complete updated solution can be found here.