Table of Contents
Serde's derive is fantastic but if you want humans to read and write files, you can't just depend on the standard derive for everything.
Having your values, even the composite ones, be written as simple strings is often more convenient.
Why
Skip to the How if the motivation and design are clear and you just want to copy paste some snippets.
Here's a real key-binding configuration:
"keybindings": {
"w": "toggle-wrap",
"n": "job:nextest",
"ctrl-u": "scroll-pages(-1)",
"ctrl-e": "export:analysis"
}
As you see, it's easy to read and modify.
If you know bacon, the meaning of those binding is obvious. If not, let's say that:
- the w key toggles wrapping lines (it's an "internal" action in bacon)
- the n key triggers the "nextest" job
- the ctrl-u combination triggers scrolling 1 page up
- the ctrl-e combination triggers the "analysis export
Bound actions are defined in this enum:
/// An action that can be mapped to a key
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum Action {
Export(String),
Internal(Internal),
Job(JobRef),
}
JobRef
and Internal
both are enums too, with some variants holding structs.
I won't bore you with the dozen involved types.
Now, let's suppose we use the derive based default serialization for actions:
"keybindings": {
"w": {
"Internal": "ToggleWrap"
},
"n": {
"Job": {
"Concrete": {
"name_or_alias": {
"Name": "nextest"
}
}
}
},
"ctrl-u": {
"Internal": {
"Scroll": {
"Pages": -1
}
}
},
"ctrl-e": {
"Export": "analysis"
}
}
Eeek. Implementation is leaking and it's ugly.
And it would be worse if keys were also just derive based (I use crokey here).
And worse again with a language making structures harder to decipher, like TOML instead of JSON.
If you want humans to be able to read or write those files, you need the first version.
That is, in many cases, you want values to be written as simple strings.
The difference is especially striking when enums are involved.
Choose a format
This can't be automatic: instead of having Serde ensure a standard symmetric (de)serialization, you have to know the value space, and how it will evolve.
If the value space is too uncertain, don't do this, stick to the derive and improve what you can with derive attributes. Let's now assume you know well enough what your values are.
You have to decide the obvious logic which will make it possible to recognize variants (eg Internal
or JobRef
) and specify parameters (eg the job name or the number of pages to scroll).
Patterns I find convenient are
- prefixing values by type
- parameterizing with braces
(in reality, those aren't so different, they may even be interchangeable)
There's no problem being lenient enough when reading and for example make your format case insensitive. Different cases will be more natural in JSON, TOML, etc.
Start with FromStr/Display
Using the FromStr
and Display
traits as basis is most often convenient because
- those are standard traits in Rust
- the
parse()
function provided byFromStr
clearly shows the intent - you'll sometimes use the string representation as display or as input without involving a serialization language (if the difference isn't obvious, remember that a string when converted to JSON is wrapped in parenthesis)
Here's the ScrollCommand
of the Action::Scroll
variant of Internal
, with its string representation:
use {
lazy_regex::*,
serde::{de, Deserialize, Deserializer, Serialize, Serializer},
std::{fmt, str::FromStr},
};
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum ScrollCommand {
Top,
Bottom,
Lines(i32),
Pages(i32),
}
impl fmt::Display for ScrollCommand {
fn fmt(
&self,
f: &mut fmt::Formatter,
) -> fmt::Result {
match self {
Self::Top => write!(f, "scroll-to-top"),
Self::Bottom => write!(f, "scroll-to-bottom"),
Self::Lines(n) => write!(f, "scroll-lines({n})"),
Self::Pages(n) => write!(f, "scroll-pages({n})"),
}
}
}
impl std::str::FromStr for ScrollCommand {
type Err = &'static str;
fn from_str(s: &str) -> Result<Self, Self::Err> {
regex_switch!(s,
"^scroll[-_]?to[-_]?top$"i => Self::Top,
"^scroll[-_]?to[-_]?bottom$"i => Self::Bottom,
r#"^scroll[-_]?lines?\((?<n>[+-]?\d{1,4})\)$"#i => Self::Lines(
n.parse().unwrap() // can't fail because [+-]?\d{1,4}
),
r#"^scroll[-_]?pages?\((?<n>[+-]?\d{1,4})\)$"#i => Self::Pages(
n.parse().unwrap() // can't fail because [+-]?\d{1,4}
),
)
.ok_or("not a valid scroll command")
}
}
#[test]
fn test_scroll_command_string_round_trip() {
let commands = [
ScrollCommand::Lines(3),
ScrollCommand::Lines(-12),
ScrollCommand::Pages(1),
ScrollCommand::Pages(-2),
ScrollCommand::Top,
ScrollCommand::Bottom,
];
for command in commands {
assert_eq!(command.to_string().parse(), Ok(command));
}
}
#[test]
fn test_scroll_command_string_alternative_writings() {
assert_eq!("SCROLL-TO-TOP".parse(), Ok(ScrollCommand::Top));
assert_eq!("SCROLL_LINES(5)".parse(), Ok(ScrollCommand::Lines(5)));
assert_eq!("scroll-lines(+12)".parse(), Ok(ScrollCommand::Lines(12)));
assert_eq!("scroll_pages(-2)".parse(), Ok(ScrollCommand::Pages(-2)));
}
This implementations starts with the Display
implementation, so that it documents the rest.
The regex_switch!
macro makes the code symmetric and simple, while using regular expressions enables a lenient parsing (if necessary).
This macro comes with the lazy-regex crate.
A lenient parsing helps users who have other habits but you usually don't have to go to such extent. I increased the leniency here for illustration.
Notice how I added the test in my example.
Testing the round trip should always be done.
Such test is easy to write when you write the FromStr
implementation (or before), and it's easy to maintain.
Add Serde implementations
To base serialization and deserialization on the string representation we just defined, some boilerplate is needed:
impl Serialize for ScrollCommand {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error> where S: Serializer {
serializer.serialize_str(&self.to_string())
}
}
impl<'de> Deserialize<'de> for ScrollCommand {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error> where D: Deserializer<'de> {
let s = String::deserialize(deserializer)?;
Self::from_str(&s).map_err(de::Error::custom)
}
}
If you don't want to have to add this boilerplate on many types, you may also use serde_with and replace those impl
blocks with 2 derive attributes:
#[derive(
Debug, Clone, Copy, PartialEq, Eq, Hash,
serde_with::DeserializeFromStr, serde_with::SerializeDisplay,
)]
pub enum ScrollCommand {
Those attributes are so simple and obvious, they should be in serde itself, but unfortunately they aren't.
Not just configuration
The example of this article shouldn't let you think this approach is reserved to configuration.
I find it even more important for data exchange as soon as the data has to be understood by humans.
This is for example the case of any serious REST/JSON API, for which you'll have lots of examples in documentation, some of them generated. API users read those examples and play with curl-like tools to prepare the calls their application will make.
There, you really want values to be easy to read and write.