String oriented serialization in Rust with Serde

6 minute read Published: 2024-10-13

Serde's derive is fantastic but if you want humans to read and write files, you can't just depend on the standard derive for everything.

Having your values, even the composite ones, be written as simple strings is often more convenient.

Why

Skip to the How if the motivation and design are clear and you just want to copy paste some snippets.

Here's a real key-binding configuration:

"keybindings": {
   "w": "toggle-wrap",
   "n": "job:nextest",
   "ctrl-u": "scroll-pages(-1)",
   "ctrl-e": "export:analysis"
}

As you see, it's easy to read and modify.

If you know bacon, the meaning of those binding is obvious. If not, let's say that:

Bound actions are defined in this enum:

/// An action that can be mapped to a key
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum Action {
    Export(String),
    Internal(Internal),
    Job(JobRef),
}

JobRef and Internal both are enums too, with some variants holding structs. I won't bore you with the dozen involved types.

Now, let's suppose we use the derive based default serialization for actions:

"keybindings": {
   "w": {
     "Internal": "ToggleWrap"
   },
   "n": {
     "Job": {
       "Concrete": {
         "name_or_alias": {
           "Name": "nextest"
         }
       }
     }
   },
   "ctrl-u": {
     "Internal": {
       "Scroll": {
         "Pages": -1
       }
     }
   },
   "ctrl-e": {
     "Export": "analysis"
   }
}

Eeek. Implementation is leaking and it's ugly.

And it would be worse if keys were also just derive based (I use crokey here).

And worse again with a language making structures harder to decipher, like TOML instead of JSON.

If you want humans to be able to read or write those files, you need the first version.

That is, in many cases, you want values to be written as simple strings.

The difference is especially striking when enums are involved.

Choose a format

This can't be automatic: instead of having Serde ensure a standard symmetric (de)serialization, you have to know the value space, and how it will evolve.

If the value space is too uncertain, don't do this, stick to the derive and improve what you can with derive attributes. Let's now assume you know well enough what your values are.

You have to decide the obvious logic which will make it possible to recognize variants (eg Internal or JobRef) and specify parameters (eg the job name or the number of pages to scroll).

Patterns I find convenient are

(in reality, those aren't so different, they may even be interchangeable)

There's no problem being lenient enough when reading and for example make your format case insensitive. Different cases will be more natural in JSON, TOML, etc.

Start with FromStr/Display

Using the FromStr and Display traits as basis is most often convenient because

Here's the ScrollCommand of the Action::Scroll variant of Internal, with its string representation:

use {
    lazy_regex::*,
    serde::{de, Deserialize, Deserializer, Serialize, Serializer},
    std::{fmt, str::FromStr},
};

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum ScrollCommand {
    Top,
    Bottom,
    Lines(i32),
    Pages(i32),
}
impl fmt::Display for ScrollCommand {
    fn fmt(
        &self,
        f: &mut fmt::Formatter,
    ) -> fmt::Result {
        match self {
            Self::Top => write!(f, "scroll-to-top"),
            Self::Bottom => write!(f, "scroll-to-bottom"),
            Self::Lines(n) => write!(f, "scroll-lines({n})"),
            Self::Pages(n) => write!(f, "scroll-pages({n})"),
        }
    }
}
impl std::str::FromStr for ScrollCommand {
    type Err = &'static str;
    fn from_str(s: &str) -> Result<Self, Self::Err> {
        regex_switch!(s,
            "^scroll[-_]?to[-_]?top$"i => Self::Top,
            "^scroll[-_]?to[-_]?bottom$"i => Self::Bottom,
            r#"^scroll[-_]?lines?\((?<n>[+-]?\d{1,4})\)$"#i => Self::Lines(
                n.parse().unwrap() // can't fail because [+-]?\d{1,4}
            ),
            r#"^scroll[-_]?pages?\((?<n>[+-]?\d{1,4})\)$"#i => Self::Pages(
                n.parse().unwrap() // can't fail because [+-]?\d{1,4}
            ),
        )
        .ok_or("not a valid scroll command")
    }
}
#[test]
fn test_scroll_command_string_round_trip() {
    let commands = [
        ScrollCommand::Lines(3),
        ScrollCommand::Lines(-12),
        ScrollCommand::Pages(1),
        ScrollCommand::Pages(-2),
        ScrollCommand::Top,
        ScrollCommand::Bottom,
    ];
    for command in commands {
        assert_eq!(command.to_string().parse(), Ok(command));
    }
}
#[test]
fn test_scroll_command_string_alternative_writings() {
    assert_eq!("SCROLL-TO-TOP".parse(), Ok(ScrollCommand::Top));
    assert_eq!("SCROLL_LINES(5)".parse(), Ok(ScrollCommand::Lines(5)));
    assert_eq!("scroll-lines(+12)".parse(), Ok(ScrollCommand::Lines(12)));
    assert_eq!("scroll_pages(-2)".parse(), Ok(ScrollCommand::Pages(-2)));
}

This implementations starts with the Display implementation, so that it documents the rest.

The regex_switch! macro makes the code symmetric and simple, while using regular expressions enables a lenient parsing (if necessary). This macro comes with the lazy-regex crate.

A lenient parsing helps users who have other habits but you usually don't have to go to such extent. I increased the leniency here for illustration.

Notice how I added the test in my example. Testing the round trip should always be done. Such test is easy to write when you write the FromStr implementation (or before), and it's easy to maintain.

Add Serde implementations

To base serialization and deserialization on the string representation we just defined, some boilerplate is needed:

impl Serialize for ScrollCommand {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error> where S: Serializer {
        serializer.serialize_str(&self.to_string())
    }
}
impl<'de> Deserialize<'de> for ScrollCommand {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error> where D: Deserializer<'de> {
        let s = String::deserialize(deserializer)?;
        Self::from_str(&s).map_err(de::Error::custom)
    }
}

If you don't want to have to add this boilerplate on many types, you may also use serde_with and replace those impl blocks with 2 derive attributes:

#[derive(
    Debug, Clone, Copy, PartialEq, Eq, Hash,
    serde_with::DeserializeFromStr, serde_with::SerializeDisplay,
)]
pub enum ScrollCommand {

Those attributes are so simple and obvious, they should be in serde itself, but unfortunately they aren't.

Not just configuration

The example of this article shouldn't let you think this approach is reserved to configuration.

I find it even more important for data exchange as soon as the data has to be understood by humans.

This is for example the case of any serious REST/JSON API, for which you'll have lots of examples in documentation, some of them generated. API users read those examples and play with curl-like tools to prepare the calls their application will make.

There, you really want values to be easy to read and write.