Why broot doesn't enter zip archives

7 minute read Published: 2024-07-08

Broot lets you navigate your filesystem, gives you an overview of any directory, previews file content. It does it by trimming folders in a balanced way, clearly outlining the structure of the tree.

Entering zip archives and navigating them the same way looks like a natural extension, doesn't it ?

That's what I thought at first.

An entry in a Zip archive looks a lot like a node in your regular filesystems: it has a name, and either a list of children or a file content.

So I made a working prototype.

Design of the zip navigating prototype

The idea was to

  1. keep having a path for every node (eg /home/dys/my-archive.zip/folderA/B/some-file.txt)
  2. make every path in the application be accompagned by a btype.

The btype is light but lets any part of broot know how to list children, to access the content, etc.

(the next few paragraphs will be easier to read if you know Rust)

Here's the BType:

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum BType {
    RealFile,
    RealDir,
    BrokenSymlink(String),
    Symlink {
        final_is_dir: bool,
        final_target: PathBuf,
    },
    Trash,
    TrashItem {
        id: OsString,
    },
    ZipRoot,
    ZipEntryFile {
        root_path_len: usize,
    },
    ZipEntryDir {
        root_path_len: usize,
    },
}

As you may have noticed, there are also btypes aimed at managing the trash. And more were envisionned, because exploring trees ala Broot is attractive enough to make you want to do it on other trees (other example: a provided list of paths given to stdin).

The BType is small enough, so that having it in memory next to a path isn't really a problem.

(a little more Rust ahead)

To deal with this, I wrote a trait, BPathTrait, (sorry for the name), an owned struct, BPathBuf, and a ref struct, BPath:

The owned struct is obvious:

#[derive(Debug, Clone)]
pub struct BPathBuf {
    pub btype: BType,
    pub path: PathBuf,
}

The BPath struct can be built from the BPathBuf and is what's visible in most APIs:

#[derive(Debug, Clone, Copy)]
pub struct BPath<'s> {
    pub btype: &'s BType,
    pub path: &'s Path,
}

And the trait contains all the functions

pub trait BPathTrait {
    fn btype(&self) -> &BType;
    fn path(&self) -> &Path;

    // about 20 functions with implementation here

The implementations tell you, for a BPath:

To determine what are the children of a path whose type is ZipEntryDir, you cut the path (basically a string) at root_path_len bytes: the left part is the path to the zip archive, and the right part is the path of the zip entry in the archive.

For example, if you have this bpath:

BPath {
    path: "/home/dys/my-archive.zip/folderA/B",
    btype: ZipRootDir {
        root_path_len: 48,
    },
}

then to list children, you open the archive found by taking the first 48 bytes of the path (i.e. "/home/dys/my-archive.zip") and you filter the entries to get the children of "folderA/B". Then you close the archive and you don't keep it in memory, so that you can explore a filesystem with hundred of thousands of zip archives and not have them in memory.

I checked this design works well enough, making it possible to browse my disks with hundreds of zip archives, search by name, and not stop at zip archive boundaries (unless it's a zip archive in a zip archive, then it wasn't opened).

And it can be extended to other kinds of tree nodes.

This was quite sexy.

The problems

As I said, I implemented search on names (and paths). I stopped before implementing searches on file content, and most previews (I still implemented previewing images).

The problem with the content of files which aren't real files on your disks but zip entries, is that you can't do random access unless you load them wholly in memory, which is a problem when it's a big file. Reading the content of a zip entry isn't terribly slow but you can't easily access the end of the file without reading it entirely.

Without random access, search by content is slower, and many kinds of previews are impossible (reminder: broot displays and searches your 2GB log file without breaking a sweat).

My initial thought with this was that it wasn't a problem: most often you wouldn't automatically enter zip archives thanks to the special paths mechanism, and users would surely understand that sometimes displaying a previewed file was a little different or slower when in a zip archive ? So I put that aside.

Then I had a look at other properties of files (permissions, date, etc.). I made those optional and put that little difference aside: users would understand not seeing them on some nodes. This made everything a little more complex, but that was hidden in code and managable.

What really made a mess of everything was the verbs. Especially user defined verbs, that is the actions you configure in broot to act on verbs.

Because for almost any verb, the kind of file matters a lot.

Imagine something as simple as a mv: it really doesn't work the same if the source or destination is a trash item, or an entry in a zip archive. Both the implementation and the ergonomics become more complex.

I thought about solving this by having both filters for user defined verbs and heavy specific implementations for basic file managing operations. But this introduced a border between core operations and what a user can do, this was making broot less open.

And this complexity doesn't really make sense for users: adding a burden of filters and heavier semantics for every verb definition doesn't look reasonnable when you use broot to explore and manage your files.

Thinking again, it was more and more obvious that the shiny new feature was a regression, and was breaking the spirit of Broot.

What I realized

The difficulty in making bpaths work for verb wasn't an implementation problem, it was a problem of essence.

Paths in your real filesystems are much more interesting than just any kind of tree nodes.

And if you try to limit yourself to their basic properties, most of the value of Broot disappears.

Assuming that you can efficiently read the content of a file (to search, jump to a part of a file, etc.), that you can do basic operations like copying or editing a file, this has a lot of value.

Making it easy and obvious for user to add verbs without thinking about use-cases that they don't have themselves, this also has a lot of value.

Broot is open and clear, and must stay so.

Conclusion

So this is the story of a failure.

I designed an abstraction making it possible to apply the novel trimmed-tree exploration applicable to any kind of tree. I was happy of the design. I started implementing it. Then I realized it was a bad idea. And I scrapped it.

An interesting thing is how unconsequential this was. I wasn't even sad.

I hadn't announced my intention. I had no boss or team to explain to why the work I had previously promoted was in fact useless. I didn't spend weeks working on a doomed feature.

This once more proved how important it is to be able to toy with new ideas, experiment, and drop them gracefully when they don't turn out to be good.