Libxo: Easy way to generate text, XML, JSON, and HTML

JNRowe · on July 14, 2023

Every time I see this I'm hoping that it has taken off, because it feels like such an obvious improvement. Instead of that we have some support for it in FreeBSD, and a homegrown solution in some packages like util-linux. Yeah, there are some concerns, but the concept seems sound and the implementation can be iterated upon.

For instance, years later and it still isn't packaged in Debian. If nothing out of the tens of thousands of Debian packages has a dependency on it there presumably must be a good reason.

It strikes me as one of those libtermkey¹/libvterm things where Leonerd pushed it for years before anybody really used it, despite it being a seemingly obvious improvement over the status quo.

¹ https://www.leonerd.org.uk/code/libtermkey/

ComputerGuru · on July 14, 2023

> Instead of that we have some support for it in FreeBSD

My google fu is failing me right now but FreeBSD also has a shared library used for reading/parsing config files and providing either a common or universal dsl for all conf files using the same library. This is one of the benefits of using an OS instead of a distribution - all the tools are developed holistically and refactors such as providing a shared, universal input or output format, sandboxing everything with capsicum, etc across the board are much more possible.

EDIT

Remembered it. Surprised at how bad Google was at finding this, though!

UCL - Universal Configuration Language [0]. Introduced in a paper by Allan Jude in 2015 [1]. Man page: libucl(3) [2].

[0]: https://github.com/vstakhov/libucl/

[1]: https://papers.freebsd.org/2015/bsdcan/allanjude-ucl/

[2]: https://man.freebsd.org/cgi/man.cgi?query=libucl&sektion=3&f...

loeg · on July 15, 2023

I wouldn't say ucl or capsicum has seen especially wide adoption inside FreeBSD. The monorepo helps, but the FreeBSD dev community just isn't that big and there isn't a huge incentive to work on using these things.

sam_bristow · on July 14, 2023

I would love having structured output from shell commands, but for now I'd settle for people using stdout and stderr correctly.

moody__ · on July 14, 2023

I see people say they would like this quite often. There is a command line interface that does do this, powershell is very structured in this exact way. But no one is throwing away their sh/bash/tcsh/ksh for powershell. I think this is just a classic example of the grass being more green on the other side.

biugbkifcjk · on July 15, 2023

I'd take powershell with proper system integration over bash any day.

moody__ · on July 15, 2023

The "with proper system integration" is the kicker. Powershell does run on linux[0], there is nothing stopping you or anyone else in creating this "proper integration", but it has yet to be done. I find this telling in what the real value proposition of this kind of shell is in the first place.

I do sympathize with you over bash, I have seen people wield it in quite horrific ways. In general I find that people are under utilizing their functions. For one example, functions in most shell languages have stdin, stdout, and stderr just like programs you call but I often see people miss this and thus miss the ability to compose said functions.

I switched over to using the plan9 rc[1] for my scripts years ago and never looked back.

[0] https://learn.microsoft.com/en-us/powershell/scripting/insta... [1] http://doc.cat-v.org/plan_9/4th_edition/papers/rc

biugbkifcjk · on July 15, 2023

Yea, I guess I could create that integration, but the key would be getting it included as the default on Linux installs. I like to use what is generally available, so that I'm not completely lost when I remote into an unfamiliar server.

boris · on July 15, 2023

> I'd settle for people using stdout and stderr correctly.

I use this as a litmus test when looking at an unfamiliar codebase: if the author couldn't care less about writing diagnostics to stderr instead of stdout, there is little chance they cared to get more tricky stuff right.

Another test that goes hand in hand with this one is to check if error messages use a consistent style such as all start with a capital or small letter and all end or don't end with a period. Extra bonus points for using consistent quoting style.

ary · on July 14, 2023

This, at least in concept, looks like a potential successor to printf() et al. The general accessibility of it is lacking given that it's a C-only API at the moment (there don't appear to be bindings for other languages), and I'm left questioning whether format strings are the best way. Perhaps worse is better in this case.

When thinking about this problem I've not been able to get beyond the decision of "should it be done with something like a builder pattern and a graph of objects/structures" or "should it be done with a DSL" (which is what I consider the format strings approach to be). A DSL is more immediately convenient when creating output, but when you want to understand the structure you're emitting it seems better to have code that is explicit and imperative.

loeg · on July 14, 2023

libxo is in practice a poor approach to generating structured output from unix utilities. There are at least a few problems. The format strings do not easily replace existing formatted prints, so it is not straightforward to adopt. For anything more complicated than simple row records, you have to change the structure of your program significantly and might as well just use a different path for formatting structured output. It is unaware of locales, and as a result, butchers text in non ASCII/UTF-8 encodings. Finally, a separate-binary-with-structured-text-output is a poor library interface to quite a lot of these utilities -- a callable C API would be more broadly useful.

zokier · on July 14, 2023

Seeing that this originates from FreeBSD ecosystem, did it get actually adopted widely in FreeBSD base system? At least I interpreted that to have been the goal: https://juniper.github.io/libxo/libxo-manual.html#can-you-sh...

tedunangst · on July 14, 2023

It's used inconsistently. df uses it, but not du. ps, but not ls.

yyyk · on July 14, 2023

Libxo is used in FreeBSD. I can't say I'm a fan of the approach though.

Typical printf usage is imperative and additive:

if (enter) printf("Hello "); else printf("Goodbye "); printf("World!\n");

Using the format string forces the programmer to keep implicit state (the document format) all over the place or get an inconsistent document. For example, imagine the first printf calls the column 'Text' and the others call it 'Output'. We can easily do this for a single format, but the complexity will get higher the more we add.

If you do this properly (emit to an object and render from that), the result is trivially consistent. The difficulty here is to get streaming, this is however not always required and can be achieved with a little effort.

mpweiher · on July 14, 2023

> The difficulty here is to get streaming

Polymorphic Write Streams do this.

ACM DL: https://dl.acm.org/doi/10.1145/3359619.3359748

pdf: http://www.hirschfeld.org/writings/media/WeiherHirschfeld_20...

Code: https://github.com/mpw/MPWFoundation/tree/master/Streams.sub...

Fast JSON parsing using this approach: https://blog.metaobject.com/2020/04/somewhat-less-lethargic-...

Presentation (DLS '19): https://www.youtube.com/watch?v=DG5MtsMojgI

lelanthran · on July 14, 2023

> Using the format string forces the programmer to keep implicit state (the document format) all over the place or get an inconsistent document.

I'm not understanding your objection[1]; surely you would only define the libxo format string once, and then reuse it everywhere? Without libxo you'd need to duplicate your code everywhere for every output format you want to support.

IOW, you'd have to construct your libxo format string using a string concatenation library, something like this:

   const char *s1 = "{:Text%7ju}";
   const char *s2 = "{:Output%7ju}";
   char *final = NULL;

   if (enter) {
      final = strdup (s1);
   } else {
      final = strdup (s2);
   }
   final = strconcat (final, s2);

Isn't that a better mechanism than printf?

[1] It's late, I've the flu and feeling a little stupid right now. Also, this is the first I've seen this project.

yyyk · on July 14, 2023

>surely you would only define the libxo format string once, and then reuse it everywhere

I would be rather scared of using a variable for a format string. IMHO, these types of format strings are an antipattern for a different reason - we don't see the format at point of use, if we make some too easy mistakes, we have a crash or CVE*. I think nowaways there are tools to do some verification**, and I guess we could use an IDE (but most C programmers don't?), but I am unfamiliar with any such tool which supports libxo style format strings.

* https://en.wikipedia.org/wiki/Format_string_attack

** IIRC, GCC/Clang eventually added a verifier? But it doesn't apply to all cases or scanf? I don't recall.

mananaysiempre · on July 14, 2023

> Typical printf usage is imperative and additive:

> if (enter) printf("Hello "); else printf("Goodbye "); printf("World!\n");

And unless you want your translator to hate your guts, you really, really mustn’t do this in user-facing output.

(OK, you can if you really want to and if you’re ready to give them the same tools[1], but it won’t be simple. Although I’m unaware of any professional translators supporting this either—most use a CAT, and this approach is deeply incompatible with all existing ones.)

[1] https://projectfluent.org/

yyyk · on July 15, 2023

Well, the issue still exists if we used regular sentences ending with a newline, allowing nice stuff for translators like printf positional arguments. The problem is that you have implicit state - the layout of the console or the columns of a json/xml - and no decent way to maintain it in a format string (because C tooling is poor, because this interface isn't entirely typesafe, etc.).

ComputerGuru · on July 14, 2023

Serde can kind of do this for rust projects, but you're usually constrained to outputs that are "identical but for the syntax/format" (i.e. same field names though perhaps with different naming conventions).

I've used that to convert configuration files from one language to the other, such as this json2toml and toml2json tool [0].

[0]: https://github.com/neosmart/toml2json

mananaysiempre · on July 14, 2023

> Serde can kind of do this for rust projects, but you're usually constrained to outputs that are "identical but for the syntax/format" (i.e. same field names though perhaps with different naming conventions).

Libxo’s distinguishing feature (IMO) is that its schema specifications are plaintext output with markup specifying which parts are data to be extracted into structured formats, so that you can port your usual Unix tool to it and not ruin its original ad-hoc output. I don’t know of anything positioned as a serialization library that can do this with comparable grace, Serde included.

Related: the section on marking up plaintext output in the Ivo essay[1].

[1] https://web.archive.org/web/20111204021526/http://lubutu.com... (discussed at the time at https://news.ycombinator.com/item?id=3300264)

ComputerGuru · on July 14, 2023

Yup. libxo is much more free-form, while anything going through a serialize-deserialize process is going to necessarily have to be more regular.

kristopolous · on July 14, 2023

I bet you could do some pretty clever heuristic hacks to wrap a bunch of programs in this especially if you attach to the process and say, clobber printf. I'm thinking in the spirit of rlwrap.

It's certainly more of a game genie approach but it might occasionally be awesome.

Norfair · on July 14, 2023

In Haskell we use autodocodec for this.

38 · on July 14, 2023

you can already do this in other languages. For example, here is Go:

    package main
    
    import (
       "encoding/json"
       "encoding/xml"
       "os"
    )
    
    type wc struct {
       File []file 
    }
    
    type file struct {
       Lines      int 
       Words      int 
       Characters int
       Filename   string
    }
    
    func main() {
       etc_motd := wc{
          []file{
             {25, 1165, 1140, "/etc/motd"},
          },
       }
       json.NewEncoder(os.Stdout).Encode(etc_motd)
       xml.NewEncoder(os.Stdout).Encode(etc_motd)
    }

paulddraper · on July 14, 2023

Gee thanks mister!

38 · on July 14, 2023

get outta here kid.