Emacs Lisp shorthands as namespacing system

In Emacs version 28 Emacs developers introduced so-called read symbol shorthands. If you’re interested in the rationale, feel free to search the Emacs developer mailing list for the discussion. However, it does seem that not everyone likes the idea of shorthands as a substitution for namespaces (or packages, if you’re coming from Common Lisp). Neither did I. And recently, a branch was set up that implements Common Lisp-style packages for Emacs. In the discussion Richard Stallman, however, notes:

CL packages are the wrong way to implement packages in Lisp. As I explained in a discussion two years ago, packages implemented using obarrays (or equivalent) don’t work reliably.

We have a much better basis for Lisp packages in the shorthands mechanism. It only needs to be completed.

Richard Stallman on CL packages landed

I’ve read the discussion further and wanted to express some of my thoughts on the namespacing problem and how shorthands may help solve it in Emacs.

While I’m not entirely sure how CL packages work underneath, I more or less know how namespaces in Clojure work (and to my knowledge these namespaces/package systems are quite similar), and I think that this is how I want namespaces to work in general. As far as I can understand, Richard Stallman doesn’t like the idea of namespaces in general - he does mention it in some subsequent emails when he talks about obarrays. Another quote from the mailing:

The reason namespaces systems do not fit well into Lisp is that they have to operate in ‘read’, in the choice of which symbol object you get.

Richard Stallman on Emacs Lisp's future

Later on same message, he mentions that Clojure is not a Lisp, but we will not go there.

Assuming the above, I guess read-symbol-shorthands are going to be the way to do namespaces in Emacs in the future. So let’s figure out how you actually use them and what they really do.

`read-symbol-shorthands`

According to the Emacs manual, shorthands are symbols renamed at read-time. Because the shorthands are read-time, we need to set them in the local variables’ section at the end of the file, which, as far as I understand, is processed before the read step happens. An example from the manual:

(defun snu-lines (s)
  "Split string  wholeS into a list of strings on newline characters."
  (snu-split "\\(\r\n\\|[\n\r]\\)" s))

;; Local Variables:
;; read-symbol-shorthands: (("snu-" . "some-nice-string-utils-"))
;; End:

When the Emacs Lisp reader reads the name snu-lines it will substitute the snu- prefix with some-nice-string-utils- and instead return the name some-nice-string-utils-lines as if it was written like this.

I guess this kinda addresses one complaint I often hear about Emacs Lisp - you always have to write full symbol names. And in Common Lisp or Clojure, you don’t need to do that, because the package/namespace system takes care of it. I’m not really that good with CL, but Clojure has the same approach, so I’ll use it to demonstrate:

(ns some-nice-string-utils)

(defn lines
  "Split string `s` into a list of strings on newline characters."
  [s]
  (clojure.string/split-lines s))

Code Snippet 1: I believe in Common Lisp this would be something like cl-user:split-lines, if there was such a function there

This code defines the function lines which can be used as (lines "foo\nbar") if you’re currently in the some-nice-string-utils namespace. If you’re in some other namespace, you have to use the full name: (some-nice-string-utils/lines "foo\nbar") (after you’ve required the namespace), or provide an explicit alias.

So, similarly to Emacs Lisp, we can use the shortened name in the current file, but provide a less common name for the users. And I can see how this can be seen as namespaces from Emacs standpoint, but it’s not that simple.

My point here is that it’s not enough for Emacs package author to use some prefix for their library - it would be better if it contained some author identification, so we could distinguish libraries that try to do similar things and share the name. Instead, we get the situation where there can be only one auto-complete.el package, despite the fact that “auto-complete” is a very generic way. Other packages have to come up with different names, like corfu or company, because there’s no other way to distinguish packages other than by their name.

So let’s have a look at how this is solved in other systems.

Namespaces in other systems

We, as programmers, live in quite big ecosystems, meaning we need ways to avoid confusion, and any programming language is one such ecosystem. Once the popularity of a certain language grows to a certain point there is a possibility of name clashes - when someone made a function that is named exactly like yours but does an entirely different thing. So language designers wanted to fix this problem, and a lot of languages have some notion of namespaces as a result. I don’t really like Java, but I think it has a decent enough way of avoiding the problem of a name clash.

Java libraries are written as packages that belong to a certain organization or individual, which is then used in the package name. When you’re creating a library project, Java IDE will ask you about your organization, and the default value is com.company. And when I first saw this I didn’t understand why it is needed, but after some time it clicked on me.

Even if some other company (or individual) will create a library that is named the same as mine, it will be still possible to distinguish them. This works because Java widely uses the reverse domain grouping for their dependencies, and hence there’s very little possibility of a clash. Library repositories, such as Maven then group artifacts by their group ID, which is exactly the com.company thing, so it’s easy to distinguish libraries.

So, for example, if I were to use this blog as a way to represent my identity in the Java libraries world, and I were to make the mentioned some-nice-string-utils library, it would be used by others as io.gitlab.andreyorst.some-nice-string-utils. And if some other person wanted to make their own version of this library it could be com.github.someuser.some-nice-string-utils, and everybody could still use it alongside mine without name clashes.

This gives quite a long name to type, but the catch is that the user of the library doesn’t have to write full names, like io.gitlab.andreyorst.some-nice-string-utils/lines. They can import names directly into the current scope, and just use lines, or define a shorthand, like snu/lines, in case of Clojure.

This still pollutes the global namespace of the runtime (kinda), it is just done in a much more controllable way, and possibilities of name clashing are minimized, and ways to recover from them are provided. The renaming just doesn’t happen in read time, as in Emacs Lisp, but at load time, when the runtime modifies its global namespace registry to hold new namespaces.

There’s another approach for namespacing, which is a bit simpler and doesn’t need a global namespace for packages at all. Here’s what a typical Lua package looks like:

local function lines(s)
  local lines = {}
  for line in s:gmatch("([^\n]+)") do
    lines[#lines+1] = line
  end
  return lines
end

return {lines=lines}

Code Snippet 2: some-nice-string-utils.lua

This file creates a function, which is only visible in this file, it doesn’t pollute the global namespace in any way. Then at the end of the file, the table is returned, with one key lines set to the definition of the function lines. So it is essentially a hash table object with one key representing the function name, and its value is the function object.

You use this library in another file by simply requiring it into another local variable:

local snu = require "some-nice-string-utils"
print(snu.lines("foo\nbar"))

Again, the library is imported into a local variable, because the library is essentially just a hash table object, holding closures. Then you index this hash table like snu.lines which obtains the function object, which is then executed.

It is a really simple system, that doesn’t require anything from the language except hash tables and first-class functions. In fact, global scope in Lua is also just a table, named _G - we could make it so that there’s no way to add stuff into it and the system would still work. The beauty of it is that you can put your library file into a directory, like my/snu.lua and require it as require("my.snu"), and if someone else has the same named library it doesn’t matter, as the file system takes care of the problem already. This comes with the overhead of the repeated table lookup, but you can always create another local lines = snu.lines and use lines without the additional lookup. And LuaJIT actually optimizes repeated lookups away.

Both systems solve the problem of the name clash, just a bit differently, and programmers can build their applications without worrying about overriding some other function. But there’s still one more problem.

Library versioning

Neither Emacs Lisp approach to packages, nor Java’s approach to libraries really solve one problem - version clash. Lua’s approach kinda allows you to manually solve it, as you can manually install a library into a versioned directory, and use both versions, but it requires some manual work. You may wonder why would you want to use two versions of the same library, and while I agree that it is strange, it’s still not uncommon, because when you depend on several libraries, they also can have dependencies, and it is possible that they depend on the same thing but of different version.

For example, let’s imagine that you have a project that depends on library A, and library B. You describe it like this:

{:deps {com.company.A {:version 1.1.0}
        io.gitlab.user.B {:version 2022-10-01-patch135}}}

These libraries are not connected to each other in any way whatsoever, but each of them specifies its own dependencies:

{:deps {org.important.C {:version 1.0.0}}}

Code Snippet 3: library A dependencies

{:deps {org.important.C {:version 2.4.0}}}

Code Snippet 4: library B dependencies

This creates a situation when both libraries require one library C, but the versions are incompatible. In your project you just use A and B and don’t even know about C, so here’s the question: what version of C you’ll get in your project after you’ve built it?

The answer: it depends on the build system, language, alignment of the stars, and possibly other things. Maven will likely pick the more recent version for you, other build tools may have other solutions. Fennel, a language that compiles down to Lua, can include both dependencies, and it will work just fine without any clashes if you’ve used versioned paths in your require calls.

Clojure author Rich Hickey once said¹ that each major version of the library should just have a different artifact name, because if something broke it is a different thing now.

I think that there may be other ways around this, not because that proposed way isn’t useful, but because we already have a version of the artifact, it’s there, we just need a system that respects it in the actual code. Especially since Fennel already can do it. File systems are great at this, in fact in Maven’s case, it is possible to have multiple versions of the same library available on the file system, it’s just the fact that most languages for some reason omit the version completely, and Java is no different.

And Emacs’ shorthand system actually allows us to do exactly this, even with the same artifact name. So let’s get back to Emacs’ side of things.

Shorthands as a solution to both name and version clashing

Let’s create a library package that will be useless, but helps demonstrate the usefulness of shorthands:

;;; foobar.el --- Useless library -*- lexical-binding: t -*-
;;
;; Author: Andrey Listopadov
;; Homepage: https://gitlab.com/andreyorst/foobar.1.0.0.el
;; Package-Requires: ((emacs "28.1"))
;; Keywords: lisp
;; Prefix: foobar
;; Version: 1.0.0
;;; Commentary:
;; no comments
;;; Code:

(defun foobar-foo ()
  "..."
  ;; ...
  )

(defun foobar-bar ()
  "..."
  ;; ...
  )

(provide 'foobar)
;;; foobar.el ends here

;; Local Variables:
;; read-symbol-shorthands: (("foobar" . "io.gitlab.andreyorst.foobar.1.0.0"))
;; End:

Code Snippet 5: foobar.el

We use a shorthand to transform our short package prefix into a versioned one with some author identity included. To my knowledge, this should work absolutely fine - all foobar entries are replaced with the io.gitlab.andreyorst.foobar.1.0.0, even the at the provide call end of the file. Upon loading this file, Emacs has a feature, named io.gitlab.andreyorst.foobar.1.0.0, and when we require it, it has two functions with the same prefix: io.gitlab.andreyorst.foobar.1.0.0-foo and io.gitlab.andreyorst.foobar.1.0.0-bar.

Now, let’s say we’ve introduced our breaking change, and removed the foobar-bar function. What should we do?

We just change the shorthand to io.gitlab.andreyorst.foobar.2.0.0, and the version header to 2.0.0. That’s it, we can ship it. How will it affect our users?

Let’s create two more packages that require our useless one (I’ll omit all stuff that is not necessary for this to “work” to keep it short):

;; Package-Requires: ((emacs "28.1") (foobar "1.0.0"))
(require 'foobar) ; requires the 'io.gitlab.andreyorst.foobar.1.0.0 feature

(defun some-project-a-function (&rest args)
  (foobar-bar args))

(provide 'some-project-a)
;; Local Variables:
;; read-symbol-shorthands: (("foobar" . "io.gitlab.andreyorst.foobar.1.0.0"))
;; End:

Code Snippet 6: some-project-a.el

The first package pulls in our foobar package of version 1.0.0 as a dependency and uses its foobar-bar function. At the end of the file, we specify that foobar-bar is actually io.gitlab.andreyorst.foobar.1.0.0-bar and our require also pulls in the correct feature (if the foobar.el file was loaded). Then there’s another package:

;; Package-Requires: ((emacs "28.1") (foobar "2.0.0"))
(require 'foobar) ; requires the 'io.gitlab.andreyorst.foobar.2.0.0 feature

(defun some-project-b-function (&rest args)
  (foobar-foo args))

(provide 'some-project-b)
;; Local Variables:
;; read-symbol-shorthands: (("foobar" . "io.gitlab.andreyorst.foobar.2.0.0"))
;; End:

Code Snippet 7: some-project-b.el

Again, we pull in foobar as a dependency, but we use the 2.0.0 version. As you may remember, there’s no function foobar-bar in this version, but there’s still foobar-foo, and that’s what this package uses. And we know that it uses the correct one, because again at the end of the file, we specify that foobar is actually io.gitlab.andreyorst.foobar.2.0.0, and our require also pulls in the correct feature.

Both packages are not affected at all, because each still uses a different feature. And both features can co-exist, without any name clashes.

Problem solved, right?

Problems

There are, of course, some problems with this approach.

The first one is that we’re replacing a common prefix, that doesn’t end on a - symbol, so if any of the projects using our library had defined something like foobarbaz it may be a source of errors. In a language with proper namespaces, defining something outside the current namespace, which is what essentially happens here, is either impossible or a compile-time error.

While this can be ignored in most cases, if the project uses a multi-file approach, it may cause problems that shorthands exist for one file, and it replaces foobarbaz with some other name foobar-expanded-shorthand-baz. And if another file, that requires the previous one, doesn’t have the same shorthand, it may search for foobarbaz and be unable to find it. Multi-file dependencies also have to maintain the same shorthands for all files they require internally, which can be a burden.

Another problem is that we have to maintain the package version in two separate places - one in the package header, and another in the shorthand. And we have to keep the version in the shorthand, to allow several versions of the same feature to coexist. Which is crucial in a system such as Emacs, where there are tons of old or not-maintained packages, which require older dependencies, many of which were already updated in an incompatible way, and there’s a possibility of another package requiring newer version.

This can lead to problems when the package is large enough not to fit on the screen, and the contributor fixes some bug in one of the functions, and updates the header, but is completely unaware that there’s a shorthand defined down below. Tests will not be able to catch this unless things like flymake will learn to detect versioned shorthands. And while this problem may seem strange to mention - we’re all people, so I can easily see it happening.

Yet another thing to remember is that the order of shorthands is important. You should position more specific shorthands first, and less specific last. For example:

(defvar a-foo 42)
(defvar a-b-bar 1337)
;; Local Variables:
;; read-symbol-shorthands: (("a-" . "less-specific-") ("a-b-" . "more-specific-"))
;; End:

In this case, both vars will transform to less-specific-foo and less-specific-b-bar. Changing read-symbol-shorthands to (("a-b-" . "more-specific-") ("a-" . "less-specific-")) will produce less-specific-foo and more-specific-bar. Perhaps, Emacs could sort shorthands in such a way that more specific ones come first, but this may have unexpected effects from the user’s perspective. With a more advanced namespace system, this won’t be an issue at all.

And a final problem - package managers. Emacs’ builtin package manager package.el can work with this kind of system because when you install a package, it ends up in a versioned directory, e.g. <user-emacs-directory>/elpa/<package>-<version>/package.el. So in this case, when we’re updating a such package, it will be installed with no problem, and packages that require it via shorthand will still work.

However, there are such package managers as straight.el which work by pulling packages from git repositories, and as far as I know, straight doesn’t support package versioning. There are other package managers, like borg, elget, etc. which I’m not familiar with, maybe they support versioning, maybe not - it is a thing to be concerned about when suggesting symbol shorthands as a namespace alternative.

Finally, though, not really a problem, but it is still a concerning point - older Emacs releases don’t support shorthands. If I were to adopt the scheme I’ve described above for my packages, I would have to abandon users of anything below Emacs 28. According to the unofficial Emacs survey, in the year 2020 57% of participants used Emacs 27, and another 20% were on Emacs 26, so I imagine that there will be users who don’t have shorthands for a couple of years.

All of this can be fixed though.

So why we don’t like shorthands again?

When I was talking about our three packages in Emacs Lisp, I had to write all names starting with a unique prefix that then can be distinguished by the reader. And I believe this is the main problem here - usually, we don’t want to write the full, versioned name of the feature for each function or var, yet we want them to stay unique, and reader shorthands are a hacky way to achieve that. The key word here is hacky.

It requires the programmer to create and manage shorthands in a special comment section instead of providing a way of doing it with regular code. In my opinion, we could still do things read time, but give it a more general interface, instead of just the text substitution, and that is what actually leads us to CL-style packages or Clojure-like namespace systems.

I’m not sure if it is possible in Emacs Lisp, but if we could define a special that would affect the reader, similarly to how the read-symbol-shorthands setting in the local variables comment does, we could have a system pretty close to CL ones. E.g. the reader could automatically prepend package name while reading code, especially the definitions like defun or defvar, we could have this done automatically without any prefixes. And when reading a free symbol², automatically intern it in the current package, by prepending a package name while reading its name:

(package 'io.gitlab.andreyorst.foobar.3.0.0) ; configure reader at the top of the file

(defun lines ()          ; reads (defun io.gitlab.andreyorst.foobar.3.0.0:lines ...)
  ;; ...
  )

(defun some-other-fn (s) ; reads
                         ; (defun io.gitlab.andreyorst.foobar.3.0.0:some-other-fn ...)
  (list 1 2 3)           ; reads (emacs.28.c-core:list 1 2 3)
                         ; because there was no definition of the list
                         ; in this package, and no prefix is used

  (lines s)              ; reads
                         ; (io.gitlab.andreyorst.foobar.3.0.0:lines s)
                         ; because its already interned
  ;; ...
  )

And when the library requires such a package, we could again set up the reader with a special syntax to do the shorthands stuff:

(require-package '(io.gitlab.andreyorst.foobar.3.0.0 :as fb))

(fb:lines "...")  ; reads (io.gitlab.andreyorst.foobar.3.0.0:lines "...")
                  ; because of the configured prefix alias

Where require-package is another special, that sets up shorthands for the reader, and expands to actual require call.

This is just a quick draft of what it could have looked like if instead of a comment we could deal with a new small set of specials that did the setup for us based on what we actually require or provide. Again, Richard Stallman considers the Common Lisp style package system broken, maybe the automatic interning is one of the unknown symbols one of the reasons, I don’t know, but this is still pretty much the same implementation as with the shorthands. It is just more automated.

Alternatives

In the already mentioned Spec-ulation Keynote talk, Rich Hickey gave, he mentions that actual dependencies are not artifacts but the code itself. Names, if you will. If a function A/foo only needs B/bar function, and in our project, we only use the function A/foo, our dependencies are not A and B, our dependencies are A/foo and B/foo. It’s just the fact that our packaging systems require us to pull all code from the library, not just definitions. Of course, there are languages that do tree shaking, excluding unused code from the final application build, but this doesn’t affect our dependencies - we still pull libraries as a whole.

There’s an interesting language, called Unison. It is quite different from other languages because they implemented the idea of code-level dependencies:

Here’s the big idea behind Unison, which we’ll explain along with some of its benefits:

Each Unison definition is identified by a hash of its syntax tree. Put another way, Unison code is content-addressed.

This essentially means that each definition has a unique ID, so if you depend on some definition you can refer to its hash to obtain the exact thing you’re dependent on. It also means that there’s no need for namespaces at all. If you have written a definition that does one thing, it will have a unique hash, which the rest of the world will use to refer to your definition - it will never collide. You can give it an appropriate and descriptive name in your program, but this name will never affect other people’s code.

Emacs is way beyond the point where such a system could be introduced, but I think it is still worth mentioning it.

Can shorthands be used as a namespacing system?

So this is the main question I wanted to answer here.

I think yes, symbol shorthands can be used for namespacing, and if used properly, they can also solve versioning problems, at least to some degree. However, this is really a poor man’s namespacing system, as in reality, it’s just a text substitution, and we’re still acting in one giant global namespace, it’s just the fact that the name clash is made harder.

A key point of namespacing systems is to be able to reuse the same name across different namespaces, and that’s what shorthands in their current form don’t provide at all. Using a short enough prefix is a workaround, though.

As much as I would like to have a better system, I guess I’ll have to stick to what Emacs developers decide to use in the future. Richard Stallman mentions that “It only needs to be completed”, so I guess we will see the improvements in the future. But again, it may actually delay the ability to actually use these namespaces for another few years from that point.

Spec-ulation Keynote (29:53) I highly recommend watching this talk as a whole, as it raises a lot of points about versioning changes. In addition, this talk has a much more comprehensive description of the problem I’m touching here. ↩︎
I can see how this may be problematic at read-time, but we can assume that everything undefined or without a prefix, or not required with an explicit alias is a free symbol. ↩︎

Comment via email