Howard Abrams 0d7d7bb668 Changing the license to be GPL

This should match the Emacs source code more closely.

2024-08-17 21:53:59 -07:00

21 KiB

Raw Permalink Blame History

Literate Programming in Org

Introduction
- Background
- Advantages of LP
Getting Started
Working with Python
Calling out to the Shell
Creating Illustrations
- Graphviz
- PlantUML
- Pikchr
Tips and Tricks
- Yasnippet Templates
- Hacking

This is a book on Literate Programming in Emacs using Org Mode.

Use literate programming as a style to aid in discovery, exploration and clarity of code.

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

—Donald Knuth in "Literate Programming", The Computer Journal 27 (1984), p. 97. (Reprinted in Literate Programming, 1992, p. 99.)

Introduction

In a computer program (no matter what the computer language), we write code, as a first class citizen, without ornamentation, but we comment the code with some sort of marker, e.g. a symbol to signify the start and end, like /* and =*/, or a single symbol, like # or // to highlight the rest of the line.

Literate programming is a style of coding where we change the paradigm as what would normally be the comments is the focus, and the code is ornamented. When Donald Knuth originally proposed the idea in 1984, text editing was still in an infant stage, and writing LP was klunky. However, with modern editors, like Emacs (can I really claim, with a straight face, that Emacs is modern), literate programming in org files can be smooth.

We assume the reader of this book to be fairly proficient with Emacs keybindings, and at least, a passing familiarity with editing Org Mode files, but we don’t assume, you’ve grokked the literate programming features of Org.

As you probably know, Org is large, and the features for writing, evaluating and connecting blocks of source code in a document are extensive, and documenting them all is a daunting task. This book attempts to both guide and inspire a programmer to enjoy coding in a iterate way.

Background

Donald Knuth invented Literate Programming in the 1980’s in an attempt to emphasize communication. Playing with the idea that a "program" shouldn't be only computer instructions, but more like literature, he called his approach, literate programming. In his 1984 essay "Literate Programming", republished in CSLI, 1992, pg. 99, he wrote:

I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title: "Literate Programming." … The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.

Wanting programs to be written for human understanding, with the order based on logic of the problem, and not constrained to deficiencies in the programming language, we create a literate programming document that generates a document for people and the source code files.

The idea is to invert code peppered with comments to prose interjected with code. Originally, a pre-processing program would then write the code blocks out into a source code file (called tangling) and create a published document of both the prose and the code formatted for reading (called weaving).

What happened to his concept and why don’t we program this way?

After introducing the concept in a white paper, he expanded the idea by publishing an example of how the source code would be written in Jon Bentley’s “Programming Pearls” column [Communications of the ACM 29, 5 (May 1986), 364-3691]. Doug McIlroy added a rebuttal where he boiled Knuth’s code into a single (now famous) shell command:

  tr -cs A-Za-z '\n' |
  tr A-Z a-z |
  sort |
  uniq -c |
  sort -rn |
  sed ${1}q

McIlroy invented the shell pipe as well as many of those command line tools. He’s quoted as saying:

A wise engineering solution would produce—or better, exploit—reusable parts.

His example proved his point.

Perhaps this process was a bit too much writing for most engineers, who view code comments as unnecessary, over-sized baggage requiring maintenance. Isn’t our goal to write readable code? While the resulting source code tangled from a literate programming document, may look the same as a source file coded directly, this idea did not significantly change our industry.

Some projects like:

Javadoc for Java
Sphinx for Python
Doxygen for C and other languages

Can extract an API from the comments of the source code could be viewed as a step toward literate programming. Haskell has a partial implementation built into the compiler so that it doesn't require a special comment syntax or an external macro system.

In most of the systems listed above, the code, not the logic, drives the presentation order. For instance, many languages require imports, variable definitions and functions to be declared before use, and one can’t image literature beginning with such a way. Knuth's original "WEB" program allowed a code block to refer (include) another code block, allowing the author to describe the code in any order that made the most sense. This ended the debate about: top-down vs. bottom-up.

Knuth's original literate programming approach had minimal editor support, as he only wrote the WEB program as a pre-processor to create (weave) the documentation and write (tangle) the source code.

From my perspective, literate programming can only be useful with help from an editor, for example, many scientists use iPython's notebook, now expanded as the Jupyter Project. However, unlike iPython's storage of the files in JSON format, I think a literate file should be readable text, as Carsten Dominik, the creator of Org, wrote:

“In the third millennium, does it still make sense to work with text files? Text files are the only truly portable format for files. The data will never get lost.”

An Org file, with its readable syntax, and amazing support from Emacs, gives a programmer a good environment to discover, explore and clarify complex code in a literate way.

Further Reading:

Introduction to org-mode's Babel Project for teaching Emacs to do Literate Programming
Reference material for the Babel Project
More Shell, Less Egg is a good historical essay on this subject
Where have all the Literate Programmers gone?

Advantages of LP

Some of the advantages of literate programming for your code include:

Clarification of your thoughts of complicated situations
Better documentation for your source code
Great for team communication for issues and problems
Inter-language facility for using the best tool for the job (for instance, querying a database and then manipulating it with a general purpose language)

The advantages of literate programming in Org is the advantage of Org itself:

Text formatting, like emphasized text and lists
Org's organizational features, like embedded heading sections marking subtrees
Tasks management, like Agendas, embedded with your code
Note-oriented REPL for investigating new libraries and APIs

I made this last point as part of my essays on Literate Devops. Briefly, REPLs can be a wonderful approach to discovering features of libraries and modules, as one types expressions, and sees the results. You can view a shell running in a terminal as a REPL. A problem arises when the programmer needs to return to the results of past commands and expressions in the transient environment of a terminal.

With LP in Org, you can still type and evaluate an expression, but Emacs embeds the output (the P in REPL) back into your file buffer. As an added bonus, you can name the results, and use that as input variables to other blocks of code (and these code blocks can be written in a different computer language).

However, if you are reading this book, you probably see the advantages, so let’s begin a short journey to master this tool yourself.

Getting Started

Since Emacs comes with Org, and Org comes with the ability to write literate programming, If you have a running Emacs instance, you begin your journey by opening up a file with an extension of .org (or any text file with org-mode enabled). This guide assumes basic familiarity with both Org and Emacs.

Since Emacs comes with Lisp, this Getting Started guide will use that language for our examples. In subsequent chapters, we will describe how to use different languages.

Create a File

Create or open an Org file, and type the following:

#+begin_src emacs-lisp
  "Hello World"
#+end_src

Next, type C-c C-c (Control-c twice) and Emacs asks if you want to evaluate this code. To see the results of evaluating that expression inserted back into your buffer after the marker, RESULTS, type yes.

While a classic, not a very good example. Let’s try again with the following code block:

#+begin_src emacs-lisp
  (truncate (* (sin 0.438) 100))
#+end_src

Now type C-c C-c again. Notice the answer to the Great Question of Life, the Universe, and Everything appears as the RESULTS of evaluating your amazing Lisp code.

That, my friend, is the beginning of your adventure.

A few points. First, you typed a lot of stuff to see a number or string. We’ll start to file away such roughness to your workflow. This book contains a lot of tips, and you’ll see that programming literately can be just as fast as regular programming.

Second, the part, emacs-lisp is the language or subsystem to call for evaluation of a code block. We’ll show how you can use your favorite language, or even systems to generate images, call web services, and update tables in a database.

Creating src Blocks Quickly

You don’t have to type the entire text for src blocks, as Org comes with this ability, Structure Templates. Type C-c C-, and a buffer appears allowing you to type s to have the bulk of the src code block inserted into your buffer.

Another approach is to use org-tempo, a template expansion feature. To kick-start this feature, press M-S-; and type the following:

  (require 'org-tempo)

Note: If you are reading this in an Emacs buffer, you can also place your cursor at the end of that parenthesized s-expression and type C-x C-e to evaluate it.

At this point, you can begin a line with <s and hit TAB to have a src block expanded, with the cursor left at the end of first line, allowing you to type emacs-lisp.

This is Emacs, you probably have your favorite template expansion, like TempEL or Yasnippet, as any system that can generate your text works fine. Unlike other notebook applications, the magic isn’t hidden in markers, but shines plainly in the text itself.

I use Yasnippets, I wrote snippets that combine the src block with a function definition. See Yasnippet Templates for details.

Editing src Blocks

I find editing prose in an Org file quite nice…editing code? Not so much. Many techniques you expect to use aren’t available, like manipulating s-expressions in a Lisp with paredit or smart-parens, or jumping to code definitions, Xref system.

With your cursor anywhere, the header line, anywhere in the body or one the end_src line, type C-c ' to narrow to contents of that block with a mode based on the language on the header line.

Whew, now you can edit your code in a manner you expect. Type C-c ' again to save and return to your full org buffer, or C-c C-k to cancel your changes.

Can we have a little chat, just between us? You know, author to reader? This book is about literate programming, and as such, we can use just about any computer programming language we want. But if I want to show examples, I have to choose one.

I’m afraid you won’t like whatever I choose. I love Lisp, but many find it hard to read (and the next paragraph doesn’t help my case). I might rework this book, so we have versions in all the languages, but until then, I decided on Python. If you want to follow along, please install it.

Now, let’s add more languages to what is available, by running the following code:

  (org-babel-do-load-languages
   'org-babel-load-languages
   '((emacs-lisp . t)
     (shell . t)       ; This probably won't work well on Windows
     (python . t)))

If you aren’t familiar with Lisp, don’t let that magic incantation concern you too much. Just follow the pattern to substitute the python for your favorite language. You can add or remove more (use t to include it, and nil to remove it). See Babel for details on languages that can be supported.

Note: If you don’t see you language’s pretty colors for its syntax, run this code (and put it in your init file):

  (setq org-src-fontify-natively t)

Now we are ready to rock n’ roll.

Evaluating src Blocks

Let’s write something more interesting. Type the following block, and evaluate it with C-c C-c:

  #+BEGIN_SRC python
    return [n * n for n in range(1, 10)]
  #+END_SRC

  #+RESULTS:
  | 1 | 4 | 9 | 16 | 25 | 36 | 49 | 64 | 81 |

Interesting to note that since our code returned an array, Org shows us the results as a table. We can use the :results header argument to state you want the results formatted as a list. While we are at it, let’s give this block a name:

  #+NAME: nine-squares
  #+BEGIN_SRC python :results list
    return [n * n for n in range(1, 10)]
  #+END_SRC

  #+RESULTS: nine-squares
  - 1
  - 4
  - 9
  - 16
  - 25
  - 36
  - 49
  - 64
  - 81

Next, let’s use the results in another block:

  #+NAME: nine-quads
  #+BEGIN_SRC python :var squares=nine-squares :results list
    return [n * n for n in squares]
  #+END_SRC

  #+RESULTS: nine-quads
  - 1
  - 16
  - 81
  - 256
  - 625
  - 1296
  - 2401
  - 4096
  - 6561

What have we learned?

Give a block a name allows you to access the results
The :results allow us to format the results
The :var header argument allows us to assign the results of a block to a variable available in the block.

I bet you have questions. Let’s answer some short ones, and then I will add some details to what we’ve learned in the following sections.

Can’t we just write a function? Of course.

Naming src Blocks

  #+NAME: <name>
  #+BEGIN_SRC <language> <switches> <header arguments>
    <body>
  #+END_SRC

The #+NAME associates name you can reference later in your document. For instance:

  #+NAME: the-answer
  #+BEGIN_SRC emacs-lisp
    (truncate (* (1- (expt 2 3)) (sqrt 36)))
  #+END_SRC

  #+RESULTS: the-answer
  : 42

The named value, the-answer isn’t a variable, but the following sections, we’ll show how it can be used as one.

Following the language, you can add switches and other header arguments (that we will discuss ad nauseam in the following sections). Sometimes, we have too many arguments to comfortably fit on a line, so we can break those long lines:

  #+NAME: <name>
  #+HEADER: <more header arguments>
  #+BEGIN_SRC <language> <switches> <header arguments>
    <body>
  #+END_SRC

You can have as many #+HEADER lines as you want.

Evaluating src Blocks

As described above, typing C-c C-c in a block, evaluates it, displaying the results.

Why doesn’t the following work?

  #+BEGIN_SRC python
    print("Hello World")
  #+end_SRC

  #+RESULTS:
  : None

The reason is that Python, by default, only returns the results from the value of an expression, and the print function returns None. To see the output from a block, we need to add a :results header argument:

  #+BEGIN_SRC python :results output
    print("Hello World")
  #+END_SRC

  #+RESULTS:
  : Hello World

Some languages, like sh (for your shell) default to output, so the following works as expected:

  #+BEGIN_SRC sh
    pwd
  #+END_SRC

  #+RESULTS:
  : /home/howard/Documents/literate-programming

If a language, like Python, can return either a value or its standard output, you can set :results to value or output respectively.

Displaying Results as Lists

Our previous examples returned a single number or string.

  #+BEGIN_SRC emacs-lisp :results list
    (number-sequence 1 5)
  #+END_SRC

  #+RESULTS:
  - 1
  - 2
  - 3
  - 4
  - 5

In the previous section, we used :results to specify whether we wanted the value of the code, or its standard output, but the :results header argument can take multiple arguments, for instance, all of the following are reasonable:

:results value
:results list
:results value list

Displaying Results as Tables

The :results header argument accepts the table parameter, which is helpful for two-dimensional arrays.

  #+BEGIN_SRC emacs-lisp :results table
    (defun rando (limit) (random limit))

    (defun lotso-randos ()
      (seq-map 'rando (number-sequence 1 25)))

    (seq-partition (lotso-randos) 5)
  #+END_SRC

  #+RESULTS:
  |  0 |  0 |  0 | 3 |  1 |
  |  4 |  3 |  6 | 6 |  3 |
  |  2 | 10 |  9 | 6 |  4 |
  |  7 | 13 |  0 | 0 | 10 |
  | 20 | 20 | 17 | 3 |  4 |

Shall we talk about state?

Displaying Results as Code

Working with Python

Calling out to the Shell

Can we do both Bash, Fish and Powershell?

Creating Illustrations

Graphviz

PlantUML

Pikchr

Tips and Tricks

This final chapter contains optional information you might find useful.

Oh, and please let me know if you have a tip or trick that you think should be included.

Yasnippet Templates

If you use Yasnippet, and include the yasnippet-snippets project, you can type <s in an Org buffer to expand into a src block, and type def in an Emacs Lisp buffer to get a function definition, but why not have both?

In an Org file, type M-x yas-new-snippet and replace the default text with:

  # key: #slf
  # name: emacs-lisp-defun
  # --
  #+BEGIN_SRC emacs-lisp
  (defun ${1:fun} (${2:args})
    "${3:docstring}"
    ${4:(interactive${5: "${6:P}"})}
    $0)
  #+END_SRC

Type C-c C-c to save and install the snippet with Org.

Let’s make one for variables:

  # key: slv
  # name: emacs-lisp-defvar
  # --
  #+BEGIN_SRC emacs-lisp
  (defvar ${1:symbol} ${2:initvalue} "${3:docstring}")
  #+END_SRC

And maybe another one for defining unit tests?

  # -*- mode: snippet -*-
  # name: ert-deftest
  # key: edt
  # --
  #+BEGIN_SRC emacs-lisp :tangle no
  (ert-deftest $1-test ()
    (should (= $0)))
  #+END_SRC

Hacking

Note the function org-in-block-p to see if we are in a “src” block.

21 KiB Raw Permalink Blame History Unescape Escape

Literate Programming in Org

Introduction

Background

Advantages of LP

Getting Started

Create a File

Creating src Blocks Quickly

Editing src Blocks

Evaluating src Blocks

Naming src Blocks

Evaluating src Blocks

Displaying Results as Lists

Displaying Results as Tables

Displaying Results as Code

Working with Python

Calling out to the Shell

Creating Illustrations

Graphviz

PlantUML

Pikchr

Tips and Tricks

Yasnippet Templates

Hacking

21 KiB

Raw Permalink Blame History