12 KiB
Editing Data Files
A literate programming file for configuring Emacs to edit files of data.
Introduction
Once upon a time, I gave a talk to EmacsConf 2019, about an interesting idea I called emacs-piper. I still like the idea of sometimes editing an Emacs buffer on the entire contents, as if it were a data file. This file contains what I feel are the best functions for that… oh, and a leader to call it (instead of the original Hydra).
(ha-leader
"d" '(:ignore t :which-key "data")
"d |" '("pipe to shell" . replace-buffer-with-shell-command)
"d r" '("replace buffer" . ha-vr-replace-all)
"d y" '("copy to clipboard" . ha-yank-buffer-contents))
Global Replacements
The string replacement functions operate at the current point, which means I need to jump to the beginning before calling something like vr/replace.
(defun ha-vr-replace-all (regexp replace start end)
"Regexp-replace entire buffer with live visual feedback."
(interactive
(vr--interactive-get-args 'vr--mode-regexp-replace 'vr--calling-func-replace))
(vr/replace regexp replace (point-min) (point-max)))
Line-Oriented Functions
These functions focus on the data in the buffer as a series of lines:
(ha-leader
"d l" '(:ignore t :which-key "on lines")
"d l d" '("flush lines" . flush-lines)
"d l k" '("keep lines" . keep-lines)
"d l s" '("sort lines" . ha-sort-lines)
"d l f" '("sort fields" . ha-sort-fields)
"d l n" '("sort field num" . ha-sort-fields-numerically)
"d l r" '("reverse lines" . ha-reverse-lines)
"d l u" '("unique lines" . delete-duplicate-lines)
"d l b" '("flush blanks" . flush-blank-lines))
One issue I have is keep-lines operate on the lines starting with the point, not on the entire buffer. Let’s fix that:
(defun call-function-at-buffer-beginning (orig-fun &rest args)
"Call ORIG-FUN after moving point to beginning of buffer.
Point restored after completion. Good for advice."
(save-excursion
(goto-char (point-min))
(apply orig-fun args)))
(advice-add 'keep-lines :around #'call-function-at-buffer-beginning)
(advice-add 'flush-lines :around #'call-function-at-buffer-beginning)
The sort-lines is useful, but insists on an active region. Let’s make a collection of data-focused versions that work on both a region (if it is active) or the entire buffer, regardless of the position of the cursor.
(dolist (tuple '((ha-sort-lines sort-lines)
(ha-sort-fields sort-fields)
(ha-sort-fields-numerically sort-numeric-fields)))
(cl-destructuring-bind (func orig-func) tuple
(eval `(defun ,func (prefix)
,(format "Call `%s' with all lines in a buffer or region (if active).
Passes PREFIX to the function." orig-func)
(interactive "P")
(save-excursion
(if (region-active-p)
(,orig-func prefix (region-beginning) (region-end))
(,orig-func prefix (point-min) (point-max))))))))
Getting rid of blank lines seems somewhat useful:
(defun flush-blank-lines ()
"Delete all empty lines in the buffer. See `flush-lines'."
(interactive)
(save-excursion
(goto-char (point-min))
(flush-lines (rx line-start (zero-or-more space) line-end))))
Table-Oriented Functions
These functions focus on the data in the buffer as a table consisting of columns and rows of some sort.
(ha-leader
"d t" '(:ignore t :which-key "on tables")
"d t f" '("sort by columns" . ha-sort-fields)
"d t n" '("sort by columns numerically" . ha-sort-fields-numerically)
"d t k" '("keep columns" . keep-columns)
"d t f" '("flush columns" . flush-columns))
Each of the table functions require a table separator (for instance the |
character) and often the columns to operate on.
The keep-columns
removes all text, except for indexed text between the separator:
(defun keep-columns (separator columns)
"Keep tabular text columns, deleting the rest in buffer or region.
Defined columns as text between SEPARATOR, not numerical
position. Note that text _before_ the separator is column 0.
For instance, given the following table:
apple : avocado : apricot
banana : blueberry : bramble
cantaloupe : cherry : courgette : cucumber
data : durian
Calling this function with a `:' character, as columns: `0, 2'
results in the buffer text:
apple : apricot
banana : bramble
cantaloupe : courgette
data : durian"
(interactive "sSeparator: \nsColumns to Keep: ")
(operate-columns separator columns t))
The flush-columns
is similar, except that is deletes the given columns.
(defun flush-columns (separator columns)
"Delete tabular text columns in buffer or region.
Defined columns as text between SEPARATOR, not numerical
position. Note that text _before_ the separator is column 0.
For instance, given the following table:
apple : avocado : apricot
banana : blueberry : bramble
cantaloupe : cherry : courgette : cucumber
data : durian
Calling this function with a `:' character, as columns: `1'
(remember the colums are 0-indexed),
results in the buffer text:
apple : avocado : apricot
banana : blueberry : bramble
cantaloupe : cherry : courgette : cucumber
data : durian
apple : apricot
banana : bramble
cantaloupe : courgette
data : durian"
(interactive "sSeparator: \nsColumns to Delete: ")
(operate-columns separator columns nil))
Both functions are similar, and their behavior comes from operate-columns
, which walks through the buffer, line-by-line:
(defun operate-columns (separator columns-str keep?)
"Call `operate-columns-on-line' for each line in buffer.
First, convert string COLUMNS-STR to a list of number, then
search for SEPARATOR."
(let ((columns (numbers-to-number-list columns-str)))
(save-excursion
(when (region-active-p)
(narrow-to-region (region-beginning) (region-end)))
(goto-char (point-min))
(while (re-search-forward (rx (literal separator)) nil t)
(operate-columns-on-line separator columns t)
(next-line)))))
For each line, the operate-columns
calls this function:
(defun operate-columns-on-line (separator columns keep?)
"Replace current line after keeping or deleting COLUMNS.
Keep the COLUMNS if KEEP? is non-nil, delete otherwise.
Defined columns as the text between SEPARATOR."
(cl-labels ((keep-oper (idx it) (if keep?
(when (member idx columns) it)
(unless (member idx columns) it))))
(let* ((start (line-beginning-position))
(end (line-end-position))
(line (buffer-substring start end))
(parts (thread-last (split-string line separator)
(--map-indexed (keep-oper it-index it))
(-remove 'null)))
(nline (string-join parts separator)))
(delete-region start end)
(insert nline))))
I like the idea of the shell command, cut
, where you can have an arbitrary character as a separator, and then either delete or keep the data between them, as columns. But I need a function that can convert a string of “columns”, for instance "1, 4-7 9"
to an list of numbers, like '(1 4 5 6 7 9)
:
(defun numbers-to-number-list (input)
"Convert the string, INPUT, to a list of numbers.
For instance: `1, 4-7 9' returns `(1 4 5 6 7 9)'"
(let* ((separator (rx (* space) (or "," space) (* space)))
(dashed (rx (* space) "-" (* space)))
(ranged (rx (group (+ digit)) (regexp dashed) (group (+ digit))))
(str-list (split-string input separator t)))
(--reduce-from (append acc (if (string-match ranged it)
(number-sequence
(string-to-number (match-string 1 it))
(string-to-number (match-string 2 it)))
(list (string-to-number it))))
() str-list)))
Does this work?
(ert-deftest numbers-to-number-list-test ()
(should (equal (numbers-to-number-list "2") '(2)))
(should (equal (numbers-to-number-list "1, 2 3") '(1 2 3)))
(should (equal (numbers-to-number-list "1, 4-7 9") '(1 4 5 6 7 9))))
The sort-fields function does a good job if the table is space separated, but if we separate by some other character(s), it doesn’t work. Can we write a function that does this? Here we make a helper to some regular expression for the sort-regexp-fields function.
(defun ha-sort-table-by-column (separator field)
"Sort the active region or entire buffer by column, FIELD.
Columns are denoted by a regular expression, SEPARATOR, which
could be a single character. For instance, given a buffer:
d, a, c, b
x, y
e, f, g
i, m, a, o
Calling this function with a `,' separator, and `2' for the
column, would result in:
d, a, c, b
e, f, g
i, m, a, o
x, y"
(interactive "sSeparator: \nnSorting field: ")
;; Create a regular expression of grouped fields, separated
;; by the separator sequence, for commas, this would be e.g.
;; \\(.*\\),\\(.*\\),\\(.*\\),\\(.*\\)
(let* ((rx-list (mapconcat (lambda (x) (rx (group (zero-or-more any))))
(number-sequence 1 field)
separator))
;; Prepend the beginning of line to the regular expression:
(regexp (concat (rx bol) rx-list))
(start (if (region-active-p) (region-beginning) (point-min)))
(end (if (region-active-p) (region-end) (point-max))))
(save-excursion
(sort-regexp-fields nil regexp (format "\\%d" field) start end))))
Buffer-Oriented Functions
If there is no specific function, but you can think of a shell command that will work, then
(defun replace-buffer-with-shell-command (command)
"Replaces the contents of the buffer, or the contents of the
selected region, with the output from running an external
executable, COMMAND.
This is a wrapper around `shell-command-on-region'."
(interactive "sCommand: ")
(save-excursion
(save-restriction
(when (region-active-p)
(narrow-to-region (region-beginning) (region-end)))
(shell-command-on-region (point-min) (point-max) command nil t))))