#+title: Editing Data Files #+author: Howard X. Abrams #+date: 2022-10-14 #+tags: emacs A literate programming file for configuring Emacs to edit files of data. #+begin_src emacs-lisp :exports none ;;; ha-data --- edit data files. -*- lexical-binding: t; -*- ;; ;; © 2022-2023 Howard X. Abrams ;; Licensed under a Creative Commons Attribution 4.0 International License. ;; See http://creativecommons.org/licenses/by/4.0/ ;; ;; Author: Howard X. Abrams ;; Maintainer: Howard X. Abrams ;; Created: October 14, 2022 ;; ;; While obvious, GNU Emacs does not include this file or project. ;; ;; *NB:* Do not edit this file. Instead, edit the original literate file at: ;; ~/src/hamacs/ha-data.org ;; And tangle the file to recreate this one. ;; ;;; Code: #+end_src * Introduction Once upon a time, I [[https://www.youtube.com/watch?v=HKJMDJ4i-XI][gave a talk]] to EmacsConf 2019, about [[http://howardism.org/Technical/Emacs/piper-presentation-transcript.html][an interesting idea]] I called [[https://gitlab.com/howardabrams/emacs-piper][emacs-piper]]. I still like the idea of sometimes editing an Emacs buffer on the entire contents, as if it were a data file. This file contains what I feel are the best functions for that… oh, and a leader to call it (instead of the original Hydra). #+begin_src emacs-lisp (ha-leader "d" '(:ignore t :which-key "data") "d |" '("pipe to shell" . replace-buffer-with-shell-command) "d r" '("replace buffer" . ha-vr-replace-all) "d y" '("copy to clipboard" . ha-yank-buffer-contents)) #+end_src ** Global Replacements The string replacement functions operate at the current point, which means I need to jump to the beginning before calling something like [[help:vr/replace][vr/replace]]. #+begin_src emacs-lisp (defun ha-vr-replace-all (regexp replace start end) "Regexp-replace entire buffer with live visual feedback." (interactive (vr--interactive-get-args 'vr--mode-regexp-replace 'vr--calling-func-replace)) (vr/replace regexp replace (point-min) (point-max))) #+end_src * Line-Oriented Functions These functions focus on the data in the buffer as a series of lines: #+begin_src emacs-lisp (ha-leader "d l" '(:ignore t :which-key "on lines") "d l d" '("flush lines" . flush-lines) "d l k" '("keep lines" . keep-lines) "d l s" '("sort lines" . ha-sort-lines) "d l f" '("sort fields" . ha-sort-fields) "d l n" '("sort field num" . ha-sort-fields-numerically) "d l r" '("reverse lines" . ha-reverse-lines) "d l u" '("unique lines" . delete-duplicate-lines) "d l b" '("flush blanks" . flush-blank-lines)) #+end_src One issue I have is [[help:keep-lines][keep-lines]] operate on the lines /starting with the point/, not on the entire buffer. Let’s fix that: #+begin_src emacs-lisp (defun call-function-at-buffer-beginning (orig-fun &rest args) "Call ORIG-FUN after moving point to beginning of buffer. Point restored after completion. Good for advice." (save-excursion (goto-char (point-min)) (apply orig-fun args))) (advice-add 'keep-lines :around #'call-function-at-buffer-beginning) (advice-add 'flush-lines :around #'call-function-at-buffer-beginning) #+end_src The [[help:sort-lines][sort-lines]] is useful, but insists on an /active/ region. Let’s make a collection of data-focused versions that work on both a region (if it is active) or the entire buffer, regardless of the position of the cursor. #+begin_src emacs-lisp (dolist (tuple '((ha-sort-lines sort-lines) (ha-sort-fields sort-fields) (ha-sort-fields-numerically sort-numeric-fields))) (cl-destructuring-bind (func orig-func) tuple (eval `(defun ,func (prefix) ,(format "Call `%s' with all lines in a buffer or region (if active). Passes PREFIX to the function." orig-func) (interactive "P") (save-excursion (if (region-active-p) (,orig-func prefix (region-beginning) (region-end)) (,orig-func prefix (point-min) (point-max)))))))) #+end_src Getting rid of blank lines seems somewhat useful: #+begin_src emacs-lisp (defun flush-blank-lines () "Delete all empty lines in the buffer. See `flush-lines'." (interactive) (save-excursion (goto-char (point-min)) (flush-lines (rx line-start (zero-or-more space) line-end)))) #+end_src * Table-Oriented Functions These functions focus on the data in the buffer as a table consisting of columns and rows of some sort. #+begin_src emacs-lisp (ha-leader "d t" '(:ignore t :which-key "on tables") "d t f" '("sort by columns" . ha-sort-fields) "d t n" '("sort by columns numerically" . ha-sort-fields-numerically) "d t k" '("keep columns" . keep-columns) "d t f" '("flush columns" . flush-columns)) #+end_src Each of the /table/ functions require a /table separator/ (for instance the =|= character) and often the columns to operate on. The =keep-columns= removes all text, except for /indexed/ text between the separator: #+begin_src emacs-lisp (defun keep-columns (separator columns) "Keep tabular text columns, deleting the rest in buffer or region. Defined columns as text between SEPARATOR, not numerical position. Note that text _before_ the separator is column 0. For instance, given the following table: apple : avocado : apricot banana : blueberry : bramble cantaloupe : cherry : courgette : cucumber data : durian Calling this function with a `:' character, as columns: `0, 2' results in the buffer text: apple : apricot banana : bramble cantaloupe : courgette data : durian" (interactive "sSeparator: \nsColumns to Keep: ") (operate-columns separator columns t)) #+end_src The =flush-columns= is similar, except that is deletes the given columns. #+begin_src emacs-lisp (defun flush-columns (separator columns) "Delete tabular text columns in buffer or region. Defined columns as text between SEPARATOR, not numerical position. Note that text _before_ the separator is column 0. For instance, given the following table: apple : avocado : apricot banana : blueberry : bramble cantaloupe : cherry : courgette : cucumber data : durian Calling this function with a `:' character, as columns: `1' (remember the colums are 0-indexed), results in the buffer text: apple : avocado : apricot banana : blueberry : bramble cantaloupe : cherry : courgette : cucumber data : durian apple : apricot banana : bramble cantaloupe : courgette data : durian" (interactive "sSeparator: \nsColumns to Delete: ") (operate-columns separator columns nil)) #+end_src Both functions are similar, and their behavior comes from =operate-columns=, which walks through the buffer, line-by-line: #+begin_src emacs-lisp (defun operate-columns (separator columns-str keep?) "Call `operate-columns-on-line' for each line in buffer. First, convert string COLUMNS-STR to a list of number, then search for SEPARATOR." (let ((columns (numbers-to-number-list columns-str))) (save-excursion (when (region-active-p) (narrow-to-region (region-beginning) (region-end))) (goto-char (point-min)) (while (re-search-forward (rx (literal separator)) nil t) (operate-columns-on-line separator columns t) (next-line))))) #+end_src For each line, the =operate-columns= calls this function: #+begin_src emacs-lisp (defun operate-columns-on-line (separator columns keep?) "Replace current line after keeping or deleting COLUMNS. Keep the COLUMNS if KEEP? is non-nil, delete otherwise. Defined columns as the text between SEPARATOR." (cl-labels ((keep-oper (idx it) (if keep? (when (member idx columns) it) (unless (member idx columns) it)))) (let* ((start (line-beginning-position)) (end (line-end-position)) (line (buffer-substring start end)) (parts (thread-last (split-string line separator) (--map-indexed (keep-oper it-index it)) (-remove 'null))) (nline (string-join parts separator))) (delete-region start end) (insert nline)))) #+end_src I like the idea of the shell command, =cut=, where you can have an arbitrary character as a separator, and then either delete or keep the data between them, as columns. But I need a function that can convert a string of “columns”, for instance ="1, 4-7 9"= to an list of numbers, like ='(1 4 5 6 7 9)=: #+begin_src emacs-lisp (defun numbers-to-number-list (input) "Convert the string, INPUT, to a list of numbers. For instance: `1, 4-7 9' returns `(1 4 5 6 7 9)'" (let* ((separator (rx (* space) (or "," space) (* space))) (dashed (rx (* space) "-" (* space))) (ranged (rx (group (+ digit)) (regexp dashed) (group (+ digit)))) (str-list (split-string input separator t))) (--reduce-from (append acc (if (string-match ranged it) (number-sequence (string-to-number (match-string 1 it)) (string-to-number (match-string 2 it))) (list (string-to-number it)))) () str-list))) #+end_src Does this work? #+begin_src emacs-lisp :tangle no (ert-deftest numbers-to-number-list-test () (should (equal (numbers-to-number-list "2") '(2))) (should (equal (numbers-to-number-list "1, 2 3") '(1 2 3))) (should (equal (numbers-to-number-list "1, 4-7 9") '(1 4 5 6 7 9)))) #+end_src The [[help:sort-fields][sort-fields]] function does a good job if the table is space separated, but if we separate by some other character(s), it doesn’t work. Can we write a function that does this? Here we make a /helper/ to some regular expression for the [[help:sort-regexp-fields][sort-regexp-fields]] function. #+begin_src emacs-lisp (defun ha-sort-table-by-column (separator field) "Sort the active region or entire buffer by column, FIELD. Columns are denoted by a regular expression, SEPARATOR, which could be a single character. For instance, given a buffer: d, a, c, b x, y e, f, g i, m, a, o Calling this function with a `,' separator, and `2' for the column, would result in: d, a, c, b e, f, g i, m, a, o x, y" (interactive "sSeparator: \nnSorting field: ") ;; Create a regular expression of grouped fields, separated ;; by the separator sequence, for commas, this would be e.g. ;; \\(.*\\),\\(.*\\),\\(.*\\),\\(.*\\) (let* ((rx-list (mapconcat (lambda (x) (rx (group (zero-or-more any)))) (number-sequence 1 field) separator)) ;; Prepend the beginning of line to the regular expression: (regexp (concat (rx bol) rx-list)) (start (if (region-active-p) (region-beginning) (point-min))) (end (if (region-active-p) (region-end) (point-max)))) (save-excursion (sort-regexp-fields nil regexp (format "\\%d" field) start end)))) #+end_src * Buffer-Oriented Functions If there is no specific function, but you can think of a shell command that will work, then #+begin_src emacs-lisp (defun replace-buffer-with-shell-command (command) "Replaces the contents of the buffer, or the contents of the selected region, with the output from running an external executable, COMMAND. This is a wrapper around `shell-command-on-region'." (interactive "sCommand: ") (save-excursion (save-restriction (when (region-active-p) (narrow-to-region (region-beginning) (region-end))) (shell-command-on-region (point-min) (point-max) command nil t)))) #+end_src * Technical Artifacts :noexport: Let's =provide= a name so we can =require= this file: #+begin_src emacs-lisp :exports none (provide 'ha-data) ;;; ha-data.el ends here #+end_src #+description: configuring Emacs to edit files of data. #+property: header-args:sh :tangle no #+property: header-args:emacs-lisp :tangle yes #+property: header-args :results none :eval no-export :comments no mkdirp yes #+options: num:nil toc:t todo:nil tasks:nil tags:nil date:nil #+options: skip:nil author:nil email:nil creator:nil timestamp:nil #+infojs_opt: view:nil toc:t ltoc:t mouse:underline buttons:0 path:http://orgmode.org/org-info.js