hamacs/ha-data.org
Howard Abrams 988c0dac2e Decided I like lowercase headers better
Oh, and let's fix the FILETAGS. Thank goodness for woccurrrrrr.
2024-03-06 20:02:25 -08:00

294 lines
12 KiB
Org Mode
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#+title: Editing Data Files
#+author: Howard X. Abrams
#+date: 2022-10-14
#+tags: emacs
A literate programming file for configuring Emacs to edit files of data.
#+begin_src emacs-lisp :exports none
;;; ha-data --- edit data files. -*- lexical-binding: t; -*-
;;
;; © 2022-2023 Howard X. Abrams
;; Licensed under a Creative Commons Attribution 4.0 International License.
;; See http://creativecommons.org/licenses/by/4.0/
;;
;; Author: Howard X. Abrams <http://gitlab.com/howardabrams>
;; Maintainer: Howard X. Abrams
;; Created: October 14, 2022
;;
;; While obvious, GNU Emacs does not include this file or project.
;;
;; *NB:* Do not edit this file. Instead, edit the original literate file at:
;; /Users/howard.abrams/other/hamacs/ha-data.org
;; And tangle the file to recreate this one.
;;
;;; Code:
#+end_src
* Introduction
Once upon a time, I [[https://www.youtube.com/watch?v=HKJMDJ4i-XI][gave a talk]] to EmacsConf 2019, about [[http://howardism.org/Technical/Emacs/piper-presentation-transcript.html][an interesting idea]] I called [[https://gitlab.com/howardabrams/emacs-piper][emacs-piper]]. I still like the idea of sometimes editing an Emacs buffer on the entire contents, as if it were a data file. This file contains what I feel are the best functions for that… oh, and a leader to call it (instead of the original Hydra).
#+begin_src emacs-lisp
(ha-leader
"d" '(:ignore t :which-key "data")
"d |" '("pipe to shell" . replace-buffer-with-shell-command)
"d r" '("replace buffer" . ha-vr-replace-all)
"d y" '("copy to clipboard" . ha-yank-buffer-contents))
#+end_src
** Global Replacements
The string replacement functions operate at the current point, which means I need to jump to the beginning before calling something like [[help:vr/replace][vr/replace]].
#+begin_src emacs-lisp
(defun ha-vr-replace-all (regexp replace start end)
"Regexp-replace entire buffer with live visual feedback."
(interactive
(vr--interactive-get-args 'vr--mode-regexp-replace 'vr--calling-func-replace))
(vr/replace regexp replace (point-min) (point-max)))
#+end_src
* Line-Oriented Functions
These functions focus on the data in the buffer as a series of lines:
#+begin_src emacs-lisp
(ha-leader
"d l" '(:ignore t :which-key "on lines")
"d l d" '("flush lines" . flush-lines)
"d l k" '("keep lines" . keep-lines)
"d l s" '("sort lines" . ha-sort-lines)
"d l f" '("sort fields" . ha-sort-fields)
"d l n" '("sort field num" . ha-sort-fields-numerically)
"d l r" '("reverse lines" . ha-reverse-lines)
"d l u" '("unique lines" . delete-duplicate-lines)
"d l b" '("flush blanks" . flush-blank-lines))
#+end_src
One issue I have is [[help:keep-lines][keep-lines]] operate on the lines /starting with the point/, not on the entire buffer. Lets fix that:
#+begin_src emacs-lisp
(defun call-function-at-buffer-beginning (orig-fun &rest args)
"Call ORIG-FUN after moving point to beginning of buffer.
Point restored after completion. Good for advice."
(save-excursion
(goto-char (point-min))
(apply orig-fun args)))
(advice-add 'keep-lines :around #'call-function-at-buffer-beginning)
(advice-add 'flush-lines :around #'call-function-at-buffer-beginning)
#+end_src
The [[help:sort-lines][sort-lines]] is useful, but insists on an /active/ region. Lets make a collection of data-focused versions that work on both a region (if it is active) or the entire buffer, regardless of the position of the cursor.
#+begin_src emacs-lisp
(dolist (tuple '((ha-sort-lines sort-lines)
(ha-sort-fields sort-fields)
(ha-sort-fields-numerically sort-numeric-fields)))
(cl-destructuring-bind (func orig-func) tuple
(eval `(defun ,func (prefix)
,(format "Call `%s' with all lines in a buffer or region (if active).
Passes PREFIX to the function." orig-func)
(interactive "P")
(save-excursion
(if (region-active-p)
(,orig-func prefix (region-beginning) (region-end))
(,orig-func prefix (point-min) (point-max))))))))
#+end_src
Getting rid of blank lines seems somewhat useful:
#+begin_src emacs-lisp
(defun flush-blank-lines ()
"Delete all empty lines in the buffer. See `flush-lines'."
(interactive)
(save-excursion
(goto-char (point-min))
(flush-lines (rx line-start (zero-or-more space) line-end))))
#+end_src
* Table-Oriented Functions
These functions focus on the data in the buffer as a table consisting of columns and rows of some sort.
#+begin_src emacs-lisp
(ha-leader
"d t" '(:ignore t :which-key "on tables")
"d t f" '("sort by columns" . ha-sort-fields)
"d t n" '("sort by columns numerically" . ha-sort-fields-numerically)
"d t k" '("keep columns" . keep-columns)
"d t f" '("flush columns" . flush-columns))
#+end_src
Each of the /table/ functions require a /table separator/ (for instance the =|= character) and often the columns to operate on.
The =keep-columns= removes all text, except for /indexed/ text between the separator:
#+begin_src emacs-lisp
(defun keep-columns (separator columns)
"Keep tabular text columns, deleting the rest in buffer or region.
Defined columns as text between SEPARATOR, not numerical
position. Note that text _before_ the separator is column 0.
For instance, given the following table:
apple : avocado : apricot
banana : blueberry : bramble
cantaloupe : cherry : courgette : cucumber
data : durian
Calling this function with a `:' character, as columns: `0, 2'
results in the buffer text:
apple : apricot
banana : bramble
cantaloupe : courgette
data : durian"
(interactive "sSeparator: \nsColumns to Keep: ")
(operate-columns separator columns t))
#+end_src
The =flush-columns= is similar, except that is deletes the given columns.
#+begin_src emacs-lisp
(defun flush-columns (separator columns)
"Delete tabular text columns in buffer or region.
Defined columns as text between SEPARATOR, not numerical
position. Note that text _before_ the separator is column 0.
For instance, given the following table:
apple : avocado : apricot
banana : blueberry : bramble
cantaloupe : cherry : courgette : cucumber
data : durian
Calling this function with a `:' character, as columns: `1'
(remember the colums are 0-indexed),
results in the buffer text:
apple : avocado : apricot
banana : blueberry : bramble
cantaloupe : cherry : courgette : cucumber
data : durian
apple : apricot
banana : bramble
cantaloupe : courgette
data : durian"
(interactive "sSeparator: \nsColumns to Delete: ")
(operate-columns separator columns nil))
#+end_src
Both functions are similar, and their behavior comes from =operate-columns=, which walks through the buffer, line-by-line:
#+begin_src emacs-lisp
(defun operate-columns (separator columns-str keep?)
"Call `operate-columns-on-line' for each line in buffer.
First, convert string COLUMNS-STR to a list of number, then
search for SEPARATOR."
(let ((columns (numbers-to-number-list columns-str)))
(save-excursion
(when (region-active-p)
(narrow-to-region (region-beginning) (region-end)))
(goto-char (point-min))
(while (re-search-forward (rx (literal separator)) nil t)
(operate-columns-on-line separator columns t)
(next-line)))))
#+end_src
For each line, the =operate-columns= calls this function:
#+begin_src emacs-lisp
(defun operate-columns-on-line (separator columns keep?)
"Replace current line after keeping or deleting COLUMNS.
Keep the COLUMNS if KEEP? is non-nil, delete otherwise.
Defined columns as the text between SEPARATOR."
(cl-labels ((keep-oper (idx it) (if keep?
(when (member idx columns) it)
(unless (member idx columns) it))))
(let* ((start (line-beginning-position))
(end (line-end-position))
(line (buffer-substring start end))
(parts (thread-last (split-string line separator)
(--map-indexed (keep-oper it-index it))
(-remove 'null)))
(nline (string-join parts separator)))
(delete-region start end)
(insert nline))))
#+end_src
I like the idea of the shell command, =cut=, where you can have an arbitrary character as a separator, and then either delete or keep the data between them, as columns. But I need a function that can convert a string of “columns”, for instance ="1, 4-7 9"= to an list of numbers, like ='(1 4 5 6 7 9)=:
#+begin_src emacs-lisp
(defun numbers-to-number-list (input)
"Convert the string, INPUT, to a list of numbers.
For instance: `1, 4-7 9' returns `(1 4 5 6 7 9)'"
(let* ((separator (rx (* space) (or "," space) (* space)))
(dashed (rx (* space) "-" (* space)))
(ranged (rx (group (+ digit)) (regexp dashed) (group (+ digit))))
(str-list (split-string input separator t)))
(--reduce-from (append acc (if (string-match ranged it)
(number-sequence
(string-to-number (match-string 1 it))
(string-to-number (match-string 2 it)))
(list (string-to-number it))))
() str-list)))
#+end_src
Does this work?
#+begin_src emacs-lisp :tangle no
(ert-deftest numbers-to-number-list-test ()
(should (equal (numbers-to-number-list "2") '(2)))
(should (equal (numbers-to-number-list "1, 2 3") '(1 2 3)))
(should (equal (numbers-to-number-list "1, 4-7 9") '(1 4 5 6 7 9))))
#+end_src
The [[help:sort-fields][sort-fields]] function does a good job if the table is space separated, but if we separate by some other character(s), it doesnt work. Can we write a function that does this? Here we make a /helper/ to some regular expression for the [[help:sort-regexp-fields][sort-regexp-fields]] function.
#+begin_src emacs-lisp
(defun ha-sort-table-by-column (separator field)
"Sort the active region or entire buffer by column, FIELD.
Columns are denoted by a regular expression, SEPARATOR, which
could be a single character. For instance, given a buffer:
d, a, c, b
x, y
e, f, g
i, m, a, o
Calling this function with a `,' separator, and `2' for the
column, would result in:
d, a, c, b
e, f, g
i, m, a, o
x, y"
(interactive "sSeparator: \nnSorting field: ")
;; Create a regular expression of grouped fields, separated
;; by the separator sequence, for commas, this would be e.g.
;; \\(.*\\),\\(.*\\),\\(.*\\),\\(.*\\)
(let* ((rx-list (mapconcat (lambda (x) (rx (group (zero-or-more any))))
(number-sequence 1 field)
separator))
;; Prepend the beginning of line to the regular expression:
(regexp (concat (rx bol) rx-list))
(start (if (region-active-p) (region-beginning) (point-min)))
(end (if (region-active-p) (region-end) (point-max))))
(save-excursion
(sort-regexp-fields nil regexp (format "\\%d" field) start end))))
#+end_src
* Buffer-Oriented Functions
If there is no specific function, but you can think of a shell command that will work, then
#+begin_src emacs-lisp
(defun replace-buffer-with-shell-command (command)
"Replaces the contents of the buffer, or the contents of the
selected region, with the output from running an external
executable, COMMAND.
This is a wrapper around `shell-command-on-region'."
(interactive "sCommand: ")
(save-excursion
(save-restriction
(when (region-active-p)
(narrow-to-region (region-beginning) (region-end)))
(shell-command-on-region (point-min) (point-max) command nil t))))
#+end_src
* Technical Artifacts :noexport:
Let's =provide= a name so we can =require= this file:
#+begin_src emacs-lisp :exports none
(provide 'ha-data)
;;; ha-data.el ends here
#+end_src
#+description: configuring Emacs to edit files of data.
#+property: header-args:sh :tangle no
#+property: header-args:emacs-lisp :tangle yes
#+property: header-args :results none :eval no-export :comments no mkdirp yes
#+options: num:nil toc:t todo:nil tasks:nil tags:nil date:nil
#+options: skip:nil author:nil email:nil creator:nil timestamp:nil
#+infojs_opt: view:nil toc:t ltoc:t mouse:underline buttons:0 path:http://orgmode.org/org-info.js