Source: understanding-ob-shell.org
# -*- time-stamp-start: "^\\#\\+[Dd]+[Aa]+[Tt]+[Ee]+:[ ]+"; time-stamp-end: "\\\\?$"; time-stamp-format:"%Y-%02m-%02d"; -*-
#+title: Understanding ob-shell.el
#+date: 2024-01-06
#+type: post
# created: October 28, 2023
=org-babel= enables embedding executable source code within a
document. Babel differs from other "notebook" applications in that the
source code, execution parameters, and results have the same
representation as the rest of the document. It's all text. Delimiters
surround the source code and distinguish it from non-code text, as well
as provide options for execution. When run, source code is isolated
from the non-code text, executed in a subprocess, and some result
inserted into the document. Source code is a "first class citizen".
Code, prose, and results are interchangeable.
=ob-shell.el= defines functions and variables that manage the
execution of several shell languages.
It appears that a high level consideration of shell evaluation hasn't
happened in at least a decade. Instead, the library has progressed by
piecemeal fixes. This has resulted in local optimizations.
Abstractions and interfaces are not always cohesive. I suspect this
is a source of bugs and that it deters contributions. There have been
several recent calls for maintainers. My hope is that this document
and the changes it inspires will encourage new people to join the
project as well as make Org more reliable.
Updating =ob-shell.el= is no trivial matter. It's a classic case of
Chesterton's fence,
#+begin_quote
"In the matter of reforming things, as distinct from deforming them,
there is one plain and simple principle; a principle which will
probably be called a paradox. There exists in such a case a certain
institution or law; let us say, for the sake of simplicity, a fence or
gate erected across a road. The more modern type of reformer goes
gaily up to it and says, 'I don't see the use of this; let us clear it
away.' To which the more intelligent type of reformer will do well to
answer: 'If you don't see the use of it, I certainly won't let you
clear it away. Go away and think. Then, when you can come back and
tell me that you do see the use of it, I may allow you to destroy it."
#+end_quote
This document is a record of "going away and thinking." It presents
an understanding of why things are the way they currently are, where
"current" means org-mode git commit [[https://git.savannah.gnu.o9183e3c72rg/cgit/emacs/org-mode.git/commit/?id=9183e3c723b812360d1042196416d521db590e9f][9183e3c72]]. Each section of code
is analyzed, questions are asked, and recommendations given. See [[*
Validation][Validation]] within the source file for how to compare a
checked out version of the code to the code reviewed in this document.
# Guiding principals:
# - *Sustainability* What changes will ease maintenance so that ob-shell
# and org-babel remain community assets?
# - *Stability* How might we organize the code so that adding features
# doesn't disrupt current functionality?
# - *Consistency* What can we do to make the Org Babel code base more
# consistent? My impression is that a high level consideration of shell
# evaluation hasn't happened in at least a decade. Instead, my
# impression is that the library has progressed by piecemeal fixes
# resulting local optimizations. My hypothesis is that this is a major
# source of edge cases.
Source file: https://excalamus.com/understanding-ob-shell-raw.html
* Table of Contents
:PROPERTIES:
#+TOC: headlines 1
:END:
* Meta :noexport:
:PROPERTIES:
#+TOC: headlines 2 local
:END:
** Setup
Checkout commit [[https://git.savannah.gnu.o9183e3c72rg/cgit/emacs/org-mode.git/commit/?id=9183e3c723b812360d1042196416d521db590e9f][9183e3c72]] of the Org source.
#+begin_src sh :async t :session *understanding-ob-shell* :exports both :eval never-export
if [ -d "/tmp/org-mode" ]; then rm -rf "/tmp/org-mode"; fi
cd /tmp
git clone https://git.savannah.gnu.org/git/emacs/org-mode.git
cd org-mode
git -c advice.detachedHead=false checkout 9183e3c723b812360d1042196416d521db590e9f
#+end_src
#+RESULTS:
: org_babel_sh_prompt> org_babel_sh_prompt> org_babel_sh_prompt> Cloning into 'org-mode'...
: remote: Counting objects: 144462, done.
: remote: Compressing objects: 100% (31560/31560), done.
: remote: Total 144462 (delta 113225), reused 143823 (delta 112760)
: Receiving objects: 100% (144462/144462), 100.63 MiB | 5.54 MiB/s, done.
: Resolving deltas: 100% (113225/113225), done.
: org_babel_sh_prompt> HEAD is now at 9183e3c72 * lisp/org-crypt.el: Fix checkdoc warnings
Set up a working environment. Generate TAGS file so that the checkout
is navigable. Override xref so that we stay within the checkout when
navigating from a checkout file.
#+begin_src emacs-lisp :results none :tangle no :lexical t :eval never-export
;; we want xref to navigate within the checkout directory, not within
;; the currently loaded Org mode (which is not the checkout).
(funcall
(let* ((androidp (file-exists-p "/data/data/org.gnu.emacs/files"))
(org-root (if androidp
"/data/data/org.gnu.emacs/files/org-mode"
"/tmp/org-mode")))
(lambda ()
;; generate TAGS on non-android system
(unless androidp
;; See xc/build-tags:
;; https://codeberg.org/excalamus/.emacs.d/src/commit/749edc6093b355f54e6258c99818a4657eab76e9/init.el#L340
(call-interactively
#'(lambda ()
"Basically calls 'ctags -eR [DIR]' on the given directory."
(interactive)
(xc/build-tags org-root nil nil t))))
;; load TAGS
(visit-tags-table (concat org-root "/TAGS"))
;; make xref use TAGS
(add-hook 'xref-after-jump-hook
(lambda ()
(if (string-match-p org-root (buffer-file-name))
(xref-etags-mode 1))))
;; open the file of interest
(find-file-other-window (concat org-root "/lisp/ob-shell.el"))
;; Bootstrap the TAGS referencing
(with-current-buffer "ob-shell.el"
(xref-etags-mode 1)))))
#+end_src
Disable xref override.
#+begin_src emacs-lisp :results none :tangle no :lexical t :eval never-export
(funcall
(let* ((androidp (file-exists-p "/data/data/org.gnu.emacs/files"))
(org-root (if androidp
"/data/data/org.gnu.emacs/files/org-mode"
"/tmp/org-mode")))
(lambda ()
(remove-hook 'xref-after-jump-hook
(lambda ()
(if (string-match-p org-root (buffer-file-name))
(xref-etags-mode 1)))))))
#+end_src
** Validation
Run the following block to check if all of the original source is
represented in this document.
#+begin_src emacs-lisp :results none :tangle no :eval never-export
;; tangle all emacs-lisp source blocks
(org-babel-tangle nil (concat temporary-file-directory "ob-shell-review.el") "emacs-lisp")
;; compare tangled sources with original (ignoring whitespace)
(let ((tangled-hash
(with-temp-buffer
(insert-file (concat temporary-file-directory "ob-shell-review.el"))
(secure-hash
'sha256
(string-trim (buffer-substring-no-properties (buffer-end 0) (buffer-end 1))))))
(source-hash
(with-temp-buffer
;; (insert-file "/data/data/org.gnu.emacs/files/org-mode/lisp/ob-shell.el")
(insert-file "/tmp/org-mode/lisp/ob-shell.el")
(whitespace-cleanup)
(secure-hash
'sha256
(string-trim (buffer-substring-no-properties (buffer-end 0) (buffer-end 1)))))))
(if (string= source-hash tangled-hash)
(message "Annotated and original MATCH")
(message "Annotated and original DO NOT MATCH")))
#+end_src
Run the following to manually compare the original source and the
source from this document.
#+begin_src emacs-lisp :results none :tangle no :eval never-export
;; Manual comparison
;; (ediff "/tmp/org-mode/lisp/ob-shell.el" "/tmp/ob-shell-review.el")
(ediff "/data/data/org.gnu.emacs/files/org-mode/lisp/ob-shell.el" (concat temporary-file-directory "ob-shell-review.el"))
#+end_src
The original has mixed tabs and spaces. Run the following to manually
compare the original source with tabs converted to spaces and the
source from this document.
#+begin_src emacs-lisp :results none :tangle no :eval never-export
;; Manual comparison with tabs converted to spaces (checkout contains
;; mixed tabs and spaces)
(progn
(with-temp-buffer
(insert-file "/tmp/org-mode/lisp/ob-shell.el")
(whitespace-cleanup)
(write-file "/tmp/ob-shell-tabs-to-spaces.el")
(message "Wrote %s" "/tmp/ob-shell-tabs-to-spaces.el"))
(ediff "/tmp/ob-shell-tabs-to-spaces.el" "/tmp/ob-shell-review.el"))
#+end_src
The order of definitions as given in the checkout is not good for
comprehension. This document rearranges definitions in a logical
order. The validation block below places definitions in the same
order as the checkout.
#+name: validation
#+begin_src emacs-lisp :eval never :noweb yes :tangle "/tmp/ob-shell-review.el"
<<top-matter>>
<<requires>>
<<function-forward-declares>>
<<org-babel-default-header-args:shell>>
<<org-babel-shell-names-1>>
<<org-babel-shell-set-prompt-commands>>
<<org-babel-shell-initialize>>
<<org-babel-shell-names-2>>
<<org-babel-shell-results-defaults-to-output>>
<<org-babel-execute:shell>>
<<org-babel-prep-session:shell>>
<<org-babel-load-session:shell>>
;;; Helper functions
<<org-babel--variable-assignments:sh-generic>>
<<org-babel--variable-assignments:fish>>
<<org-babel--variable-assignments:bash_array>>
<<org-babel--variable-assignments:bash_assoc>>
<<org-babel--variable-assignments:bash>>
<<org-babel-variable-assignments:shell>>
<<org-babel-sh-var-to-sh>>
<<org-babel-sh-var-to-string>>
<<org-babel-sh-eoe-indicator>>
<<org-babel-sh-eoe-output>>
<<org-babel-sh-prompt>>
<<org-babel-sh-initiate-session>>
<<ob-shell-async-indicator>>
<<ob-shell-async-chunk-callback>>
<<org-babel-sh-evaluate>>
<<org-babel-sh-strip-weird-long-prompt>>
(provide 'ob-shell)
;;; ob-shell.el ends here
#+end_src
* org-babel
:PROPERTIES:
#+TOC: headlines 2 local
:END:
Org Babel started as a package called "[[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=7c51afb64215258cbc03f5e0a80a78b718385ae5][litorgy]]" around February
22, 2009. Support for shells, [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/diff/litorgy/litorgy-shell.el?id=e7701000029f9c71ddcec6fd469e483e99a542d4][litorgy-shell.el]], was added in commit
[[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=e7701000029f9c71ddcec6fd469e483e99a542d4][e7701000]] on March 31, 2009. The current file [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/tree/lisp/ob-shell.el?id=583e7ab1757f54a656acb52ef60c6069d060cbe1][lisp/ob-shell.el]] was
created in commit [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=583e7ab1757f54a656acb52ef60c6069d060cbe1][583e7ab1]] on December 13, 2013.
An incomplete history of ob-shell is here:
| Date | Commit | Action |
|-------------------+----------+------------------------------------------------------|
| March 31, 2009 | [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=e7701000029f9c71ddcec6fd469e483e99a542d4][e7701000]] | [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/tree/litorgy/litorgy-shell.el?id=e7701000029f9c71ddcec6fd469e483e99a542d4][litorgy-shell.el]] created |
| May 24, 2009 | [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=2b9b2ee7a3d3e2c48e317fc1143a28c7271c9530][2b9b2ee7]] | renamed to [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/tree/org-babel/org-babel-shell.el?id=2b9b2ee7a3d3e2c48e317fc1143a28c7271c9530][org-babel/org-babel-shell.el]] |
| May 24, 2009 | [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=e82fb2566cb573f5e9e38d6e63aef535a1b936f5][e82fb256]] | (parent) |
| May 24, 2009 | [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=8e9d3a51b763d1ec6fc57ad6440963d8e89ccbb1][8e9d3a51]] | (parent) |
| May 24, 2009 | [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=9247200bb0ac8fb52770951a8ea66b8e16aa7daf][9247200b]] | renamed to [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/tree/lisp/org-babel-shell.el?id=9247200bb0ac8fb52770951a8ea66b8e16aa7daf][lisp/org-babel-shell.el]] |
| June 12, 2009 | [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=b64be813631b91d6d18b9b14542fd95f79ca6f8b][b64be813]] | renamed to [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/tree/lisp/langs/org-babel-shell.el?id=b64be813631b91d6d18b9b14542fd95f79ca6f8b][lisp/langs/org-babel-shell.el]] |
| June 14, 2009 | [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=4fd1abecc5e373d04e8069cd4c9f1567eb1e871a][4fd1abec]] | renamed to [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/tree/lisp/langs/org-babel-sh.el?id=4fd1abecc5e373d04e8069cd4c9f1567eb1e871a][lisp/langs/org-babel-sh.el]] |
| June 17, 2010 | [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=e0e4d76094f269f1786dc5339b939f7900c1a5e6][e0e4d760]] | renamed to [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/tree/lisp/babel/langs/ob-sh.el?id=e0e4d76094f269f1786dc5339b939f7900c1a5e6][lisp/babel/langs/ob-sh.el]] |
| June 17, 2010 | [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=04152d3a06bfe131b6e0c9cfcc12d8de5314f627][04152d3a]] | renamed to [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/tree/lisp/babel/langs/ob-sh.el?id=e0e4d76094f269f1786dc5339b939f7900c1a5e6][lisp/babel/langs/ob-sh.el]] (from contrib?) |
| December 13, 2013 | [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=583e7ab1757f54a656acb52ef60c6069d060cbe1][583e7ab1]] | renamed from lisp/ob-sh.el to [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/tree/lisp/ob-shell.el?id=583e7ab1757f54a656acb52ef60c6069d060cbe1][lisp/ob-shell.el]] |
** ON-HOLD Difference between a "shell" and a "terminal"
On-hold until I can find the brain power to dig into kernel drivers
and terminal I/O more.
The following is Linux centric because that's what information is
available. Other resources included MINIX. All of these are
inspired by UNIX. Maybe it's more accurate to say it leans POSIX.
A "shell" is a command driven process launcher. It reads input,
evaluates it, prints results, and loops back to reading input. The
following demonstrates how a simple shell operates. It's easy to
imagine the example extended to receive input in batch. Such batch
input to a shell is a "script."
#+begin_src C :eval never
/*
* https://stevens.netmeister.org/631/
*/
#include <sys/types.h>
#include <sys/wait.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sysexits.h>
#include <unistd.h>
char *
getinput (char *buffer, size_t buflen)
{
printf ("$$ ");
return fgets (buffer, buflen, stdin);
}
int
main (int argc, char **argv)
{
char buf[BUFSIZ];
pid_t pid;
int status;
/* cast to void to silence compiler warnings */
(void) argc;
(void) argv;
while (getinput (buf, sizeof (buf)))
{
buf[strlen (buf) - 1] = '\0';
if ((pid = fork ()) == -1)
{
fprintf (stderr, "shell: can't fork: %s\n", strerror (errno));
continue;
}
else if (pid == 0)
{ /* child */
execlp (buf, buf, (char *) 0);
fprintf (stderr, "shell: couldn't exec %s: %s\n", buf,
strerror (errno));
exit (EX_UNAVAILABLE);
}
/* parent waits */
if ((pid = waitpid (pid, &status, 0)) < 0)
{
fprintf (stderr, "shell: waitpid error: %s\n", strerror (errno));
}
}
exit (EX_OK);
}
#+end_src
POSIX defines a shell as:
#+begin_quote
A program that interprets sequences of text input as
commands. It may operate on an input stream or it may interactively
prompt and read commands from a terminal.
-- https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_347
#+end_quote
A "terminal" is an interface to a shell. A shell simply "interprets
sequences of text" for the purpose of starting new processes. A
terminal manages the sequences of text sent and received from a
shell. The terms "console" and "command-line" are also used.
Historically, a terminal was a physical device. Early computers
repurposed Teletypes, electrical mechanical devices for sending and
receiving telegrams, for input and output. As computers developed,
so did terminals. Gradually, terminals went from recycled devices
resembling typewriters to simple computers with memory, screens, and
basic computational ability. Still, the primary purpose of a
terminal is preparing text for a shell and outputting text from the
shell to the user.
Terminals prepare text in one of two ways: character-wise or
line-wise. For example, if the user inputs "dsta", realizes their
mistake, presses backspace three times, and then inserts "ata"
before pressing enter, the shell receives 11 characters. If the
terminal instead handled this as a line, then only the 5 characters
are necessary. Which approach to use depends on the hardware and
requirements. Yet even within this simple example is hidden
complexity. What constitutes the end of a line? A carriage return?
A line feed? Both? If both, must the user type both? This and
many other tasks like echoes, signal handling, jobs control, and
special characters processing are managed by the terminal. Such
tasks constitute what's called "line discipline."
Terminals act as an interface in other ways. The POSIX standard has
this gem of a definition for "terminal,"
#+begin_quote
A character special file that obeys the specifications of the
general terminal interface.
-- https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_401
A "character special file" is a file that refers to a device (such
as a terminal device file) or that has special properties (such as
/dev/null).
-- https://pubs.opengroup.org/onlinepubs/9699919799/nframe.html
#+end_quote
So, according to POSIX, a terminal is a file that refers to a device
such as a terminal. What that means is a terminal a file, as well
as a physical device. Unfortunately, several "slights of hand"
happen with the words "file", "device", and "terminal."
POSIX is a standardization of operating systems based on UNIX. If
modular design is "the UNIX philosophy", then UNIX's motto must be
"everything is a file." More accurately, "everything has a file
descriptor." This means that (most) objects in the system have a
numeric reference like 0, 1, or 2, which allow them to work with the
system calls "open", "close", "read", and "write" as a stream of
bytes, just like a file. Saying something is a "file" means it has
the same interface as a file regardless of whether it's an
electronic document, a keyboard, a mouse, or a terminal.
Any device we attach to our computer must communicate with the
system. Data must flow from the device into our system and the
system may send data back. Processes related to this I/O are often
called "device drivers." Typically, there is one process, or
driver, per device. UNIX-like operating systems drivers are
represented by special files. The fact they're not typical files
like a document makes them special; they're an interface. So,
"device", "driver", and "file" often refer to the same thing: the
interface the operating system presents for a connected device.
We've already seen that a file is another way to refer to a stream
of bytes. Different devices group data within a stream in different
ways. Some devices, such as a keyboard, operate one byte at a time.
Other devices, like hard disks, use blocks of data. Since one byte
corresponds to a character, such devices are called "character
devices." Connecting our terms, "a character special file" is
another way to say, "a driver associated with a device that sends
and receives one byte at a time." Generally, devices which send and
receive one byte at a time are called "serial" devices.
Physical terminals were some of the first devices ever connected to
a computer. Indeed, the famous picture of Ken Thompson and Dennis
Richie in front of a PDP-11 shows Ken at a Teletype Model 33 ASR
teleprinter (probably interacting with UNIX). It's no surprise then
that the device file used to represent a Teletype in UNIX is named
"tty," a common abbreviation for (T)ele(ty)pes. TTY files have come
to mean any character device.
In Linux, processes are grouped into sets of "sessions" in order to
facilitate process management. Every session is tied to a terminal,
called the "controlling terminal," from which processes in the
session get their input and to which they send their output. A TTY
device file corresponds to any device that could act as a
controlling terminal. These are either serial devices or virtual
consoles (full-screen terminal displays on the system video
monitor). There are various naming conventions for these
devices/files. However, they always correspond to a serial device
or virtual console.
Sometimes it's desirable to enforce a line discipline without a
physical device. For example, we may want to automate shell
interaction using a program or we may want to connect Emacs to a
shell. Linux provides a generalization of TTY drivers called
pseudoterminals, abbreviated by PTY.
The man page for "pty" says it best,
#+begin_quote
A pseudoterminal (sometimes abbreviated "pty") is a pair of virtual
character devices that provide a bidirectional communication
channel. One end of the channel is called the master; the other end
is called the slave.
The slave end of the pseudoterminal provides an interface that
behaves exactly like a classical terminal. A process that expects
to be connected to a terminal, can open the slave end of a
pseudoterminal and then be driven by a program that has opened the
master end. Anything that is written on the master end is provided
to the process on the slave end as though it was input typed on a
terminal. For example, writing the interrupt character (usually
control-C) to the master device would cause an interrupt signal
(SIGINT) to be generated for the foreground process group that is
connected to the slave. Conversely, anything that is written to the
slave end of the pseudoterminal can be read by the process that is
connected to the master end.
Data flow between master and slave is handled asynchronously, much
like data flow with a physical terminal. Data written to the slave
will be available at the master promptly, but may not be available
immediately. Similarly, there may be a small processing delay
between a write to the master, and the effect being visible at the
slave.
-- https://www.man7.org/linux/man-pages/man7/pty.7.html
#+end_quote
Regurgitating some information, on a PTY, the master end sends it's
input to the slave's output:
#+begin_example
PTY
+--------+-------+
input | | | output
-------->==============+----->
| | |
e.g. emacs | master | slave | e.g. shell
| | |
<-----+==============<--------
output | | | input
+--------+-------+
#+end_example
I don't actually know what this diagram means.
For C examples of working with a PTY, see:
- https://www.rkoucha.fr/tech_corner/pty_pdip.html
- [[https://archive.org/details/The_Linux_Programming_Interface/page/1374/mode/2up][The Linux Programming Interface, Kerrisk - Chapter 64]]
rkoucha's example is good yet it's interactive and not easily
embedded within an org document. I could not automate it and must
return to it.
PTYs are compared to bidirectional pipes. Here's an example of
implementing bidirectional pipes (AFAIU) between two processes.
#+begin_src C :results output :eval never-export
/* gcc -o pipe3-example pipe3-example.c && ./pipe3-example
This example creates a "bidirectional" pipe.
A pipe enables unidirectional data channel between processes.
Using two pipes allows two processes to each send and receive data.
The pipe() Linux system command (man 2 pipe) says,
"The array pipefd is used to return two file descriptors referring
to the ends of the pipe. pipefd[0] refers to the read end of the
pipe. pipefd[1] refers to the write end of the pipe. Data written
to the write end of the pipe is buffered by the kernel until it is
read from the read end of the pipe."
fd[0] fd[1]
read --------> write
in out
Reading happens been writing. Something is put in the pipe before
it is taken out.
So, given two pipes, A and B, and two processes (c)hild and
(p)arent,
A
cread a[0]<------->a[1] pwrite
pread b[0]<------->b[1] cwrite
B
The child reads from a[0] after the parent writes to a[1].
The parent reads from b[0] after the child writes to b[1].
,*/
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
int
main ()
{
int pipe_A[2];
int pipe_B[2];
ssize_t size_parent_read;
ssize_t size_child_read;
char parent_buffer[100];
memset (parent_buffer, '\0', 100);
char child_buffer[100];
memset (child_buffer, '\0', 100);
if (pipe (pipe_A) == 0 && pipe (pipe_B) == 0)
{
pid_t pid = fork ();
if (pid == (pid_t) - 1)
{
fprintf (stderr, "Fork failed");
exit (EXIT_FAILURE);
}
else if (pid > (pid_t) 0)
{ // Parent
printf ("Parent writing...\n");
write (pipe_A[1], "hello", sizeof("hello"));
size_parent_read = read (pipe_B[0], parent_buffer, 99);
printf ("Parent received: '%s'\n", parent_buffer);
}
else if (pid == (pid_t) 0)
{ // Child
size_child_read = read (pipe_A[0], child_buffer, 99);
printf ("Child received: '%s'\n", child_buffer);
printf ("Child writing...\n");
write (pipe_B[1], "world", sizeof("world"));
}
}
exit (EXIT_SUCCESS);
}
#+end_src
#+RESULTS:
: Child received: 'hello'
: Child writing...
: Parent writing...
: Parent received: 'world'
- https://unix.stackexchange.com/a/4132
- https://tldp.org/LDP/lpg/node10.html#SECTION00721000000000000000
- https://www.youtube.com/watch?v=07Q9oqNLXB4&pp=ygUZYnJpYW4gd2lsbCB0ZXJtaW5hbHMgdW5peA%3D%3D
- Operating Systems Design and Implementation, Tanenbaum 1997
- http://www.haifux.org/lectures/86-sil/kernel-modules-drivers/node10.html
- https://web.archive.org/web/20231230182026/https://raw.githubusercontent.com/torvalds/linux/master/Documentation/admin-guide/devices.rst
- https://web.archive.org/web/20231230204511/https://raw.githubusercontent.com/torvalds/linux/master/Documentation/admin-guide/devices.txt
- https://web.archive.org/web/20231208020059/https://www.informit.com/articles/article.aspx?p=397655&seqNum=6
- https://www.rkoucha.fr/tech_corner/pty_pdip.html
** How blocks execute to obtain results
Shell blocks have two basic execution models depending on whether it's
a session or not (that is, a persistent environment or a temporary
environment). In each case, a subprocess call is made to the
corresponding shell program.
For non-sessions, the final lisp call is to =call-process= (via
=process-file=). This runs the shell in a synchronous process. Stdin
and stderr are output to a buffer when the process completes. The
output is scraped, cleaned, and returned for the block results.
For sessions, the final lisp call is to =process-send-string= (via
=comint-send-string=). When the first session block is run, a process
buffer is created to collect output from a synchronous process (see
[[https://www.gnu.org/software/emacs/manual/html_node/elisp/Output-from-Processes.html][ Output from Processes]]). Block source is sent to the buffer process
along with command to echo delimiters. Output is filtered until the
ending delimiter appears. When the ending delimiter is found, output
is scraped, cleaned, and returned for the block results.
*NOTE* Sessions are currently entangled with async! See [[* "session" and "non-session" vs. "synchronous" and "asynchronous"]["session" and
"non-session" vs. "synchronous" and "asynchronous"]].
| header | final lisp call |
|---------+---------------------|
| shebang | call-process |
| cmdline | call-process |
| stdin | call-process |
| session | process-send-string |
| async | process-send-string |
| default | call-process |
See
https://lists.gnu.org/archive/html/emacs-orgmode/2023-11/msg00242.html
** "session" and "non-session" vs. "synchronous" and "asynchronous"
"Session" means a shell environment is "persistent." Each call is
executed in the same environment. State exists between calls. In the
early-history of Babel, this was called the "imperative" style.
"Non-session" means a shell environment is "temporary." Each call is
executed in an independent environment. State does not exist between
calls. In the early-history of Babel, this was called the
"functional" style.
"synchronous" means that execution prevents the user from editing
the document while results are obtained.
"asynchronous" means that execution does not prevent the user from
editing the document while results are obtained.
These concepts are conflated (or entangled) in org-babel.
+ Why can't non-session blocks run asynchronously?
+ Why not make asynchronous the default?
** Framework versus implementation
Currently, ob-shell and org-babel provide a set of functions and
variables that implement subprocess calls. org-babel, along with
ob-template, imply a framework for creating new integrations with
different applications. However, in practice, are those functions
and variables set up to be composed? Not really.
Asynchronous execution was introduced by ob-python and adopted by
ob-shell. Why can't ob-C execute asynchronously? Because the
implementation is not set up as a framework.
The following hangs Emacs:
#+begin_src C :eval never
while (1) {}
#+end_src
** TODO How many different ways are there to execute a shell block?
Executing a shell block requires starting a [[https://www.gnu.org/software/emacs/manual/html_node/elisp/Processes.html][process]].
Processes are synchronous or asynchronous.
Three primitives exist for making processes:
1. make-process (asynchronous)
2. call-process (synchronous)
3. call-process-region (synchronous)
Output from processes typically go to a buffer. This may be changed
and handled by a filter. =call-process= has an option to directly
send output to a file.
Subprocesses inherent the =default-directory= and the environment
from Emacs. The environment may be changed using
=process-environment=.
There are two types of asynchronous connections: "pty" and "pipe".
"pty" means "pseudoterminal", sometimes called a "pseudotty" (where
"tty" means "teletype"). This is a
pty connections allow for job control
(=C-c=, =C-z=, etc.).
Shell blocks have three forms:
1. a command pipeline
2. shell scripting
3. shebangs
Files paths have two forms:
1. absolute
2. relative ("expanded" (converted to absolute) using =exec-path=)
File paths are converted to absolute paths ("expanded") using
=exec-path=. =exec-path= is initialized using PATH and serves a
similar role within Emacs.
Files may be local or remote.
- By directly passing it to the shell
- Putting the source in a file and running
Processes have several return values:
- success or failure of the process
- stdout of the process
- stderr of the process
*** [0/2] Questions
- [ ] How might viewing ob-shell as a terminal change our approach to
its design?
According to [[* Difference between a "shell" and a "terminal"][definitions of "shell" and "terminal"]], ob-shell is a
terminal.
- [ ] Why does the pipe3-example returns results in a different order
than when run in xfce4-terminal?
* ob-shell
:PROPERTIES:
#+TOC: headlines 2 local
:END:
** [2/9] General obshell: Questions
- [ ] How is the word "shell" used within the =ob-shell= API?
"shell" is the Babel package name. Babel packages have the form
=ob-<language>=. However, "<language>" isn't always one-to-one.
For example, =ob-C= defines functionality for D and C++.
#+begin_example
#+begin_src shell
echo "hi"
#+end_src
#+end_example
It's confusing because it's also possible to call "sh". However,
"sh" is a specific shell application, the Bourne shell!
Unfortunately, "sh" is often linked to bash which confuses this
point.
The "shell" word within the =ob-shell= source code is used as a
common form in which specific shell APIs are based. For example,
=org-babel-shell-initialize= creates functions of the form
"org-babel-execute:template" where "template" is a specific shell.
The specific execute functions call the generic
=org-babel-execute:shell= function.
- [ ] Should =ob-shell= and =ob-eshell= be separate packages?
- [ ] What is the execution path for inline versus non-inline blocks?
Looking at =org-babel-default-header-args= (which is apparently only
for inline blocks) and =org-babel-default-header-args:template=, the
two seem separate. Should they be separate? Could they be
combined? How does this affect things like
=org-babel-shell-results-defaults-to-output=?
- [ ] What is "posh"?
I think it's supposed to mean "Powershell". I don't use it but I've
*never* heard it referred to that way, nor does MicroSoft seem to
use it. If it's something that's in the code base (and therefore
something we "support"), then we should use the correct name for it
(or get rid of it).
NNNNNNOOOOOO....according to the Worg page, it stands for
"Policy-compliant Ordinary SHell".
https://packages.qa.debian.org/p/posh.html
However, =org-babel-shell-set-prompt-commands= has a comment saying
it's Powershell. So which is it?
- [ ] Would it make sense for sessions to have the process buffer
exist without a prompt? Would this fix problems we see of prompts
being returned in the results?
Such a change would likely require updating
=org-babel-comint-with-output=.
- [X] What is meant by "support"?
Not all functionality is consistent nor available for some shells.
This is not necessarily because of technical limitations of the
shell. The biggest culprit is probably Powershell. Obviously, the
software comes with no warranty or guarantee.
However, I think it would be nice to provide as consistent behavior
as possible. So, I would say "support" means "we'll try to make
things consistent but no guarantee :)"
- [ ] Is there a better way to handle the "variable quoting"
1. Convert to string
2. Quote
3. Use quoted version to define a shell variable
This seems reasonable.
- [X] How do (or even do) :hlines and :sep or :separator work with ob-shell?
These are strewn throughout the code base. However, none of them
are documented, nor is their use clear. They should be fully
implemented and documented or removed.
There is *no* mention of the :separator keyword anywhere in the
manual. There is mention of a :sep keyword in the documentation for
Texinfo export. The string ":sep" appears twice in the ob-shell.el
source:
#+begin_example
2 matches for ":sep" in buffer: ob-shell.el
211: (let ((sep (cdr (assq :separator params)))
241: (orgtbl-to-generic var (list :sep (or sep "\t") :fmt echo-var
#+end_example
These appear to perform the same role: specify how to separate items
in a table. However, the syntax is different and (as we shall see)
the functionality not quite implemented.
:hlines twice in in the manual. First, with respect to columnview
blocks and again with Python blocks. columnview blocks are not
source blocks, so while the concept is probably similar, it's not
clear how it translates (if at all) to source blocks. Regarding,
Python, the manual says,
#+begin_quote
In-between each table row or below the table headings, sometimes
results have horizontal lines, which are also known as "hlines".
The 'hlines' argument with the default 'no' value strips such lines
from the input table. For most code, this is desirable, or else
those 'hline' symbols raise unbound variable errors. A 'yes'
accepts such lines, as demonstrated in the following example.
#+end_quote
Adapting the (only) example from the info manual, I can sort of get
an answer:
#+begin_example
#+NAME: many-cols
| a | b | c |
|---+---+---|
| d | e | f |
|---+---+---|
| g | h | i |
#+BEGIN_SRC sh :var tab=many-cols :hlines yes
echo $tab
#+END_SRC
#+RESULTS:
: a b c hline d e f hline g h i
#+BEGIN_SRC sh :var tab=many-cols :hlines yes
echo "$tab"
#+END_SRC
#+RESULTS:
| a | b | c |
| hline | | |
| d | e | f |
| hline | | |
| g | h | i |
#+end_example
Looking at the source code and playing around with it more, here is
an example using :separator and :hlines. When :hlines is "yes", it
requires the use of :hline-string to specify what the hline looks
like, otherwise it defaults to "hline" (as we saw above).
#+begin_example
#+NAME: many-cols
| a | b | c |
|---+---+---|
| d | e | f |
|---+---+---|
| g | h | i |
#+BEGIN_SRC sh :var tab=many-cols :separator -- :hline-string ++ :hlines yes
echo "$tab"
#+END_SRC
#+RESULTS:
| a--b--c |
| ++ |
| d--e--f |
| ++ |
| g--h--i |
#+end_example
If the :separator is a pipe, then the result is the usual table:
#+begin_example
#+NAME: many-cols
| a | b | c |
|---+---+---|
| d | e | f |
|---+---+---|
| g | h | i |
#+BEGIN_SRC sh :var tab=many-cols :separator | :hline-string ++ :hlines yes
echo "$tab"
#+END_SRC
#+RESULTS:
| a | b | c |
| ++ | | |
| d | e | f |
| ++ | | |
| g | h | i |
#+end_example
So, we see that the :hlines and :separator keywords produce some
result. However, result leaves a lot to desire and it appears like
the functionality is not fully implemented. The results are not
meaningful and :hlines requires an undocumented header argument
:hline-string.
- [ ] Do people use :separator outside of Texinfo exports?
** [1/8] General ob-shell: Refactor
- [ ] Make all symbols follow the convention of =org-babel-X= where X
is the specific language. "shell" is the generic name whereas
something like "bash" or "sh" refers to a specific shell
implementation.
There are three names used:
+ org-babel-shell
+ org-babel-sh
+ ob-shell
This module was originally called =ob-sh= and later changed to
=ob-shell=. When this happened, the meaning of the =sh= changed.
Before, it meant a generic shell. "shell" now means a generic
"shell" changing the meaning of "sh". "sh" refers now to a specific
shell (=/bin/sh=) since that's the call that's actually made when
executing. Are "sh" functions specific to the "sh" shell or are
they generic?
It looks like when I submitted my async changes, I kept the
=ob-shell= prefix. My preference is for =ob-shell= since to matches
the filename. However, it shouldn't be there if it's not consistent
with the rest of the file (or babel system). The problem with
introducing =ob-shell= as a prefix is that none of the other babel
files use that convention.
It looks like =ob-C= uses =org-babel-X= where X is the language.
This is the only other Babel module to support multiple languages in
a single library.
- [X] Add file local variable for nameless
-*- nameless-current-name: "org-babel-shell"; -*-
This should be handled by a .dirlocals. It's not proper to have
third party code referenced in the mainline repo. The .dirlocals
should also be local and not part of the repo.
- [ ] Fixed mixed tabs/spaces and remove trailing whitespace
Both issues make preparing and reviewing commits needlessly complex.
Executive decision: use spaces and remove trailing whitespace.
We're not doing embedded work, so there's no point to space saving
with tabs. It should be one or the other. Since the rest of the
code is indented by spaces and the gnu standard is two spaces, I
think we should enforce only spaces.
- [ ] Rearrange definitions. The order of definitions doesn't assist
the process of learning what this library does. Aside from the
entangled =org-babel-shell-initialize= / =org-babel-shell-names=,
order doesn't matter for execution. So, we should optimize the
order for understanding.
*Helper terms defined after usage* Organization of the =ob-shell=
hinders understanding. "Helper functions" are defined at the end of
the file. This makes reading the code difficult; each function is
defined in terms only introduced later on.
*Solution: Reorder function declarations*
1. org-babel-sh-eoe-output
2. org-babel-sh-eoe-indicator
3. org-babel-shell-names
4. org-babel-default-header-args:shell
5. org-babel-shell-results-defaults-to-output
6. org-babel-sh-initiate-session
7. org-babel-sh-var-to-string
8. org-babel-sh-var-to-sh
9. org-babel--variable-assignments:sh-generic
10. org-babel--variable-assignments:bash_array
11. org-babel--variable-assignments:bash_assoc
12. org-babel--variable-assignments:bash
13. org-babel--variable-assignments:fish
14. org-babel-variable-assignments:shell
15. org-babel-prep-session:shell
16. org-babel-load-session:shell
17. org-babel-sh-strip-weird-long-prompt
18. org-babel-sh-evaluate
19. org-babel-execute:shell
20. org-babel-shell-initialize
- [ ] Remove needless helper functions
The way the code is split up doesn't seem to really warrant being
split up. The subdivisions are too small. Most of the helper
functions are only used once. If that's the case, is it really
helpful to break them out separately? My opinion is no. That might
be the case if the function names and docstrings helped
understanding the use and context. In my opinion, they don't. If
they were instead not broken out and appeared in the context they
are used, the context would be clear. The role of the function names
could be satisfied by comments. This may be worth refactoring,
considering it took me several hours to make sense of things.
- [ ] Implement a logical separation of concerns
Aside from the useless helper functions, it's not clear why
functions like =org-babel-execute:shell= and =org-babel-sh-evaluate=
do exactly what they do. There's initiating the process, getting
the inputs, dispatching, and then providing output. All of this
appears to split, ad hoc, between (at least) these two functions.
Review and consider a different break down of functionality. See
the individual functions for details on how this might be done.
- [ ] Fix confusion about what "posh" is
- [ ] Remove :sep, :separator, and :hlines code
First, they are not documented. I doubt they are, or ever have
been, used.
Second, they do not appear to be implemented correctly. Specifying
:hlines along with :hline-string produces a result that doesn't
appear useful.
One option would be to implement these. Doing this would require 1)
understanding the tooling that currently exists and 2) integrating
it into the shell context.
Another option is to remove it until someone requests it.
Given all the problems that exist currently with shells and the
refactoring that needs to happen to make a consistent framework, I
would prefer to simply remove this code.
* ob-shell code analysis
:PROPERTIES:
#+TOC: headlines 2 local
:END:
** DONE Top matter
No commentary, just a standard Emacs header. Included for validation.
#+name: top-matter
#+begin_src emacs-lisp :eval never :tangle no
;;; ob-shell.el --- Babel Functions for Shell Evaluation -*- lexical-binding: t; -*-
;; Copyright (C) 2009-2023 Free Software Foundation, Inc.
;; Author: Eric Schulte
;; Maintainer: Matthew Trzcinski <matt@excalamus.com>
;; Keywords: literate programming, reproducible research
;; URL: https://orgmode.org
;; This file is part of GNU Emacs.
;; GNU Emacs is free software: you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation, either version 3 of the License, or
;; (at your option) any later version.
;; GNU Emacs is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;; GNU General Public License for more details.
;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
;;; Commentary:
;; Org-Babel support for evaluating shell source code.
;;; Code:
#+end_src
** DONE Requires
#+name: requires
#+begin_src emacs-lisp :eval never :tangle no
(require 'org-macs)
(org-assert-version)
(require 'ob)
(require 'org-macs)
(require 'shell)
(require 'cl-lib)
#+end_src
*** [2/2] Questions
- [X] What are each of the requires for and are they necessary?
It appears like =ob= and =shell= are the only requires
needed.
Notice that =ob-macs= is given twice. The first require for
=org-macs= is likely for =org-assert-version=. The second is just
extra as far as I can tell.
=ob= loads features from other Org Babel files. In fact, it loads
=org-macs= and calls =org-assert-version=. This makes =org-macs=
even more redundant.
=shell= is needed for the =shell= command used in
=org-babel-sh-initiate-session=.
=cl-lib= appears unused. It was introduced to =ob-shell.el= in 2016
with commit 0f7df327, replacing the deprecated =cl= package. =cl= was
fully removed from Emacs in version 27.
#+begin_src sh :var emacs_etc_dir=(identity data-directory) :eval never
grep -r "'cl' package is now officially deprecated" "$emacs_etc_dir" | cut -f 2 -d ":"
#+end_src
#+RESULTS:
: ** The 'cl' package is now officially deprecated in favor of 'cl-lib'.
Regarding backwards compatiblity, it appears that =cl= was never used
(except for, perhaps, a brief period (that I could not find)).
If =ob-shell.el= relied on =cl=, we would expect it was updated
appropriately that when =cl= was deprecated. Since Emacs no longer
contains the code for =cl=, if =ob-shell.el= required it, then it
should fail. Since it doesn't, it seems safe to assume that all =cl=
calls were updated to use =cl-lib=. The =cl-lib= uses a "cl-" prefix.
However, we don't see that anywhere in =ob-shell.el=. We only see the
require itself.
#+begin_src sh :results output :eval never
grep "cl-" /tmp/org-mode/lisp/ob-shell.el
#+end_src
#+RESULTS:
: (require 'cl-lib)
It appears that =cl-lib= is not needed.
Fun fact: the =cl-lib= has only ever been required (never used,
according to the prefix).
#+begin_src sh :results output :eval never
cd /tmp/org-mode/
git grep "cl-" $(git rev-list --all -- lisp/ob-shell.el) -- lisp/ob-shell.el
#+end_src
#+RESULTS:
#+begin_example
1a1f45d2368b97dc6ea206edcf4202a2d2f359cc:lisp/ob-shell.el:(require 'cl-lib)
c42cdcda4789122c3c8ed477365b9369bdf0af87:lisp/ob-shell.el:(require 'cl-lib)
6d85f851b3cf47abaf5197fe07bd793b5cf0d5dc:lisp/ob-shell.el:(require 'cl-lib)
f5467b53ec9be02ceaca8494e58090b3972fe2ac:lisp/ob-shell.el:(require 'cl-lib)
39de4a1848d12b1be929853bf884ec04e121d9f0:lisp/ob-shell.el:(require 'cl-lib)
f7aa8c19f5170dbf09538686fb569f9b60acbd6c:lisp/ob-shell.el:(require 'cl-lib)
ecb62e2e317b1a4b5b8a6c0f111ed7ef18413040:lisp/ob-shell.el:(require 'cl-lib)
80d1bc63fff8bb0f92ab4dab9c3b534ccb4d4d69:lisp/ob-shell.el:(require 'cl-lib)
93339de71b3e2aaa4f0cbaf13ed8bcbc3fa448f3:lisp/ob-shell.el:(require 'cl-lib)
96a402780c0fd06ca015b6a31a96909d3ab11d23:lisp/ob-shell.el:(require 'cl-lib)
e0815d75457e4a86f4940631729c98e318bc8231:lisp/ob-shell.el:(require 'cl-lib)
e8ceb4a2cb3e4901320bc0d577f6bf6fab237879:lisp/ob-shell.el:(require 'cl-lib)
801c93638ab4b6616567035c6438ad229bc14c1d:lisp/ob-shell.el:(require 'cl-lib)
8151d52574f525fd922aadc43544688004cb7d14:lisp/ob-shell.el:(require 'cl-lib)
e81a094383a2e06a80e6dfb5ed0ca1ece44026f2:lisp/ob-shell.el:(require 'cl-lib)
e36c3cc21b8b1471e1f7928a118de693819c3f12:lisp/ob-shell.el:(require 'cl-lib)
a35d163685908386833a3d549ed110931bf3915a:lisp/ob-shell.el:(require 'cl-lib)
13d97ee18c3bd23ccd04b21e0e1cd78070874cdb:lisp/ob-shell.el:(require 'cl-lib)
f9ea6c61ed716184a17a04399f408c9b922dbd53:lisp/ob-shell.el:(require 'cl-lib)
5a229cbc44e67ac323b7c5fd46a422089838a6f1:lisp/ob-shell.el:(require 'cl-lib)
99eafe3787e03ac31ab12b3cc28f7832ef8b0987:lisp/ob-shell.el:(require 'cl-lib)
07c6b11258e4ce83b78a023e63412183cf9c4c9f:lisp/ob-shell.el:(require 'cl-lib)
612f4db0907e6eb0e332c83704d19eb78943123e:lisp/ob-shell.el:(require 'cl-lib)
2f53429413983106aec252e18a8b018649e6b1e0:lisp/ob-shell.el:(require 'cl-lib)
08428fed7813f25afa9c2df9a0553eb95e264bf0:lisp/ob-shell.el:(require 'cl-lib)
0c02928eb25c3ec867e01adb05b481cddb3992a6:lisp/ob-shell.el:(require 'cl-lib)
4533d783c23f89232f7078e6c761541180a70126:lisp/ob-shell.el:(require 'cl-lib)
6a1f6ee1f80cd9b4fe7d813105577a12fac06c38:lisp/ob-shell.el:(require 'cl-lib)
ff5fc050d36db9d0cfad208e7626320278d26c0a:lisp/ob-shell.el:(require 'cl-lib)
9c611fd8ac3483576a5c5871f05aecad1afafa5c:lisp/ob-shell.el:(require 'cl-lib)
f584d37a67c7e199957c040973dd85e9606e9469:lisp/ob-shell.el:(require 'cl-lib)
dcf179663626872bae88ff36f58ad1c745f96b45:lisp/ob-shell.el:(require 'cl-lib)
4afb7f747b16ff32086bc3ac937380e774d6e54c:lisp/ob-shell.el:(require 'cl-lib)
bb035512464fcb7306d7360b3ac2ba2c2f2e8f23:lisp/ob-shell.el:(require 'cl-lib)
f57df8fc74df1b76aca35bcf0315636b4d3071f3:lisp/ob-shell.el:(require 'cl-lib)
b289a65be71cc5ec8df393928f84df90787ced55:lisp/ob-shell.el:(require 'cl-lib)
3e1641ef0aa01ae39f90a3cb532136484de617bb:lisp/ob-shell.el:(require 'cl-lib)
ff0dcf52a5e258af82f3eaf1f8ec3b7cd022cb6b:lisp/ob-shell.el:(require 'cl-lib)
b8df40eccc6ce19455f2bdce50f6d6f3f913d544:lisp/ob-shell.el:(require 'cl-lib)
713f785017e908333caddd244fcc685745e78539:lisp/ob-shell.el:(require 'cl-lib)
93f90c8412d61930aa367d8ca61e10ff44f4090a:lisp/ob-shell.el:(require 'cl-lib)
140aacbf2f57e207a33417bb446060de52a4b312:lisp/ob-shell.el:(require 'cl-lib)
79650ffbbdac17e9dc016571d3a25f7f32737fd6:lisp/ob-shell.el:(require 'cl-lib)
6b52bc6a2153a8c60d23d0915246a60d3ee37a52:lisp/ob-shell.el:(require 'cl-lib)
0dc3811a7a3f441564db35ec6b068751222c6544:lisp/ob-shell.el:(require 'cl-lib)
250304bd2eb2449bb1fccd80b8efb6e25c6aa901:lisp/ob-shell.el:(require 'cl-lib)
0f7df32711170906a47594cb2a397c6e5d9c46b7:lisp/ob-shell.el:(require 'cl-lib)
#+end_example
- [X] Should comments tell what the requires are for?
I feel like yes for the =org-assert-version= in =ob=, but no for
=shell=. Maybe just no since =ob= is used for things other than
=org-assert-version=.
*** [0/2] Refactoring
- [ ] Remove all requires except =ob= and =shell=.
- [ ] Add comment that =ob= should be listed first in order to check
version
** DONE Function forward declares
#+name: function-forward-declares
#+begin_src emacs-lisp :eval never :tangle no
(declare-function org-babel-comint-in-buffer "ob-comint" (buffer &rest body)
t)
(declare-function org-babel-comint-wait-for-output "ob-comint" (buffer))
(declare-function org-babel-comint-buffer-livep "ob-comint" (buffer))
(declare-function org-babel-comint-with-output "ob-comint" (meta &rest body)
t)
(declare-function orgtbl-to-generic "org-table" (table params))
#+end_src
These declarations exist to satisfy the byte compiler (or to help it
find the definitions of functions defined in other files).
*** [0/0] Questions
*** [0/0] Refactor
** DONE defvar org-babel-sh-eoe-output
#+name: org-babel-sh-eoe-output
#+begin_src emacs-lisp :eval never :tangle no
(defvar org-babel-sh-eoe-output "org_babel_sh_eoe"
"String to indicate that evaluation has completed.")
#+end_src
Token used by =org-babel-comint-with-output= (within
=org-babel-sh-evaluate=) to indicate that execution has completed.
Used only by sessions. See [[* How blocks execute to obtain results][How blocks execute to obtain results]].
*** [1/3] Questions
- [ ] Should this be renamed to indicate that it's only used by
session? Maybe update the docstring at least?
- [X] What is "eoe"?
Probably "end of execution" or "end of evaluation".
- [ ] Why is it called "output"?
Maybe because this string is what's sent to the shell process
(output)? However, that's not entirely true, as
=org-babel-sh-eoe-indicator= is also sent. Maybe because it's what
=org-babel-comint-with-output= searches for in the output?
*** [0/1] Refactor
- [ ] Rename this
It needs to have "sh" removed. EOE should be spelled out or a
different term used. "output" should be made more clear.
** DONE defvar org-babel-sh-eoe-indicator
#+name: org-babel-sh-eoe-indicator
#+begin_src emacs-lisp :eval never :tangle no
(defvar org-babel-sh-eoe-indicator "echo 'org_babel_sh_eoe'"
"String to indicate that evaluation has completed.")
#+end_src
Shell command used by =org-babel-comint-with-output= within
=org-babel-sh-evaluate=. It echoes =org-babel-sh-eoe-output= within
the comint, that being the token used to indicate that execution has
completed. See [[* How blocks execute to obtain results][How blocks execute to obtain results]]. It is used only
with sessions.
#+begin_example
,#+begin_src sh :session *example*
echo "hello, world"
,#+end_src
sh-5.1$ PROMPT_COMMAND=;PS1="org_babel_sh_prompt> ";PS2=
org_babel_sh_prompt> echo "hello, world" <------ This is the org-babel-sh-eoe-indicator
echo 'org_babel_sh_eoe'
hello, world
org_babel_sh_prompt> org_babel_sh_eoe
org_babel_sh_prompt>
#+end_example
*** [2/3] Questions
- [ ] What would it look like to explicitly define
=org-babel-sh-eoe-indicator= in terms of =org-babel-sh-eoe-output=?
If it's just (format "echo '%s'" org-babel-sh-eoe-output), then that
avoids the (minor) problem of a typo creating a problem.
- [X] Is this string the same for all the supported shells?
No. Posh (powershell) uses something like "Write-Host". See
https://stackoverflow.com/a/707666
- [X] Does it matter that posh uses a different command to print to
the prompt?
Yes, it does if we want to support powershell. Personally, I'd
rather support cmd.exe than powershell.
*** [0/2] Refactor
- [ ] Rename this
Remove "sh" and "eoe"
- [ ] Fix docstring
This is *not* the "string to indicate that evaluation has
completed". It is the command used to call the string used to
indicate that evaluation has completed.
** DONE defvar org-babel-sh-prompt
#+name: org-babel-sh-prompt
#+begin_src emacs-lisp :eval never :tangle no
(defvar org-babel-sh-prompt "org_babel_sh_prompt> "
"String to set prompt in session shell.")
#+end_src
Prompt for shell sessions. This is set during
=org-babel-sh-initiate-session= using the
=org-babel-shell-set-prompt-commands= corresponding to the block
language.
In practice, the prompt is given by the =comint-prompt-regexp= which
looks like:
#+begin_example
(concat "^" (regexp-quote org-babel-sh-prompt) " *")
(concat "^" (regexp-quote "org_babel_sh_prompt> ") " *")
"^org_babel_sh_prompt> *"
#+end_example
=comint-prompt-regexp= is required by =org-babel-comint-with-output=.
*** [3/4] Questions
- [X] Why do we set the prompt?
Because of how =org-babel-comint-with-output= uses
=comint-prompt-regexp=.
- [X] Many issues relate back to prompt filtering. Would it benefit
us to make the prompt empty?
Possibly. However, the implementation of
=org-babel-comint-with-output= limits us. It searches on the
=comint-prompt-regexp=. If that's empty, then it will probably
search more than we want to (not per line or per call).
- [X] Does =org-babel-comint-with-output= use a filter? If not, why
not?
=org-babel-comint-with-output= is a deep rabbit hole. It was
created in 2009 by Eric Schulte. It appears that originally plain
filters were used. The =org-babel-comint-with-output= macro was
introduced to manage the filters. The macro was simple then. It is
no longer simple (although Ihor has recently simplified it (thank
you!)). This will require more research.
- [ ] How does =org-babel-comint-with-output= work?
Why is it a macro? Does it need to be a macro?
*** [0/1] Refactor
- [ ] Indicate that this is required by =org-babel-comint-with-output=
Or better yet, change how =org-babel-comint-with-output= functions
so that this is not necessary.
** DONE defvar ob-shell-async-indicator
#+name: ob-shell-async-indicator
#+begin_src emacs-lisp :eval never :tangle no
(defconst ob-shell-async-indicator "echo 'ob_comint_async_shell_%s_%s'"
"Session output delimiter template.
See `org-babel-comint-async-indicator'.")
#+end_src
Similar to =org-babel-sh-eoe-indicator=.
Used by =org-babel-comint-async-delete-dangling-and-eval= within
=org-babel-sh-evaluate=.
Template for a shell command. Used to construct strings that delimit
results within the comint process. Used only with async sessions.
See [[* How blocks execute to obtain results][How blocks execute to obtain results]].
=org-babel-sh-evaluate= formats two indicators, a "start" and an
"end", along with a [[https://en.wikipedia.org/wiki/Universally_unique_identifier][Universally Unique IDentifier (UUID)]]. The UUID is
a placeholder used until results are available. When the process
returns, the UUID is replaced with the results.
*** [0/0] Questions
*** [0/1] Refactor
- [ ] Change the "namespace" from "ob-shell-"
** DONE defcustom org-babel-shell-names
#+name: org-babel-shell-names-2
#+begin_src emacs-lisp :eval never :tangle no
(defcustom org-babel-shell-names
'("sh" "bash" "zsh" "fish" "csh" "ash" "dash" "ksh" "mksh" "posh")
"List of names of shell supported by babel shell code blocks.
Call `org-babel-shell-initialize' when modifying this variable
outside the Customize interface."
:group 'org-babel
:type '(repeat (string :tag "Shell name: "))
:set (lambda (symbol value)
(set-default-toplevel-value symbol value)
(org-babel-shell-initialize)))
#+end_src
Used to define shell languages supported by =ob-shell=. Each item in
the list corresponds to the shell binary.
*** [0/1] Questions
- [ ] What would it take to add Windows cmd to this list?
Probably being maintainer or contributor to cmdproxy.c :)
*** [1/1] Refactor
- [X] See Refactor for =org-babel-shell-initialize=.
As a long term personal goal, I'd like to support Windows cmd. This
would be helpful for me at work. It would also be a good pathway for
someone to free software.
** DONE Variable declarations
The following three forms automate variable definitions for the
various supported shells.
#+name: org-babel-default-header-args:shell
#+begin_src emacs-lisp :eval never :tangle no
(defvar org-babel-default-header-args:shell '())
#+end_src
This comes from [[https://git.sr.ht/~bzg/worg/tree/master/item/org-contrib/babel/ob-template.el#L83][=ob-template=]]. The only documentation for it says,
"optionally declare default header arguments for this language". The
variable is not used anywhere in the ob-shell code base. Maybe it's
intended for end-users rather than developers? Searching for
"org-babel-default-header-args" within the Org source, I see that
other languages use it in their code base. This seems like something
that should stay and something that should be documented.
Later, =org-babel-shell-initialize= will define a similar variable for
each of the other supported shells.
#+name: org-babel-shell-names-1
#+begin_src emacs-lisp :eval never :tangle no
(defvar org-babel-shell-names)
#+end_src
This is a variable to hold the list of shell names supported by
=ob-shell=. =org-babel-shell-names= is actually defined later on.
This definition acts as a forward declaration so that
=org-babel-initialize= works.
#+name: org-babel-shell-set-prompt-commands
#+begin_src emacs-lisp :eval never :tangle no
(defconst org-babel-shell-set-prompt-commands
'(;; Fish has no PS2 equivalent.
("fish" . "function fish_prompt\n\techo \"%s\"\nend")
;; prompt2 is like PS2 in POSIX shells.
("csh" . "set prompt=\"%s\"\nset prompt2=\"\"")
;; PowerShell, similar to fish, does not have PS2 equivalent.
("posh" . "function prompt { \"%s\" }")
;; PROMPT_COMMAND can override PS1 settings. Disable it.
;; Disable PS2 to avoid garbage in multi-line inputs.
(t . "PROMPT_COMMAND=;PS1=\"%s\";PS2="))
"Alist assigning shells with their prompt setting command.
Each element of the alist associates a shell type from
`org-babel-shell-names' with a template used to create a command to
change the default prompt. The template is an argument to `format'
that will be called with a single additional argument: prompt string.
The fallback association template is defined in (t . \"template\")
alist element.")
#+end_src
Introduced as a result of the issue [[https://list.orgmode.org/CKK9TULBP2BG.2UITT31YJV03J@laptop/T/#mc8e3ca2f5f1b9a94040a68b4c6201234b209041c][babel output seems to drop
anything before % (in session)]].
This variable is only used with blocks containing the :session header.
It is used in =org-babel-sh-initiate-session= with
=org-babel-sh-prompt= to reset the prompt. Each command in
=org-babel-shell-set-prompt-commands= removes the prompt and replaces
it with whatever is in =org-babel-sh-prompt=, for the first two levels
of prompts (if the shell supports multiple shell level prompts).
Running the commands alone looks like this:
#+begin_example
ahab@pequod ~ $ guix shell fish
ahab@pequod ~ [env]$ fish
ahab@pequod ~> function fish_prompt\n\techo \"%s\"\nend
ahab@pequod ~> function fish_prompt
echo "%s"
end
%secho "hi"
hi
%s
#+end_example
#+begin_example
ahab@pequod ~ $ guix shell tsch
ahab@pequod ~ [env]$ tcsh
> set prompt="%s"
set prompt2=""
exit
#+end_example
It replaces the prompt with "%s". The "%s" is a format expression.
When =org-babel-shell-set-prompt-commands= is run by
=org-babel-sh-initiate-session=, the command for corresponding shell
gets fed into the lisp =format= command. For example, the default
(which corresponds to bash) is
#+begin_example
"PROMPT_COMMAND=;PS1=\"%s\";PS2="
#+end_example
this is used by format:
#+begin_example
(format "PROMPT_COMMAND=;PS1=\"%s\";PS2=" org-babel-sh-prompt)
#+end_example
this results in the following being called in a shell:
#+begin_example
"PROMPT_COMMAND=;PS1=\"org_babel_sh_prompt> \";PS2="
#+end_example
*** [2/2] Questions
- [X] What is the required form for =org-babel-default-header-args:shell=?
See =org-babel-default-header-args=:
#+begin_quote
Default arguments to use when evaluating a source block.
This is a list in which each element is an alist. Each key
corresponds to a header argument, and each value to that header's
value. The value can either be a string or a closure that evaluates
to a string.
#+end_quote
- [X] Why is =org-babel-shell-names= defined with defvar only to be
later defined with defcustom?
It looks to me like this is the reason: =org-babel-shell-initialize=
(explained below) requires =org-babel-shell-names=. However,
=org-babel-shell-initialize= isn't called until
=org-babel-shell-names= is actually defined using defcustom. When
that happens, defcustom calls =org-babel-shell-initialize=
(according to its :set feature).
*** [0/4] Refactor
- [ ] Document =org-babel-default-header-args:shell= and the other
=org-babel-default-header-args:template= variables. Maybe say that
it is different for inline source blocks, too (see
=org-babel-default-header-args=).
- [ ] Find a better implementation than the =org-babel-shell-names=
defined twice, along with =org-babel-shell-initialize=.
This looks like a problem to me. First, it's not documented. It
should be documented. My guess as why it's not documented is
because, even though defvar supports docstrings, the later call to
defcustom clobbers it. At the very least, there should be a lisp
comment explaining what's going on.
Since this is all tied up with =org-babel-shell-initialize=, see the
comments there.
- [ ] Better document =org-babel-shell-set-prompt-commands=. The way
it's written doesn't explain why it exists or what it's for. It
explains a little about how it's used (the implementation (which may
change)). The comments within it are helpful, but only if you
understand what the list is for. Otherwise, it seems really to deal
with setting PS2.
- [ ] Make =org-babel-shell-set-prompt-commands= private. This is not
something end-users should modify.
** DONE defcustom org-babel-shell-results-defaults-to-output
#+name: org-babel-shell-results-defaults-to-output
#+begin_src emacs-lisp :eval never :tangle no
(defcustom org-babel-shell-results-defaults-to-output t
"Let shell execution defaults to \":results output\".
When set to t, use \":results output\" when no :results setting
is set. This is especially useful for inline source blocks.
When set to nil, stick to the convention of using :results value
as the default setting when no :results is set, the \"value\" of
a shell execution being its exit code."
:group 'org-babel
:type 'boolean
:package-version '(Org . "9.4"))
#+end_src
Used to define the results header default.
I'm confused by this because it feels like there are several places in
the codebase which define "defaults" rather than one. This concerns
me because it's not clear how they're incorporated into the broader
system. The concern is basically having a state machine spread across
the system and running into issues where one clobbers the other.
*** [0/2] Questions
- [ ] How does this differ from =org-babel-default-header-args:shell=?
The =org-babel-default-header-args:shell= isn't used directly
anywhere in =ob-shell=. =org-babel-default-header-args:<language>=
is used within =org-babel-get-src-block-info=.
- [ ] Is this necessary or is it duplicating
=org-babel-default-header-args:shell=?
*** [0/1] Refactor
- [ ] Fix grammar of docstring.
#+begin_example
"Make shell execution default to \":results output\""
#+end_example
** DONE defun org-babel-sh-initiate-session
#+name: org-babel-sh-initiate-session
#+begin_src emacs-lisp :eval never :tangle no
(defun org-babel-sh-initiate-session (&optional session _params)
"Initiate a session named SESSION according to PARAMS."
(when (and session (not (string= session "none")))
(save-window-excursion
(or (org-babel-comint-buffer-livep session)
(progn
(shell session)
;; Set unique prompt for easier analysis of the output.
(org-babel-comint-wait-for-output (current-buffer))
(org-babel-comint-input-command
(current-buffer)
(format
(or (cdr (assoc (file-name-nondirectory shell-file-name)
org-babel-shell-set-prompt-commands))
(alist-get t org-babel-shell-set-prompt-commands))
org-babel-sh-prompt))
(setq-local comint-prompt-regexp
(concat "^" (regexp-quote org-babel-sh-prompt)
" *"))
;; Needed for Emacs 23 since the marker is initially
;; undefined and the filter functions try to use it without
;; checking.
(set-marker comint-last-output-start (point))
(get-buffer (current-buffer)))))))
#+end_src
The main purpose of this function is to start a [[https://www.gnu.org/software/emacs/manual/html_node/elisp/Process-Buffers.html][Process Buffer]] and set
it up so that we can scrape output from it.
When a initiating a new session, create a new process buffer with
=shell=. Loop until the prompt appears. When the prompt appears, the
shell is ready to receive input. Then, change the shell prompt and
update the =comint-prompt-regexp=. Return the buffer associated with
process.
Note that for Emacs 23 compatibility, we must manually set
=comint-last-output-start=.
*** [1/2] Questions
- [X] Why doesn't the calling function decide to execute this
function?
When a block is executed, =org-babel-execute-src-block= is called.
The language is parsed from the header line and the appropriate
dispatch function called (for example =org-babel-execute:bash=).
All dispatch functions call =org-babel-execute:shell=. The first
thing it does is check for whether to use a session. However, the
logic for this is strange: the check happens in the initiate
function =org-babel-sh-initiate-session=, not the calling function,
=org-babel-execute:shell=. Should it happen in the caller?
One way to consider which is the better way to call the function is
to consider the information used to make the decision. The caller
will have all the necessary information. For the callee to make the
decision, it will need that information. Therefore, having the
callee decide whether or not to run requires passing all the
necessary information to it so that it can make that decision. This
seems unnecessary: there is no obvious benefit to passing that
information around. Passing information requires memory, it
requires manually writing it in as an argument. Which information
is necessary? What if more information is needed yet isn't there?
When information is passed to the callee to make the decision to run
or not, it may lead to different data structures or in passing
around "the world". I think this is what we see with the _params
parameter (which is a "world" variable that isn't used.)
Fundamentally, having the function decide whether or not to run
makes no sense. If you're calling it, you've decided it should run.
Why have the function second-guess you? Doing this splits a single
logical choice into two lexical locations.
- [ ] Why is _params a parameter?
It's not used. Was it used in a previous version?
*** [0/2] Refactor
- [ ] Move the decision to run the function to the caller
- [ ] Remove the _params parameter
** DONE defun org-babel-sh-var-to-string
#+name: org-babel-sh-var-to-string
#+begin_src emacs-lisp :eval never :tangle no
(defun org-babel-sh-var-to-string (var &optional sep hline)
"Convert an elisp value to a string."
(let ((echo-var (lambda (v) (if (stringp v) v (format "%S" v)))))
(cond
((and (listp var) (or (listp (car var)) (eq (car var) 'hline)))
(orgtbl-to-generic var (list :sep (or sep "\t") :fmt echo-var
:hline hline)))
((listp var)
(mapconcat echo-var var "\n"))
(t (funcall echo-var var)))))
#+end_src
Convert a "value" to a string. First define a helper function
=echo-var= to keep strings as strings and convert anything else using
the "%S" option of =format=. You can read the documentation for
those, but it basically boils down to return primitive types like
numbers as-is and evaluate symbols first.
=org-babel-sh-var-to-string= behaves like follows:
#+begin_example
,#+name: my-table
| col1 | col2 |
|-------+--------|
| test1 | test 2 |
,#+begin_src sh :stdin my-table :results output :eval never-export
cat
,#+end_src
,#+RESULTS:
: col1 col2
: test1 test 2
#+end_example
For a table, =var= comes in as:
#+begin_example
(("col1" "col2") hline ("test1" "test 2"))
#+end_example
This is the result of =org-babel-ref-resolve=. Unfortunately,
=org-babel-ref-resolve= doesn't document what that value will look
like. This is what a table apparently looks like. A raw Org table is
converted to a list of lists such that rows are nested lists or the
symbol 'hline.
The first condition of the cond checks for a table by way of "is it a
list and is the first element another list or the symbol 'hline?"
When true, the table contents are passed to =orgtbl-to-generic= which
converts it to a string of some sort according to some given
parameters. In our case, it's hard coded as tab delimited strings and
no hlines.
The second condition checks for a list:
#+begin_example
#+NAME: my-list
- simple
- list
- without
- nesting
#+begin_src sh :results output :stdin my-list :eval never-export
cat
#+end_src
#+RESULTS:
: simple
: list
: without
: nesting
#+end_example
In the case of a list, =var= comes through as a list of items:
#+begin_example
("simple" "list" "without" "nesting")
#+end_example
=org-babel-sh-var-to-string= concatenates each element of the list
together with a newline.
Finally, the third condition is a catch-all that simply does the
conversion:
#+begin_example
#+name: text
The third case
#+begin_src sh :results output :stdin text :eval never-export
cat
#+end_src
#+RESULTS:
: The third case
#+end_example
Used in =org-babel-execute:shell= for stdin (to allow named tables as
inputs). Also used in =org-babel-sh-var-to-sh=.
*** [0/1] Questions
- [ ] Should the default separator be a tab or four spaces?
*** [0/6] Refactor
- [ ] Remove extra space between 'var' and the list of parameters in
the call to =orgtbl-to-generic=.
- [ ] :hline parameter not used and defaults to nil. Kill from parameter list.
- [ ] Change "var" to "val" (or something better)
Function converts a "value" to a string, per the docstring. This is
true. It converts some lisp expression (which is a value). Why then
is it called 'var'? It appears to have nothing to do with a 'var'!
- [ ] Change 'echo-var' to something better
It doesn't echo. And what is var? Maybe call it 'convert-to-string'.
- [ ] Document the conditional cases
Say something about the expected form. Say that one is a table, the
other a list, the other a value. Say that it is expected to be used
with a name block.
- [ ] Document arguments
** DONE defun org-babel-sh-var-to-sh
#+name: org-babel-sh-var-to-sh
#+begin_src emacs-lisp :eval never :tangle no
(defun org-babel-sh-var-to-sh (var &optional sep hline)
"Convert an elisp value to a shell variable.
Convert an elisp var into a string of shell commands specifying a
var of the same value."
(concat "'" (replace-regexp-in-string
"'" "'\"'\"'"
(org-babel-sh-var-to-string var sep hline))
"'"))
#+end_src
This function quotes its argument. The docstring inaccurately says,
"Convert an elisp value to a shell variable." This is not true. Yes,
it's used in a process of converting a lisp value to a shell variable.
However, that's not what this function does.
This function does two things: 1) each ' quote in the argument is
replaced by '"'"' and 2) the whole result is surrounded in single
quotes. Step 1 is done to prevent single-quotes contained in the
argument from breaking the outer quotes added by step 2. Step 2 is
done to ensure the quoted argument is interpreted as a literal.
This is easiest to see with a non-trivial example:
#+begin_example
(org-babel-sh-var-to-sh "it'd") -> 'it'\"'\"'d'
#+end_example
See how the single quote is replaced by '"'"'? In this case, the
double-quotes are escaped with backslashes. It's maybe easier to see
if we separate the quoted single-quote with spaces:
#+begin_example
'it '\"'\"' d'
#+end_example
All of this is made even more confusing because when these values are
viewed in Emacs, double-quotes are added around it (again to represent
literal).
Most likely this quoting is shell specific. I assume it only applies
to Bourne-like shells. In truth, I'm not sure.
This function is a helper used in the following:
- =org-babel--variable-assignments:sh-generic=
- =org-babel--variable-assignments:fish=
- =org-babel--variable-assignments:bash_array=
- =org-babel--variable-assignments:bash_assoc=
#+name:pointless-sep-hline
| this | is | a | test |
|------+----+-----+---------|
| of | a | ref | resolve |
#+begin_src emacs-lisp :tangle no :eval never
(org-babel-sh-var-to-sh (org-babel-ref-resolve "pointless-sep-hline") "--" "++")
#+end_src
#+RESULTS:
: 'this--is--a--test
: ++
: of--a--ref--resolve'
*** [2/3] Questions
- [ ] Is this quoting shell specific?
- [X] Does Emacs provide a function for this already?
It seems not.
- [X] How do :hlines and :separator work? Do they even work?
Yes, but not well. See [[General obshell: Questions][General obshell: Questions]].
*** [0/2] Refactor
- [ ] Rename
Quoting is fiddly and this exact process appears necessary for
several different shells. It makes sense to define it once to avoid
needing to make multiple updates or need to debug multiple
functions.
- [ ] Document arguments
** DONE defun org-babel--variable-assignments:sh-generic
#+name: org-babel--variable-assignments:sh-generic
#+begin_src emacs-lisp :eval never :tangle no
(defun org-babel--variable-assignments:sh-generic
(varname values &optional sep hline)
"Return a list of statements declaring the values as a generic variable."
(format "%s=%s" varname (org-babel-sh-var-to-sh values sep hline)))
#+end_src
Create a string like "%s=%s" for use in creating variables. See [[https://www.gnu.org/software/bash/manual/html_node/Shell-Parameters.html][Shell
Parameters]].
*** [0/2] Questions
- [ ] What is meant by "generic variable"?
According to the docstring, "generic" doesn't appear to mean
"across" shells. Instead, it means something else. The word
"generic" only shows up twice in the Bash manual and only then in
the GNU Free Documentation License.
If this is intended for use across shells, does that work? Looking
at you, Powershell (if that's even supported).
- [ ] Should variables be first 'unset'?
'Unset' is called first for
=org-babel--variable-assignments:bash_assoc=. Why is it not called
here?
*** [0/2] Refactor
- [ ] Remove sep and hline arguments
See arguments elsewhere.
- [ ] Document arguments
** DONE defun org-babel--variable-assignments:bash_array
#+name: org-babel--variable-assignments:bash_array
#+begin_src emacs-lisp :eval never :tangle no
(defun org-babel--variable-assignments:bash_array
(varname values &optional sep hline)
"Return a list of statements declaring the values as a bash array."
(format "unset %s\ndeclare -a %s=( %s )"
varname varname
(mapconcat
(lambda (value) (org-babel-sh-var-to-sh value sep hline))
values
" ")))
#+end_src
Create a string for defining Bash indexed arrays.
Indexed arrays are lists of values accessed by their position in the
list. See [[https://www.gnu.org/software/bash/manual/html_node/Arrays.html][Arrays]].
=unset= is a Bash builtin to destroy arrays. Use =declare -a myarray=
to create a new array. There are two ways to define elements of an
array. One of them is ~myarray[index]=value~. Another is
~NAME=(VALUE1 VALUE2 ... )~ where each ~VALUE~ may be of the form
~'[SUBSCRIPT]='STRING~.
Given a 'varname' for the array name and a list of pairs, construct an
the string:
#+begin_example
unset VARNAME
declare -a VARNAME=( VARNAME[key1]=value1 VARNAME[key2]=value2 ... VARNAME[keyN]=valueN )
#+end_example
This function is called when using the =:var= header argument.
#+begin_src bash :results output :var myfriends='("alice" "bob" "carol") :eval never-export
echo ${myfriends[2]} is a friend of mine
#+end_src
#+RESULTS:
: carol is a friend of mine
*** [0/1] Questions
- [ ] Does this work with named references, like a table?
*** [0/1] Refactor
- [ ] Document arguments
** DONE defun org-babel--variable-assignments:bash_assoc
#+name: org-babel--variable-assignments:bash_assoc
#+begin_src emacs-lisp :eval never :tangle no
(defun org-babel--variable-assignments:bash_assoc
(varname values &optional sep hline)
"Return a list of statements declaring the values as bash associative array."
(format "unset %s\ndeclare -A %s\n%s"
varname varname
(mapconcat
(lambda (items)
(format "%s[%s]=%s"
varname
(org-babel-sh-var-to-sh (car items) sep hline)
(org-babel-sh-var-to-sh (cdr items) sep hline)))
values
"\n")))
#+end_src
Create a string for constructing an associative array in Bash syntax.
Bash supports "associative" arrays. Associative arrays are key-value
stores (sometimes called "dictionaries"). See [[https://www.gnu.org/software/bash/manual/html_node/Arrays.html][Arrays]].
=unset= is a Bash builtin to destroy arrays. Use =declare -A myarray=
to create a new array. There are two ways to define elements of an
array. One of them is ~myarray[key]=value~.
Given a 'varname' for the array name and a list of pairs, construct an
the string:
#+begin_example
unset VARNAME
declare -A VARNAME
VARNAME[key1]=value1
VARNAME[key2]=value2
...
VARNAME[keyN]=valueN
#+end_example
This function is called when using the =:var= header argument.
#+begin_src bash :results output :var myarray='(("apple" 3) ("banana" 4) ("cherry" 5)) :eval never-export
echo I have ${myarray[apple]} apples
#+end_src
#+RESULTS:
: I have 3 apples
*** [0/3] Questions
- [ ] What are downsides to off-loading this kind of functionality to
the user?
The problem: for every syntax, for every shell, we need to create a
command that maps to various shell constructs. For example,
defining associative arrays uses different syntax in different
shells:
#+begin_example
mksh$ typeset -a may[Germany]=Berlin
mksh$ echo ${may[Germany]}
Berlin
#+end_example
I don't think there's a way around needing to construct a string
that's sent to the subprocess. So, for each construct, we'll need
helper functions like =org-babel--variable-assignments:bash_assoc=.
This could be done 1:1, like with
=org-babel--variable-assignments:bash_assoc=. That is for the shell
functionality, there is a separate function to handle it. A concern
is that as these accrue, someone will come along and say, "Hey, this
syntax is all very similar. Why don't we 'simplify' the code and
generate all the similar cases?" Then we'll be in the same
situation as =org-babel-shell-initialize= where simple tasks become
complicated code generators that have hidden dependencies and extra
code needs to exist to handle the edge cases.
By "off-loading" to the end-user, I mean, provide an API that makes
this straight forward for end-users to implement. For example,
something like hook functions.
- [ ] How is this function used? Why is it provided in the first
place?
At first glance, this seems convenient. However, why does it exist?
Why can't the user "simply" write a block that does this?
Without looking into it, my guess is to convert an Org table to an
array.
- [ ] How are 'sep' and 'hline' supposed to work with this function?
As far as I understand how =org-babel-sh-var-to-sh= works, these
make no sense.
*** [0/2] Refactor
- [ ] Document the parameters
'varname' is what to name the array.
'values' is a list of pairs
- [ ] The worg documentation is wrong on this topic
** DONE defun org-babel--variable-assignments:bash
#+name: org-babel--variable-assignments:bash
#+begin_src emacs-lisp :eval never :tangle no
(defun org-babel--variable-assignments:bash (varname values &optional sep hline)
"Represent the parameters as useful Bash shell variables."
(pcase values
(`((,_ ,_ . ,_) . ,_) ;two-dimensional array
(org-babel--variable-assignments:bash_assoc varname values sep hline))
(`(,_ . ,_) ;simple list
(org-babel--variable-assignments:bash_array varname values sep hline))
(_ ;scalar value
(org-babel--variable-assignments:sh-generic varname values sep hline))))
#+end_src
Call appropriate assignment function:
- org-babel--variable-assignments:bash_assoc
- org-babel--variable-assignments:bash_array
- org-babel--variable-assignments:sh-generic
Uses =pcase= which does pattern matching on the form of a value and
performs an action accordingly.
*** [1/2] Questions
- [X] Is =pcase= the best way to do this?
=pcase= is a little weird to read at first. To match the nested
list appears needing to understand cons cells. This could be a
problem for someone not familiar with them. It also requires
quasi-quote. So, that's two levels of knowledge needed.
+ [X] Can it be used with a different pattern?
The pattern doesn't make sense to me, too. The pattern is this:
#+begin_example
`((,_ ,_ . ,_) . ,_)
#+end_example
Why not use a simpler form?
#+begin_example
`((,_ . ,_) . ,_)
#+end_example
Or maybe one that's literally the form we want?
#+begin_example
`((,_ . (,_ . nil)) . ,_)
#+end_example
As cons cells, a list of pairs looks like:
#+begin_src emacs-lisp :tangle no :results verbatim
'((3 . (4 . nil)) . nil)
#+end_src
#+RESULTS:
: ((3 4))
or with a second pair:
#+begin_src emacs-lisp :tangle no :results verbatim
'((3 . (4 . nil)) . ((5 . (6 . nil))))
#+end_src
#+RESULTS:
: ((3 4) (5 6))
It looks like it's combining the first cons together.
#+begin_src emacs-lisp :tangle no :results verbatim
'((3 4 . nil) . nil)
#+end_src
#+RESULTS:
: ((3 4))
or with a second pair:
#+begin_src emacs-lisp :tangle no :results verbatim
'((3 4 . nil) . ((5 6 . nil))))
#+end_src
#+RESULTS:
: ((3 4) (5 6))
Honestly, I'm getting annoyed at parsing this out. Why are we using
cons cells here anyway? It seems like we need to use them in order
capture the "rest" of the pairs.
So why not something like this since we're really only checking that
the first element is a pair?
#+begin_example
`((,_ ,_) . ,_)
#+end_example
It works:
#+begin_src emacs-lisp :tangle no
(let ((values '(("apple" 3) ("banana" 4) ("cherry" 5))))
(pcase values
(`((,_ ,_) . ,_)
(message "Simpler form catches"))
(`((,_ ,_ . ,_) . ,_)
(message "Simple form failed"))))
#+end_src
#+RESULTS:
: Simpler form catches
#+begin_src emacs-lisp :tangle no
(let ((values '("alice" "bob" "carol")))
(pcase values
(`((,_ ,_) . ,_)
(message "Simpler form catches"))
(`((,_ ,_ . ,_) . ,_)
(message "Simple form failed"))
(_ (message "Neither form catches (as desired)"))))
#+end_src
#+RESULTS:
: Neither form catches (as desired)
#+begin_src emacs-lisp :tangle no
(let ((values 42))
(pcase values
(`((,_ ,_) . ,_)
(message "Simpler form catches"))
(`((,_ ,_ . ,_) . ,_)
(message "Simple form failed"))
(_ (message "Neither form catches (as desired)"))))
#+end_src
#+RESULTS:
: Neither form catches (as desired)
+ [X] How could this same task be done without =pcase=?
Clearly, =pcase= can get confusing and fiddly. It seems to me a
good idea if we could avoid using it all together.
We're using the form of the parameters to determine the action.
Why? Because the parameters derive from the =:var= header which
can take an arbitrary form.
Have I been calling =:var= "incorrectly"?
It seems like =:var= is expected to be called multiple times! The
docstring for =org-babel--get-vars= says that PARAMS "may contain
multiple entries for the key :var ".
#+begin_src emacs-lisp :eval never :tangle no
(defun org-babel--get-vars (params)
"Return the babel variable assignments in PARAMS.
PARAMS is a quasi-alist of header args, which may contain
multiple entries for the key `:var'. This function returns a
list of the cdr of all the `:var' entries."
(mapcar #'cdr
(cl-remove-if-not (lambda (x) (eq (car x) :var)) params)))
#+end_src
If you're not careful, it looks like this should work:
#+begin_src bash :results output :var myarray["apple"]=3 :var myarray["banana"]=42 :eval never-export
echo I have ${myarray["banana"]} bananas
#+end_src
#+RESULTS:
: I have 42 bananas
But, NO, actually it doesn't...apple is never assigned.
#+begin_src bash :results output :var myarray["apple"]=3 :var myarray["banana"]=42 :eval never-export
echo $myarray
echo I have ${myarray["banana"]} bananas and ${myarray["apple"]} apples
#+end_src
#+RESULTS:
: 42
: I have 42 bananas and 42 apples
What happens is this:
Each :var is evaluated. The varname comes through as something
that Bash should recognize, "myarray\[\"apple\"\]". However, the
value is just "3", so the =pcase= sees it as an atom and uses
=org-babel--variable-assignments:sh-generic=. This means bash
doesn't make the "define" call necessary to create an array.
Inneed, through this and through the use of =pcase=, the
expectation appears to be that a lisp expression is passed to
:var. So, =:var= can be called with any form.
How else might we differentiate forms?
The params coming into =org-babel--get-vars= look like:
#+begin_example
((:results . "output replace")
(:exports . "code")
(:var . "myarray[\"apple\"]=3")
(:var . "myarray[\"banana\"]=42")
(:tangle . "no")
(:hlines . "no")
(:noweb . "no")
(:cache . "no")
(:session . "none"))
#+end_example
Rather than check the values parameter, check the varname. That
is check for array syntax. Should we be parsing for Bash code?
This is the weeds. What does it look like stepping back?
We're wanting to use the :var header to define variables in a Bash
shell. In addition to regular variables, we want to define
indexed arrays and associative arrays. How might an end user
indicate that these need to be defined? Currently, they can give
a lisp expression that gets translated. How else might an end
user indicate and provide the necessary information for defining
an indexed array or associative array?
- Separate headers for indexed and associative arrays
ACTUALLY... :var is intended to work with Org tables and lists
(the following don't work here because of the indentation, but it
does work).
#+begin_example
#+name:fruit
| apple | 3 |
| banana | 42 |
#+begin_src bash :results output :var myarray=fruit :eval never-export
echo I have ${myarray["banana"]} bananas and ${myarray["apple"]} apples
#+end_src
#+RESULTS:
: I have 42 bananas and 3 apples
#+end_example
- [ ] Should this function exist?
It's only delegating to other functions and is only called
by =org-babel-variable-assignments:shell=.
The reason why it's broken out looks like it may be because
=org-babel-variable-assignments:shell= gets a list of params that is
then mapped using a lambda. Breaking the delegation to a separate
function makes the lambda code shorter.
I'm curious if restructuring its logic would make it easier to
understand and remove the need for these separate functions.
*** [0/3] Refactor
- [ ] Change docstring
It's not very clear. It doesn't follow Emacs Lisp [[https://www.gnu.org/software/emacs/manual/html_node/elisp/Documentation-Tips.html][Documentation
Tips]]:
#+begin_quote
For a function, the first line should briefly answer the question,
"What does this function do?"
The first line should mention all the important arguments of the
function, and should mention them in the order that they are
written in a function call.
#+end_quote
What parameters? What does represent mean? Why?
It takes a variable name and...
How should we describe "values"? It's not a list because it could
be an atom.
It takes a variable name and the corresponding value (or values).
Variable name must be a string and a valid Bash variable name.
Values must be a list of pairs, a list, or an atom.
- [ ] Reword comments
+ Regarding "scalar":
It's not for a "scalar", it's for an atom, since we're checking the
form of =values=, not what values represents in Bash (and Bash
doesn't have scalars). Further, the value of the variable is not
necessarily a number:
#+begin_src bash :results output :var X="$Y" :eval never-export
Y="six";
eval echo I have "$X" apples
#+end_src
#+RESULTS:
: I have six apples
And still yet, the underscore matches anything.
+ Regarding "two-dimensional array":
The comment is misleading. It does *not* mean to say that
"bash_assoc" is a 2D-array, that associative arrays are 2D. They're
not. What it means to say is that the form of =values= is a "list
of lists". That =values= is a 2D list.
It's confusing because bash_assoc makes a Bash *array*. So, it's
easy to confuse what this is talking about (especially if you're not
familiar with Bash details).
Further, Lisp lists are not arrays! So, it's technically incorrect.
#+begin_src emacs-lisp :tangle no :results value
(let ((values '(("apple" 3) ("banana" 4) ("cherry" 5))))
(if (arrayp values)
(message "The list is an array")
(message "Lists are not arrays")))
#+end_src
#+RESULTS:
: Lists are not arrays
- [ ] Create better worg documentation for :var
First, the worg documentation is misleading. :var is intended for
use mainly with named tables and lists. By happenstance (it
appears), it also works with lisp expressions.
This is how :var is expected to be used:
#+name:fruit
| apple | 5 |
| banana | 7 |
#+begin_src bash :results output :var myarray=fruit :eval never-export
echo I have ${myarray["banana"]} bananas and ${myarray["apple"]} apples
#+end_src
#+RESULTS:
: I have 7 bananas and 5 apples
And this works with arrays as expected.
#+name:capitals
| Berlin |
| Washington DC |
#+begin_src bash :results output :var myarray=capitals :eval never-export
echo ${myarray[0]} is the first capital by alphabetical order
echo ${myarray[1]} comes last in alphabetical order
#+end_src
#+RESULTS:
: Berlin is the first capital by alphabetical order
: Washington DC comes last in alphabetical order
** DONE defun org-babel--variable-assignments:fish
#+name: org-babel--variable-assignments:fish
#+begin_src emacs-lisp :eval never :tangle no
(defun org-babel--variable-assignments:fish
(varname values &optional sep hline)
"Return a list of statements declaring the values as a fish variable."
(format "set %s %s" varname (org-babel-sh-var-to-sh values sep hline)))
#+end_src
This is basically the same function as
=org-babel--variable-assignments= but for fish's syntax.
*** [0/0] Questions
*** [0/1] Refactor
- [ ] Update docstring to explain the arguments
** DONE defun org-babel-variable-assignments:shell
#+name: org-babel-variable-assignments:shell
#+begin_src emacs-lisp :eval never :tangle no
(defun org-babel-variable-assignments:shell (params)
"Return list of shell statements assigning the block's variables."
(let ((sep (cdr (assq :separator params)))
(hline (when (string= "yes" (cdr (assq :hlines params)))
(or (cdr (assq :hline-string params))
"hline"))))
(mapcar
(lambda (pair)
(if (string-suffix-p "bash" shell-file-name)
(org-babel--variable-assignments:bash
(car pair) (cdr pair) sep hline)
(if (string-suffix-p "fish" shell-file-name)
(org-babel--variable-assignments:fish
(car pair) (cdr pair) sep hline)
(org-babel--variable-assignments:sh-generic
(car pair) (cdr pair) sep hline))))
(org-babel--get-vars params))))
#+end_src
Call appropriate variable assignment function depending on shell type
(i.e. suffix is bash or not). Each element of a :var header arg is
processed.
*** [1/1] Questions
- [X] How do (or even do) :hlines and :sep or :separator work with ob-shell?
See same question in [[General obshell: Questions][General obshell: Questions]]
*** [0/1] Refactor
- [ ] Change nested ifs to a single cond
** DONE defun org-babel-prep-session:shell
#+name: org-babel-prep-session:shell
#+begin_src emacs-lisp :eval never :tangle no
(defun org-babel-prep-session:shell (session params)
"Prepare SESSION according to the header arguments specified in PARAMS."
(let* ((session (org-babel-sh-initiate-session session))
(var-lines (org-babel-variable-assignments:shell params)))
(org-babel-comint-in-buffer session
(mapc (lambda (var)
(insert var) (comint-send-input nil t)
(org-babel-comint-wait-for-output session))
var-lines))
session))
#+end_src
Cruft left over from the original implementation of ob-sh. It was
used to start and return the comint associated with the session.
Originally, the comint associated with a session was actually started
by a function (that no longer exists) called
=org-babel-sh-initiate-session-by-key=. The
=org-babel-sh-initiate-session= returned only the buffer after it was
created:
#+begin_src emacs-lisp :tangle no
;; from commit 0fa68a53
(defvar org-babel-sh-buffers '(:default . nil))
(defun org-babel-sh-session-buffer (session)
(cdr (assoc session org-babel-sh-buffers)))
(defun org-babel-sh-initiate-session-by-key (&optional session)
"If there is not a current inferior-process-buffer in SESSION
then create. Return the initialized session."
(save-window-excursion
(let* ((session (if session (intern session) :default))
(sh-buffer (org-babel-sh-session-buffer session))
(newp (not (org-babel-comint-buffer-livep sh-buffer))))
(if (and sh-buffer (get-buffer sh-buffer) (not (buffer-live-p sh-buffer)))
(setq sh-buffer nil))
(shell sh-buffer)
(when newp
(setq sh-buffer (current-buffer))
(org-babel-comint-wait-for-output sh-buffer))
(setq org-babel-sh-buffers (cons (cons session sh-buffer)
(assq-delete-all session org-babel-sh-buffers)))
session)))
(defun org-babel-sh-initiate-session (&optional session)
(unless (string= session "none")
(org-babel-sh-session-buffer (org-babel-sh-initiate-session-by-key session))))
#+end_src
Over time, the functionality of creating the comint process was moved
into =org-babel-sh-initiate-session=. Variable assignment was moved
from =org-babel-prep-session:sh= to
=org-babel-sh-variable-assignments= in 8e151c06 October 14, 2010. It
has been untouched since then (for 13 years).
As you can see, =org-babel-prep-session= duplicates that of
=org-babel-sh-initiate-session-by-key= (and the modern
=org-babel-sh-initiate-session=). That is, it starts a comint.
*** [0/0] Questions
*** [0/1] Refactor
- [ ] Remove this function
See Refactor section of =org-babel-load-session=.
** DONE defun org-babel-load-session:shell
#+name: org-babel-load-session:shell
#+begin_src emacs-lisp :eval never :tangle no
(defun org-babel-load-session:shell (session body params)
"Load BODY into SESSION."
(save-window-excursion
(let ((buffer (org-babel-prep-session:shell session params)))
(with-current-buffer buffer
(goto-char (process-mark (get-buffer-process (current-buffer))))
(insert (org-babel-chomp body)))
buffer)))
#+end_src
This is cruft from the original development. I highly doubt it's been
used in the 13 years since it was last changed. Everything this
function does has been reimplemented in other functions.
It starts a process if none exists and inserts the body into the
associated process buffer. Initiate a comint, if needed. Define any
variables that are given in the header args. Return buffer associated
with the session.
It's intended for use with the (largely undocumented) =org-metaup=.
*** [1/1] Questions
- [X] What is this used for?
Searching for all instances of "org-babel-load-session:shell", we
see that it's used in ob-core.
#+begin_src sh :results output
rg -n --type=elisp -e org-babel-load-session\: /tmp/org-mode | rg -v defun
#+end_src
#+RESULTS:
: /tmp/org-mode/lisp/ob-core.el:1091: (cmd (intern (concat "org-babel-load-session:" lang))))
This is within the =org-babel-load-in-session= function.
#+begin_src sh :results output
rg -n --type=elisp -e org-babel-load-in-session /tmp/org-mode | rg -v defun
#+end_src
#+RESULTS:
: /tmp/org-mode/lisp/ob-core.el:392:then run `org-babel-load-in-session'."
: /tmp/org-mode/lisp/ob-core.el:395: (org-babel-load-in-session current-prefix-arg)))
: /tmp/org-mode/lisp/ob-core.el:397:(add-hook 'org-metaup-hook 'org-babel-load-in-session-maybe)
: /tmp/org-mode/lisp/org-keys.el:888: ("\C-l" . org-babel-load-in-session)
: /tmp/org-mode/lisp/org-keys.el:889: ("l" . org-babel-load-in-session)
This is used by =org-babel-load-in-session-maybe= which is added as
a hook to the =org-metaup-hook=.
#+begin_src sh :results output
rg -n --type=elisp -e org-metaup-hook /tmp/org-mode
#+end_src
#+RESULTS:
: /tmp/org-mode/lisp/ob-core.el:397:(add-hook 'org-metaup-hook 'org-babel-load-in-session-maybe)
: /tmp/org-mode/lisp/org.el:16833:(defvar org-metaup-hook nil
: /tmp/org-mode/lisp/org.el:17154:This function runs the functions in `org-metaup-hook' one by one
: /tmp/org-mode/lisp/org.el:17160: ((run-hook-with-args-until-success 'org-metaup-hook))
It looks like =org-metaup-hook= is only set up (by default) with
=org-babel-load-in-session-maybe= (which calls
=org-babel-load-in-session=).
The only other place this stuff is used in code is in =org-keys=
(seen above in the =rg= output). The =org-keys.el= commentary says,
#+begin_quote
;; This library adds bindings for Org mode buffers. It also
;; implements both Speed keys and Babel speed keys. See manual for
;; details.
#+end_quote
Looking at the manual, indeed, =org-babel-load-in-session= is
mentioned in [[https://www.gnu.org/software/emacs/manual/html_node/org/Key-bindings-and-Useful-Functions.html][Key bindings and Useful Functions]]:
#+begin_example
Active key bindings in code blocks:
Key binding Function
--------------------------------------------------------
'C-c C-c' 'org-babel-execute-src-block'
'C-c C-o' 'org-babel-open-src-block-result'
'M-<UP>' 'org-babel-load-in-session'
'M-<DOWN>' 'org-babel-pop-to-session'
...
'C-c C-v l' or 'C-c C-v C-l' 'org-babel-load-in-session'
#+end_example
No mention of =org-babel-load-in-session= is given related to the
speed commands. Speed commands creates a shortcut for =org-metaup=,
but does not explain (or in any way connect) that
=org-babel-load-in-session= exists.
Back to the original question, what is this used for?
=org-babel-load-session:shell= is connected to an undocumented
feature. It's unclear what this feature is. Try calling =M-x
org-metaup= on the following block:
#+begin_src shell :session *test*
echo "hi"
echo "world"
#+end_src
It starts a session and takes you to it:
#+begin_example
ahab@pequod ~/Projects/excalamus.com/src/draft/emacs$ PROMPT_COMMAND=;PS1="org_babel_sh_prompt> ";PS2=
org_babel_sh_prompt> echo "hi"
echo "world"
#+end_example
*** [0/1] Refactor
- [ ] Remove all traces of =org-babel-load-session=
I would be shocked if this feature has been used by anyone in the
past 13 years. It requires a largely undocumented feature:
=org-metaup=.
The function of it is to take you to the comint buffer. Something
like that might be useful if Org could capture the input and output.
However, it doesn't. This defeats the whole point of literate
programming.
Unless there's a way to capture the input and output of session
interaction happening *outside* the Org block, I don't see a point
in keeping this.
We're also in the fortunate position of no one likely knowing about
this, making it easy for us to kill it.
** DONE defun org-babel-sh-strip-weird-long-prompt
#+name: org-babel-sh-strip-weird-long-prompt
#+begin_src emacs-lisp :eval never :tangle no
(defun org-babel-sh-strip-weird-long-prompt (string)
"Remove prompt cruft from a string of shell output."
(while (string-match "^% +[\r\n$]+ *" string)
(setq string (substring string (match-end 0))))
string)
#+end_src
Used only in =org-babel-sh-evaluate= and only for sessions. Removes
leading white space for strings starting with "%".
The name concerns me. It sounds an awful lot like the problem it tries
to solve wasn't fully understood. That is, it's a work around, not a
solution.
Introduced in commit [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/org-babel.org?id=9623b169140f5ef79fbe437df79fc6e6b22c5c67][9623b169]].
*** [3/3] Questions
- [X] What is the actual regexp?
The regexp is:
#+begin_example
"^% +[\r\n$]+ *"
#+end_example
This says,
- ^ starting at the beginning of the line
- % match the percent sign
- + match one or more spaces
- [\r\n$]+ match one or more line endings
- * match zero or more white spaces
- [X] Does this regexp actually do anything?
I don't think so.
Here's how it works:
#+begin_src emacs-lisp :tangle no :eval never
(let ((string "% \n\n\r\n\rhello world\ngood cruel world x_x"))
(while (string-match "^% +[\r\n$]+ *" string)
(setq string (substring string (match-end 0))))
string)
#+end_src
#+RESULTS:
: hello world
: good cruel world x_x
It will not match unless the string starts with a "%":
#+begin_src emacs-lisp :tangle no :eval never
(let ((string "\n\n\r\n\rhello world\ngood cruel world x_x"))
(while (string-match "^% +[\r\n$]+ *" string)
(setq string (substring string (match-end 0))))
string)
#+end_src
#+RESULTS:
:
:
: ^M
: ^Mhello world
: good cruel world x_x
The only place =org-babel-sh-strip-weird-long-prompt= is ever called
is in =org-babel-sh-evaluate=. The captured session text is passed
in.
For the following, =#("The value of x is 7" 0 19 (field output))= is
the string passed in:
#+begin_src bash :results output :session *test* :eval never
x=7
echo "The value of x is $x"
#+end_src
I assume the "%" corresponded to the prompt, but I can't tell.
Currently, the string passed in is the result of the (several times)
trimmed result of evaluation.
#+begin_src emacs-lisp :tangle no :eval never
;; org-babel-sh-evaluate
(mapconcat
#'org-babel-sh-strip-weird-long-prompt
(mapcar
#'org-trim
(butlast ; Remove eoe indicator
(org-babel-comint-with-output
(session org-babel-sh-eoe-output t body)
(insert (org-trim body) "\n"
org-babel-sh-eoe-indicator)
(comint-send-input nil t))
;; Remove `org-babel-sh-eoe-indicator' output line.
1))
"\n")))
#+end_src
- [X] Is it necessary to setq?
The while loop continues through the string in order to find all
matches. When a match is found, the string is replaced. In this way,
each iteration of the loop is checking a new string. This allows for
=(match-end 0)= to always be used for the next match.
Yes, the setq is necessary if the goal is to return a new string.
*** [0/1] Refactor
- [ ] Remove this function
I'm leaning toward removing this. It doesn't seem like it's called
anymore. I have no clue how to reproduce the problem it seems to
solve and there's no record I could find of what the problem even was.
It also appears like the original problem was never fully understood.
If we remove it and a bug shows up, then we'll have a real test case,
have a chance to figure out what's actually happening, and maybe find
a better solution. We could remove the element of "weirdness." It
would negatively affect users in the short term by outputting extra
characters. The output would still exist, however, and users would be
able to keep using the Babel. In the long term, we'd be able to
better support users by actually understanding our code base.
By removing it, we avoid:
1. Having code
2. Not understanding our code
3. Doing extra processing
** DONE defun ob-shell-async-chunk-callback
#+name: ob-shell-async-chunk-callback
#+begin_src emacs-lisp :eval never :tangle no
(defun ob-shell-async-chunk-callback (string)
"Filter applied to results before insertion.
See `org-babel-comint-async-chunk-callback'."
(replace-regexp-in-string comint-prompt-regexp "" string))
#+end_src
As the docstring says, the filter applied to results before insertion.
Replace the =comint-prompt-regexp= ("^org_babel_sh_prompt> *") with
nothing.
Used in =org-babel-sh-evaluate=. Provided as the CHUNK-CALLBACK
argument of =org-babel-comint-async-register=. Called after the
=ob-shell-async-indicator= is found and the results scraped from the
comint buffer.
*** [0/0] Questions
*** [0/1] Refactor
- [ ] Document argument
** DONE [10/10] defun org-babel-sh-evaluate
This is where all the functionality of =ob-shell= lives. Understand
this and you'll have the bulk of what =ob-shell= is.
#+name: org-babel-sh-evaluate
#+begin_src emacs-lisp :eval never :tangle no :noweb yes
<<org-babel-sh-evaluate-varlist-shebang>>
<<org-babel-sh-evaluate-varlist-async>>
<<org-babel-sh-evaluate-varlist-results-params>>
<<org-babel-sh-evaluate-varlist-value-is-exit-status>>
<<org-babel-sh-evaluate-varlist-results>>
<<org-babel-sh-evaluate-varlist-results-stdin-cmdline>>
<<org-babel-sh-evaluate-varlist-results-session>>
<<org-babel-sh-evaluate-varlist-results-shebang>>
<<org-babel-sh-evaluate-varlist-results-general>>
<<org-babel-sh-evaluate-varlist-return>>
#+end_src
The =org-babel-sh-evaluate= function has two parts, calculating the
results and formatting the results. Calculating the results makes up
most of the function. It's a conditional for each case where each
case corresponds to possible header arguments. Formatting is
more-or-less off-loaded to core Org functionality.
There are several ways in which a shell block may evaluate. Many of
these overlap and influence each other. Each, more-or-less,
corresponds to a case in the conditional.
These are the current cases:
1. stdin
2. cmdline
3. shebang
4. session
5. async
6. general
- [X] *defun and org-babel-sh-evaluate-varlist-shebang*
#+name: org-babel-sh-evaluate-varlist-shebang
#+begin_src emacs-lisp :eval never :tangle no
(defun org-babel-sh-evaluate (session body &optional params stdin cmdline)
"Pass BODY to the Shell process in BUFFER.
If RESULT-TYPE equals `output' then return a list of the outputs
of the statements in BODY, if RESULT-TYPE equals `value' then
return the value of the last statement in BODY."
(let* ((shebang (cdr (assq :shebang params)))
#+end_src
The arguments are several parameters (session, stdin, cmdline, async,
etc) and the source block body. For some reason, the parameters are
split between being explicitly passed and passed within the params
list. See Questions and Refactor for discussion.
Params is an alist of data. condition See "What is params?" in [[*
defun org-babel-execute:shell][defun org-babel-execute:shell]] for
more information.
A shebang is a line like "#!/bin/bash" which does two things:
1. Allows a script file to execute as a command
2. Specifies which binary to use when executing
This let statement binds the =:shebang= header argument.
- [X] *org-babel-sh-evaluate-varlist-async*
#+name: org-babel-sh-evaluate-varlist-async
#+begin_src emacs-lisp :eval never :tangle no
(async (org-babel-comint-use-async params))
#+end_src
Determine whether to use async.
Async depends on 1) =:session= not being "none" and 2) the =:async=
header not being "no". See Questions section below on toggle
(in)consistencies. =org-babel-use-async= moves this logic to a
separate module for other async implementations to use.
- [X] *org-babel-sh-evaluate-varlist-results-params*
#+name: org-babel-sh-evaluate-varlist-results-params
#+begin_src emacs-lisp :eval never :tangle no
(results-params (cdr (assq :result-params params)))
#+end_src
The =:result-params= key in =params= holds the header values for the
=:results= header.
It looks something like,
#+begin_example
(:result-params "replace" "output")
#+end_example
or
#+begin_example
(:result-params "replace" "value" "table")
#+end_example
Results are presentable in several formats and =:result-params= holds
the keywords for deciding what the output should look like.
- [X] *org-babel-sh-evaluate-varlist-value-is-exit-status*
#+name: org-babel-sh-evaluate-varlist-value-is-exit-status
#+begin_src emacs-lisp :eval never :tangle no
(value-is-exit-status
(or (and
(equal '("replace") results-params)
(not org-babel-shell-results-defaults-to-output))
(member "value" results-params)))
#+end_src
Block results are stdout, stderr, or an exit code. Typically, an exit
code of 0 means success, non-0 means failure. =value-is-exit-status=
helps figure out which result to return.
This is something that needs refactoring. See Refactoring notes.
- [X] *org-babel-sh-evaluate-varlist-results*
#+name: org-babel-sh-evaluate-varlist-results
#+begin_src emacs-lisp :eval never :tangle no
(results
(cond
#+end_src
Results are calculated using a =cond=. Various combinations of headers
make up the different cases. See Question asking whether all
combinations are represented.
- [X] *org-babel-sh-evaluate-varlist-results-stdin-cmdline*
#+name: org-babel-sh-evaluate-varlist-results-stdin-cmdline
#+begin_src emacs-lisp :eval never :tangle no
((or stdin cmdline) ; external shell script w/STDIN
(let ((script-file (org-babel-temp-file "sh-script-"))
(stdin-file (org-babel-temp-file "sh-stdin-"))
(padline (not (string= "no" (cdr (assq :padline params))))))
(with-temp-file script-file
(when shebang (insert shebang "\n"))
(when padline (insert "\n"))
(insert body))
(set-file-modes script-file #o755)
(with-temp-file stdin-file (insert (or stdin "")))
(with-temp-buffer
(with-connection-local-variables
(apply #'process-file
(if shebang (file-local-name script-file)
shell-file-name)
stdin-file
(current-buffer)
nil
(if shebang (when cmdline (list cmdline))
(list shell-command-switch
(concat (file-local-name script-file) " " cmdline)))))
(buffer-string))))
#+end_src
This condition handles the stdin, cmdline, and (sometimes) shebangs
headers. Note: there's a separate branch condition for shebang
that's only called when there's no stdin or cmdline header. So,
this is for:
stdin
cmdline
stdin, shebang
cmdline, shebang
stdin, cmdline, shebang
A shebang is a line at the top of a file like "#!/bin/bash" which
allows a shell script to run as a command. The name "shebang" comes
from combining the names of the first two characters, the "hash" (#)
and the "bang" (!).
It works by calling =process-file= which has form:
#+begin_example
(process-file PROGRAM &optional INFILE BUFFER DISPLAY &rest ARGS)
#+end_example
The stdin header is put into a temporary file whose path is passed
in as INFILE.
The block code is also put into a temporary file. The block code is
either executed as a shebang or called as part of the argument to
the shell binary. The shebang line is inserted when a =:shebang= is
given.
The temporary script file is made executable with 755 permissions.
This corresponds to =chmod a+rwx,o-w= which sets permissions so that
the group and owner can read, can write and can execute. Others can
read and execute, but can't write.
When the block script is executed, it's run using =process-file= in
a temporary buffer. The results of the process are placed in the
buffer and, finally, the contents of the buffer given for the
results.
For some reason, a newline is inserted if =:padline= is given. See
Questions.
- [X] *org-babel-sh-evaluate-varlist-results-session*
#+name: org-babel-sh-evaluate-varlist-results-session
#+begin_src emacs-lisp :eval never :tangle no
(session ; session evaluation
(if async
(progn
(let ((uuid (org-id-uuid)))
(org-babel-comint-async-register
session
(current-buffer)
"ob_comint_async_shell_\\(.+\\)_\\(.+\\)"
'ob-shell-async-chunk-callback
nil)
(org-babel-comint-async-delete-dangling-and-eval
session
(insert (format ob-shell-async-indicator "start" uuid))
(comint-send-input nil t)
(insert (org-trim body))
(comint-send-input nil t)
(insert (format ob-shell-async-indicator "end" uuid))
(comint-send-input nil t))
uuid))
(mapconcat
#'org-babel-sh-strip-weird-long-prompt
(mapcar
#'org-trim
(butlast ; Remove eoe indicator
(org-babel-comint-with-output
(session org-babel-sh-eoe-output t body)
(insert (org-trim body) "\n"
org-babel-sh-eoe-indicator)
(comint-send-input nil t))
;; Remove `org-babel-sh-eoe-indicator' output line.
1))
"\n")))
#+end_src
The "session" condition has two parts: async and non-async.
Async handling comes first. The async API requires "registering"
certain information: the session buffer, the org buffer the block
lives in, a regexp for matching the beginning and end of evaluation,
and a user-defined filter function. The registration process adds a
special (non-user-defined) filter function to the session buffer
(=org-babel-comint-async-filter=). This filter looks at output sent
to the session buffer from the process for the end of evaluation
regex.
After the registration is made, the "start" indicator is sent, the
block code sent, and finally the "end" indicator. Because this is
all sent to a shell session, it runs in the background until the
filter finds the "end" indicator. Until that happens, the result of
the async branch is the uuid and that's what appears for the user as
the result. (If the "end" indicator is found, the uuid is located
and replaced with the text occurring between the "start" and "end"
indicators. But that's all handled by things in ob-comint.el.)
Non-async sessions are handled by the next bit of code. Basically,
it sends the block code to the session and waits for the
=org-babel-sh-eoe-indicator=. How this happens is determined by
code in ob-comint.el.
- [X] *org-babel-sh-evaluate-varlist-results-shebang*
#+name: org-babel-sh-evaluate-varlist-results-shebang
#+begin_src emacs-lisp :eval never :tangle no
;; External shell script, with or without a predefined
;; shebang.
((org-string-nw-p shebang)
(let ((script-file (org-babel-temp-file "sh-script-"))
(padline (not (equal "no" (cdr (assq :padline params))))))
(with-temp-file script-file
(insert shebang "\n")
(when padline (insert "\n"))
(insert body))
(set-file-modes script-file #o755)
(if (file-remote-p script-file)
;; Run remote script using its local path as COMMAND.
;; The remote execution is ensured by setting
;; correct `default-directory'.
(let ((default-directory (file-name-directory script-file)))
(org-babel-eval (file-local-name script-file) ""))
(org-babel-eval script-file ""))))
#+end_src
Check if shebang is non-empty; check for padline
Write shebang and body to temp file
The temporary script file is made executable with 755 permissions.
This corresponds to =chmod a+rwx,o-w= which sets permissions so that
the group and owner can read, can write and can execute. Others can
read and execute, but can't write.
Check if the script file is remote. This is because, ultimately,
=process-file= is called. Two things must happen:
1. strip the remote information from the filename
2. make sure the file runs on the remote
=process-file= says to use =file-local-name= for remote files. This
strips the remote information from the filename. However, without
this information, =process-file= will execute the file on the local
machine. We change this behavior by setting the =default-directory=
to make the file relative to the remote machine.
For example, when processing a remote file, the =script-file=
variable looks like:
"/ssh:ahab@localhost:/tmp/sh-script-3qPNNc"
The =file-local-name= strips the "ssh" stuff off the front to make
it: ~/tmp/sh-script-3qPNNc~. Originally, the file was relative to
"~/ssh:ahab@localhost:/tmp/~". After =file-local-name=, it looks
like it's relative to "/tmp" (on the local machine).
=org-babel-eval= then tries to run the result of =file-local-name=.
Without setting the =default-directory=, it uses the current
=default-directory=. This is the the *local* =default-directory=
which is "/tmp". When this happens, the script can't be found. We
must set the =default-directory= to the remote
(~/ssh:ahab@localhost:/tmp/~). Then it's found.
- [X] *org-babel-sh-evaluate-varlist-results-general*
#+name: org-babel-sh-evaluate-varlist-results-general
#+begin_src emacs-lisp :eval never :tangle no
(t (org-babel-eval shell-file-name (org-trim body))))))
#+end_src
This handles any of the cases not covered.
It trims the body and runs it through =process-file=. See "How
Shell works at a deep level" in the [[* Mailing list items][Mailing list items]].
- [X] *org-babel-sh-evaluate-varlist-return*
#+name: org-babel-sh-evaluate-varlist-return
#+begin_src emacs-lisp :eval never :tangle no
(when (and results value-is-exit-status)
(setq results (car (reverse (split-string results "\n" t)))))
(when results
(let ((result-params (cdr (assq :result-params params))))
(org-babel-result-cond result-params
results
(let ((tmp-file (org-babel-temp-file "sh-")))
(with-temp-file tmp-file (insert results))
(org-babel-import-elisp-from-file tmp-file)))))))
#+end_src
Shells can return two types of values, exit codes and standard
output.
The Babel results are the exit code when using ":results value":
#+begin_src sh :results value
echo "hi"
#+end_src
The results look like:
#+begin_example
"hi
0
"
#+end_example
#+begin_src emacs-lisp :tangle no
(car (reverse (split-string "hi\n0" "\n" t)))
#+end_src
#+RESULTS:
: 0
The final branch takes the result-params (which control output
formatting) and
#+begin_src sh :results table
echo "hello"
echo "world"
#+end_src
#+RESULTS:
| hello |
| world |
=org-table-result-cond= handles the different result cases. For a
table, it needs a sexp. It gets this from
=org-babel-import-elisp-from-file=:
#+begin_example
(("hello") ("world"))
#+end_example
This is what's returned to the caller. Why? That's not exactly
clear. But there you have it.
*** [3/17] Questions
- [ ] Why does this function exist?
- [ ] What combinations exist and are they represented?
Possible headers:
- stdin
- cmdline
- stdin
- session
- async
- shebang
- default
#+begin_src emacs-lisp :results table :tangle no :eval never
;; org-sort
;; delete-duplicate-lines
(defun powerset (x)
(if (null x)
(list '())
(let ((elt (car x))
(p (power-set (cdr x))))
(append (mapc (lambda (s) (cons elt s)) p) p))))
(powerset
'(stdin
cmdline
stdin
session
async
shebang
))
#+end_src
#+RESULTS:
| async | | | | |
| async | shebang | | | |
| cmdline | | | | |
| cmdline | shebang | | | |
| cmdline | async | | | |
| cmdline | async | shebang | | |
| cmdline | session | | | |
| cmdline | session | shebang | | |
| cmdline | session | async | | |
| cmdline | session | async | shebang | |
| cmdline | stdin | | | |
| cmdline | stdin | shebang | | |
| cmdline | stdin | async | | |
| cmdline | stdin | async | shebang | |
| cmdline | stdin | session | | |
| cmdline | stdin | session | shebang | |
| cmdline | stdin | session | async | |
| cmdline | stdin | session | async | shebang |
| session | | | | |
| session | shebang | | | |
| session | async | | | |
| session | async | shebang | | |
| shebang | | | | |
| stdin | | | | |
| stdin | shebang | | | |
| stdin | async | | | |
| stdin | async | shebang | | |
| stdin | session | | | |
| stdin | session | shebang | | |
| stdin | session | async | | |
| stdin | session | async | shebang | |
- [ ] What is the use of running a block as a shebang?
The sole purpose (AFAIU) of a shebang is to run a script as a
command. The purpose of Babel is for literate programming. Does
literate programming have any overlap with running a script as a
command?
A command is a way to execute code. In Babel, executing a block is
the way to execute the code. A shebang, AFAIU, doesn't have any
present inherent change in execution. The shebang header *does*
affect output within Babel because of implementation details.
However, that's not a part of shebangs themselves.
What are the different was to execute a script?
1. run as a command (shebang)
2. passing the script as an argument (=sh my-script.sh=)
3. passing it as a command string (=bash -c "code stuff"=)
4. sourcing it
What are the differences between them?
AFAIU, (1) and (2) are the same. (3) doesn't work well with
multiple lines. (4) modifies the current environment.
What am I missing? Because (1) and (2) are the same, it seems to me
that the shebang option could be deprecated and removed.
- [X] Is a block actually run as a shebang/command?
Yes. The call made is:
#+begin_src emacs-lisp :tangle no :eval never
(apply #'process-file
(if shebang (file-local-name script-file)
shell-file-name)
...
#+end_src
If the =:shebang= header is specified, then the temporary script
file name is passed in as the PROGRAM argument for =process-file=.
- [ ] What is required to make the toggles for options consistent?
Async depends on 1) =:sessions= not being "none" and 2) the =:async=
header not being "no". The following is run async:
#+begin_src bash :results output :session no :async none :eval never
echo "hello"
sleep 3
echo "world"
#+end_src
#+RESULTS:
: hello
: world
I think ob-python uses "yes". What are the toggles for other
languages?
- [ ] How does the logic for =value-is-exit-status= work?
There was big discussion in 2020 about the meaning of output and
value and about which should be default (if any).
See
<https://orgmode.org/list/CA+A2iZaziAfMeGpBqL6qGrzrWEVvLvC0DUw++T4gCF3NGuW-DQ@mail.gmail.com>
At that time, in commit 07c6b1125, the current logic was put in
place:
#+begin_src diff
- (value-is-exit-status (or (equal '("replace") results-params)
- (member "value" results-params)))
+ (value-is-exit-status
+ (or (and
+ (equal '("replace") results-params)
+ (not org-babel-shell-results-defaults-to-output))
+ (member "value" results-params)))
#+end_src
It says,
#+begin_example
If "replace" and not default to output -> t
Elseif value appears -> t
Else nil
#+end_example
- [ ] Why only check "default to output" for "replace" in
=value-is-exit-status=?
What is =org-babel-shell-results-defaults-to-output=?
#+begin_quote
org-babel-shell-results-defaults-to-output is a variable defined in 'ob-shell.el'.
Let shell execution defaults to ":results output".
When set to t, use ":results output" when no :results setting
is set. This is especially useful for inline source blocks.
When set to nil, stick to the convention of using :results value
as the default setting when no :results is set, the "value" of
a shell execution being its exit code.
#+end_quote
What is "replace?"
#+begin_quote
'replace'
Default. Insert results in the Org buffer. Remove previous
results. Usage example: ':results output replace'.
#+end_quote
We want "value" to correspond to "exit code" and "output" to
correspond to "stdout".
The default Org Babel behavior is "value". However, "value" is not
a good default for ob-shell. So, the setting
=org-babel-shell-results-defaults-to-output= is set to t, making
"output" the default.
So, when =org-babel-shell-results-defaults-to-output= is nil,
results should be "value."
Why is "replace" checked at all? It's not clear to me, but the
results aren't right if it's removed. The logic for this needs to
be cleaned up...
- [ ] How does =with-connection-local-variables= work?
Within the stdin/cmdline branch, what does
=with-connection-local-variables= do?
It applies "connection-local-variables", yes. I suppose those are
set elsewhere?
- [ ] Should other branches use =with-connection-local-variables=?
Might this help with some of the remote file handling? Or is this
just a convenience? Should other branches get the same convenience?
- [ ] When stdin/cmdline branch called without shebang, does
=shell-command-switch= always work?
It looks like =shell-command-switch= is always set to "-c". This is
bash specific.
=shell-command-switch= is only called in this branch. So, it's
being used for *all* shells. This should be set in the initialize
closure for each shell (if such a flag exists).
- [X] Is a check done on the =:shebang=?
No. It appears to just run and fail silently.
On Android, at least, "Code block produced no output."
#+begin_example
#+begin_src sh :shebang #!/banana :cmdline first second
echo "$2"
#+end_src
#+RESULTS:
#+end_example
Note stdin or cmdline must be given in order for the stdin/cmdline
branch to apply.
- [ ] Are the file permissions appropriate?
For Stdin/Cmdline, they are made executable with 755 permissions.
This corresponds to =chmod a+rwx,o-w= which sets permissions so that
the group and owner can read, can write and can execute. Others can
read and execute, but can't write.
- [ ] Why is =:padline= checked in stdin/cmdline?
The manual says,
#+begin_quote
The 'padline' header argument controls insertion of newlines to pad
source code in the tangled file.
#+end_quote
So, why is there any handling for padlines here?
The padline is only used after inserting the shebang. Is a newline
sometimes (but not always) needed after a shebang?
- [ ] Why does the async user define the uuid and call the start/end
indicators?
This is really a question for ob-comint.el.
- [ ] Why does the shebang branch check if the script file is remote?
Is it for when the Org buffer/file is being run on a remote machine?
Are there other places this should be done?
- [ ] For shebang branch, could the remote file checking logic clean
up?
Maybe.
This logic was introduced in [[https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=39de4a1848d12b1be929853bf884ec04e121d9f0][39de4a18]]:
#+begin_quote
commit 39de4a1848d12b1be929853bf884ec04e121d9f0 (HEAD)
Author: Ihor Radchenko <yantar92@posteo.net>
Date: Sun Jul 23 17:38:09 2023 +0300
org-babel-sh-evaluate: Fix edge case when :shebang is set and :dir is remote
* lisp/ob-shell.el (org-babel-sh-evaluate): Pass remote local file
name as command when executing script with :shebang. `org-babel-eval'
will fail when SCRIPT-FILE is TRAMP file name.
Link: https://www.reddit.com/r/orgmode/comments/14zh2yi/orgbabel_shebang_not_working_on_with_tramp_dir/
#+end_quote
The following is a function I made that probably isn't necessary.
#+begin_src emacs-lisp :tangle no
(defun my-file-name-directory (filename)
"Like `file-name-directory' but works with remote files."
(interactive)
(let* ((remote-connection (file-remote-p filename))
(end (if remote-connection
(match-end
(string-match remote-connection filename)))))
(file-name-directory (substring filename end))))
(my-file-name-directory
;; "/ssh:ahab@localhost:~/"
"/tmp/org-mode/lisp/ox.el"
;; "~/delete-me"
)
#+end_src
- [X] Is =org-babel-import-elisp-from-file= necessary for results?
No, there are other ways. I was not able to find anything as simple
as "convert-results-to-table(txt)". Everything seems to require
gymnastics.
=org-babel-import-elisp-from-file= requires writing to disk simply
to convert a the results into a sexp.
For the following,
#+begin_src sh :results table
echo "hello"
echo "world"
#+end_src
the "results" look like,
#+begin_example
"hello
world
"
#+end_example
It's not clear to me if there's a function to take the results
string and convert it to a table. There are several functions that
kind of do something like that,
- =org-table-convert-region=
- =org-table-create-or-convert-from-region=
- the whole =org-babel-import-elisp-from-file= dance =ob-shell= does
The following works:
#+begin_src emacs-lisp :tangle no :eval never-export
(with-temp-buffer
(insert "hello\nworld\n")
(org-table-convert-region (point-min) (point-max))
(buffer-substring-no-properties (point-min) (point-max)))
#+end_src
Notice that it does the transformation in place and therefore
requires manually grabbing the result and returning it.
Maybe this should be a function?
Fun fact, there are two "create-table-or-convert-from-region"
functions in =org-table.el=:
#+begin_src emacs-lisp :eval never
(defun org-table-create-or-convert-from-region (arg)
"Convert region to table, or create an empty table.
If there is an active region, convert it to a table, using the function
`org-table-convert-region'. See the documentation of that function
to learn how the prefix argument is interpreted to determine the field
separator.
If there is no such region, create an empty table with `org-table-create'."
(interactive "P")
(if (org-region-active-p)
(org-table-convert-region (region-beginning) (region-end) arg)
(org-table-create arg)))
(defun orgtbl-create-or-convert-from-region (_arg)
"Create table or convert region to table, if no conflicting binding.
This installs the table binding `C-c |', but only if there is no
conflicting binding to this key outside `orgtbl-mode'."
(interactive "P")
(let* (orgtbl-mode (cmd (key-binding "\C-c|")))
(if cmd
(call-interactively cmd)
(call-interactively 'org-table-create-or-convert-from-region))))
#+end_src
It seems like this might be because there are different table types?
#+begin_quote
;;; Commentary:
;; Watch out: Here we are talking about two different kind of tables.
;; Most of the code is for the tables created with the Org mode table editor.
;; Sometimes, we talk about tables created and edited with the table.el
;; Emacs package. We call the former org-type tables, and the latter
;; table.el-type tables.
#+end_quote
It feels like this is typical for the Org code base. Functions take
weird arguments and work in ways that seem based simply on the
functions that were available, regardless of whether it made any
sense. They're not designed to be composable. So, you end up
needing to do a bunch of weird transformations and understand the
mechanics of how the underlying system works. Basically, there's no
abstraction.
My impression is that this makes maintenance harder. For example, I
want to focus on shell functionality. My brain is filled with the
Chapter 18 level of kernel driver books dealing with terminal I/O
and here I am having to also deal with table rendering when there's
no fundamental reason why it can't be a function that takes a
string. That makes me grumpy.
This makes it harder to get maintainers because maintaining one
section is not possible. You need a deep understanding of each part
of the code base.
*** [0/7] Refactor
- [ ] Get rid of this function
It makes no sense. The necessary variables (session, cmdline,
async, etc.) are randomly grabbed within this function and it's
caller. The caller, =org-babel-execute:shell= is the one needed by
the Org Babel API. Both functions do similar things, like prep for
the type of execution and preparing the results. There is no clear
reason for either to exist. It's not like one does the subprocess
call and the other assembles results. No! Both do a bit of that.
- [ ] Document the arguments
- [ ] Make the arguments consistent
For some reason the arguments are inconsistent. Some are provided
directly (session, stdin, and cmdline), others through the params.
Both of the direct ones come from params, yet are obtained in the
caller function, =org-babel-execute:shell=, and are only used to
call =org-babel-sh-evaluate=.
- [ ] Fix =value-is-exit-status=
This is part of the "Get rid of this function" argument.
Basically,
1. Don't split the functionality of getting the exit code between
two functions
2. Don't use only "echo $?" to get the exit status (what about
"unsupported" shells?)
=value-is-exit-status= is a toggle to return the exit code. It
works like this:
A block is executed and =org-babel-execute:shell= is called.
=value-is-exit-status= is determined. When it's true, "\necho $?"
is appended to the block body. Then =org-babel-sh-evaluate= is
executed. =value-is-exit-status= is again figured out. When it's
true, then the result is parsed for the last value (which is the
exit code).
Maybe I need to think more, but I don't see why this needs to be
split between two functions.
Also, the exit code is obtained using "\necho $?". I'm pretty sure
that's only valid for Bourne-like shells (I have not checked).
- [ ] Fix docstring for =org-babel-shell-results-defaults-to-output=
Says "Let shell execution defaults to ":results output"."
Should be something like
"Let shell execution default to ":results output".
or
"Set default execution to ":results output""
- [ ] Make the paths for stdin, cmdline, and shebang more clear
- [ ] Move functionality of =org-babel-sh-evaluate-varlist-return= elsewhere
The evaluation should happen separately from the formatting of the
results of the evaluation. Right now, the logic for formatting is
split between two functions (and that logic appears unnecessarily
complex).
** DONE defun org-babel-execute:shell
#+name: org-babel-execute:shell
#+begin_src emacs-lisp :eval never :tangle no
(defun org-babel-execute:shell (body params)
"Execute Shell BODY according to PARAMS.
This function is called by `org-babel-execute-src-block'."
(let* ((session (org-babel-sh-initiate-session
(cdr (assq :session params))))
(stdin (let ((stdin (cdr (assq :stdin params))))
(when stdin (org-babel-sh-var-to-string
(org-babel-ref-resolve stdin)))))
(results-params (cdr (assq :result-params params)))
(value-is-exit-status
(or (and
(equal '("replace") results-params)
(not org-babel-shell-results-defaults-to-output))
(member "value" results-params)))
(cmdline (cdr (assq :cmdline params)))
(full-body (concat
(org-babel-expand-body:generic
body params (org-babel-variable-assignments:shell params))
(when value-is-exit-status "\necho $?"))))
(org-babel-reassemble-table
(org-babel-sh-evaluate session full-body params stdin cmdline)
(org-babel-pick-name
(cdr (assq :colname-names params)) (cdr (assq :colnames params)))
(org-babel-pick-name
(cdr (assq :rowname-names params)) (cdr (assq :rownames params))))))
#+end_src
The fundamental entry point for =ob-shell= when a source block is
evaluated.
When a user executes a block, the =ob-core= function
=org-babel-execute-src-block= is called. It dynamically binds
"org-babel-execute:lang" to a variable "cmd" where "lang" corresponds
to whatever follows "#+begin_src" in the block header ("lang"
corresponds to what we've been calling "template").
#+begin_src emacs-lisp :eval never :tangle no
;; org-babel-execute-src-block
...
d))))
(cmd (intern (concat "org-babel-execute:" lang))) ; dynamically bind symbol
result exec-start-time)
...
(setq exec-start-time (current-time)
result
(let ((r (save-current-buffer (funcall cmd body params)))) ; call org-babel-execute:lang
(if (and (eq (cdr (assq :result-type params)) 'value)
...
#+end_src
Recall that =org-babel-shell-initialize= generated functions called
"org-babel-execute:template", one for each of the shells given in
=org-babel-shell-names=. Each of these calls
=org-babel-execute:shell=.
BODY is the inside of the source block. PARAMS is an alist (see
below) of header arguments and other information.
This is function doesn't have a clear purpose. It just happens to be
the entry point.
It parses the block header and passes some of these to is the eventual
execution function =org-babel-sh-evaluate=. However, it also passes
the full params which is re-parsed out again.
It sort of handles assembling results. For example, it reassembles
the table. However, some of that is also done in
=org-babel-sh-evaluate=.
*** [1/3] Questions
- [ ] The =org-babel-execute:template= functions and
=org-babel-execute:shell= are all related to dispatch. How might we
dispatch differently and are any of the alternatives better?
- [X] Why does the docstring say "This function is called by
`org-babel-execute-src-block'."?
It seems to me that it says this because it's trying to explain how
the Org Babel API functions. Unfortunately, how the Org Babel API
functions is rather opaque.
- [ ] What is params?
Params is a data structure whose specific structure depends on the
context. PARAMS, as seen within =org-babel-execute:shell=, stores
information taken from the source block header as an [[info:elisp#Association Lists][alist]]:
#+begin_src emacs-lisp :eval never :tangle no
;; What's stored in the 'params' parameter:
((:colname-names)
(:rowname-names)
(:result-params "replace")
(:result-type . value)
(:results . "replace")
(:exports . "code")
(:session . "none")
(:cache . "no")
(:noweb . "no")
(:hlines . "no")
(:tangle . "no"))
;; Typical access pattern:
(cdr (assq :results
'((:colname-names)
(:rowname-names)
(:result-params "replace")
(:result-type . value)
(:results . "replace")
(:exports . "code")
(:session . "none")
(:cache . "no")
(:noweb . "no")
(:hlines . "no")
(:tangle . "no"))
)) ; "replace"
#+end_src
Note that PARAMS is defined differently elsewhere! Some of these
docstring descriptions, such as for
=org-babel-variable-assignments=, are probably incorrect in calling
it a plist.
According to =org-babel--get-vars=,
#+begin_quote
PARAMS is a quasi-alist of header args, which may contain multiple
entries for the key `:var'. This function returns a list of the cdr
of all the `:var' entries."
#+end_quote
According to =org-babel--file-desc=,
#+begin_quote
PARAMS is header argument values.
#+end_quote
According to =org-list--to-generic-plain-list=,
#+begin_quote
PARAMS is a plist used to tweak the behavior of the transcoder.
#+end_quote
According to =org-babel-tangle--unbracketed-link=,
#+begin_quote
The PARAMS are the 3rd element of the info for the same src block.
#+end_quote
According to =org-babel-comint-use-async=,
#+begin_quote
PARAMS are the header arguments as passed to
`org-babel-execute:lang'.
#+end_quote
According to =org-babel-variable-assignments:plantuml=,
#+begin_quote
PARAMS is a property list of source block parameters, which may
contain multiple entries for the key `:var'. `:var' entries in
PARAMS are expected to be scalar variables."
#+end_quote
*** [0/2] Refactor
- [ ] Remove the statement "This function is called by
`org-babel-execute-src-block'." from the docstring. Although a true
statement, it's also true that this function is called by all of the
=org-babel-execute:template= functions.
- [ ] Give this function a clear division of responsiblity
** DONE defun org-babel-shell-initialize
#+name: org-babel-shell-initialize
#+begin_src emacs-lisp :eval never :tangle no
(defun org-babel-shell-initialize ()
"Define execution functions associated to shell names.
This function has to be called whenever `org-babel-shell-names'
is modified outside the Customize interface."
(interactive)
(dolist (name org-babel-shell-names)
(let ((fname (intern (concat "org-babel-execute:" name))))
(defalias fname
(lambda (body params)
(:documentation
(format "Execute a block of %s commands with Babel." name))
(let ((shell-file-name name))
(org-babel-execute:shell body params))))
(put fname 'definition-name 'org-babel-shell-initialize))
(defalias (intern (concat "org-babel-variable-assignments:" name))
#'org-babel-variable-assignments:shell
(format "Return list of %s statements assigning to the block's \
variables."
name))
(funcall (if (fboundp 'defvar-1) #'defvar-1 #'set) ;Emacs-29
(intern (concat "org-babel-default-header-args:" name))
nil)))
#+end_src
Several different shells are supported.
Each needs the following functions:
1. An alias =org-babel-execute:template= for =org-babel-execute:shell=
2. An alias =org-babel-variable-assignments:name= for
=org-babel-variable-assignments:shell=
3. =org-babel-default-header-args:name=
These allow people to use any of the =org-babel-shell-names= as the
source block language. For example, the =ob-core= functionality that
runs the source block (=org-babel-execute-src-block=) expects a
"org-babel-execute:template" function.
The way it works is this. It iterates through the
=org-babel-shell-names= list. For each element, it defines a symbol
for =org-babel-execute:template= according to the current list item
(name). It makes this symbol an alias for a function that calls
=org-babel-execute:shell= such that =shell-file-name= is assigned to
the current list item. For example, when processing "fish", a call to
=org-babel-execute:fish= will call =org-babel-execute:shell= yet any
call to an inferior shell process will use "fish" rather than the
shell stored in the SHELL environment variable at start up. (It's
assumed that the command, such as "fish", is accessible through PATH).
A "put" call makes =org-babel-execute:template= accessible from the built
in help. It next creates an alias for
=org-babel-variable-assignments:name= corresponding to
=org-babel-variable-assignments:shell=. Finally, it creates a symbol
for =org-babel-default-header-args:name=, using a different function
(either =set= or =defvar-1=) depending on the Emacs version.
*** [1/1] Questions
- [X] Is there a better way to do this?
This sets off my alarms because it feels too clever.
- probably complex for someone coming from a non-lispy background,
creates a closure to modify =shell-file-name=, meta-programming
- creates (what feels like) unnecessary dependencies between
=org-babel-shell-names= and =org-babel-shell-initialize=
- hints at being an end-user function when it shouldn't be used that
way
- assumes that the things it's automating are all the same
tl;dr Make =org-babel-shell-initialize= take a list parameter,
rather than operating on =org-babel-shell-names= as a side effect.
It has immediate problems when learning the codebase, as written
about above for =org-babel-shell-names=. It's feels convoluted to
me. Understanding =org-babel-shell-names= requires knowing how
=org-babel-shell-initialize= works. =org-babel-initialize= forces
=org-babel-shell-names= to be defined twice.
It seems like this function is trying to provide end-users with an
easy way to update =ob-shell= with a new shell during run-time.
That's cool and all...but that should be the job of the =ob-shell=
maintainer. It's not simple supporting different shells, even if
they're similar. What benefit is there in giving end-users a
half-baked way to hang themselves?
Further, I doubt many end-users are using =ob-shell= with a custom
shell not supported. There's one exception: cmd.exe on Windows.
However, adding "cmd.exe" or "cmdproxy.exe" to
=org-babel-shell-names= doesn't really work well. Nor is it
necessary: you can just use "shell".
I think it's the automation part that concerns me most. It's the
fear of "What if there's an exception that doesn't fit into the
automation". My non-fear brain says, "Has that been a problem since
this was implemented? How hard would it be to change it when a
problem arises that requires such a change?" Leaving the
implementation as is costs nothing (but conceptual energy).
Re-working it will likely be easier later, if and when, a problem
appears since there will be a clear goal.
Part of me thinks it would be better to define
=org-babel-shell-names= once, populated with all the shells and then
to manually calling =org-babel-shell-initialize= to define the
execution functions, rather than forward declare
=org-babel-shell-names= and using the defcustom call to trigger the
initialization.
Maybe make =org-babel-shell-initialize= take a parameter which is a
list of shells to initialize? In this way, it would break the
dependency on =org-babel-shell-names=. Then we could declare
=org-babel-shell-names= once and have it call the initialize
function.
1. Make =org-babel-shell-initialize= take a parameter which a list
of shells to initialize
2. Define the list =org-babel-shell-names=
3. Set =org-babel-shell-names= to call =org-babel-shell-initialize=
when set.
+ Can defcustom refer to itself during the set call? Yes! It
seems so.
*** [0/3] Refactor
- [ ] Make =org-babel-shell-initialize= private. No one but the
maintainer should be running that.
- [ ] Document the interaction between =org-babel-shell-initialize=
and =org-babel-shell-names=
- [ ] Make =org-babel-shell-initialize= functional
1. Make =org-babel-shell-initialize= take a parameter which is a
list of shells to initialize
2. Define the list =org-babel-shell-names= using defcustom
3. Set =org-babel-shell-names= to call =org-babel-shell-initialize=
when set.
For example,
#+begin_src emacs-lisp :results none :tangle no
(defun my-initialize (initialization-list)
(interactive)
(dolist (name initialization-list)
(message "%s" (concat "my-automatically-created-symbol:" name))))
(defcustom my-name-list
'("banana" "Rama")
"List of names to be created by `my-initialize'"
:group 'my-test
:type '(repeat (string :tag "Name to create: "))
:set (lambda (symbol value)
(set-default-toplevel-value symbol value)
(my-initialize my-name-list)))
#+end_src
* ob-shell undocumented behaviors
:PROPERTIES:
#+TOC: headlines 2 local
:END:
** Problem: Undocumented functionality
Some of =ob-shell.el= is dedicated to undocumented behavior. The
=org-babel-load-in-session= function which is bound to the
=org-metaup-hook=, M-up. The source body will be inserted in a new
session process buffer.
For this is the code path (M-up):
- =ob-core:org-babel-load-in-session=
- =ob-shell:org-babel-prep-session:shell=
Now documented, yet a good example of this, is the =:dir= header.
There have been many questions on the mailing list regarding shell
blocks and ssh. The =:dir= header works with TRAMP to create a remote
connection. This works seamlessly with Babel. However, it was not
clearly documented (the only example showed an R code block and is
located in a separate part of the manual).
* Addendum
:PROPERTIES:
#+TOC: headlines 2 local
:END:
** History
In the early days of Babel, collaboration notes were kept. Eric makes
a distinction between (what he calls) "functional" and "imperative".
It's not clear to me what's meant by these terms. AFAICT,
"functional" means "non-persistent" and "imperative" means
"persistent". "persistent" means here that when you run a block
twice, you always get the same result with a "functional" block.
See:
- https://git.savannah.gnu.org/cgit/emacs/org-mode.git/tree/contrib/babel/org-babel.org?id=c0554c775344207e6adaf191258312fb8d1b6a15
- https://git.savannah.gnu.org/cgit/emacs/org-mode.git/tree/org-babel.org?id=b1c103890c1523f99e380d88ed684454d902414e
** Mailing list items
Discussion on refactoring all non-session calls to be run as scripts,
including bugs to fix
- https://lists.gnu.org/archive/html/emacs-orgmode/2023-11/msg00137.html
- https://lists.gnu.org/archive/html/emacs-orgmode/2023-11/msg00142.html
How Shell works at a deep level
- https://lists.gnu.org/archive/html/emacs-orgmode/2023-11/msg00242.html
Bug report and patches for :cmdline failings.
- https://lists.gnu.org/archive/html/emacs-orgmode/2023-11/msg00263.html
Handling ANSI characters
- https://list.orgmode.org/orgmode/87msvcgjgv.fsf@gmail.com/
Misc:
- [[https://list.orgmode.org/CKK9TULBP2BG.2UITT31YJV03J@laptop/T/#mc8e3ca2f5f1b9a94040a68b4c6201234b209041c][babel output seems to drop anything before % (in session)]].
- [[https://orgmode.org/list/CA+A2iZaziAfMeGpBqL6qGrzrWEVvLvC0DUw++T4gCF3NGuW-DQ@mail.gmail.com][:results value vs. :results output]]
* File Variables :noexport:
# Local Variables:
# eval: (add-hook 'before-save-hook 'time-stamp nil t)
# End: