Understanding ob-shell.el
org-babel
enables embedding executable source code within a
document. Babel differs from other "notebook" applications in that the
source code, execution parameters, and results have the same
representation as the rest of the document. It's all text. Delimiters
surround the source code and distinguish it from non-code text, as well
as provide options for execution. When run, source code is isolated
from the non-code text, executed in a subprocess, and some result
inserted into the document. Source code is a "first class citizen".
Code, prose, and results are interchangeable.
ob-shell.el
defines functions and variables that manage the
execution of several shell languages.
It appears that a high level consideration of shell evaluation hasn't happened in at least a decade. Instead, the library has progressed by piecemeal fixes. This has resulted in local optimizations. Abstractions and interfaces are not always cohesive. I suspect this is a source of bugs and that it deters contributions. There have been several recent calls for maintainers. My hope is that this document and the changes it inspires will encourage new people to join the project as well as make Org more reliable.
Updating ob-shell.el
is no trivial matter. It's a classic case of
Chesterton's fence,
"In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, 'I don't see the use of this; let us clear it away.' To which the more intelligent type of reformer will do well to answer: 'If you don't see the use of it, I certainly won't let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it."
This document is a record of "going away and thinking." It presents an understanding of why things are the way they currently are, where "current" means org-mode git commit 9183e3c72. Each section of code is analyzed, questions are asked, and recommendations given. See [BROKEN LINK: * Validation] within the source file for how to compare a checked out version of the code to the code reviewed in this document.
Source file: https://excalamus.com/understanding-ob-shell-raw.html
Table of Contents
Topics
Make asynchronous execution the default
Why isn't async the default?
As far as I can tell, it's because Carsten used
shell-command-on-region
in the original ob-eval.el
.
(defun org-eval-run (cmd code) (with-temp-buffer (insert code) (shell-command-on-region (point-min) (point-max) cmd nil 'replace) (buffer-string)))
In the early days of Babel, collaboration notes were kept. This
started as rorg.org
and was later renamed to org-babel.org
.
The rorg.org in commit e770100 mentions development for an
ob-eval-light
on March 31, 2009:
org-eval was written by Carsten. It lives in the org/contrib/lisp directory because it is too dangerous to include in the base. Unlike org-eval-light org-eval evaluates all source blocks in an org-file when the file is first opened, which could be a security nightmare for example if someone emailed you a pernicious file.
All evaluations at this time happen with shell-command-on-region
:
- org-eval-run May 6, 2008 commit 62fc2c1
- org-eval-light-run March 31, 2009 commit e7701000
- all the "execute:lang"-type functions (see How blocks execute to obtain results)
The last known update to rorg.org
(later renamed to org-babel.org
) 2d2a2a19
says,
,*** TODO evaluation of shell code as background process? After C-c C-c on an R code block, the process may appear to block, but C-g can be used to reclaim control of the .org buffer, without interrupting the R evalution. However I believe this is not true of bash/sh evaluation. [Haven't tried other languages] Perhaps a solution is just to background the individual shell commands.
The other languages (aside from emacs lisp) are run through the shell, so if we find a shell solution it should work for them as well.
Adding an ampersand seems to be a supported way to run commands in the background (see external-commands). Although a more extensible solution may involve the use of the call-process-region function.
Going to try this out in a new file org-babel-proc.el. This should contain functions for asynchronously running generic shell commands in the background, and then returning their input.
I can find no record of org-babel-proc.el
:
cd /tmp/org-mode && pwd && git log --all -- '*org-babel-proc.el*' cd /tmp/org-mode && pwd && git log --all --full-history -- org-babel-proc.el cd /tmp/org-mode && pwd && git log --all --full-history -- **/org-babel-proc.* cd /tmp/org-mode && pwd && git log --all --full-history -- *org-babel-proc* cd /tmp/org-mode && pwd && git log --all --full-history -- org-babel-proc*
/tmp/org-mode /tmp/org-mode /tmp/org-mode /tmp/org-mode /tmp/org-mode
org-babel.org (bb2969ef) (quoted above) suggests "a more extensible
solution may involve the use of the call-process-region
function".
The thought was sap that "adding an ampersand seems to be a supported
way to run commands in the background." This appears based on
information from the EmacsWiki.
Unfortunately, call-process-region
is synchronous and the suggestion
is appears ill-informed (although that may simply be the fog of time).
See How many different ways are there to execute a shell block?
Today, process-file
is used instead of call-process-on-region
.
Yet process-file
is a wrapper around call-process
and is still
synchronous.
I think an investigation was made, no clear answer or path figured out, and focus shifted elsewhere. After the initial development push, async has come up a few times on the mailing list:
- (February 29, 2012) babel, executing code in background process
- (September 4, 2013) Re: asynchronous code evaluation
- (November 18, 2015) long running processes
- (November 20, 2015) asynchronous python code blocks in org-mode
- (April 6, 2016) [PATCH] new :async feature for org-babel-clojure
- (July 2, 2019) Asynchronous session evaluation
- (March 20, 2023) ob-async
- (October 25, 2020) [PATCH] Async session eval (2nd attempt)
- (September 26, 2021) [PATCH] async process in R
- (November 22, 2021) [PATCH] ob-shell-test, test-ob-shell and introduction
ob-shell (or org-babel) as a terminal emulator
Ultimately, I would like ob-shell to be more robust and have more functionality. It's limited presently by ob-comint.el and ob-eval.el.
Code cleanup
CANCELED Untangle asynchronous execution and sessions
This is an ob-comint/ob-eval issue.
Async depends on make-process
not on comints. Currently, sessions
are the only way to run something asynchronously. That doesn't make
sense for some languages. That doesn't mean they can't work
asynchronously.
CANCELED Standardize the org-babel interface
This is an ob-comint/ob-eval/ob-core issue.
The only documentation is ob-template.el
.
org-babel notes
Org Babel started as a package called "litorgy" around February 22, 2009. Support for shells, litorgy-shell.el, was added in commit e7701000 on March 31, 2009. The current file lisp/ob-shell.el was created in commit 583e7ab1 on December 13, 2013.
An incomplete history of ob-shell is here:
Date | Commit | Action |
---|---|---|
March 31, 2009 | e7701000 | litorgy-shell.el created |
May 24, 2009 | 2b9b2ee7 | renamed to org-babel/org-babel-shell.el |
May 24, 2009 | e82fb256 | (parent) |
May 24, 2009 | 8e9d3a51 | (parent) |
May 24, 2009 | 9247200b | renamed to lisp/org-babel-shell.el |
June 12, 2009 | b64be813 | renamed to lisp/langs/org-babel-shell.el |
June 14, 2009 | 4fd1abec | renamed to lisp/langs/org-babel-sh.el |
June 17, 2010 | e0e4d760 | renamed to lisp/babel/langs/ob-sh.el |
June 17, 2010 | 04152d3a | renamed to lisp/babel/langs/ob-sh.el (from contrib?) |
December 13, 2013 | 583e7ab1 | renamed from lisp/ob-sh.el to lisp/ob-shell.el |
Difference between a "shell" and a "terminal"
On-hold until I can find the brain power to dig into kernel drivers and terminal I/O more.
The following is Linux centric because that's what information is available. Other resources included MINIX. All of these are inspired by UNIX. Maybe it's more accurate to say it leans POSIX.
A "shell" is a command driven process launcher. It reads input, evaluates it, prints results, and loops back to reading input. The following demonstrates how a simple shell operates. It's easy to imagine the example extended to receive input in batch. Such batch input to a shell is a "script."
/* * https://stevens.netmeister.org/631/ */ #include <sys/types.h> #include <sys/wait.h> #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sysexits.h> #include <unistd.h> char * getinput (char *buffer, size_t buflen) { printf ("$$ "); return fgets (buffer, buflen, stdin); } int main (int argc, char **argv) { char buf[BUFSIZ]; pid_t pid; int status; /* cast to void to silence compiler warnings */ (void) argc; (void) argv; while (getinput (buf, sizeof (buf))) { buf[strlen (buf) - 1] = '\0'; if ((pid = fork ()) == -1) { fprintf (stderr, "shell: can't fork: %s\n", strerror (errno)); continue; } else if (pid == 0) { /* child */ execlp (buf, buf, (char *) 0); fprintf (stderr, "shell: couldn't exec %s: %s\n", buf, strerror (errno)); exit (EX_UNAVAILABLE); } /* parent waits */ if ((pid = waitpid (pid, &status, 0)) < 0) { fprintf (stderr, "shell: waitpid error: %s\n", strerror (errno)); } } exit (EX_OK); }
POSIX defines a shell as:
A program that interprets sequences of text input as commands. It may operate on an input stream or it may interactively prompt and read commands from a terminal.
– https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_347
A "terminal" is an interface to a shell. A shell simply "interprets sequences of text" for the purpose of starting new processes. A terminal manages the sequences of text sent and received from a shell. The terms "console" and "command-line" are also used.
Historically, a terminal was a physical device. Early computers repurposed Teletypes, electrical mechanical devices for sending and receiving telegrams, for input and output. As computers developed, so did terminals. Gradually, terminals went from recycled devices resembling typewriters to simple computers with memory, screens, and basic computational ability. Still, the primary purpose of a terminal is preparing text for a shell and outputting text from the shell to the user.
Terminals prepare text in one of two ways: character-wise or line-wise. For example, if the user inputs "dsta", realizes their mistake, presses backspace three times, and then inserts "ata" before pressing enter, the shell receives 11 characters. If the terminal instead handled this as a line, then only the 5 characters are necessary. Which approach to use depends on the hardware and requirements. Yet even within this simple example is hidden complexity. What constitutes the end of a line? A carriage return? A line feed? Both? If both, must the user type both? This and many other tasks like echoes, signal handling, jobs control, and special characters processing are managed by the terminal. Such tasks constitute what's called "line discipline."
Terminals act as an interface in other ways. The POSIX standard has this gem of a definition for "terminal,"
A character special file that obeys the specifications of the general terminal interface.
– https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_401
A "character special file" is a file that refers to a device (such as a terminal device file) or that has special properties (such as /dev/null).
– https://pubs.opengroup.org/onlinepubs/9699919799/nframe.html
So, according to POSIX, a terminal is a file that refers to a device such as a terminal. What that means is a terminal a file, as well as a physical device. Unfortunately, several "slights of hand" happen with the words "file", "device", and "terminal."
POSIX is a standardization of operating systems based on UNIX. If modular design is "the UNIX philosophy", then UNIX's motto must be "everything is a file." More accurately, "everything has a file descriptor." This means that (most) objects in the system have a numeric reference like 0, 1, or 2, which allow them to work with the system calls "open", "close", "read", and "write" as a stream of bytes, just like a file. Saying something is a "file" means it has the same interface as a file regardless of whether it's an electronic document, a keyboard, a mouse, or a terminal.
Any device we attach to our computer must communicate with the system. Data must flow from the device into our system and the system may send data back. Processes related to this I/O are often called "device drivers." Typically, there is one process, or driver, per device. UNIX-like operating systems drivers are represented by special files. The fact they're not typical files like a document makes them special; they're an interface. So, "device", "driver", and "file" often refer to the same thing: the interface the operating system presents for a connected device.
We've already seen that a file is another way to refer to a stream of bytes. Different devices group data within a stream in different ways. Some devices, such as a keyboard, operate one byte at a time. Other devices, like hard disks, use blocks of data. Since one byte corresponds to a character, such devices are called "character devices." Connecting our terms, "a character special file" is another way to say, "a driver associated with a device that sends and receives one byte at a time." Generally, devices which send and receive one byte at a time are called "serial" devices.
Physical terminals were some of the first devices ever connected to a computer. Indeed, the famous picture of Ken Thompson and Dennis Richie in front of a PDP-11 shows Ken at a Teletype Model 33 ASR teleprinter (probably interacting with UNIX). It's no surprise then that the device file used to represent a Teletype in UNIX is named "tty," a common abbreviation for (T)ele(ty)pes. TTY files have come to mean any character device.
In Linux, processes are grouped into sets of "sessions" in order to facilitate process management. Every session is tied to a terminal, called the "controlling terminal," from which processes in the session get their input and to which they send their output. A TTY device file corresponds to any device that could act as a controlling terminal. These are either serial devices or virtual consoles (full-screen terminal displays on the system video monitor). There are various naming conventions for these devices/files. However, they always correspond to a serial device or virtual console.
Sometimes it's desirable to enforce a line discipline without a physical device. For example, we may want to automate shell interaction using a program or we may want to connect Emacs to a shell. Linux provides a generalization of TTY drivers called pseudoterminals, abbreviated by PTY.
The man page for "pty" says it best,
A pseudoterminal (sometimes abbreviated "pty") is a pair of virtual character devices that provide a bidirectional communication channel. One end of the channel is called the master; the other end is called the slave.
The slave end of the pseudoterminal provides an interface that behaves exactly like a classical terminal. A process that expects to be connected to a terminal, can open the slave end of a pseudoterminal and then be driven by a program that has opened the master end. Anything that is written on the master end is provided to the process on the slave end as though it was input typed on a terminal. For example, writing the interrupt character (usually control-C) to the master device would cause an interrupt signal (SIGINT) to be generated for the foreground process group that is connected to the slave. Conversely, anything that is written to the slave end of the pseudoterminal can be read by the process that is connected to the master end.
Data flow between master and slave is handled asynchronously, much like data flow with a physical terminal. Data written to the slave will be available at the master promptly, but may not be available immediately. Similarly, there may be a small processing delay between a write to the master, and the effect being visible at the slave.
Regurgitating some information, on a PTY, the master end sends it's input to the slave's output:
PTY +--------+-------+ input | | | output -------->==============+-----> | | | e.g. emacs | master | slave | e.g. shell | | | <-----+==============<-------- output | | | input +--------+-------+
I don't actually know what this diagram means.
For C examples of working with a PTY, see:
- https://www.rkoucha.fr/tech_corner/pty_pdip.html
- The Linux Programming Interface, Kerrisk - Chapter 64
rkoucha's example is good yet it's interactive and not easily embedded within an org document. I could not automate it and must return to it.
PTYs are compared to bidirectional pipes. Here's an example of implementing bidirectional pipes (AFAIU) between two processes.
/* gcc -o pipe3-example pipe3-example.c && ./pipe3-example This example creates a "bidirectional" pipe. A pipe enables unidirectional data channel between processes. Using two pipes allows two processes to each send and receive data. The pipe() Linux system command (man 2 pipe) says, "The array pipefd is used to return two file descriptors referring to the ends of the pipe. pipefd[0] refers to the read end of the pipe. pipefd[1] refers to the write end of the pipe. Data written to the write end of the pipe is buffered by the kernel until it is read from the read end of the pipe." fd[0] fd[1] read --------> write in out Reading happens before writing. Something is put in the pipe before it is taken out. So, given two pipes, A and B, and two processes (c)hild and (p)arent, A cread a[0]<------->a[1] pwrite pread b[0]<------->b[1] cwrite B The child reads from a[0] after the parent writes to a[1]. The parent reads from b[0] after the child writes to b[1]. */ #include <unistd.h> #include <stdlib.h> #include <stdio.h> #include <string.h> #include <errno.h> int main () { int pipe_A[2]; int pipe_B[2]; ssize_t size_parent_read; ssize_t size_child_read; char parent_buffer[100]; memset (parent_buffer, '\0', 100); char child_buffer[100]; memset (child_buffer, '\0', 100); if (pipe (pipe_A) == 0 && pipe (pipe_B) == 0) { pid_t pid = fork (); if (pid == (pid_t) - 1) { fprintf (stderr, "Fork failed"); exit (EXIT_FAILURE); } else if (pid > (pid_t) 0) { // Parent printf ("Parent writing...\n"); write (pipe_A[1], "hello", sizeof("hello")); size_parent_read = read (pipe_B[0], parent_buffer, 99); printf ("Parent received: '%s'\n", parent_buffer); } else if (pid == (pid_t) 0) { // Child size_child_read = read (pipe_A[0], child_buffer, 99); printf ("Child received: '%s'\n", child_buffer); printf ("Child writing...\n"); write (pipe_B[1], "world", sizeof("world")); } } exit (EXIT_SUCCESS); }
- https://unix.stackexchange.com/a/4132
- https://tldp.org/LDP/lpg/node10.html#SECTION00721000000000000000
- https://www.youtube.com/watch?v=07Q9oqNLXB4&pp=ygUZYnJpYW4gd2lsbCB0ZXJtaW5hbHMgdW5peA%3D%3D
- Operating Systems Design and Implementation, Tanenbaum 1997
- http://www.haifux.org/lectures/86-sil/kernel-modules-drivers/node10.html
- https://web.archive.org/web/20231230182026/https://raw.githubusercontent.com/torvalds/linux/master/Documentation/admin-guide/devices.rst
- https://web.archive.org/web/20231230204511/https://raw.githubusercontent.com/torvalds/linux/master/Documentation/admin-guide/devices.txt
- https://web.archive.org/web/20231208020059/https://www.informit.com/articles/article.aspx?p=397655&seqNum=6
- https://www.rkoucha.fr/tech_corner/pty_pdip.html
How blocks execute to obtain results
Shell blocks have two basic execution models depending on whether it's a session or not (that is, a persistent environment or a temporary environment). In each case, a subprocess call is made to the corresponding shell program.
For non-sessions, the final lisp call is to call-process
(via
process-file
). This runs the shell in a synchronous process. Stdin
and stderr are output to a buffer when the process completes. The
output is scraped, cleaned, and returned for the block results.
For sessions, the final lisp call is to process-send-string
(via
comint-send-string
). When the first session block is run, a process
buffer is created to collect output from a synchronous process (see
Output from Processes). Block source is sent to the buffer process
along with command to echo delimiters. Output is filtered until the
ending delimiter appears. When the ending delimiter is found, output
is scraped, cleaned, and returned for the block results.
NOTE Sessions are currently entangled with async! See "session" and "non-session" vs. "synchronous" and "asynchronous".
header | final lisp call |
---|---|
shebang | call-process |
cmdline | call-process |
stdin | call-process |
session | process-send-string |
async | process-send-string |
default | call-process |
See https://lists.gnu.org/archive/html/emacs-orgmode/2023-11/msg00242.html
"session" and "non-session" vs. "synchronous" and "asynchronous"
"Session" means a shell environment is "persistent." Each call is executed in the same environment. State exists between calls. In the early-history of Babel, this was called the "imperative" style.
"Non-session" means a shell environment is "temporary." Each call is executed in an independent environment. State does not exist between calls. In the early-history of Babel, this was called the "functional" style.
"synchronous" means that execution prevents the user from editing the document while results are obtained.
"asynchronous" means that execution does not prevent the user from editing the document while results are obtained.
These concepts are conflated (or entangled) in org-babel.
- Why can't non-session blocks run asynchronously?
- Why not make asynchronous the default?
Framework versus implementation
Currently, ob-shell and org-babel provide a set of functions and variables that implement subprocess calls. org-babel, along with ob-template, imply a framework for creating new integrations with different applications. However, in practice, are those functions and variables set up to be composed? Not really.
Asynchronous execution was introduced by ob-python and adopted by ob-shell. Why can't ob-C execute asynchronously? Because the implementation is not set up as a framework.
The following hangs Emacs:
while (1) {}
How many different ways are there to execute a shell block?
Executing a shell block requires starting a process.
Processes are synchronous or asynchronous.
Three primitives exist in Emacs for making processes:
- make-process (asynchronous)
- call-process (synchronous)
- call-process-region (synchronous)
There exist several convenience wrappers for these.
Output from a process typically goes to a buffer. This may be
changed and instead handled by a filter. call-process
has an
option to directly send output to a file.
Subprocesses inherent the default-directory
and the environment
from Emacs. The environment may be changed using
process-environment
.
There are two types of asynchronous connections: "pty"
("pseudoterminal") and "pipe". The main difference is that "pty"
provides a terminal-like connection which allows for things like
job control (C-c
, C-z
, etc.). See Difference between a "shell" and a
"terminal".
Shells have two modes. A third mode may exist that blends these.
- (required) command pipeline
- (required) shell scripting
- (optional) shebangs (scripts executed as commands)
Shebangs are not available for all shells (such as cmd.exe)
Files paths have two forms:
- absolute
- relative ("expanded" (converted to absolute) using
exec-path
)
File paths may convert to absolute paths (be "expanded"). In
Emacs, this happens using exec-path
. exec-path
is initialized
using PATH and serves a similar role to PATH within Emacs.
Files may be local or remote.
Executing a block can happen by:
- via pty or pipe
- passing it to the shell as an argument (e.g.
bash -c
) - putting the source in a file and running it (e.g.
bash my-file.sh
)
Commands may execute in the current environment or in a child environment.
- source environment
- subshell
Processes have several return values:
- success or failure of the process
- stdout of the process
- stderr of the process
org buffer +-------+ process buffer pty shell process | | +------+ +------+ +------+ | | | | | | | | | | <---> | | <--> | | <--> | | | | | | | | | | | | +------+ +------+ +------+ +-------+ org buffer +-------+ shell process | | +------+ | | pipe | | | | <--------> | | | | | | | | +------+ +-------+
[1/4]
org-babel notes: Questions
[ ]
How might viewing ob-shell as a terminal change our approach to its design?
According to definitions of "shell" and "terminal", we can think of ob-shell as an asynchronous terminal emulator.
Since editing happens with Emacs, it operates in "line mode". All edits happen before being sent over the wire.
Challenges include job control and signal handling.
If we could reliably identify source blocks (or a part of them), we could adjust the commands that work at that moment for that connection.
- identification (which shell are we communicating with?)
That is, the issues of job control and signal handling are issues of shell connection identification. Currently, synchronous blocks have a one-to-one association with the shell. Asynchronous blocks are associated by the ":session" header for input and the UUID for output.
Another challenge is interactivity.
- How to handle "read"?
- Can we update results asynchronously?
[ ]
Why does the pipe3-example returns results in a different order than when run in xfce4-terminal?[ ]
Are there differences between command-line versus script?
It depends on the shell.
Consider cmd.exe:
- The command-line uses "%A" whereas scripts use "%%A"
- https://web.archive.org/web/20231127170136/https://ss64.com/nt/syntax-percent.html
Considering Bash:
- Interactive shells run startup and log-off scripts
- There exist different options that may change the behavior of one the other. The manual usually mentions options that make an interactive shell exit.
- "Aliases are not expanded when the shell is not interactive, unless the expandaliases shell option is set using shopt"
- There are differences in signal handling.
- "When executing interactively, bash displays the primary prompt PS1 when it is ready to read a command, and the secondary prompt PS2 when it needs more input to complete a command."
- "autocd If set, a command name that is the name of a directory is executed as if it were the argument to the cd command. This option is only used by interactive shells."
- "cdspell If set, minor errors in the spelling of a directory component in a cd command will be corrected. The errors checked for are transposed characters, a missing charac‐ ter, and one character too many. If a correction is found, the corrected filename is printed, and the command proceeds. This option is only used by interactive shells."
- https://tldp.org/LDP/abs/html/intandnonint.html
[X]
How is shell.el different from term.el
Both are terminal emulators
shell operates in line mode
term has the ability to also operate in character mode
shell appears to have been created first.
term.el uses shell's "explicit-shell-file-name" variable.
shell-mode and comint-mode were created side-by-side and initialized in the same commit be9b65ac336e5ff60f39b55beb8f63e6fd7c65f7.
comint.el Mon Nov 5 10:06:02 1990 (David Lawrence) shell.el Mon Nov 5 10:06:02 1990 (David Lawrence) term.el Thu Oct 13 06:30:49 1994 (RMS)
ob-shell notes
[8/9]
General obshell: Questions
[X]
How is the word "shell" used within the
ob-shell
API?"shell" is the Babel package name. Babel packages have the form
ob-<language>
. However, "<language>" isn't always one-to-one. For example,ob-C
defines functionality for D and C++.#+begin_src shell echo "hi" #+end_src
It's confusing because it's also possible to call "sh". However, "sh" is a specific shell application, the Bourne shell! Unfortunately, "sh" is often linked to bash which confuses this point.
The "shell" word within the
ob-shell
source code is used as a common form in which specific shell APIs are based. For example,org-babel-shell-initialize
creates functions of the form "org-babel-execute:template" where "template" is a specific shell. The specific execute functions call the genericorg-babel-execute:shell
function.[X]
Should
ob-shell
andob-eshell
be separate packages?In some sense they represent two different execution paths. Personally, I think so. However, this will require effort.
[X]
What is the execution path for inline versus non-inline blocks?
Ultimately,
org-babel-execute-src-block
is the dispatcher for the appropriateorg-babel-execute:template
function.These functions call it:
org-babel-execute-buffer
org-ctrl-c-ctrl-c
org-babel-exp-results
org-babel-lob-execute-maybe
[X]
What is "posh"?
See https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/
[X]
Would it make sense for sessions to have the process buffer exist without a prompt? Would this fix problems we see of prompts being returned in the results?
This is an issue for ob-comint.
Such a change would require updating
org-babel-comint-with-output
.See ob-blub.
[X]
What is meant by "support"?
Not all functionality is consistent nor available for some shells. This is not necessarily because of technical limitations of the shell. The biggest culprit is probably Powershell. Obviously, the software comes with no warranty or guarantee.
However, I think it would be nice to provide as consistent behavior as possible. So, I would say "support" means "we'll try to make things consistent but no guarantee :)"
[X]
Is there a better way to handle the "variable quoting"
- Convert to string
- Quote
- Use quoted version to define a shell variable
This seems reasonable.
Looking at this months after it was written, I don't know what it's referring to. Marking as done.
[X]
How do (or even do) :hlines and :sep or :separator work with ob-shell?
These are strewn throughout the code base. However, none of them are documented, nor is their use clear. They should be fully implemented and documented or removed.
There is no mention of the :separator keyword anywhere in the manual. There is mention of a :sep keyword in the documentation for Texinfo export. The string ":sep" appears twice in the ob-shell.el source:
2 matches for ":sep" in buffer: ob-shell.el 211: (let ((sep (cdr (assq :separator params))) 241: (orgtbl-to-generic var (list :sep (or sep "\t") :fmt echo-var
These appear to perform the same role: specify how to separate items in a table. However, the syntax is different and (as we shall see) the functionality not quite implemented.
:hlines twice in in the manual. First, with respect to columnview blocks and again with Python blocks. columnview blocks are not source blocks, so while the concept is probably similar, it's not clear how it translates (if at all) to source blocks. Regarding, Python, the manual says,
In-between each table row or below the table headings, sometimes results have horizontal lines, which are also known as "hlines". The 'hlines' argument with the default 'no' value strips such lines from the input table. For most code, this is desirable, or else those 'hline' symbols raise unbound variable errors. A 'yes' accepts such lines, as demonstrated in the following example.
Adapting the (only) example from the info manual, I can sort of get an answer:
#+NAME: many-cols | a | b | c | |---+---+---| | d | e | f | |---+---+---| | g | h | i | #+BEGIN_SRC sh :var tab=many-cols :hlines yes echo $tab #+END_SRC #+RESULTS: : a b c hline d e f hline g h i #+BEGIN_SRC sh :var tab=many-cols :hlines yes echo "$tab" #+END_SRC #+RESULTS: | a | b | c | | hline | | | | d | e | f | | hline | | | | g | h | i |
Looking at the source code and playing around with it more, here is an example using :separator and :hlines. When :hlines is "yes", it requires the use of :hline-string to specify what the hline looks like, otherwise it defaults to "hline" (as we saw above).
#+NAME: many-cols | a | b | c | |---+---+---| | d | e | f | |---+---+---| | g | h | i | #+BEGIN_SRC sh :var tab=many-cols :separator -- :hline-string ++ :hlines yes echo "$tab" #+END_SRC #+RESULTS: | a--b--c | | ++ | | d--e--f | | ++ | | g--h--i |
If the :separator is a pipe, then the result is the usual table:
#+NAME: many-cols | a | b | c | |---+---+---| | d | e | f | |---+---+---| | g | h | i | #+BEGIN_SRC sh :var tab=many-cols :separator | :hline-string ++ :hlines yes echo "$tab" #+END_SRC #+RESULTS: | a | b | c | | ++ | | | | d | e | f | | ++ | | | | g | h | i |
So, we see that the :hlines and :separator keywords produce some result. However, result leaves a lot to desire and it appears like the functionality is not fully implemented. The results are not meaningful and :hlines requires an undocumented header argument :hline-string.
[ ]
Do people use :separator outside of Texinfo exports?
[2/8]
General ob-shell: Refactor
[ ]
Make all symbols follow the convention of
org-babel-X
where X is the specific language. "shell" is the generic name whereas something like "bash" or "sh" refers to a specific shell implementation.There are three names used:
- org-babel-shell
- org-babel-sh
- ob-shell
This module was originally called
ob-sh
and later changed toob-shell
. When this happened, the meaning of thesh
changed. Before, it meant a generic shell. "shell" now means a generic "shell" changing the meaning of "sh". "sh" refers now to a specific shell (/bin/sh
) since that's the call that's actually made when executing. Are "sh" functions specific to the "sh" shell or are they generic?It looks like when I submitted my async changes, I kept the
ob-shell
prefix. My preference is forob-shell
since to matches the filename. However, it shouldn't be there if it's not consistent with the rest of the file (or babel system). The problem with introducingob-shell
as a prefix is that none of the other babel files use that convention.It looks like
ob-C
usesorg-babel-X
where X is the language. This is the only other Babel module to support multiple languages in a single library.See https://list.orgmode.org/87ttndl8wk.fsf@localhost/T/#t
[ ]
non-private functions and variables need alias or made obsolete[ ]
can change variable names in main yet not released[ ]
API functions- org-babel-default-header-args:<lang>
- org-babel-execute:<lang>
- org-babel-expand-body:<lang>
- org-babel-variable-assignments:<lang>
- org-babel-header-args:<lang>
- org-babel-load-session:<lang>
- org-babel-<lang>-initiate-session
- org-babel-prep-session:<lang>
- org-babel-edit-prep:<lang>
- org-babel-<lang>-associate-session
Org needs to work for Emacs 29, 28, and 27.
Third-party code that references public symbols in the current stable release needs to continue working for the next release. The subsequent release may break. Child must be stable. Grand-child can break.
"Some Org components also depend on third-party packages available through package archives. Org is only guaranteed to be compatible with the latest stable versions of these third-party packages." https://orgmode.org/worg/org-maintenance.html#emacs-compatibility
How to handle strict renaming?
Problem: If the new release doesn't contain old names, then third-party code that references the old name will break when the Org version is updated.
Solution: Make a 1:1 mapping from new name to old name
Emacs Org 29.1 9.6.6 28.2 9.5.5 27.2 9.4.4 How to handle change in API, functionality, or use?
Problem: If the name is the same yet the arguments
Seems like aliases are placed in
org-compat.el
.define-obsolete-function-alias
define-obsolete-variable-alias
make-obsolete
Consider
org-babel-sh-eoe-indicator
. The new name would beorg-babel-shell-eoe-indicator
.;;; need to make alias/obsolete (defvar org-babel-sh-eoe-indicator "echo 'org_babel_sh_eoe'" (defvar org-babel-sh-eoe-output "org_babel_sh_eoe" (defvar org-babel-sh-prompt "org_babel_sh_prompt> " (defun org-babel-sh-var-to-sh (var &optional sep hline) (defun org-babel-sh-var-to-string (var &optional sep hline) (defun org-babel-sh-initiate-session (&optional session _params) (defun org-babel-sh-evaluate (session body &optional params stdin cmdline) (defun org-babel-sh-strip-weird-long-prompt (string) ;;; Change ;; private (defun org-babel--variable-assignments:sh-generic (defun org-babel--variable-assignments:fish (defun org-babel--variable-assignments:bash_array (defun org-babel--variable-assignments:bash_assoc (defun org-babel--variable-assignments:bash (varname values &optional sep hline) ;; not released in 9.6.17 ;; make private? (defconst ob-shell-async-indicator "echo 'ob_comint_async_shell_%s_%s'" (defun ob-shell-async-chunk-callback (string) ;;; do not touch! ;; Babel API (defvar org-babel-default-header-args:shell '()) (defun org-babel-variable-assignments:shell (params) (defun org-babel-execute:shell (body params) (defun org-babel-prep-session:shell (session params) (defun org-babel-load-session:shell (session body params) ;; Already fits convention (defconst org-babel-shell-set-prompt-commands (defcustom org-babel-shell-names (defcustom org-babel-shell-results-defaults-to-output t (defvar org-babel-shell-names) (defun org-babel-shell-initialize ()
(define-obsolete-variable-alias 'org-babel-sh-eoe-indicator 'org-babel-shell--end-of-output-indicator "9.7") (define-obsolete-variable-alias 'org-babel-sh-eoe-output 'org-babel-shell--end-of-output "9.7") (define-obsolete-variable-alias 'org-babel-sh-prompt 'org-babel-shell--prompt "9.7")
[X]
Add file local variable for nameless
-- nameless-current-name: "org-babel-shell"; --
This should be handled by a .dirlocals. It's not proper to have third party code referenced in the mainline repo. The .dirlocals should also be local and not part of the repo.
[ ]
Fixed mixed tabs/spaces and remove trailing whitespace
Both issues make preparing and reviewing commits needlessly complex. Executive decision: use spaces and remove trailing whitespace. We're not doing embedded work, so there's no point to space saving with tabs. It should be one or the other. Since the rest of the code is indented by spaces and the gnu standard is two spaces, I think we should enforce only spaces.
[ ]
Rearrange definitions. The order of definitions doesn't assist the process of learning what this library does. Aside from the entangled
org-babel-shell-initialize
/org-babel-shell-names
, order doesn't matter for execution. So, we should optimize the order for understanding.Helper terms defined after usage Organization of the
ob-shell
hinders understanding. "Helper functions" are defined at the end of the file. This makes reading the code difficult; each function is defined in terms only introduced later on.Solution: Reorder function declarations
- org-babel-sh-eoe-output
- org-babel-sh-eoe-indicator
- org-babel-shell-names
- org-babel-default-header-args:shell
- org-babel-shell-results-defaults-to-output
- org-babel-sh-initiate-session
- org-babel-sh-var-to-string
- org-babel-sh-var-to-sh
- org-babel–variable-assignments:sh-generic
- org-babel–variable-assignments:basharray
- org-babel–variable-assignments:bashassoc
- org-babel–variable-assignments:bash
- org-babel–variable-assignments:fish
- org-babel-variable-assignments:shell
- org-babel-prep-session:shell
- org-babel-load-session:shell
- org-babel-sh-strip-weird-long-prompt
- org-babel-sh-evaluate
- org-babel-execute:shell
- org-babel-shell-initialize
[ ]
Remove needless helper functions
The way the code is split up doesn't seem to really warrant being split up. The subdivisions are too small. Most of the helper functions are only used once. If that's the case, is it really helpful to break them out separately? My opinion is no. That might be the case if the function names and docstrings helped understanding the use and context. In my opinion, they don't. If they were instead not broken out and appeared in the context they are used, the context would be clear. The role of the function names could be satisfied by comments. This may be worth refactoring, considering it took me several hours to make sense of things.
[ ]
Implement a logical separation of concerns
Aside from the useless helper functions, it's not clear why functions like
org-babel-execute:shell
andorg-babel-sh-evaluate
do exactly what they do. There's initiating the process, getting the inputs, dispatching, and then providing output. All of this appears to split, ad hoc, between (at least) these two functions. Review and consider a different break down of functionality. See the individual functions for details on how this might be done.[X]
Fix confusion about what "posh" is[ ]
Remove :sep, :separator, and :hlines code
First, they are not documented. I doubt they are, or ever have been, used.
Second, they do not appear to be implemented correctly. Specifying :hlines along with :hline-string produces a result that doesn't appear useful.
One option would be to implement these. Doing this would require 1) understanding the tooling that currently exists and 2) integrating it into the shell context.
Another option is to remove it until someone requests it.
Given all the problems that exist currently with shells and the refactoring that needs to happen to make a consistent framework, I would prefer to simply remove this code.
ob-shell code analysis
- DONE Top matter
- DONE Requires
- DONE Function forward declares
- DONE defvar org-babel-sh-eoe-output
- DONE defvar org-babel-sh-eoe-indicator
- DONE defvar org-babel-sh-prompt
- DONE defvar ob-shell-async-indicator
- DONE defcustom org-babel-shell-names
- DONE Variable declarations
- DONE defcustom org-babel-shell-results-defaults-to-output
- DONE defun org-babel-sh-initiate-session
- DONE defun org-babel-sh-var-to-string
- DONE defun org-babel-sh-var-to-sh
- DONE defun org-babel–variable-assignments:sh-generic
- DONE defun org-babel–variable-assignments:basharray
- DONE defun org-babel–variable-assignments:bashassoc
- DONE defun org-babel–variable-assignments:bash
- DONE defun org-babel–variable-assignments:fish
- DONE defun org-babel-variable-assignments:shell
- DONE defun org-babel-prep-session:shell
- DONE defun org-babel-load-session:shell
- DONE defun org-babel-sh-strip-weird-long-prompt
- DONE defun ob-shell-async-chunk-callback
- DONE
[10/10]
defun org-babel-sh-evaluate - DONE defun org-babel-execute:shell
- DONE defun org-babel-shell-initialize
DONE Top matter
No commentary, just a standard Emacs header. Included for validation.
;;; ob-shell.el --- Babel Functions for Shell Evaluation -*- lexical-binding: t; -*- ;; Copyright (C) 2009-2023 Free Software Foundation, Inc. ;; Author: Eric Schulte ;; Maintainer: Matthew Trzcinski <matt@excalamus.com> ;; Keywords: literate programming, reproducible research ;; URL: https://orgmode.org ;; This file is part of GNU Emacs. ;; GNU Emacs is free software: you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation, either version 3 of the License, or ;; (at your option) any later version. ;; GNU Emacs is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details. ;; You should have received a copy of the GNU General Public License ;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. ;;; Commentary: ;; Org-Babel support for evaluating shell source code. ;;; Code:
DONE Requires
(require 'org-macs) (org-assert-version) (require 'ob) (require 'org-macs) (require 'shell) (require 'cl-lib)
[2/2]
Requires: Questions
[X]
What are each of the requires for and are they necessary?
It appears like
ob
andshell
are the only requires needed.Notice that
ob-macs
is given twice. The first require fororg-macs
is likely fororg-assert-version
. The second is just extra as far as I can tell.ob
loads features from other Org Babel files. In fact, it loadsorg-macs
and callsorg-assert-version
. This makesorg-macs
even more redundant.shell
is needed for theshell
command used inorg-babel-sh-initiate-session
.cl-lib
appears unused. It was introduced toob-shell.el
in 2016 with commit 0f7df327, replacing the deprecatedcl
package.cl
was fully removed from Emacs in version 27.grep -r "'cl' package is now officially deprecated" "$emacs_etc_dir" | cut -f 2 -d ":"
Regarding backwards compatiblity, it appears that
cl
was never used (except for, perhaps, a brief period (that I could not find)).If
ob-shell.el
relied oncl
, we would expect it was updated appropriately that whencl
was deprecated. Since Emacs no longer contains the code forcl
, ifob-shell.el
required it, then it should fail. Since it doesn't, it seems safe to assume that allcl
calls were updated to usecl-lib
. Thecl-lib
uses a "cl-" prefix. However, we don't see that anywhere inob-shell.el
. We only see the require itself.grep "cl-" /tmp/org-mode/lisp/ob-shell.el
It appears that
cl-lib
is not needed.Fun fact: the
cl-lib
has only ever been required (never used, according to the prefix).cd /tmp/org-mode/ git grep "cl-" $(git rev-list --all -- lisp/ob-shell.el) -- lisp/ob-shell.el
[X]
Should comments tell what the requires are for?
I feel like yes for the
org-assert-version
inob
, but no forshell
. Maybe just no sinceob
is used for things other thanorg-assert-version
.
[0/2]
Requires: Refactor
[ ]
Remove all requires exceptob
andshell
.[ ]
Add comment thatob
should be listed first in order to check version
DONE Function forward declares
(declare-function org-babel-comint-in-buffer "ob-comint" (buffer &rest body) t) (declare-function org-babel-comint-wait-for-output "ob-comint" (buffer)) (declare-function org-babel-comint-buffer-livep "ob-comint" (buffer)) (declare-function org-babel-comint-with-output "ob-comint" (meta &rest body) t) (declare-function orgtbl-to-generic "org-table" (table params))
These declarations exist to satisfy the byte compiler (or to help it find the definitions of functions defined in other files).
[0/0]
Function forward declares: Questions
[0/0]
Function forward declares: Refactor
DONE defvar org-babel-sh-eoe-output
(defvar org-babel-sh-eoe-output "org_babel_sh_eoe" "String to indicate that evaluation has completed.")
Token used by org-babel-comint-with-output
(within
org-babel-sh-evaluate
) to indicate that execution has completed.
Used only by sessions. See How blocks execute to obtain results.
[1/3]
defvar org-babel-sh-eoe-output: Questions
[ ]
Should this be renamed to indicate that it's only used by session? Maybe update the docstring at least?[X]
What is "eoe"?
Probably "end of execution" or "end of evaluation".
[ ]
Why is it called "output"?
Maybe because this string is what's sent to the shell process (output)? However, that's not entirely true, as
org-babel-sh-eoe-indicator
is also sent. Maybe because it's whatorg-babel-comint-with-output
searches for in the output?
[0/1]
defvar org-babel-sh-eoe-output: Refactor
[ ]
Rename this
It needs to have "sh" removed. EOE should be spelled out or a different term used. "output" should be made more clear.
DONE defvar org-babel-sh-eoe-indicator
(defvar org-babel-sh-eoe-indicator "echo 'org_babel_sh_eoe'" "String to indicate that evaluation has completed.")
Shell command used by org-babel-comint-with-output
within
org-babel-sh-evaluate
. It echoes org-babel-sh-eoe-output
within
the comint, that being the token used to indicate that execution has
completed. See How blocks execute to obtain results. It is used only
with sessions.
#+begin_src sh :session *example* echo "hello, world" #+end_src sh-5.1$ PROMPT_COMMAND=;PS1="org_babel_sh_prompt> ";PS2= org_babel_sh_prompt> echo "hello, world" <------ This is the org-babel-sh-eoe-indicator echo 'org_babel_sh_eoe' hello, world org_babel_sh_prompt> org_babel_sh_eoe org_babel_sh_prompt>
[2/3]
defvar org-babel-sh-eoe-indicator: Questions
[ ]
What would it look like to explicitly define
org-babel-sh-eoe-indicator
in terms oforg-babel-sh-eoe-output
?If it's just (format "echo '%s'" org-babel-sh-eoe-output), then that avoids the (minor) problem of a typo creating a problem.
[X]
Is this string the same for all the supported shells?
No. Posh (powershell) uses something like "Write-Host". See https://stackoverflow.com/a/707666
[X]
Does it matter that posh uses a different command to print to the prompt?
Yes, it does if we want to support powershell. Personally, I'd rather support cmd.exe than powershell.
[0/2]
defvar org-babel-sh-eoe-indicator: Refactor
[ ]
Rename this
Remove "sh" and "eoe"
[ ]
Fix docstring
This is not the "string to indicate that evaluation has completed". It is the command used to call the string used to indicate that evaluation has completed.
DONE defvar org-babel-sh-prompt
(defvar org-babel-sh-prompt "org_babel_sh_prompt> " "String to set prompt in session shell.")
Prompt for shell sessions. This is set during
org-babel-sh-initiate-session
using the
org-babel-shell-set-prompt-commands
corresponding to the block
language.
In practice, the prompt is given by the comint-prompt-regexp
which
looks like:
(concat "^" (regexp-quote org-babel-sh-prompt) " *") (concat "^" (regexp-quote "org_babel_sh_prompt> ") " *") "^org_babel_sh_prompt> *"
comint-prompt-regexp
is required by org-babel-comint-with-output
.
[4/4]
defvar org-babel-sh-prompt: Questions
[X]
Why do we set the prompt?
Because of how
org-babel-comint-with-output
usescomint-prompt-regexp
.[X]
Many issues relate back to prompt filtering. Would it benefit us to make the prompt empty?
Possibly. However, the implementation of
org-babel-comint-with-output
limits us. It searches on thecomint-prompt-regexp
. If that's empty, then it will probably search more than we want to (not per line or per call).[X]
Does
org-babel-comint-with-output
use a filter? If not, why not?org-babel-comint-with-output
is a deep rabbit hole. It was created in 2009 by Eric Schulte. It appears that originally plain filters were used. Theorg-babel-comint-with-output
macro was introduced to manage the filters. The macro was simple then. It is no longer simple (although Ihor has recently simplified it (thank you!)). This will require more research.[X]
How does
org-babel-comint-with-output
work?Why is it a macro? Does it need to be a macro? This is not a (direct) ob-shell issue.
[0/1]
defvar org-babel-sh-prompt: Refactor
[ ]
Indicate that this is required by
org-babel-comint-with-output
Or better yet, change how
org-babel-comint-with-output
functions so that this is not necessary.
DONE defvar ob-shell-async-indicator
(defconst ob-shell-async-indicator "echo 'ob_comint_async_shell_%s_%s'" "Session output delimiter template. See `org-babel-comint-async-indicator'.")
Similar to org-babel-sh-eoe-indicator
.
Used by org-babel-comint-async-delete-dangling-and-eval
within
org-babel-sh-evaluate
.
Template for a shell command. Used to construct strings that delimit results within the comint process. Used only with async sessions. See How blocks execute to obtain results.
org-babel-sh-evaluate
formats two indicators, a "start" and an
"end", along with a Universally Unique IDentifier (UUID). The UUID is
a placeholder used until results are available. When the process
returns, the UUID is replaced with the results.
[0/0]
ob-shell-async-indicator: Questions
[0/1]
ob-shell-async-indicator: Refactor
[ ]
Change the "namespace" from "ob-shell-"
DONE defcustom org-babel-shell-names
(defcustom org-babel-shell-names '("sh" "bash" "zsh" "fish" "csh" "ash" "dash" "ksh" "mksh" "posh") "List of names of shell supported by babel shell code blocks. Call `org-babel-shell-initialize' when modifying this variable outside the Customize interface." :group 'org-babel :type '(repeat (string :tag "Shell name: ")) :set (lambda (symbol value) (set-default-toplevel-value symbol value) (org-babel-shell-initialize)))
Used to define shell languages supported by ob-shell
. Each item in
the list corresponds to the shell binary.
[1/1]
defcustom org-babel-shell-names: Questions
[X]
What would it take to add Windows cmd to this list?
Probably being maintainer or contributor to cmdproxy.c :)
[1/1]
defcustom org-babel-shell-names: Refactor
[X]
See Refactor fororg-babel-shell-initialize
.
As a long term personal goal, I'd like to support Windows cmd. This would be helpful for me at work. It would also be a good pathway for someone to free software.
DONE Variable declarations
The following three forms automate variable definitions for the various supported shells.
(defvar org-babel-default-header-args:shell '())
This comes from ob-template
. The only documentation for it says,
"optionally declare default header arguments for this language". The
variable is not used anywhere in the ob-shell code base. Maybe it's
intended for end-users rather than developers? Searching for
"org-babel-default-header-args" within the Org source, I see that
other languages use it in their code base. This seems like something
that should stay and something that should be documented.
Later, org-babel-shell-initialize
will define a similar variable for
each of the other supported shells.
(defvar org-babel-shell-names)
This is a variable to hold the list of shell names supported by
ob-shell
. org-babel-shell-names
is actually defined later on.
This definition acts as a forward declaration so that
org-babel-initialize
works.
(defconst org-babel-shell-set-prompt-commands '(;; Fish has no PS2 equivalent. ("fish" . "function fish_prompt\n\techo \"%s\"\nend") ;; prompt2 is like PS2 in POSIX shells. ("csh" . "set prompt=\"%s\"\nset prompt2=\"\"") ;; PowerShell, similar to fish, does not have PS2 equivalent. ("posh" . "function prompt { \"%s\" }") ;; PROMPT_COMMAND can override PS1 settings. Disable it. ;; Disable PS2 to avoid garbage in multi-line inputs. (t . "PROMPT_COMMAND=;PS1=\"%s\";PS2=")) "Alist assigning shells with their prompt setting command. Each element of the alist associates a shell type from `org-babel-shell-names' with a template used to create a command to change the default prompt. The template is an argument to `format' that will be called with a single additional argument: prompt string. The fallback association template is defined in (t . \"template\") alist element.")
Introduced as a result of the issue babel output seems to drop anything before % (in session).
This variable is only used with blocks containing the :session header.
It is used in org-babel-sh-initiate-session
with
org-babel-sh-prompt
to reset the prompt. Each command in
org-babel-shell-set-prompt-commands
removes the prompt and replaces
it with whatever is in org-babel-sh-prompt
, for the first two levels
of prompts (if the shell supports multiple shell level prompts).
Running the commands alone looks like this:
ahab@pequod ~ $ guix shell fish ahab@pequod ~ [env]$ fish ahab@pequod ~> function fish_prompt\n\techo \"%s\"\nend ahab@pequod ~> function fish_prompt echo "%s" end %secho "hi" hi %s
ahab@pequod ~ $ guix shell tsch ahab@pequod ~ [env]$ tcsh > set prompt="%s" set prompt2="" exit
It replaces the prompt with "%s". The "%s" is a format expression.
When org-babel-shell-set-prompt-commands
is run by
org-babel-sh-initiate-session
, the command for corresponding shell
gets fed into the lisp format
command. For example, the default
(which corresponds to bash) is
"PROMPT_COMMAND=;PS1=\"%s\";PS2="
this is used by format:
(format "PROMPT_COMMAND=;PS1=\"%s\";PS2=" org-babel-sh-prompt)
this results in the following being called in a shell:
"PROMPT_COMMAND=;PS1=\"org_babel_sh_prompt> \";PS2="
[2/2]
Variable declarations: Questions
[X]
What is the required form for
org-babel-default-header-args:shell
?See
org-babel-default-header-args
:Default arguments to use when evaluating a source block.
This is a list in which each element is an alist. Each key corresponds to a header argument, and each value to that header's value. The value can either be a string or a closure that evaluates to a string.
[X]
Why is
org-babel-shell-names
defined with defvar only to be later defined with defcustom?It looks to me like this is the reason:
org-babel-shell-initialize
(explained below) requiresorg-babel-shell-names
. However,org-babel-shell-initialize
isn't called untilorg-babel-shell-names
is actually defined using defcustom. When that happens, defcustom callsorg-babel-shell-initialize
(according to its :set feature).
[0/4]
Variable declarations: Refactor
[ ]
Documentorg-babel-default-header-args:shell
and the otherorg-babel-default-header-args:template
variables. Maybe say that it is different for inline source blocks, too (seeorg-babel-default-header-args
).[ ]
Find a better implementation than the
org-babel-shell-names
defined twice, along withorg-babel-shell-initialize
.This looks like a problem to me. First, it's not documented. It should be documented. My guess as why it's not documented is because, even though defvar supports docstrings, the later call to defcustom clobbers it. At the very least, there should be a lisp comment explaining what's going on.
Since this is all tied up with
org-babel-shell-initialize
, see the comments there.[ ]
Better documentorg-babel-shell-set-prompt-commands
. The way it's written doesn't explain why it exists or what it's for. It explains a little about how it's used (the implementation (which may change)). The comments within it are helpful, but only if you understand what the list is for. Otherwise, it seems really to deal with setting PS2.[ ]
Makeorg-babel-shell-set-prompt-commands
private. This is not something end-users should modify.
DONE defcustom org-babel-shell-results-defaults-to-output
(defcustom org-babel-shell-results-defaults-to-output t "Let shell execution defaults to \":results output\". When set to t, use \":results output\" when no :results setting is set. This is especially useful for inline source blocks. When set to nil, stick to the convention of using :results value as the default setting when no :results is set, the \"value\" of a shell execution being its exit code." :group 'org-babel :type 'boolean :package-version '(Org . "9.4"))
Used to define the results header default.
I'm confused by this because it feels like there are several places in the codebase which define "defaults" rather than one. This concerns me because it's not clear how they're incorporated into the broader system. The concern is basically having a state machine spread across the system and running into issues where one clobbers the other.
[0/2]
org-babel-shell-results-defaults-to-output: Questions
[ ]
How does this differ from
org-babel-default-header-args:shell
?The
org-babel-default-header-args:shell
isn't used directly anywhere inob-shell
.org-babel-default-header-args:<language>
is used withinorg-babel-get-src-block-info
.[ ]
Is this necessary or is it duplicatingorg-babel-default-header-args:shell
?
[0/1]
org-babel-shell-results-defaults-to-output: Refactor
[ ]
Fix grammar of docstring.
"Make shell execution default to \":results output\""
DONE defun org-babel-sh-initiate-session
(defun org-babel-sh-initiate-session (&optional session _params) "Initiate a session named SESSION according to PARAMS." (when (and session (not (string= session "none"))) (save-window-excursion (or (org-babel-comint-buffer-livep session) (progn (shell session) ;; Set unique prompt for easier analysis of the output. (org-babel-comint-wait-for-output (current-buffer)) (org-babel-comint-input-command (current-buffer) (format (or (cdr (assoc (file-name-nondirectory shell-file-name) org-babel-shell-set-prompt-commands)) (alist-get t org-babel-shell-set-prompt-commands)) org-babel-sh-prompt)) (setq-local comint-prompt-regexp (concat "^" (regexp-quote org-babel-sh-prompt) " *")) ;; Needed for Emacs 23 since the marker is initially ;; undefined and the filter functions try to use it without ;; checking. (set-marker comint-last-output-start (point)) (get-buffer (current-buffer)))))))
The main purpose of this function is to start a Process Buffer and set it up so that we can scrape output from it.
When a initiating a new session, create a new process buffer with
shell
. Loop until the prompt appears. When the prompt appears, the
shell is ready to receive input. Then, change the shell prompt and
update the comint-prompt-regexp
. Return the buffer associated with
process.
Note that for Emacs 23 compatibility, we must manually set
comint-last-output-start
.
[1/2]
defun org-babel-sh-initiate-session: Questions
[X]
Why doesn't the calling function decide to execute this function?
When a block is executed,
org-babel-execute-src-block
is called. The language is parsed from the header line and the appropriate dispatch function called (for exampleorg-babel-execute:bash
). All dispatch functions callorg-babel-execute:shell
. The first thing it does is check for whether to use a session. However, the logic for this is strange: the check happens in the initiate functionorg-babel-sh-initiate-session
, not the calling function,org-babel-execute:shell
. Should it happen in the caller?One way to consider which is the better way to call the function is to consider the information used to make the decision. The caller will have all the necessary information. For the callee to make the decision, it will need that information. Therefore, having the callee decide whether or not to run requires passing all the necessary information to it so that it can make that decision. This seems unnecessary: there is no obvious benefit to passing that information around. Passing information requires memory, it requires manually writing it in as an argument. Which information is necessary? What if more information is needed yet isn't there? When information is passed to the callee to make the decision to run or not, it may lead to different data structures or in passing around "the world". I think this is what we see with the _params parameter (which is a "world" variable that isn't used.)
Fundamentally, having the function decide whether or not to run makes no sense. If you're calling it, you've decided it should run. Why have the function second-guess you? Doing this splits a single logical choice into two lexical locations.
[ ]
Why is _params a parameter?
It's not used. Was it used in a previous version?
[0/2]
defun org-babel-sh-initiate-session: Refactor
[ ]
Move the decision to run the function to the caller[ ]
Remove the _params parameter
DONE defun org-babel-sh-var-to-string
(defun org-babel-sh-var-to-string (var &optional sep hline) "Convert an elisp value to a string." (let ((echo-var (lambda (v) (if (stringp v) v (format "%S" v))))) (cond ((and (listp var) (or (listp (car var)) (eq (car var) 'hline))) (orgtbl-to-generic var (list :sep (or sep "\t") :fmt echo-var :hline hline))) ((listp var) (mapconcat echo-var var "\n")) (t (funcall echo-var var)))))
Convert a "value" to a string. First define a helper function
echo-var
to keep strings as strings and convert anything else using
the "%S" option of format
. You can read the documentation for
those, but it basically boils down to return primitive types like
numbers as-is and evaluate symbols first.
org-babel-sh-var-to-string
behaves like follows:
#+name: my-table | col1 | col2 | |-------+--------| | test1 | test 2 | #+begin_src sh :stdin my-table :results output :eval never-export cat #+end_src #+RESULTS: : col1 col2 : test1 test 2
For a table, var
comes in as:
(("col1" "col2") hline ("test1" "test 2"))
This is the result of org-babel-ref-resolve
. Unfortunately,
org-babel-ref-resolve
doesn't document what that value will look
like. This is what a table apparently looks like. A raw Org table is
converted to a list of lists such that rows are nested lists or the
symbol 'hline.
The first condition of the cond checks for a table by way of "is it a
list and is the first element another list or the symbol 'hline?"
When true, the table contents are passed to orgtbl-to-generic
which
converts it to a string of some sort according to some given
parameters. In our case, it's hard coded as tab delimited strings and
no hlines.
The second condition checks for a list:
#+NAME: my-list - simple - list - without - nesting #+begin_src sh :results output :stdin my-list :eval never-export cat #+end_src #+RESULTS: : simple : list : without : nesting
In the case of a list, var
comes through as a list of items:
("simple" "list" "without" "nesting")
org-babel-sh-var-to-string
concatenates each element of the list
together with a newline.
Finally, the third condition is a catch-all that simply does the conversion:
#+name: text The third case #+begin_src sh :results output :stdin text :eval never-export cat #+end_src #+RESULTS: : The third case
Used in org-babel-execute:shell
for stdin (to allow named tables as
inputs). Also used in org-babel-sh-var-to-sh
.
[0/1]
defun org-babel-sh-var-to-string: Questions
[ ]
Should the default separator be a tab or four spaces?
[0/6]
defun org-babel-sh-var-to-string: Refactor
[ ]
Remove extra space between 'var' and the list of parameters in the call toorgtbl-to-generic
.[ ]
:hline parameter not used and defaults to nil. Kill from parameter list.[ ]
Change "var" to "val" (or something better)
Function converts a "value" to a string, per the docstring. This is true. It converts some lisp expression (which is a value). Why then is it called 'var'? It appears to have nothing to do with a 'var'!
[ ]
Change 'echo-var' to something better
It doesn't echo. And what is var? Maybe call it 'convert-to-string'.
[ ]
Document the conditional cases
Say something about the expected form. Say that one is a table, the other a list, the other a value. Say that it is expected to be used with a name block.
[ ]
Document arguments
DONE defun org-babel-sh-var-to-sh
(defun org-babel-sh-var-to-sh (var &optional sep hline) "Convert an elisp value to a shell variable. Convert an elisp var into a string of shell commands specifying a var of the same value." (concat "'" (replace-regexp-in-string "'" "'\"'\"'" (org-babel-sh-var-to-string var sep hline)) "'"))
This function quotes its argument. The docstring inaccurately says, "Convert an elisp value to a shell variable." This is not true. Yes, it's used in a process of converting a lisp value to a shell variable. However, that's not what this function does.
This function does two things: 1) each ' quote in the argument is replaced by '"'"' and 2) the whole result is surrounded in single quotes. Step 1 is done to prevent single-quotes contained in the argument from breaking the outer quotes added by step 2. Step 2 is done to ensure the quoted argument is interpreted as a literal.
This is easiest to see with a non-trivial example:
(org-babel-sh-var-to-sh "it'd") -> 'it'\"'\"'d'
See how the single quote is replaced by '"'"'? In this case, the double-quotes are escaped with backslashes. It's maybe easier to see if we separate the quoted single-quote with spaces:
'it '\"'\"' d'
All of this is made even more confusing because when these values are viewed in Emacs, double-quotes are added around it (again to represent literal).
Most likely this quoting is shell specific. I assume it only applies to Bourne-like shells. In truth, I'm not sure.
This function is a helper used in the following:
org-babel--variable-assignments:sh-generic
org-babel--variable-assignments:fish
org-babel--variable-assignments:bash_array
org-babel--variable-assignments:bash_assoc
this | is | a | test |
---|---|---|---|
of | a | ref | resolve |
(org-babel-sh-var-to-sh (org-babel-ref-resolve "pointless-sep-hline") "--" "++")
[2/3]
defun org-babel-sh-var-to-sh: Questions
[ ]
Is this quoting shell specific?[X]
Does Emacs provide a function for this already?
It seems not.
[X]
How do :hlines and :separator work? Do they even work?
Yes, but not well. See General obshell: Questions.
[0/2]
defun org-babel-sh-var-to-sh: Refactor
[ ]
Rename
Quoting is fiddly and this exact process appears necessary for several different shells. It makes sense to define it once to avoid needing to make multiple updates or need to debug multiple functions.
[ ]
Document arguments
DONE defun org-babel–variable-assignments:sh-generic
(defun org-babel--variable-assignments:sh-generic (varname values &optional sep hline) "Return a list of statements declaring the values as a generic variable." (format "%s=%s" varname (org-babel-sh-var-to-sh values sep hline)))
Create a string like "%s=%s" for use in creating variables. See Shell Parameters.
[0/2]
defun org-babel–variable-assignments:sh-generic: Questions
[ ]
What is meant by "generic variable"?
According to the docstring, "generic" doesn't appear to mean "across" shells. Instead, it means something else. The word "generic" only shows up twice in the Bash manual and only then in the GNU Free Documentation License.
If this is intended for use across shells, does that work? Looking at you, Powershell (if that's even supported).
[ ]
Should variables be first 'unset'?
'Unset' is called first for
org-babel--variable-assignments:bash_assoc
. Why is it not called here?
[0/2]
defun org-babel–variable-assignments:sh-generic: Refactor
[ ]
Remove sep and hline arguments
See arguments elsewhere.
[ ]
Document arguments
DONE defun org-babel–variable-assignments:basharray
(defun org-babel--variable-assignments:bash_array (varname values &optional sep hline) "Return a list of statements declaring the values as a bash array." (format "unset %s\ndeclare -a %s=( %s )" varname varname (mapconcat (lambda (value) (org-babel-sh-var-to-sh value sep hline)) values " ")))
Create a string for defining Bash indexed arrays.
Indexed arrays are lists of values accessed by their position in the list. See Arrays.
unset
is a Bash builtin to destroy arrays. Use declare -a myarray
to create a new array. There are two ways to define elements of an
array. One of them is myarray[index]=value
. Another is
NAME=(VALUE1 VALUE2 ... )
where each VALUE
may be of the form
'[SUBSCRIPT]='STRING
.
Given a 'varname' for the array name and a list of pairs, construct an the string:
unset VARNAME declare -a VARNAME=( VARNAME[key1]=value1 VARNAME[key2]=value2 ... VARNAME[keyN]=valueN )
This function is called when using the :var
header argument.
echo ${myfriends[2]} is a friend of mine
[0/1]
defun org-babel–variable-assignments:basharray: Questions
[ ]
Does this work with named references, like a table?
[0/1]
defun org-babel–variable-assignments:basharray: Refactor
[ ]
Document arguments
DONE defun org-babel–variable-assignments:bashassoc
(defun org-babel--variable-assignments:bash_assoc (varname values &optional sep hline) "Return a list of statements declaring the values as bash associative array." (format "unset %s\ndeclare -A %s\n%s" varname varname (mapconcat (lambda (items) (format "%s[%s]=%s" varname (org-babel-sh-var-to-sh (car items) sep hline) (org-babel-sh-var-to-sh (cdr items) sep hline))) values "\n")))
Create a string for constructing an associative array in Bash syntax.
Bash supports "associative" arrays. Associative arrays are key-value stores (sometimes called "dictionaries"). See Arrays.
unset
is a Bash builtin to destroy arrays. Use declare -A myarray
to create a new array. There are two ways to define elements of an
array. One of them is myarray[key]=value
.
Given a 'varname' for the array name and a list of pairs, construct an the string:
unset VARNAME declare -A VARNAME VARNAME[key1]=value1 VARNAME[key2]=value2 ... VARNAME[keyN]=valueN
This function is called when using the :var
header argument.
echo I have ${myarray[apple]} apples
[0/3]
defun org-babel–variable-assignments:bashassoc: Questions
[ ]
What are downsides to off-loading this kind of functionality to the user?
The problem: for every syntax, for every shell, we need to create a command that maps to various shell constructs. For example, defining associative arrays uses different syntax in different shells:
mksh$ typeset -a may[Germany]=Berlin mksh$ echo ${may[Germany]} Berlin
I don't think there's a way around needing to construct a string that's sent to the subprocess. So, for each construct, we'll need helper functions like
org-babel--variable-assignments:bash_assoc
.This could be done 1:1, like with
org-babel--variable-assignments:bash_assoc
. That is for the shell functionality, there is a separate function to handle it. A concern is that as these accrue, someone will come along and say, "Hey, this syntax is all very similar. Why don't we 'simplify' the code and generate all the similar cases?" Then we'll be in the same situation asorg-babel-shell-initialize
where simple tasks become complicated code generators that have hidden dependencies and extra code needs to exist to handle the edge cases.By "off-loading" to the end-user, I mean, provide an API that makes this straight forward for end-users to implement. For example, something like hook functions.
[ ]
How is this function used? Why is it provided in the first place?
At first glance, this seems convenient. However, why does it exist? Why can't the user "simply" write a block that does this?
Without looking into it, my guess is to convert an Org table to an array.
[ ]
How are 'sep' and 'hline' supposed to work with this function?
As far as I understand how
org-babel-sh-var-to-sh
works, these make no sense.
[0/2]
defun org-babel–variable-assignments:bashassoc: Refactor
[ ]
Document the parameters
'varname' is what to name the array. 'values' is a list of pairs
[ ]
The worg documentation is wrong on this topic
DONE defun org-babel–variable-assignments:bash
(defun org-babel--variable-assignments:bash (varname values &optional sep hline) "Represent the parameters as useful Bash shell variables." (pcase values (`((,_ ,_ . ,_) . ,_) ;two-dimensional array (org-babel--variable-assignments:bash_assoc varname values sep hline)) (`(,_ . ,_) ;simple list (org-babel--variable-assignments:bash_array varname values sep hline)) (_ ;scalar value (org-babel--variable-assignments:sh-generic varname values sep hline))))
Call appropriate assignment function:
- org-babel–variable-assignments:bashassoc
- org-babel–variable-assignments:basharray
- org-babel–variable-assignments:sh-generic
Uses pcase
which does pattern matching on the form of a value and
performs an action accordingly.
[1/2]
defun org-babel–variable-assignments:bash: Questions
[X]
Is
pcase
the best way to do this?pcase
is a little weird to read at first. To match the nested list appears needing to understand cons cells. This could be a problem for someone not familiar with them. It also requires quasi-quote. So, that's two levels of knowledge needed.[X]
Can it be used with a different pattern?
The pattern doesn't make sense to me, too. The pattern is this:
`((,_ ,_ . ,_) . ,_)
Why not use a simpler form?
`((,_ . ,_) . ,_)
Or maybe one that's literally the form we want?
`((,_ . (,_ . nil)) . ,_)
As cons cells, a list of pairs looks like:
'((3 . (4 . nil)) . nil)
or with a second pair:
'((3 . (4 . nil)) . ((5 . (6 . nil))))
It looks like it's combining the first cons together.
'((3 4 . nil) . nil)
or with a second pair:
'((3 4 . nil) . ((5 6 . nil))))
Honestly, I'm getting annoyed at parsing this out. Why are we using cons cells here anyway? It seems like we need to use them in order capture the "rest" of the pairs.
So why not something like this since we're really only checking that the first element is a pair?
`((,_ ,_) . ,_)
It works:
(let ((values '(("apple" 3) ("banana" 4) ("cherry" 5)))) (pcase values (`((,_ ,_) . ,_) (message "Simpler form catches")) (`((,_ ,_ . ,_) . ,_) (message "Simple form failed"))))
(let ((values '("alice" "bob" "carol"))) (pcase values (`((,_ ,_) . ,_) (message "Simpler form catches")) (`((,_ ,_ . ,_) . ,_) (message "Simple form failed")) (_ (message "Neither form catches (as desired)"))))
(let ((values 42)) (pcase values (`((,_ ,_) . ,_) (message "Simpler form catches")) (`((,_ ,_ . ,_) . ,_) (message "Simple form failed")) (_ (message "Neither form catches (as desired)"))))
[X]
How could this same task be done without
pcase
?Clearly,
pcase
can get confusing and fiddly. It seems to me a good idea if we could avoid using it all together.We're using the form of the parameters to determine the action. Why? Because the parameters derive from the
:var
header which can take an arbitrary form.Have I been calling
:var
"incorrectly"?It seems like
:var
is expected to be called multiple times! The docstring fororg-babel--get-vars
says that PARAMS "may contain multiple entries for the key :var ".(defun org-babel--get-vars (params) "Return the babel variable assignments in PARAMS. PARAMS is a quasi-alist of header args, which may contain multiple entries for the key `:var'. This function returns a list of the cdr of all the `:var' entries." (mapcar #'cdr (cl-remove-if-not (lambda (x) (eq (car x) :var)) params)))
If you're not careful, it looks like this should work:
echo I have ${myarray["banana"]} bananas
But, NO, actually it doesn't…apple is never assigned.
echo $myarray echo I have ${myarray["banana"]} bananas and ${myarray["apple"]} apples
What happens is this:
Each :var is evaluated. The varname comes through as something that Bash should recognize, "myarray\[\"apple\"\]". However, the value is just "3", so the
pcase
sees it as an atom and usesorg-babel--variable-assignments:sh-generic
. This means bash doesn't make the "define" call necessary to create an array.Inneed, through this and through the use of
pcase
, the expectation appears to be that a lisp expression is passed to :var. So,:var
can be called with any form.How else might we differentiate forms?
The params coming into
org-babel--get-vars
look like:((:results . "output replace") (:exports . "code") (:var . "myarray[\"apple\"]=3") (:var . "myarray[\"banana\"]=42") (:tangle . "no") (:hlines . "no") (:noweb . "no") (:cache . "no") (:session . "none"))
Rather than check the values parameter, check the varname. That is check for array syntax. Should we be parsing for Bash code?
This is the weeds. What does it look like stepping back?
We're wanting to use the :var header to define variables in a Bash shell. In addition to regular variables, we want to define indexed arrays and associative arrays. How might an end user indicate that these need to be defined? Currently, they can give a lisp expression that gets translated. How else might an end user indicate and provide the necessary information for defining an indexed array or associative array?
- Separate headers for indexed and associative arrays
ACTUALLY… :var is intended to work with Org tables and lists (the following don't work here because of the indentation, but it does work).
#+name:fruit | apple | 3 | | banana | 42 | #+begin_src bash :results output :var myarray=fruit :eval never-export echo I have ${myarray["banana"]} bananas and ${myarray["apple"]} apples #+end_src #+RESULTS: : I have 42 bananas and 3 apples
[ ]
Should this function exist?
It's only delegating to other functions and is only called by
org-babel-variable-assignments:shell
.The reason why it's broken out looks like it may be because
org-babel-variable-assignments:shell
gets a list of params that is then mapped using a lambda. Breaking the delegation to a separate function makes the lambda code shorter.I'm curious if restructuring its logic would make it easier to understand and remove the need for these separate functions.
[0/3]
defun org-babel–variable-assignments:bash: Refactor
[ ]
Change docstring
It's not very clear. It doesn't follow Emacs Lisp Documentation Tips:
For a function, the first line should briefly answer the question, "What does this function do?"
The first line should mention all the important arguments of the function, and should mention them in the order that they are written in a function call.
What parameters? What does represent mean? Why?
It takes a variable name and…
How should we describe "values"? It's not a list because it could be an atom.
It takes a variable name and the corresponding value (or values). Variable name must be a string and a valid Bash variable name. Values must be a list of pairs, a list, or an atom.
[ ]
Reword comments
- Regarding "scalar":
It's not for a "scalar", it's for an atom, since we're checking the form of
values
, not what values represents in Bash (and Bash doesn't have scalars). Further, the value of the variable is not necessarily a number:Y="six"; eval echo I have "$X" apples
And still yet, the underscore matches anything.
- Regarding "two-dimensional array":
The comment is misleading. It does not mean to say that "bashassoc" is a 2D-array, that associative arrays are 2D. They're not. What it means to say is that the form of
values
is a "list of lists". Thatvalues
is a 2D list.It's confusing because bashassoc makes a Bash array. So, it's easy to confuse what this is talking about (especially if you're not familiar with Bash details).
Further, Lisp lists are not arrays! So, it's technically incorrect.
(let ((values '(("apple" 3) ("banana" 4) ("cherry" 5)))) (if (arrayp values) (message "The list is an array") (message "Lists are not arrays")))
[ ]
Create better worg documentation for :var
First, the worg documentation is misleading. :var is intended for use mainly with named tables and lists. By happenstance (it appears), it also works with lisp expressions.
This is how :var is expected to be used:
apple 5 banana 7 echo I have ${myarray["banana"]} bananas and ${myarray["apple"]} apples
And this works with arrays as expected.
Berlin Washington DC echo ${myarray[0]} is the first capital by alphabetical order echo ${myarray[1]} comes last in alphabetical order
DONE defun org-babel–variable-assignments:fish
(defun org-babel--variable-assignments:fish (varname values &optional sep hline) "Return a list of statements declaring the values as a fish variable." (format "set %s %s" varname (org-babel-sh-var-to-sh values sep hline)))
This is basically the same function as
org-babel--variable-assignments
but for fish's syntax.
[0/0]
defun org-babel–variable-assignments:fish: Questions
[0/1]
defun org-babel–variable-assignments:fish: Refactor
[ ]
Update docstring to explain the arguments
DONE defun org-babel-variable-assignments:shell
(defun org-babel-variable-assignments:shell (params) "Return list of shell statements assigning the block's variables." (let ((sep (cdr (assq :separator params))) (hline (when (string= "yes" (cdr (assq :hlines params))) (or (cdr (assq :hline-string params)) "hline")))) (mapcar (lambda (pair) (if (string-suffix-p "bash" shell-file-name) (org-babel--variable-assignments:bash (car pair) (cdr pair) sep hline) (if (string-suffix-p "fish" shell-file-name) (org-babel--variable-assignments:fish (car pair) (cdr pair) sep hline) (org-babel--variable-assignments:sh-generic (car pair) (cdr pair) sep hline)))) (org-babel--get-vars params))))
Call appropriate variable assignment function depending on shell type (i.e. suffix is bash or not). Each element of a :var header arg is processed.
[1/1]
defun org-babel-variable-assignments:shell: Questions
[X]
How do (or even do) :hlines and :sep or :separator work with ob-shell?
See same question in General obshell: Questions
[0/1]
defun org-babel-variable-assignments:shell: Refactor
[ ]
Change nested ifs to a single cond
DONE defun org-babel-prep-session:shell
(defun org-babel-prep-session:shell (session params) "Prepare SESSION according to the header arguments specified in PARAMS." (let* ((session (org-babel-sh-initiate-session session)) (var-lines (org-babel-variable-assignments:shell params))) (org-babel-comint-in-buffer session (mapc (lambda (var) (insert var) (comint-send-input nil t) (org-babel-comint-wait-for-output session)) var-lines)) session))
Cruft left over from the original implementation of ob-sh. It was used to start and return the comint associated with the session.
Originally, the comint associated with a session was actually started
by a function (that no longer exists) called
org-babel-sh-initiate-session-by-key
. The
org-babel-sh-initiate-session
returned only the buffer after it was
created:
;; from commit 0fa68a53 (defvar org-babel-sh-buffers '(:default . nil)) (defun org-babel-sh-session-buffer (session) (cdr (assoc session org-babel-sh-buffers))) (defun org-babel-sh-initiate-session-by-key (&optional session) "If there is not a current inferior-process-buffer in SESSION then create. Return the initialized session." (save-window-excursion (let* ((session (if session (intern session) :default)) (sh-buffer (org-babel-sh-session-buffer session)) (newp (not (org-babel-comint-buffer-livep sh-buffer)))) (if (and sh-buffer (get-buffer sh-buffer) (not (buffer-live-p sh-buffer))) (setq sh-buffer nil)) (shell sh-buffer) (when newp (setq sh-buffer (current-buffer)) (org-babel-comint-wait-for-output sh-buffer)) (setq org-babel-sh-buffers (cons (cons session sh-buffer) (assq-delete-all session org-babel-sh-buffers))) session))) (defun org-babel-sh-initiate-session (&optional session) (unless (string= session "none") (org-babel-sh-session-buffer (org-babel-sh-initiate-session-by-key session))))
Over time, the functionality of creating the comint process was moved
into org-babel-sh-initiate-session
. Variable assignment was moved
from org-babel-prep-session:sh
to
org-babel-sh-variable-assignments
in 8e151c06 October 14, 2010. It
has been untouched since then (for 13 years).
As you can see, org-babel-prep-session
duplicates that of
org-babel-sh-initiate-session-by-key
(and the modern
org-babel-sh-initiate-session
). That is, it starts a comint.
[0/0]
defun org-babel-prep-session:shell: Questions
[0/1]
defun org-babel-prep-session:shell: Refactor
[ ]
Remove this function
See Refactor section of
org-babel-load-session
.
DONE defun org-babel-load-session:shell
(defun org-babel-load-session:shell (session body params) "Load BODY into SESSION." (save-window-excursion (let ((buffer (org-babel-prep-session:shell session params))) (with-current-buffer buffer (goto-char (process-mark (get-buffer-process (current-buffer)))) (insert (org-babel-chomp body))) buffer)))
This is cruft from the original development. I highly doubt it's been used in the 13 years since it was last changed. Everything this function does has been reimplemented in other functions.
It starts a process if none exists and inserts the body into the associated process buffer. Initiate a comint, if needed. Define any variables that are given in the header args. Return buffer associated with the session.
It's intended for use with the (largely undocumented) org-metaup
.
[1/1]
defun org-babel-load-session:shell: Questions
[X]
What is this used for?
Searching for all instances of "org-babel-load-session:shell", we see that it's used in ob-core.
rg -n --type=elisp -e org-babel-load-session\: /tmp/org-mode | rg -v defun
This is within the
org-babel-load-in-session
function.rg -n --type=elisp -e org-babel-load-in-session /tmp/org-mode | rg -v defun
This is used by
org-babel-load-in-session-maybe
which is added as a hook to theorg-metaup-hook
.rg -n --type=elisp -e org-metaup-hook /tmp/org-mode
It looks like
org-metaup-hook
is only set up (by default) withorg-babel-load-in-session-maybe
(which callsorg-babel-load-in-session
).The only other place this stuff is used in code is in
org-keys
(seen above in therg
output). Theorg-keys.el
commentary says,;; This library adds bindings for Org mode buffers. It also ;; implements both Speed keys and Babel speed keys. See manual for ;; details.
Looking at the manual, indeed,
org-babel-load-in-session
is mentioned in Key bindings and Useful Functions:Active key bindings in code blocks: Key binding Function -------------------------------------------------------- 'C-c C-c' 'org-babel-execute-src-block' 'C-c C-o' 'org-babel-open-src-block-result' 'M-<UP>' 'org-babel-load-in-session' 'M-<DOWN>' 'org-babel-pop-to-session' ... 'C-c C-v l' or 'C-c C-v C-l' 'org-babel-load-in-session'
No mention of
org-babel-load-in-session
is given related to the speed commands. Speed commands creates a shortcut fororg-metaup
, but does not explain (or in any way connect) thatorg-babel-load-in-session
exists.Back to the original question, what is this used for?
org-babel-load-session:shell
is connected to an undocumented feature. It's unclear what this feature is. Try callingM-x org-metaup
on the following block:echo "hi" echo "world"
It starts a session and takes you to it:
ahab@pequod ~/Projects/excalamus.com/src/draft/emacs$ PROMPT_COMMAND=;PS1="org_babel_sh_prompt> ";PS2= org_babel_sh_prompt> echo "hi" echo "world"
[0/1]
defun org-babel-load-session:shell: Refactor
[ ]
Remove all traces of
org-babel-load-session
I would be shocked if this feature has been used by anyone in the past 13 years. It requires a largely undocumented feature:
org-metaup
.The function of it is to take you to the comint buffer. Something like that might be useful if Org could capture the input and output. However, it doesn't. This defeats the whole point of literate programming.
Unless there's a way to capture the input and output of session interaction happening outside the Org block, I don't see a point in keeping this.
We're also in the fortunate position of no one likely knowing about this, making it easy for us to kill it.
DONE defun org-babel-sh-strip-weird-long-prompt
(defun org-babel-sh-strip-weird-long-prompt (string) "Remove prompt cruft from a string of shell output." (while (string-match "^% +[\r\n$]+ *" string) (setq string (substring string (match-end 0)))) string)
Used only in org-babel-sh-evaluate
and only for sessions. Removes
leading white space for strings starting with "%".
The name concerns me. It sounds an awful lot like the problem it tries to solve wasn't fully understood. That is, it's a work around, not a solution.
Introduced in commit 9623b169.
[3/3]
defun org-babel-sh-strip-weird-long-prompt: Questions
[X]
What is the actual regexp?
The regexp is:
"^% +[\r\n$]+ *"
This says,
- ^ starting at the beginning of the line
- % match the percent sign
- + match one or more spaces
- [\r\n$]+ match one or more line endings
- * match zero or more white spaces
[X]
Does this regexp actually do anything?
I don't think so.
Here's how it works:
(let ((string "% \n\n\r\n\rhello world\ngood cruel world x_x")) (while (string-match "^% +[\r\n$]+ *" string) (setq string (substring string (match-end 0)))) string)
It will not match unless the string starts with a "%":
(let ((string "\n\n\r\n\rhello world\ngood cruel world x_x")) (while (string-match "^% +[\r\n$]+ *" string) (setq string (substring string (match-end 0)))) string)
The only place
org-babel-sh-strip-weird-long-prompt
is ever called is inorg-babel-sh-evaluate
. The captured session text is passed in.For the following,
#("The value of x is 7" 0 19 (field output))
is the string passed in:x=7 echo "The value of x is $x"
I assume the "%" corresponded to the prompt, but I can't tell. Currently, the string passed in is the result of the (several times) trimmed result of evaluation.
;; org-babel-sh-evaluate (mapconcat #'org-babel-sh-strip-weird-long-prompt (mapcar #'org-trim (butlast ; Remove eoe indicator (org-babel-comint-with-output (session org-babel-sh-eoe-output t body) (insert (org-trim body) "\n" org-babel-sh-eoe-indicator) (comint-send-input nil t)) ;; Remove `org-babel-sh-eoe-indicator' output line. 1)) "\n")))
[X]
Is it necessary to setq?
The while loop continues through the string in order to find all matches. When a match is found, the string is replaced. In this way, each iteration of the loop is checking a new string. This allows for
(match-end 0)
to always be used for the next match.Yes, the setq is necessary if the goal is to return a new string.
[0/1]
defun org-babel-sh-strip-weird-long-prompt: Refactor
[ ]
Remove this function
I'm leaning toward removing this. It doesn't seem like it's called anymore. I have no clue how to reproduce the problem it seems to solve and there's no record I could find of what the problem even was. It also appears like the original problem was never fully understood.
If we remove it and a bug shows up, then we'll have a real test case, have a chance to figure out what's actually happening, and maybe find a better solution. We could remove the element of "weirdness." It would negatively affect users in the short term by outputting extra characters. The output would still exist, however, and users would be able to keep using the Babel. In the long term, we'd be able to better support users by actually understanding our code base.
By removing it, we avoid:
- Having code
- Not understanding our code
- Doing extra processing
DONE defun ob-shell-async-chunk-callback
(defun ob-shell-async-chunk-callback (string) "Filter applied to results before insertion. See `org-babel-comint-async-chunk-callback'." (replace-regexp-in-string comint-prompt-regexp "" string))
As the docstring says, the filter applied to results before insertion.
Replace the comint-prompt-regexp
("orgbabelshprompt> *") with
nothing.
Used in org-babel-sh-evaluate
. Provided as the CHUNK-CALLBACK
argument of org-babel-comint-async-register
. Called after the
ob-shell-async-indicator
is found and the results scraped from the
comint buffer.
[0/0]
defun ob-shell-async-chunk-callback: Questions
[0/1]
defun ob-shell-async-chunk-callback: Refactor
[ ]
Document argument
DONE [10/10]
defun org-babel-sh-evaluate
This is where all the functionality of ob-shell
lives. Understand
this and you'll have the bulk of what ob-shell
is.
(defun org-babel-sh-evaluate (session body &optional params stdin cmdline) "Pass BODY to the Shell process in BUFFER. If RESULT-TYPE equals `output' then return a list of the outputs of the statements in BODY, if RESULT-TYPE equals `value' then return the value of the last statement in BODY." (let* ((shebang (cdr (assq :shebang params))) (async (org-babel-comint-use-async params)) (results-params (cdr (assq :result-params params))) (value-is-exit-status (or (and (equal '("replace") results-params) (not org-babel-shell-results-defaults-to-output)) (member "value" results-params))) (results (cond ((or stdin cmdline) ; external shell script w/STDIN (let ((script-file (org-babel-temp-file "sh-script-")) (stdin-file (org-babel-temp-file "sh-stdin-")) (padline (not (string= "no" (cdr (assq :padline params)))))) (with-temp-file script-file (when shebang (insert shebang "\n")) (when padline (insert "\n")) (insert body)) (set-file-modes script-file #o755) (with-temp-file stdin-file (insert (or stdin ""))) (with-temp-buffer (with-connection-local-variables (apply #'process-file (if shebang (file-local-name script-file) shell-file-name) stdin-file (current-buffer) nil (if shebang (when cmdline (list cmdline)) (list shell-command-switch (concat (file-local-name script-file) " " cmdline))))) (buffer-string)))) (session ; session evaluation (if async (progn (let ((uuid (org-id-uuid))) (org-babel-comint-async-register session (current-buffer) "ob_comint_async_shell_\\(.+\\)_\\(.+\\)" 'ob-shell-async-chunk-callback nil) (org-babel-comint-async-delete-dangling-and-eval session (insert (format ob-shell-async-indicator "start" uuid)) (comint-send-input nil t) (insert (org-trim body)) (comint-send-input nil t) (insert (format ob-shell-async-indicator "end" uuid)) (comint-send-input nil t)) uuid)) (mapconcat #'org-babel-sh-strip-weird-long-prompt (mapcar #'org-trim (butlast ; Remove eoe indicator (org-babel-comint-with-output (session org-babel-sh-eoe-output t body) (insert (org-trim body) "\n" org-babel-sh-eoe-indicator) (comint-send-input nil t)) ;; Remove `org-babel-sh-eoe-indicator' output line. 1)) "\n"))) ;; External shell script, with or without a predefined ;; shebang. ((org-string-nw-p shebang) (let ((script-file (org-babel-temp-file "sh-script-")) (padline (not (equal "no" (cdr (assq :padline params)))))) (with-temp-file script-file (insert shebang "\n") (when padline (insert "\n")) (insert body)) (set-file-modes script-file #o755) (if (file-remote-p script-file) ;; Run remote script using its local path as COMMAND. ;; The remote execution is ensured by setting ;; correct `default-directory'. (let ((default-directory (file-name-directory script-file))) (org-babel-eval (file-local-name script-file) "")) (org-babel-eval script-file "")))) (t (org-babel-eval shell-file-name (org-trim body)))))) (when (and results value-is-exit-status) (setq results (car (reverse (split-string results "\n" t))))) (when results (let ((result-params (cdr (assq :result-params params)))) (org-babel-result-cond result-params results (let ((tmp-file (org-babel-temp-file "sh-"))) (with-temp-file tmp-file (insert results)) (org-babel-import-elisp-from-file tmp-file)))))))
The org-babel-sh-evaluate
function has two parts, calculating the
results and formatting the results. Calculating the results makes up
most of the function. It's a conditional for each case where each
case corresponds to possible header arguments. Formatting is
more-or-less off-loaded to core Org functionality.
There are several ways in which a shell block may evaluate. Many of these overlap and influence each other. Each, more-or-less, corresponds to a case in the conditional.
These are the current cases:
- stdin
- cmdline
- shebang
- session
- async
- general
[X]
defun and org-babel-sh-evaluate-varlist-shebang
(defun org-babel-sh-evaluate (session body &optional params stdin cmdline) "Pass BODY to the Shell process in BUFFER. If RESULT-TYPE equals `output' then return a list of the outputs of the statements in BODY, if RESULT-TYPE equals `value' then return the value of the last statement in BODY." (let* ((shebang (cdr (assq :shebang params)))
The arguments are several parameters (session, stdin, cmdline, async, etc) and the source block body. For some reason, the parameters are split between being explicitly passed and passed within the params list. See Questions and Refactor for discussion.
Params is an alist of data. condition See "What is params?" in defun org-babel-execute:shell for more information.
A shebang is a line like "#!/bin/bash" which does two things:
- Allows a script file to execute as a command
- Specifies which binary to use when executing
This let statement binds the
:shebang
header argument.[X]
org-babel-sh-evaluate-varlist-async
(async (org-babel-comint-use-async params))
Determine whether to use async.
Async depends on 1)
:session
not being "none" and 2) the:async
header not being "no". See Questions section below on toggle (in)consistencies.org-babel-use-async
moves this logic to a separate module for other async implementations to use.[X]
org-babel-sh-evaluate-varlist-results-params
(results-params (cdr (assq :result-params params)))
The
:result-params
key inparams
holds the header values for the:results
header.It looks something like,
(:result-params "replace" "output")
or
(:result-params "replace" "value" "table")
Results are presentable in several formats and
:result-params
holds the keywords for deciding what the output should look like.[X]
org-babel-sh-evaluate-varlist-value-is-exit-status
(value-is-exit-status (or (and (equal '("replace") results-params) (not org-babel-shell-results-defaults-to-output)) (member "value" results-params)))
Block results are stdout, stderr, or an exit code. Typically, an exit code of 0 means success, non-0 means failure.
value-is-exit-status
helps figure out which result to return.This is something that needs refactoring. See Refactoring notes.
[X]
org-babel-sh-evaluate-varlist-results
(results (cond
Results are calculated using a
cond
. Various combinations of headers make up the different cases. See Question asking whether all combinations are represented.[X]
org-babel-sh-evaluate-varlist-results-stdin-cmdline
((or stdin cmdline) ; external shell script w/STDIN (let ((script-file (org-babel-temp-file "sh-script-")) (stdin-file (org-babel-temp-file "sh-stdin-")) (padline (not (string= "no" (cdr (assq :padline params)))))) (with-temp-file script-file (when shebang (insert shebang "\n")) (when padline (insert "\n")) (insert body)) (set-file-modes script-file #o755) (with-temp-file stdin-file (insert (or stdin ""))) (with-temp-buffer (with-connection-local-variables (apply #'process-file (if shebang (file-local-name script-file) shell-file-name) stdin-file (current-buffer) nil (if shebang (when cmdline (list cmdline)) (list shell-command-switch (concat (file-local-name script-file) " " cmdline))))) (buffer-string))))
This condition handles the stdin, cmdline, and (sometimes) shebangs headers. Note: there's a separate branch condition for shebang that's only called when there's no stdin or cmdline header. So, this is for:
stdin cmdline stdin, shebang cmdline, shebang stdin, cmdline, shebang
A shebang is a line at the top of a file like "#!/bin/bash" which allows a shell script to run as a command. The name "shebang" comes from combining the names of the first two characters, the "hash" (#) and the "bang" (!).
It works by calling
process-file
which has form:(process-file PROGRAM &optional INFILE BUFFER DISPLAY &rest ARGS)
The stdin header is put into a temporary file whose path is passed in as INFILE.
The block code is also put into a temporary file. The block code is either executed as a shebang or called as part of the argument to the shell binary. The shebang line is inserted when a
:shebang
is given.The temporary script file is made executable with 755 permissions. This corresponds to
chmod a+rwx,o-w
which sets permissions so that the group and owner can read, can write and can execute. Others can read and execute, but can't write.When the block script is executed, it's run using
process-file
in a temporary buffer. The results of the process are placed in the buffer and, finally, the contents of the buffer given for the results.For some reason, a newline is inserted if
:padline
is given. See Questions.[X]
org-babel-sh-evaluate-varlist-results-session
(session ; session evaluation (if async (progn (let ((uuid (org-id-uuid))) (org-babel-comint-async-register session (current-buffer) "ob_comint_async_shell_\\(.+\\)_\\(.+\\)" 'ob-shell-async-chunk-callback nil) (org-babel-comint-async-delete-dangling-and-eval session (insert (format ob-shell-async-indicator "start" uuid)) (comint-send-input nil t) (insert (org-trim body)) (comint-send-input nil t) (insert (format ob-shell-async-indicator "end" uuid)) (comint-send-input nil t)) uuid)) (mapconcat #'org-babel-sh-strip-weird-long-prompt (mapcar #'org-trim (butlast ; Remove eoe indicator (org-babel-comint-with-output (session org-babel-sh-eoe-output t body) (insert (org-trim body) "\n" org-babel-sh-eoe-indicator) (comint-send-input nil t)) ;; Remove `org-babel-sh-eoe-indicator' output line. 1)) "\n")))
The "session" condition has two parts: async and non-async.
Async handling comes first. The async API requires "registering" certain information: the session buffer, the org buffer the block lives in, a regexp for matching the beginning and end of evaluation, and a user-defined filter function. The registration process adds a special (non-user-defined) filter function to the session buffer (
org-babel-comint-async-filter
). This filter looks at output sent to the session buffer from the process for the end of evaluation regex.After the registration is made, the "start" indicator is sent, the block code sent, and finally the "end" indicator. Because this is all sent to a shell session, it runs in the background until the filter finds the "end" indicator. Until that happens, the result of the async branch is the uuid and that's what appears for the user as the result. (If the "end" indicator is found, the uuid is located and replaced with the text occurring between the "start" and "end" indicators. But that's all handled by things in ob-comint.el.)
Non-async sessions are handled by the next bit of code. Basically, it sends the block code to the session and waits for the
org-babel-sh-eoe-indicator
. How this happens is determined by code in ob-comint.el.[X]
org-babel-sh-evaluate-varlist-results-shebang
;; External shell script, with or without a predefined ;; shebang. ((org-string-nw-p shebang) (let ((script-file (org-babel-temp-file "sh-script-")) (padline (not (equal "no" (cdr (assq :padline params)))))) (with-temp-file script-file (insert shebang "\n") (when padline (insert "\n")) (insert body)) (set-file-modes script-file #o755) (if (file-remote-p script-file) ;; Run remote script using its local path as COMMAND. ;; The remote execution is ensured by setting ;; correct `default-directory'. (let ((default-directory (file-name-directory script-file))) (org-babel-eval (file-local-name script-file) "")) (org-babel-eval script-file ""))))
Check if shebang is non-empty; check for padline Write shebang and body to temp file
The temporary script file is made executable with 755 permissions. This corresponds to
chmod a+rwx,o-w
which sets permissions so that the group and owner can read, can write and can execute. Others can read and execute, but can't write.Check if the script file is remote. This is because, ultimately,
process-file
is called. Two things must happen:- strip the remote information from the filename
- make sure the file runs on the remote
process-file
says to usefile-local-name
for remote files. This strips the remote information from the filename. However, without this information,process-file
will execute the file on the local machine. We change this behavior by setting thedefault-directory
to make the file relative to the remote machine.For example, when processing a remote file, the
script-file
variable looks like:"/ssh:ahab@localhost:/tmp/sh-script-3qPNNc"
The
file-local-name
strips the "ssh" stuff off the front to make it:/tmp/sh-script-3qPNNc
. Originally, the file was relative to "/ssh:ahab@localhost:/tmp/
". Afterfile-local-name
, it looks like it's relative to "/tmp" (on the local machine).org-babel-eval
then tries to run the result offile-local-name
. Without setting thedefault-directory
, it uses the currentdefault-directory
. This is the the localdefault-directory
which is "/tmp". When this happens, the script can't be found. We must set thedefault-directory
to the remote (/ssh:ahab@localhost:/tmp/
). Then it's found.[X]
org-babel-sh-evaluate-varlist-results-general
(t (org-babel-eval shell-file-name (org-trim body))))))
This handles any of the cases not covered.
It trims the body and runs it through
process-file
. See "How Shell works at a deep level" in the Mailing list items.[X]
org-babel-sh-evaluate-varlist-return
(when (and results value-is-exit-status) (setq results (car (reverse (split-string results "\n" t))))) (when results (let ((result-params (cdr (assq :result-params params)))) (org-babel-result-cond result-params results (let ((tmp-file (org-babel-temp-file "sh-"))) (with-temp-file tmp-file (insert results)) (org-babel-import-elisp-from-file tmp-file)))))))
Shells can return two types of values, exit codes and standard output.
The Babel results are the exit code when using ":results value":
echo "hi"
The results look like:
"hi 0 "
(car (reverse (split-string "hi\n0" "\n" t)))
The final branch takes the result-params (which control output formatting) and
echo "hello" echo "world"
org-table-result-cond
handles the different result cases. For a table, it needs a sexp. It gets this fromorg-babel-import-elisp-from-file
:(("hello") ("world"))
This is what's returned to the caller. Why? That's not exactly clear. But there you have it.
[3/17]
defun org-babel-sh-evaluate: Questions
[ ]
Why does this function exist?[ ]
What combinations exist and are they represented?
Possible headers:
- stdin
- cmdline
- stdin
- session
- async
- shebang
- default
;; org-sort ;; delete-duplicate-lines (defun powerset (x) (if (null x) (list '()) (let ((elt (car x)) (p (power-set (cdr x)))) (append (mapc (lambda (s) (cons elt s)) p) p)))) (powerset '(stdin cmdline stdin session async shebang ))
[ ]
What is the use of running a block as a shebang?
The sole purpose (AFAIU) of a shebang is to run a script as a command. The purpose of Babel is for literate programming. Does literate programming have any overlap with running a script as a command?
A command is a way to execute code. In Babel, executing a block is the way to execute the code. A shebang, AFAIU, doesn't have any present inherent change in execution. The shebang header does affect output within Babel because of implementation details. However, that's not a part of shebangs themselves.
What are the different was to execute a script?
- run as a command (shebang)
- passing the script as an argument (
sh my-script.sh
) - passing it as a command string (
bash -c "code stuff"
) - sourcing it
What are the differences between them?
AFAIU, (1) and (2) are the same. (3) doesn't work well with multiple lines. (4) modifies the current environment.
What am I missing? Because (1) and (2) are the same, it seems to me that the shebang option could be deprecated and removed.
[X]
Is a block actually run as a shebang/command?
Yes. The call made is:
(apply #'process-file (if shebang (file-local-name script-file) shell-file-name) ...
If the
:shebang
header is specified, then the temporary script file name is passed in as the PROGRAM argument forprocess-file
.[ ]
What is required to make the toggles for options consistent?
Async depends on 1)
:sessions
not being "none" and 2) the:async
header not being "no". The following is run async:echo "hello" sleep 3 echo "world"
I think ob-python uses "yes". What are the toggles for other languages?
[ ]
How does the logic for
value-is-exit-status
work?There was big discussion in 2020 about the meaning of output and value and about which should be default (if any).
See https://orgmode.org/list/CA+A2iZaziAfMeGpBqL6qGrzrWEVvLvC0DUw++T4gCF3NGuW-DQ@mail.gmail.com
At that time, in commit 07c6b1125, the current logic was put in place:
- (value-is-exit-status (or (equal '("replace") results-params) - (member "value" results-params))) + (value-is-exit-status + (or (and + (equal '("replace") results-params) + (not org-babel-shell-results-defaults-to-output)) + (member "value" results-params)))
It says,
If "replace" and not default to output -> t Elseif value appears -> t Else nil
[ ]
Why only check "default to output" for "replace" in
value-is-exit-status
?What is
org-babel-shell-results-defaults-to-output
?org-babel-shell-results-defaults-to-output is a variable defined in 'ob-shell.el'.
Let shell execution defaults to ":results output".
When set to t, use ":results output" when no :results setting is set. This is especially useful for inline source blocks.
When set to nil, stick to the convention of using :results value as the default setting when no :results is set, the "value" of a shell execution being its exit code.
What is "replace?"
'replace' Default. Insert results in the Org buffer. Remove previous results. Usage example: ':results output replace'.
We want "value" to correspond to "exit code" and "output" to correspond to "stdout".
The default Org Babel behavior is "value". However, "value" is not a good default for ob-shell. So, the setting
org-babel-shell-results-defaults-to-output
is set to t, making "output" the default.So, when
org-babel-shell-results-defaults-to-output
is nil, results should be "value."Why is "replace" checked at all? It's not clear to me, but the results aren't right if it's removed. The logic for this needs to be cleaned up…
[ ]
How does
with-connection-local-variables
work?Within the stdin/cmdline branch, what does
with-connection-local-variables
do?It applies "connection-local-variables", yes. I suppose those are set elsewhere?
[ ]
Should other branches use
with-connection-local-variables
?Might this help with some of the remote file handling? Or is this just a convenience? Should other branches get the same convenience?
[ ]
When stdin/cmdline branch called without shebang, does
shell-command-switch
always work?It looks like
shell-command-switch
is always set to "-c". This is bash specific.shell-command-switch
is only called in this branch. So, it's being used for all shells. This should be set in the initialize closure for each shell (if such a flag exists).[X]
Is a check done on the
:shebang
?No. It appears to just run and fail silently.
On Android, at least, "Code block produced no output."
#+begin_src sh :shebang #!/banana :cmdline first second echo "$2" #+end_src #+RESULTS:
Note stdin or cmdline must be given in order for the stdin/cmdline branch to apply.
[ ]
Are the file permissions appropriate?
For Stdin/Cmdline, they are made executable with 755 permissions. This corresponds to
chmod a+rwx,o-w
which sets permissions so that the group and owner can read, can write and can execute. Others can read and execute, but can't write.[ ]
Why is
:padline
checked in stdin/cmdline?The manual says,
The 'padline' header argument controls insertion of newlines to pad source code in the tangled file.
So, why is there any handling for padlines here?
The padline is only used after inserting the shebang. Is a newline sometimes (but not always) needed after a shebang?
[ ]
Why does the async user define the uuid and call the start/end indicators?
This is really a question for ob-comint.el.
[ ]
Why does the shebang branch check if the script file is remote?
Is it for when the Org buffer/file is being run on a remote machine?
Are there other places this should be done?
[ ]
For shebang branch, could the remote file checking logic clean up?
Maybe.
This logic was introduced in 39de4a18:
commit 39de4a1848d12b1be929853bf884ec04e121d9f0 (HEAD) Author: Ihor Radchenko <yantar92@posteo.net> Date: Sun Jul 23 17:38:09 2023 +0300
org-babel-sh-evaluate: Fix edge case when :shebang is set and :dir is remote
- lisp/ob-shell.el (org-babel-sh-evaluate): Pass remote local file
name as command when executing script with :shebang. `org-babel-eval' will fail when SCRIPT-FILE is TRAMP file name.
Link: https://www.reddit.com/r/orgmode/comments/14zh2yi/orgbabel_shebang_not_working_on_with_tramp_dir/
The following is a function I made that probably isn't necessary.
(defun my-file-name-directory (filename) "Like `file-name-directory' but works with remote files." (interactive) (let* ((remote-connection (file-remote-p filename)) (end (if remote-connection (match-end (string-match remote-connection filename))))) (file-name-directory (substring filename end)))) (my-file-name-directory ;; "/ssh:ahab@localhost:~/" "/tmp/org-mode/lisp/ox.el" ;; "~/delete-me" )
[X]
Is
org-babel-import-elisp-from-file
necessary for results?No, there are other ways. I was not able to find anything as simple as "convert-results-to-table(txt)". Everything seems to require gymnastics.
org-babel-import-elisp-from-file
requires writing to disk simply to convert a the results into a sexp.For the following,
echo "hello" echo "world"
the "results" look like,
"hello world "
It's not clear to me if there's a function to take the results string and convert it to a table. There are several functions that kind of do something like that,
org-table-convert-region
org-table-create-or-convert-from-region
- the whole
org-babel-import-elisp-from-file
danceob-shell
does
The following works:
(with-temp-buffer (insert "hello\nworld\n") (org-table-convert-region (point-min) (point-max)) (buffer-substring-no-properties (point-min) (point-max)))
Notice that it does the transformation in place and therefore requires manually grabbing the result and returning it.
Maybe this should be a function?
Fun fact, there are two "create-table-or-convert-from-region" functions in
org-table.el
:(defun org-table-create-or-convert-from-region (arg) "Convert region to table, or create an empty table. If there is an active region, convert it to a table, using the function `org-table-convert-region'. See the documentation of that function to learn how the prefix argument is interpreted to determine the field separator. If there is no such region, create an empty table with `org-table-create'." (interactive "P") (if (org-region-active-p) (org-table-convert-region (region-beginning) (region-end) arg) (org-table-create arg))) (defun orgtbl-create-or-convert-from-region (_arg) "Create table or convert region to table, if no conflicting binding. This installs the table binding `C-c |', but only if there is no conflicting binding to this key outside `orgtbl-mode'." (interactive "P") (let* (orgtbl-mode (cmd (key-binding "\C-c|"))) (if cmd (call-interactively cmd) (call-interactively 'org-table-create-or-convert-from-region))))
It seems like this might be because there are different table types?
;;; Commentary:
;; Watch out: Here we are talking about two different kind of tables. ;; Most of the code is for the tables created with the Org mode table editor. ;; Sometimes, we talk about tables created and edited with the table.el ;; Emacs package. We call the former org-type tables, and the latter ;; table.el-type tables.
It feels like this is typical for the Org code base. Functions take weird arguments and work in ways that seem based simply on the functions that were available, regardless of whether it made any sense. They're not designed to be composable. So, you end up needing to do a bunch of weird transformations and understand the mechanics of how the underlying system works. Basically, there's no abstraction.
My impression is that this makes maintenance harder. For example, I want to focus on shell functionality. My brain is filled with the Chapter 18 level of kernel driver books dealing with terminal I/O and here I am having to also deal with table rendering when there's no fundamental reason why it can't be a function that takes a string. That makes me grumpy.
This makes it harder to get maintainers because maintaining one section is not possible. You need a deep understanding of each part of the code base.
[0/7]
defun org-babel-sh-evaluate: Refactor
[ ]
Get rid of this function
It makes no sense. The necessary variables (session, cmdline, async, etc.) are randomly grabbed within this function and it's caller. The caller,
org-babel-execute:shell
is the one needed by the Org Babel API. Both functions do similar things, like prep for the type of execution and preparing the results. There is no clear reason for either to exist. It's not like one does the subprocess call and the other assembles results. No! Both do a bit of that.[ ]
Document the arguments[ ]
Make the arguments consistent
For some reason the arguments are inconsistent. Some are provided directly (session, stdin, and cmdline), others through the params. Both of the direct ones come from params, yet are obtained in the caller function,
org-babel-execute:shell
, and are only used to callorg-babel-sh-evaluate
.[ ]
Fix
value-is-exit-status
This is part of the "Get rid of this function" argument.
Basically,
- Don't split the functionality of getting the exit code between two functions
- Don't use only "echo $?" to get the exit status (what about "unsupported" shells?)
value-is-exit-status
is a toggle to return the exit code. It works like this:A block is executed and
org-babel-execute:shell
is called.value-is-exit-status
is determined. When it's true, "\necho $?" is appended to the block body. Thenorg-babel-sh-evaluate
is executed.value-is-exit-status
is again figured out. When it's true, then the result is parsed for the last value (which is the exit code).Maybe I need to think more, but I don't see why this needs to be split between two functions.
Also, the exit code is obtained using "\necho $?". I'm pretty sure that's only valid for Bourne-like shells (I have not checked).
[ ]
Fix docstring for
org-babel-shell-results-defaults-to-output
Says "Let shell execution defaults to ":results output"."
Should be something like
"Let shell execution default to ":results output".
or
"Set default execution to ":results output""
[ ]
Make the paths for stdin, cmdline, and shebang more clear[ ]
Move functionality of
org-babel-sh-evaluate-varlist-return
elsewhereThe evaluation should happen separately from the formatting of the results of the evaluation. Right now, the logic for formatting is split between two functions (and that logic appears unnecessarily complex).
DONE defun org-babel-execute:shell
(defun org-babel-execute:shell (body params) "Execute Shell BODY according to PARAMS. This function is called by `org-babel-execute-src-block'." (let* ((session (org-babel-sh-initiate-session (cdr (assq :session params)))) (stdin (let ((stdin (cdr (assq :stdin params)))) (when stdin (org-babel-sh-var-to-string (org-babel-ref-resolve stdin))))) (results-params (cdr (assq :result-params params))) (value-is-exit-status (or (and (equal '("replace") results-params) (not org-babel-shell-results-defaults-to-output)) (member "value" results-params))) (cmdline (cdr (assq :cmdline params))) (full-body (concat (org-babel-expand-body:generic body params (org-babel-variable-assignments:shell params)) (when value-is-exit-status "\necho $?")))) (org-babel-reassemble-table (org-babel-sh-evaluate session full-body params stdin cmdline) (org-babel-pick-name (cdr (assq :colname-names params)) (cdr (assq :colnames params))) (org-babel-pick-name (cdr (assq :rowname-names params)) (cdr (assq :rownames params))))))
The fundamental entry point for ob-shell
when a source block is
evaluated.
When a user executes a block, the ob-core
function
org-babel-execute-src-block
is called. It dynamically binds
"org-babel-execute:lang" to a variable "cmd" where "lang" corresponds
to whatever follows "#+beginsrc" in the block header ("lang"
corresponds to what we've been calling "template").
;; org-babel-execute-src-block ... d)))) (cmd (intern (concat "org-babel-execute:" lang))) ; dynamically bind symbol result exec-start-time) ... (setq exec-start-time (current-time) result (let ((r (save-current-buffer (funcall cmd body params)))) ; call org-babel-execute:lang (if (and (eq (cdr (assq :result-type params)) 'value) ...
Recall that org-babel-shell-initialize
generated functions called
"org-babel-execute:template", one for each of the shells given in
org-babel-shell-names
. Each of these calls
org-babel-execute:shell
.
BODY is the inside of the source block. PARAMS is an alist (see below) of header arguments and other information.
This is function doesn't have a clear purpose. It just happens to be the entry point.
It parses the block header and passes some of these to is the eventual
execution function org-babel-sh-evaluate
. However, it also passes
the full params which is re-parsed out again.
It sort of handles assembling results. For example, it reassembles
the table. However, some of that is also done in
org-babel-sh-evaluate
.
[1/3]
defun org-babel-execute:shell: Questions
[ ]
Theorg-babel-execute:template
functions andorg-babel-execute:shell
are all related to dispatch. How might we dispatch differently and are any of the alternatives better?[X]
Why does the docstring say "This function is called by `org-babel-execute-src-block'."?
It seems to me that it says this because it's trying to explain how the Org Babel API functions. Unfortunately, how the Org Babel API functions is rather opaque.
[ ]
What is params?
Params is a data structure whose specific structure depends on the context. PARAMS, as seen within
org-babel-execute:shell
, stores information taken from the source block header as an alist:;; What's stored in the 'params' parameter: ((:colname-names) (:rowname-names) (:result-params "replace") (:result-type . value) (:results . "replace") (:exports . "code") (:session . "none") (:cache . "no") (:noweb . "no") (:hlines . "no") (:tangle . "no")) ;; Typical access pattern: (cdr (assq :results '((:colname-names) (:rowname-names) (:result-params "replace") (:result-type . value) (:results . "replace") (:exports . "code") (:session . "none") (:cache . "no") (:noweb . "no") (:hlines . "no") (:tangle . "no")) )) ; "replace"
Note that PARAMS is defined differently elsewhere! Some of these docstring descriptions, such as for
org-babel-variable-assignments
, are probably incorrect in calling it a plist.According to
org-babel--get-vars
,PARAMS is a quasi-alist of header args, which may contain multiple entries for the key `:var'. This function returns a list of the cdr of all the `:var' entries."
According to
org-babel--file-desc
,PARAMS is header argument values.
According to
org-list--to-generic-plain-list
,PARAMS is a plist used to tweak the behavior of the transcoder.
According to
org-babel-tangle--unbracketed-link
,The PARAMS are the 3rd element of the info for the same src block.
According to
org-babel-comint-use-async
,PARAMS are the header arguments as passed to `org-babel-execute:lang'.
According to
org-babel-variable-assignments:plantuml
,PARAMS is a property list of source block parameters, which may contain multiple entries for the key `:var'. `:var' entries in PARAMS are expected to be scalar variables."
[0/2]
defun org-babel-execute:shell: Refactor
[ ]
Remove the statement "This function is called by `org-babel-execute-src-block'." from the docstring. Although a true statement, it's also true that this function is called by all of theorg-babel-execute:template
functions.[ ]
Give this function a clear division of responsiblity
DONE defun org-babel-shell-initialize
(defun org-babel-shell-initialize () "Define execution functions associated to shell names. This function has to be called whenever `org-babel-shell-names' is modified outside the Customize interface." (interactive) (dolist (name org-babel-shell-names) (let ((fname (intern (concat "org-babel-execute:" name)))) (defalias fname (lambda (body params) (:documentation (format "Execute a block of %s commands with Babel." name)) (let ((shell-file-name name)) (org-babel-execute:shell body params)))) (put fname 'definition-name 'org-babel-shell-initialize)) (defalias (intern (concat "org-babel-variable-assignments:" name)) #'org-babel-variable-assignments:shell (format "Return list of %s statements assigning to the block's \ variables." name)) (funcall (if (fboundp 'defvar-1) #'defvar-1 #'set) ;Emacs-29 (intern (concat "org-babel-default-header-args:" name)) nil)))
Several different shells are supported.
Each needs the following functions:
- An alias
org-babel-execute:template
fororg-babel-execute:shell
- An alias
org-babel-variable-assignments:name
fororg-babel-variable-assignments:shell
org-babel-default-header-args:name
These allow people to use any of the org-babel-shell-names
as the
source block language. For example, the ob-core
functionality that
runs the source block (org-babel-execute-src-block
) expects a
"org-babel-execute:template" function.
The way it works is this. It iterates through the
org-babel-shell-names
list. For each element, it defines a symbol
for org-babel-execute:template
according to the current list item
(name). It makes this symbol an alias for a function that calls
org-babel-execute:shell
such that shell-file-name
is assigned to
the current list item. For example, when processing "fish", a call to
org-babel-execute:fish
will call org-babel-execute:shell
yet any
call to an inferior shell process will use "fish" rather than the
shell stored in the SHELL environment variable at start up. (It's
assumed that the command, such as "fish", is accessible through PATH).
A "put" call makes org-babel-execute:template
accessible from the built
in help. It next creates an alias for
org-babel-variable-assignments:name
corresponding to
org-babel-variable-assignments:shell
. Finally, it creates a symbol
for org-babel-default-header-args:name
, using a different function
(either set
or defvar-1
) depending on the Emacs version.
[1/1]
defun org-babel-shell-initialize: Questions
[X]
Is there a better way to do this?
This sets off my alarms because it feels too clever.
- probably complex for someone coming from a non-lispy background,
creates a closure to modify
shell-file-name
, meta-programming - creates (what feels like) unnecessary dependencies between
org-babel-shell-names
andorg-babel-shell-initialize
- hints at being an end-user function when it shouldn't be used that way
- assumes that the things it's automating are all the same
tl;dr Make
org-babel-shell-initialize
take a list parameter, rather than operating onorg-babel-shell-names
as a side effect.It has immediate problems when learning the codebase, as written about above for
org-babel-shell-names
. It's feels convoluted to me. Understandingorg-babel-shell-names
requires knowing howorg-babel-shell-initialize
works.org-babel-initialize
forcesorg-babel-shell-names
to be defined twice.It seems like this function is trying to provide end-users with an easy way to update
ob-shell
with a new shell during run-time. That's cool and all…but that should be the job of theob-shell
maintainer. It's not simple supporting different shells, even if they're similar. What benefit is there in giving end-users a half-baked way to hang themselves?Further, I doubt many end-users are using
ob-shell
with a custom shell not supported. There's one exception: cmd.exe on Windows. However, adding "cmd.exe" or "cmdproxy.exe" toorg-babel-shell-names
doesn't really work well. Nor is it necessary: you can just use "shell".I think it's the automation part that concerns me most. It's the fear of "What if there's an exception that doesn't fit into the automation". My non-fear brain says, "Has that been a problem since this was implemented? How hard would it be to change it when a problem arises that requires such a change?" Leaving the implementation as is costs nothing (but conceptual energy). Re-working it will likely be easier later, if and when, a problem appears since there will be a clear goal.
Part of me thinks it would be better to define
org-babel-shell-names
once, populated with all the shells and then to manually callingorg-babel-shell-initialize
to define the execution functions, rather than forward declareorg-babel-shell-names
and using the defcustom call to trigger the initialization.Maybe make
org-babel-shell-initialize
take a parameter which is a list of shells to initialize? In this way, it would break the dependency onorg-babel-shell-names
. Then we could declareorg-babel-shell-names
once and have it call the initialize function.- Make
org-babel-shell-initialize
take a parameter which a list of shells to initialize - Define the list
org-babel-shell-names
- Set
org-babel-shell-names
to callorg-babel-shell-initialize
when set.- Can defcustom refer to itself during the set call? Yes! It seems so.
- probably complex for someone coming from a non-lispy background,
creates a closure to modify
[0/3]
defun org-babel-shell-initialize: Refactor
[ ]
Makeorg-babel-shell-initialize
private. No one but the maintainer should be running that.[ ]
Document the interaction betweenorg-babel-shell-initialize
andorg-babel-shell-names
[ ]
Makeorg-babel-shell-initialize
functional- Make
org-babel-shell-initialize
take a parameter which is a list of shells to initialize - Define the list
org-babel-shell-names
using defcustom Set
org-babel-shell-names
to callorg-babel-shell-initialize
when set.For example,
(defun my-initialize (initialization-list) (interactive) (dolist (name initialization-list) (message "%s" (concat "my-automatically-created-symbol:" name)))) (defcustom my-name-list '("banana" "Rama") "List of names to be created by `my-initialize'" :group 'my-test :type '(repeat (string :tag "Name to create: ")) :set (lambda (symbol value) (set-default-toplevel-value symbol value) (my-initialize my-name-list)))
- Make
ob-shell undocumented behaviors
Problem: Undocumented functionality
Some of ob-shell.el
is dedicated to undocumented behavior. The
org-babel-load-in-session
function which is bound to the
org-metaup-hook
, M-up. The source body will be inserted in a new
session process buffer.
For this is the code path (M-up):
ob-core:org-babel-load-in-session
ob-shell:org-babel-prep-session:shell
Now documented, yet a good example of this, is the :dir
header.
There have been many questions on the mailing list regarding shell
blocks and ssh. The :dir
header works with TRAMP to create a remote
connection. This works seamlessly with Babel. However, it was not
clearly documented (the only example showed an R code block and is
located in a separate part of the manual).
Addendum
History
In the early days of Babel, collaboration notes were kept. Eric makes a distinction between (what he calls) "functional" and "imperative". It's not clear to me what's meant by these terms. AFAICT, "functional" means "non-persistent" and "imperative" means "persistent". "persistent" means here that when you run a block twice, you always get the same result with a "functional" block.
See:
- https://git.savannah.gnu.org/cgit/emacs/org-mode.git/tree/contrib/babel/org-babel.org?id=c0554c775344207e6adaf191258312fb8d1b6a15
- https://git.savannah.gnu.org/cgit/emacs/org-mode.git/tree/org-babel.org?id=b1c103890c1523f99e380d88ed684454d902414e
On September 11, 2009, org-babel.org
was moved to contrib.
org-babel.org
was deleted in eec9c235 and its contents put into a
new file babel.org
. It was deleted the next day 2d2a2a19.
Org Babel started as a package called "litorgy" around February 22, 2009. Support for shells, litorgy-shell.el, was added in commit e7701000 on March 31, 2009. The current file lisp/ob-shell.el was created in commit 583e7ab1 on December 13, 2013.
An incomplete history of ob-shell is here:
Date | Commit | Action |
---|---|---|
March 31, 2009 | e7701000 | litorgy-shell.el created |
May 24, 2009 | 2b9b2ee7 | renamed to org-babel/org-babel-shell.el |
May 24, 2009 | e82fb256 | (parent) |
May 24, 2009 | 8e9d3a51 | (parent) |
May 24, 2009 | 9247200b | renamed to lisp/org-babel-shell.el |
June 12, 2009 | b64be813 | renamed to lisp/langs/org-babel-shell.el |
June 14, 2009 | 4fd1abec | renamed to lisp/langs/org-babel-sh.el |
June 17, 2010 | e0e4d760 | renamed to lisp/babel/langs/ob-sh.el |
June 17, 2010 | 04152d3a | renamed to lisp/babel/langs/ob-sh.el (from contrib?) |
December 13, 2013 | 583e7ab1 | renamed from lisp/ob-sh.el to lisp/ob-shell.el |
Mailing list items
Backwards compatibility: https://list.orgmode.org/8734ur6dk4.fsf@localhost/T/#t
Discussion on refactoring all non-session calls to be run as scripts, including bugs to fix
- https://lists.gnu.org/archive/html/emacs-orgmode/2023-11/msg00137.html
- https://lists.gnu.org/archive/html/emacs-orgmode/2023-11/msg00142.html
How Shell works at a deep level
Bug report and patches for :cmdline failings.
Handling ANSI characters
Misc:
Refactoring tips
Don't rename right away
If you rename things right away, then your notes will be outdated. Save renaming until the very last moment.
Start at the lowest level, not the highest
Operate at the function or variable level, not the file level.