Vim Tip #29: Working with Syntax Selection Strings

2020-06-13(Sat)

tags: Vim

Vim Tips

Syntax Highlighting is a nice feature of Vim, but what if you want to highlight a non-standard file? There are a lot of circumstances where this could happen (you use a standard programming language but with a non-standard file extension, you use an obscure programming language that Vim doesn't have a syntax for, there are lots of reasons). In my case, I keep detailed notes about every computer I maintain - what software has been installed, and what changes have been made to configurations. This is helpful a week or a month later when something is misbehaving, I can see what's been changed. It eventually occurred to me that it would be really useful to have syntax highlighting pointing out the dates and commands.

Before I go any further, I'll point out that the authoritative source on this Steve Losh's outstanding book Learn Vimscript the Hard Way, which is available online at https://learnvimscriptthehardway.stevelosh.com/ - look particularly at chapters 44 ("Detecting Filetypes"), and 45 through 47 on Syntax Highlighting. While it's available online, he also sells it, and I would encourage you to buy a copy: it's the best-written technical book I've ever read. (No "full disclosure" needed here: I endorse the book because it's a great book, I don't know him and I'm not paid.) Nearly everything I say in the rest of this entry I learned from that book, and he probably explains it better.

It's also important to know that you'll need a basic understanding of Regex to create Syntax Highlighting files.

Please understand that I'm not putting this forward as a glowing (or even remotely complete) example of how to do this: I know the regexes shown here are sloppy, and some of the choices are poor. Again, if you want to be properly educated on this, take the time to read the chapters mentioned above. This is just an example of how I tackled a small syntax highlighting problem.


To start highlighting a particular filetype, we need to tell Vim what filetype we're talking about. If it doesn't already exist, create a folder ~/.vim/ftdetect/ and create a file inside it for your filetype. The name should be descriptive, but isn't critical: all files in this folder are parsed by Vim at start-up regardless of name (so long as they end in the .vim extension), so the name is mostly for you.

" I called this file Installtxt.vim because my files are Install.<machine-name>.txt
au BufRead,BufNewFile Install*.txt set filetype=Installtxt

My linter (vim-vimlint) gets upset because I haven't wrapped this statement in an "autocommand group" - this would normally be good advice, but Losh's book says "Vim automatically wraps the contents of ftdetect/*.vim files in autocommand groups for you, so you don't need to worry about it."

With this change in place, Vim's behaviour when you open an "Installtxt" file won't be noticeably different because we haven't set up any syntax highlighting yet. But there is a small but important difference: if you type :set filetype? it should now reply filetype=Installtxt.

The next step is to set up the syntax file. If it doesn't exist, create a folder ~/.vim/syntax/. In that folder, create a file with a name that matches the filetype:

" ~/.vim/syntax/Installtxt.vim
" Language: Install*.txt files
" Maintainer: Giles Orr

if exists('b:current_syntax')
    finish
endif

" syntax commands go here

let b:current_syntax = 'install'

This does nothing useful yet, but this is a good outline for a syntax file. It may seem odd to say "Maintainer: Giles Orr" - isn't that unnecessarily obvious? But I've ended up with syntax files by other people in this folder, and I think it's worth noting that this is one I maintain.

syntax match date  "^[2-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]:*"

Dates in these files are always left-aligned and of the form "2020-06-13:". That date regex could be a lot cleaner, but it works and it's "good enough," so I leave it as is. I'm also not terribly picky about what colour it ends up being, or even what kind of keyword it's associated with, so I just picked "Delimiter" and ran with it:

highlight link date   Delimiter

This is very sloppy, but because this is a simplistic implementation it really doesn't matter.

Here's the more challenging part. I wanted to highlight all commands I put into the file. Here it helps to be consistent and/or a programmer: in these files, quotes (what a document says, or what a person says) are surrounded by double quotes, but commands (things like git fsck) are always surrounded by single quotes. Which means we can look for that and expect it always to be a command. This is the first, simplistic attempt:

syntax region Command start=/'/ end=/'/

Note we're using the region keyword now: I'm not going into the details of the distinction here, that's not what this entry is about.

This match works ... after a fashion. The first thing I noticed was that it matched across paragraphs, which was wrong. I added end=/\n\n/ (you can have multiple start= and end= statements). The next thing I noticed was that it matched on contractions like "I'm": the solution to that was to refine the original start= and end= statements: start=/[^a-zA-Z]'/ end=/'[^a-zA-Z]/. Finally I added a refinement recommended by Steve Losh: skip/\\./. This skips backslash-escaped single quotes. The final product looks like this:

syntax region Command start=/[^a-zA-Z]'/ start=/^'/ skip=/\\./ end=/'[^a-zA-Z]/ end=/'$/ end=/\n\n/

This works reasonably well. But I also sometimes have a command (usually longer or more complex ones) on a line by themselves, for example:

$ git update-ref HEAD 57dc883015ae780c0ea877e3f566521b4323468e

I added another definition for "Command":

syntax region Command start=/^ *[$] / end=/$/

I quickly realized this wasn't enough as I use "#" to indicate the command was executed as root, and "$" when it's a user, and sometimes I preface the #/$ with a directory name if the location is important to the command, so the final version looks like this:

syntax region Command start=/^ *[a-z/]*[#$] / end=/$/ end=/\n\n/

Notice this won't work with directories with spaces in the name or even upper case letters: I've gotten away with this so far, but I use a Mac often enough that I'll have to fix it at some point.

Again, we want to highlight the text, so I chose "String" this time:

highlight link Command String

This is a basic and very incomplete introduction to the topic: I hope it helps someone.