The article title is somewhat deceptive since what you will actually be doing is "Hacking" a lexer.
A lexer is a state machine that parses the program text and marks specific sequences of characters with predefined tokens. If you were writing a compiler the tokens would probably be used to create an Abstract Syntax Tree, a tree representation of the program, which will then be used to generate the compiler output. In the Pygments case, [from what I understand] the tokens are used directly by the formatter to generate highlighted output.
In general the lexer you are planning to write will fit in one of the following three categories:
- More builtins for an existing language
This would be useful for embedded languages such as Blender's Python.
- Custom extensions to a standard language
For a lack of a better example this type of lexer would be useful for something like J++, the polluted Java, Microsoft released in the late 1990s.
- A new language
Because the Pygments lexer hacking page is a very good resource, for now I will only provide you with information on adding builtins, below. If you need to do something more complicated refer to the Pygments documentation
Adding some extra builtins to your favorite language is by far the easiest thing you can do, it is basically a copy/paste job. It is done by subclassing the parent language Lexer and marking the correct 'keywords' as keyword tokens.
The following example from http://pygments.org/docs/lexerdevelopment/ illustrates the point very well:
1from pygments.lexers.agile import PythonLexer
2from pygments.token import Name, Keyword
5 EXTRA_KEYWORDS = ['foo', 'bar', 'foobar', 'barfoo', 'spam', 'eggs']
7 def get_tokens_unprocessed(self, text):
8 for index, token, value in PythonLexer.get_tokens_unprocessed(self, text):
9 if token is Name and value in self.EXTRA_KEYWORDS:
10 yield index, Keyword.Pseudo, value
12 yield index, token, value
just add your keywords to EXTRA_KEYWORDS list and you are set.