License New BSD license
Lines 107
Keywords
COBOL (5) IBM (1) lexer (6) pygments (6)
Included in this Library
Permissions
Owner: Theodore Test
Viewable by Everyone
Editable by All Siafoo Users
Hide
Know what you're getting – Unlike many sites, all our code is clearly licensed. Join Siafoo Now or Learn More

IBM COBOL Pygments Lexer Atom Feed 0

In Brief A Pygments Lexer for syntax highlighting of IBM COBOL, a modern COBOL dialect. This is the IBM COBOL lexer used by Siafoo.
# 's
  1import re
2
3from pygments.lexer import RegexLexer, include, bygroups, using, this
4from pygments.token import Error, Punctuation, Text, Comment, Operator, Keyword, Name, String, Number
5
6class IBM_COBOLLexer(RegexLexer):
7 name = 'IBM_COBOL'
8 aliases = ['ibm_cobol', 'IBM_COBOL']
9 filenames = ['*.cbl','*.CBL']
10 mimetypes = []
11
12
13 # Really only five major sorts of highlighting --
14 # Reserved Words, Comments, Numbers, Strings, and Punctuation
15 # Due to COBOL implementation details, even what modern languages consider 'operators'
16 # are treated as Reserved Words.
17
18 tokens = {
19 'root': [
20 (r'\*.*\n', Comment),
21 include('strings'),
22 include('core'),
23 include('nums'),
24 (r'[\s]+', Text)
25 ], 'core':[(r'\b(FOREGROUND-COLOR|ROLLING|INDEX-3|END-UNSTRING|DB-DATA-NAME|REFERENCE-MONITOR|REPLACING'
26 r'SEGMENT-LIMIT|PROCEDURE-POINTER|DB-STATUS|GET|END-PERFORM|FROM|RIGHT'
27 r'RECURSIVE|RESERVE|REEL|COPY|ZERO-FILL|AT|RELATION'
28 r'ALPHANUMERIC-EDITED|QUOTES|TIMES|VALUE|NONE|KANJI|MEMORY'
29 r'I-O-CONTROL|SUPPRESS|NUMBER|SPACE|PROCEED|TYPEDEF|DIVISION'
30 r'STATUS|END-ADD|ASSIGN|=|THAN|BIT|DECLARATIVES'
31 r'TEST|COMMA|PROCESSING|ALSO|FIND|WHEN|ARITHMETIC'
32 r'\*\*|ADDRESS|END-EVALUATE|ROUNDED|SORT-MERGE|SUB-SCHEMA|TRAILING'
33 r'NULL|SPACES|PREFIX|LOCAL-STORAGE|REMOVAL|EQUAL|CHARACTER|'
34 r'LOCALE|DEPENDING|INVOKE|SECURE|USE|ZEROES|OBJECT-COMPUTER|'
35 r'>|USING|KEY|PRESENT|EXIT|BOOLEAN|UNDERLINE|'
36 r'SKIP3|GREATER|FALSE|PROGRAM-ID|WAIT|TRUE|END-XML|'
37 r'EMPTY-CHECK|CONVERTING|CLOSE|EBCDIC|EMPTY|ADD|END-DIVIDE|'
38 r'WRITE|USAGE|B-LESS|CRT-UNDER|DUPLICATES|PROMPT|LENGTH-CHECK|'
39 r'INDEX-8|BITS|CONFIGURATION|COMMIT|TRAILING-SIGN|STORE|COMPUTATIONAL-0|'
40 r'INDICATE|ASCENDING|DBCS|PRIOR|COLUMN|NO|INITIALIZE|'
41 r'CRT|SET|DEBUGGING|REALM|DISPLAY|OPEN|REPLACE|'
42 r'HIGH-VALUE|HIGHLIGHT|COMP-6|REFERENCES|RETURN-CODE|DBCS-EDITED|STOP|'
43 r'SECTION|FOREGROUND-COLOUR|DEFAULT|INDEX|DEBUG-LINE|>=|DESCRIBED|'
44 r'DATE-COMPILED|END-START|LEFT-JUSTIFY|TIME|THROUGH|SEQUENTIAL|END-INVOKE|'
45 r'INITIATE|I-O|UP|LOW-VALUES|ATTRIBUTE|START|REFERENCE|'
46 r'FILES|SIZE|END-ACCEPT|BACKGROUND-COLOUR|ERROR|CONTAINED|UPDATE|'
47 r'SYMBOLIC|ALTERNATE|JUST|END|FUNCTION|ADVANCING|INTO|'
48 r'BLANK|DB-FORMAT-NAME|ORDER|LEADING|ALPHABETIC-UPPER|CONNECT|DB-ACCESS-CONTROL-KEY|'
49 r'BLINK|/|DECIMAL-POINT|DYNAMIC|REQUIRED|TO|FILLER|'
50 r'END-IF|CURSOR|RIGHT-JUSTIFY|STRING|BLOCK|OVERFLOW|EXCEEDS|'
51 r'DOWN|IDENTIFICATION|ACCEPT|MODULES|COMP-1|SORT-RETURN|DELETE|'
52 r'OR|OUTPUT|READY|VARYING|END-REWRITE|LENGTH|PROCESS|'
53 r'PARSE|CLASS|REVERSE-VIDEO|CONTAINS|OF|CURRENT|BINARY|'
54 r'END-CALL|BEFORE|DISPLAY-9|DISPLAY-8|DISPLAY-7|DISPLAY-6|SYSIN|'
55 r'DISPLAY-4|REWRITE|DELIMITER|DISPLAY-1|COMPUTATIONAL-5|COMPUTATIONAL-4|COMPUTATIONAL-7|'
56 r'END-READ|COMPUTATIONAL-1|REVERSED|COMPUTATIONAL-3|COMPUTATIONAL-2|ON|EJECT|'
57 r'COMPUTE|GLOBAL|RECORD-NAME|DB-SET-NAME|LINKAGE|<|NULL-KEY-MAP|'
58 r'SUBFILE|END-DISPLAY|EVALUATE|END-OF-PAGE|SUBSTITUTE|EVERY|KEEP|'
59 r'PIC|RUN|BOTTOM|BY|SUBTRACT|LINE|UNEQUAL|'
60 r'MULTIPLY|CONTROL-AREA|FILE|INDEX-1|PROTECTED|DUPLICATE|INDEX-5|'
61 r'INDEX-4|WHEN-COMPILED|RERUN|-|SEPARATE|DEBUG-NAME|MULTIPLE|'
62 r'TRANSACTION|RESET|DISPLAY-5|DISPLAY-2|POINTER|OMITTED|ARE|'
63 r'CALL|DISPLAY-3|ACCESS|SAME|NEGATIVE|EXCLUSIVE|TAPE|'
64 r'MERGE|ALPHABETIC-LOWER|IF|ID|INSTALLATION|CANCEL|IN|'
65 r'INVALID|ZEROS|OTHER|ALIAS|IS|AUTO|FINISH|'
66 r'PICTURE|CONSOLE|RANDOM|COMP-8|END-SUBTRACT|SCREEN|WITHIN|'
67 r'FREE|PROGRAM|MODE|CONTINUE|RECORD|COMP-7|INDIC|'
68 r'COMP-5|COMP-4|COMP-3|COMP-2|EXTERNALLY-DESCRIBED-KEY|COMP-0|RENAMES|'
69 r'LD|BEEP|CONTROL|DATA|NULL-MAP|DELIMITED|DATE|'
70 r'CHARACTERS|CONTROLS|COMPUTATIONAL-9|REPEATED|LINAGE|DB-EXCEPTION|COMPUTATIONAL-8|'
71 r'LINES|<=|TOP|REPOSITORY|EXCEPTION|MOVE|SD|'
72 r'PAGE|TYPE|VALID|OWNER|REWIND|DEBUG-CONTENTS|INSPECT|'
73 r'\*|HIGH-VALUES|LIKE|CORRESPONDING|ALL|RETRIEVAL|DESCENDING|'
74 r'ZERO|CONTENT|STANDARD|SPACE-FILL|COMMITMENT|LOCK|CLOCK-UNITS|'
75 r'SORT|LEFT|AUTO-SKIP|OCCURS|DEBUG-SUB-3|DEBUG-SUB-2|ORGANIZATION|'
76 r'WORDS|BACKGROUND-COLOR|ERASE|COLLATING|MEMBER|UNTIL|AREA|'
77 r'USAGE-MODE|DROP|ALPHABET|RETURNING|DAY-OF-WEEK|PACKED-DECIMAL|END-COMPUTE|'
78 r'NATIVE|COL|METACLASS|COMP-9|FD|STANDARD-1|END-STRING|'
79 r'STANDARD-2|NUMERIC-EDITED|WORKING-STORAGE|READ|TITLE|END-WRITE|B-NOT|'
80 r'THEN|COMP|REMAINDER|FIRST|NOT|TERMINAL|INPUT-OUTPUT|'
81 r'ACQUIRE|POSITIVE|GIVING|SPECIAL-NAMES|DB-RECORD-NAME|AND|UNIT|'
82 r'GENERATE|OPTIONAL|TALLYING|INPUT|INDEXED|EXTERNAL|END-MULTIPLY|'
83 r'EQUALS|ALPHANUMERIC|ELSE|B-EXOR|NULLS|DB|SEQUENCE|'
84 r'FOR|DIVIDE|ROLLBACK|END-DELETE|COUNT|STARTING|VLR|'
85 r'SECURITY|POSITION|INDICATORS|COMMON|FILE-CONTROL|RECORDS|ALTER|'
86 r'LIBRARY|MODIFIED|SOURCE-COMPUTER|\+|WITH|AUTHOR|END-SEARCH|'
87 r'SEARCH|LINAGE-COUNTER|SELECT|JUSTIFIED|PROCEDURES|REDEFINES|SKIP1|'
88 r'CURRENCY|SKIP2|LABEL|SYNC|UPON|INITIAL|GOBACK|'
89 r'BELL|SYNCHRONIZED|INDEX-2|RETURN|CODE|CODE-SET|LOCALLY|'
90 r'AREAS|B-AND|SHARED|ALPHABETIC|VALUES|INDEX-7|ONLY|'
91 r'PADDING|LOW-VALUE|INDEX-6|INDEX-9|AFTER|FILE-STREAM|COMPUTATIONAL|'
92 r'COMPUTATIONAL-6|PROCEDURE|AUTOMATIC|ENTRY|GO|ENTER|MODIFY|'
93 r'RELATIVE|SIGN|FOOTING|NATIONAL|DISCONNECT|SENTENCE|DAY|'
94 r'OFF|FULL|END-RETURN|DEBUG-ITEM|B-OR|THRU|RETAINING|'
95 r'RECONNECT|TENANT|FETCH|PRINTING|NEXT|NUMERIC|LESS|'
96 r'QUOTE|LAST|DATE-WRITTEN|DEBUG-SUB-1|UNSTRING|VALIDATE|SYSOUT|'
97 r'EOP|ENVIRONMENT|RELEASE|OBJECT|NO-ECHO|PERFORM|FORMAT|'
98 r'EXTEND|CORR|INDICATOR)\s*\b', Keyword),
99 (r'[a-zA-Z_][a-zA-Z0-9_]*', Name),
100 (r'(\.)',Punctuation)],
101
102 'strings': [
103 (r'"(\\|\[0-7]+|\.|[^"])*"', String.Double),
104 (r"'(\\|\[0-7]+|\.|[^'])*'", String.Single),
105 ],
106
107 'nums': [
108 (r'\d+(?![.Ee])', Number.Integer),
109 (r'[+-]?\d*\.\d+([eE][-+]?\d+)?', Number.Float),
110 (r'[+-]?\d+\.\d*([eE][-+]?\d+)?', Number.Float)
111 ],
112 }

A Pygments Lexer for syntax highlighting of IBM COBOL, a modern COBOL dialect. This is the IBM COBOL lexer used by Siafoo.

Comments

over 8 years ago (08 Aug 2008 at 03:26 AM) by Stou S.
Woah did you write this?
over 8 years ago (08 Aug 2008 at 09:16 AM) by Theodore Test
Though to be completely accurate, technically I wrote a script which in turn wrote these COBOL lexers by parsing a page of various sorts of COBOL keywords into mildly-customized Pygments boilerplate. `The aforementioned page was here <http://publib.boulder.ibm.com/infocenter/iadthelp/v7r0/index.jsp?topic=/com.ibm.etools.iseries.langref.doc/c0925395697.htm>`_
over 8 years ago (08 Aug 2008 at 09:05 AM) by Theodore Test
Yup. As usual, Pygments makes lexer-writing pretty dang easy. Three quarters C/V work and one quarter regex common sense.

I haven't really tested it seriously, though, so there might be hidden blips.
over 8 years ago (08 Aug 2008 at 11:17 AM) by Stou S.
Yup if you find the right format data you can easily automate the process. You should post your script and probably add some info to my lame "writing pygments lexers"... "article".

I'll play around with the Cobol lexer, when I come back from DC16 and integrate it into the next build (or maybe sooner).
over 8 years ago (08 Aug 2008 at 05:35 PM) by Theodore Test
Have fun at DC16 -- try and stay off the WoS! :)

Oh, and I caught a missing backspace-escape in the COBOL lexers. Hopefully that'll be the only major bug.
over 8 years ago (08 Aug 2008 at 11:30 PM) by Stou S.
I am tethered to my phone... which is sloooow but safe... and either way if someone is 31337 to pull my data off the cell tower they probably deserve to have it.
over 8 years ago (15 Aug 2008 at 04:16 PM) by Stou S.
The Cobol lexers are now part of Siafoo... but the lexer names are "cobol", "cobol_ibm", "cobol_ile" since I felt it was more consistent.