Introduction
Columns++ is a plugin for Notepad++ which offers features for working with text and data arranged in columns, including an implementation of elastic tabstops, enhanced searching and sorting, column alignment and numeric calulations. Like Notepad++, Columns++ is released under the GNU General Public License (either version 3 of the License, or, at your option, any later version). Columns++ was first released by Randall Joseph Fellmy in 2023; you can find the source code on GitHub.
Columns++ uses the C++ Mathematical Expression Toolkit Library (ExprTk) by Arash Partow, which is released under the MIT license; JSON for Modern C++ by Niels Lohmann, which is released under the MIT license; the Boost.Regex library, which is released under the Boost Software License, Version 1.0; and some files from the Unicode Character Database, which is released under the Unicode License V3.
Purpose and limitations
Columns++ is designed to provide some helpful functions for editing text or data that is lined up visually in columns, so that you can make a rectangular selection of the column(s) you want to process.
The integrated implementation of Elastic tabstops works to line up columns when tabs are used as logical separators, including tab-separated values data files as well as any ordinary text or code document containing sections in which you want to line up columns easily using tabs. You can use this feature on its own or with the other functions in Columns++.
Columns++ is optimized for use with Elastic tabstops. It also works with files that use traditional, fixed tabs for alignment, or no tabs at all; however, you should ordinarily select only one column at a time in files that don’t use Elastic tabstops.
Columns++ is generally not helpful when columns do not line up visually, such as in comma-separated values files. However, Columns++ can convert between delimiter-separated values and tabbed presentation; and there are some features, particularly Search using numeric formulas in regular expression replacement strings and Sorting with custom criteria, which may be useful in documents that are not column-oriented.
Elastic tabstops can cause loading and editing to be slow for large files. By default, Elastic tabstops is automatically turned off for files over 1000 KB or 5000 lines. You can change these limits.
Elastic tabstops
Columns++ includes a new implementation of Nick Gravgaard’s Elastic tabstops. (Please note that as of this writing I have not communicated with Mr. Gravgaard about my implementation of his proposal, and no endorsement on his part is implied. — RJF)
The first item of the Columns++ menu enables or disables Elastic tabstops. Elastic tabstops stretches tabs so that columns line up to fit their content, using only a single tab to separate one column from the next.
This implementation of Elastic tabstops includes some options that were not part of the original proposal. These options can be accessed by using the Profile... menu option. There are three “built-in” profiles:
Classic | endeavors to reproduce precisely the behavior described in the proposal linked above. |
---|---|
General | ensures that leading tabs are always used for indentation, and are not lined up with elastic tabstops. |
Tabular | is suitable for tab-separated values files, in which the entire file is a single table with the values in each row separated by single tabs. |
You can select a profile from the drop-down box in the Elastic tabstops profile dialog. You can also change individual settings; choose some options to automatically enable a profile or disable elastic tabstops for different types of files; and save, rename or delete profiles.
Settings in an elastic tabstops profile
Along with the enabled or disabled status of elastic tabstops, the settings in an elastic tabstops profile are kept independently for each document you have open. These settings, which are available in the Elastic tabstops profile dialog, are:
Use leading tabs for indentation only; don't make them elastic. | When checked, this option treats tabs which occur at the beginning of a line, before the first non-tab character, as ordinary fixed-width tabs instead of elastic tabs. Without this option, a line with a tab used to line up a column of data cannot be followed by a line that uses tabs for indentation without an intervening blank line; otherwise, the first leading tab will expand to line up with the tab on the previous line. The disadvantage is that if you want an empty column at the beginning of a line, you must place a space before the first tab to make it line up with the next column. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Line up elastic tabstops throughout the entire document. | Normally elastic tabstops are positioned independently whenever a column is interrupted; that is, tabstops created by tabs that appear on adjacent lines are lined up, but they don’t “project through” lines with fewer (or no) tabs. This option indicates that a single set of tabstops is to be used for the entire document, so that columns line up even when intervening lines have fewer columns. | ||||||||
Do not allow text following the last tab on a line to span columns. | Normally, text following the last tab on a line is not treated as belonging to a “column” at all. This makes sense for documents that mix text and tables. However, for documents that are entirely tabular but have omitted tabs at the end of lines where the final columns are blank, this option (along with the one above) is needed to keep things lined up properly. | ||||||||
Override default/language tab size (used for indent or minimum): | Elastic tabstops uses the “tab size” in different ways depending on whether Use leading tabs for indentation only is checked: when checked, the tab size represents the number of spaces each leading tab indents, and it is otherwise ignored; when unchecked, it is the minimum space between any two tabstops (that is, the width of the intervening column plus the space between columns). When the Override tab size box is unchecked, Columns++ uses the tab size set in Notepad++; when checked, the spin box to the right specifies the size (in spaces) to be used. | ||||||||
Minimum space between elastic columns: | This spin box specifies the size (in spaces) occupied by the tab following the longest span of text in a column. | ||||||||
Apply monospaced font optimizations: |
When Use DirectWrite is not enabled in Notepad++ MISC. settings, responsiveness with elastic tabstops enabled is greatly improved if it is possible to calculate the width of text by counting characters rather than by measuring. This only works if the fonts in use are monospaced (typewriter-like fonts in which every character has the same width; also called fixed pitch fonts), and all the fonts used by the styles in the current language have the same width. Monospaced font optimizations do not appear to offer any benefit (and might make things slower) when DirectWrite is enabled.
Usually it’s best to let Columns++ determine whether to use monospaced font optimizations, but there can be exceptional cases. Columns++ checks the width of a space and a capital letter W in each font assigned to a style in the current language; if these are all the same, it uses monospace optimizations if DirectWrite is not enabled. In some cases, a language might define styles which inhibit optimization but are never applied in a particular file; for large files, the performance gain from forcing monospaced font optimizations may be considerable. Conversely, a font might use monospaced characters in the ASCII range but wider characters outside that range; in this case, monospaced font optimizations can cause processing to be much slower than necessary, since each line in which text overflows the expected width in any column forces additional measurement and layout of text. If you want to use elastic tabstops with a large file, but response is sluggish and the best estimate chosen by Columns++ seems wrong, it’s worth trying the opposite setting. |
These settings are only applied when you click the OK button near the bottom right of the dialog.
Saving, renaming and deleting profiles
You can save the settings in a profile by clicking the Save As... button to the right of the profile selection drop-down box. You can give the profile any name that does not begin with an asterisk or an open parenthesis and is not one of the three built-in profiles (“Classic,” “General” and “Tabular”). You can use the additional options from the drop-down menu at the right of the Save As... button to rename or delete a profile. If you have made changes to an existing profile that is not a built-in profile, you can save the changes without having to type the profile name again by using the Save option.
Automatically enabling or disabling elastic tabstops
By default, Columns++ uses whatever settings were in effect for the last active tab when you open a file or a new tab. You can change this behavior with the remaining options on the Elastic tabstops profile dialog.
The checkbox under the profile selection dropdown labeled Automatically enable this profile when opening type files. is available when a built-in or saved profile is selected (and Disable... when opening type files. in the bottom section of the dialog, which will be explained later, is not checked). Checking this box assigned the selected built-in or saved profile to be enabled whenever you open a file of the same type as the one you are currently viewing. The Type can be existing files with the same extension, existing files with no extension, or new files. This option is only applied when you click the OK button near the bottom right of the dialog.
The options in the box labeled When opening an existing file without an explicit rule for its extension allow you choose what happens when opening existing files for which you haven’t set either Automatically enable this profile... or Disable... when opening...:
keep the same settings as the last viewed tab. | This is the default behavior: each existing file you open begins with same the elastic tabstops settings you had previously. Note that setting does not affect the default for new files; if you want a profile enabled, or elastic tabstops disabled, whenever you open a new tab with File|New you must set that behavior specifically using one of the when opening new files options in the Elastic tabstops profile dialog opened when viewing a new file. |
---|---|
disable elastic tabstops. | Elastic tabstops will be turned off when opening any existing file unless you’ve specifically set a rule to turn it on for that file’s extension. |
enable this profile: | You can select any built-in or saved profile, which will be enabled when opening any existing file unless you’ve set a different rule for that file’s extension. |
The options in the box labeled Disable elastic tabstops (applies to all profiles) allow you choose specific conditions under which elastic tabstops should always be disabled:
when opening type files. | If you always want elastic tabstops disabled when you open the type of file in the current tab, check this box. |
---|---|
when opening files over ____ KB. | Elastic tabstops can cause loading and editing to be slow for large files. These options disable elastic tabstops when loading files over the specified limits, regardless of any other settings. The default values disable elastic tabstops for files over 1000 KB or 5000 lines. |
when opening files over ____ lines. |
Note that although the options for automatically enabling or disabling elastic tabstops do not affect the tab you have open, they are only applied when you click the OK button near the bottom right of the dialog.
Rectangular selections
Most of the commands from the Columns++ menu operate on rectangular selections.
You can select a single column or multiple columns separated by tabs. Since each tab is interpreted as a column separator, this works as expected when elastic tabstops are used. The results with traditional fixed tabs are not likely to be obvious or expected when sequences of multiple fixed tabs are included in the selection, since Columns++ interprets each tab as starting a new “logical” column without regard to physical placement.
When selecting one or more columns in a document using tabs, you should generally include the tab that ends the rightmost selected column in your selection. Unless all the entries in the last column are the same width, it is often difficult or impossible to get a complete selection without including the final tabs; in any case, Columns++ will process the trailing tabs intelligently.
When you invoke a command that requires a rectangular selection and the current selection is not a non-zero-width rectangular selection, Columns++ will inform you of this and, if possible, offer reasonable options to create a rectangular selection based on the current selection or cursor position.
You can enable specific “implicit” rectangular selections in the Options dialog if you would prefer that Columns++ make those selections without prompting you.
Selection submenu and keyboard shortcuts
The Selection submenu contains commands to create a rectangular selection based on the current selection.
Command | Keyboard shortcut | Action |
---|---|---|
Select Up | Ctrl+Alt+Up | Makes a rectangular selection which encloses the current selection and extends upward to reach the first line of the document. |
Select Left | Ctrl+Alt+Left | Makes a rectangular selection which encloses the current selection and extends to the left edge of the document. |
Select Right | Ctrl+Alt+Right | Makes a rectangular selection which encloses the current selection and extends to the right far enough to include the rightmost character in each line of which it includes any part. |
Select Down | Ctrl+Alt+Down | Makes a rectangular selection which encloses the current selection and extends downward to reach the last line of the document; except that if the last line is empty (that is, the document ends with a line return), the selection extends to the next-to-last line. |
Enclose Selection | Ctrl+Alt+Home | Makes the smallest rectangular selection which includes every part of the current selection. |
Extend Selection | Ctrl+Alt+End | If the selection is contained in a single line, makes a rectangular selection which encloses the current selection and extends from the first line of the document to the last line of the document; except that if the last line is empty (that is, the document ends with a line return), the selection extends to the next-to-last line. If the selection is an empty selection (a zero-width rectangular selection or a multiple selection in which each selection is empty) spanning more than one line, makes a rectangular selection which encloses the current selection and extends from the left edge of the document far enough to the right to include the rightmost character in each line of which it includes any part. Otherwise, makes the smallest rectangular selection which includes every part of the current selection. |
The keyboard shortcuts are defined by default, but you can change or remove them using the Shortcut Mapper from the Settings menu in Notepad++. These commands do not always work sensibly when word-wrapped lines are included in the selection range.
Regular expressions
Several commands in Columns++ can use regular expressions for matching character strings. Columns++ uses the same regular expression engine, Boost.Regex, used in Notepad++, so the syntax and behavior are mostly the same. Some considerations unique to Columns++ are described below.
Within a rectangular selection, the selection in each row is matched independently of the surrounding text. The ^ assertion matches the beginning of the selection within a row, the $ assertion matches the end of the selection, and lookahead and lookbehind assertions cannot examine text past the boundaries of the selection. (Lookbehind assertions in Notepad++ can examine all text back to the beginning of the document, even when counting or replacing in a selection.) When using the Search in indicated region dialog, the region to be searched can be made up of one or more separate segments, each of which is searched independently. When a rectangular selection initializes the search region, each row of the selection becomes a separate segment of the search region.
A rectangular selection enclosing a series of lines, or the entire document, is not the same as an ordinary selection encompassing the same series of lines, or the entire document. Rectangular selections do not include line endings, and each line is a separate selection when matching. This applies to search regions created from such selections for use in the Search in indicated region dialog, where it can be important to distinguish these two cases (which, unfortunately, appear the same visually).
Matches for regular expressions using the \K directive are never replaced when performing stepwise Find and Replace in Notepad++. In the Search in indicated region dialog in Columns++, such matches can be replaced if you do not click outside the dialog between finding the match and replacing it.
Unicode
There are some significant differences in the way Columns++ and Notepad++ match regular expressions in Unicode documents. Columns++ treats the document text as a series of Unicode code points (i.e., UTF-32) while Notepad++ treats it as UTF-16. Most of the differences are related to this change.
There are no surrogate pairs in Columns++ regular expressions; each code point matches as a single character. To enter any Unicode character in hexadecimal notation, use the full code point; for example, enter 🙂 as \x{1f642}
. (The surrogate pair, \x{d83d}\x{de42}
, which must be used in Notepad++ search, will not match in Columns++.)
Scintilla, the display control used in Notepad++, represents Unicode internally as UTF-8. (This is true whether the file containing the document is UTF-8, UTF-16 or anything else other than “ANSI.”) When displaying Unicode documents that contain invalid UTF-8, Scintilla shows each byte that cannot be decoded as a hexadecimal code in reversed colors. You can match any of these bytes with \i
; to match a specific byte, use the hexadecimal code Scintilla displays as a symbolic character name, e.g., [[.xF7.]]
. (When matching a regular expression, Columns++ treats each of these error bytes as if it were the Unicode code point formed by adding 0xdc00
to the invalid byte. These code points are in the surrogate range and are invalid as UTF-32 code units.)
The period (.
) matches any one code point except the characters which end lines in Scintilla: carriage return (\x0d
or \r
) and newline (also called line feed, \x0a
or \n
). This corresponds to the documented behavior of the period, but not the actual behavior in Notepad++ (where there are several other control characters it does not match). Use \X
to match a character including any combining code points (marks) which follow it. (In Notepad++ search, .
and \X
do not work as expected when the code points involved are outside the basic multilingual plane, that is, 0x10000 or greater.)
The Unicode character classes and these escape sequences, negations (which match characters not in the class) and character classes are added:
escape | negation | character class | meaning |
---|---|---|---|
\i | \I | [[:invalid:]] | a byte in an invalid UTF-8 sequence |
\o | \O | [[:ascii:]] | an ASCII character, code points 0 through 127 |
\y | \Y | [[:defined:]] | any Unicode code point that is assigned and is not a surrogate or a private use character |
Some symbolic names for collating elements have been added, including all the abbreviations used to display invisible characters; all can be entered in any mix of upper and lower case. This is the full list:
expression | code pt | common name |
---|---|---|
[[.nul.]] | 00 | null |
[[.soh.]] | 01 | start of heading |
[[.stx.]] | 02 | start of text |
[[.etx.]] | 03 | end of text |
[[.eot.]] | 04 | end of transmission |
[[.enq.]] | 05 | enquiry |
[[.ack.]] | 06 | acknowledge |
[[.bel.]] [[.alert.]] | 07 | bell |
[[.bs.]] [[.backspace.]] | 08 | backspace |
[[.ht.]] [[.tab.]] | 09 | horizontal tab |
[[.lf.]] [[.newline.]] | 0a | line feed, new line |
[[.vt.]] [[.vertical-tab.]] | 0b | line tabulation, vertical tab |
[[.ff.]] [[.form-feed.]] | 0c | form feed |
[[.cr.]] [[.carriage-return.]] | 0d | carriage return |
[[.so.]] | 0e | shift out |
[[.si.]] | 0f | shift in |
[[.dle.]] | 10 | data link escape |
[[.dc1.]] | 11 | device control one |
[[.dc2.]] | 12 | device control two |
[[.dc3.]] | 13 | device control three |
[[.dc4.]] | 14 | device control four |
[[.nak.]] | 15 | negative acknowledge |
[[.syn.]] | 16 | synchronous idle |
[[.etb.]] | 17 | end of transmission block |
[[.can.]] | 18 | cancel |
[[.em.]] | 19 | end of medium |
[[.sub.]] | 1a | substitute |
[[.esc.]] | 1b | escape |
[[.fs.]] [[.IS4.]] | 1c | information separator four |
[[.gs.]] [[.IS3.]] | 1d | information separator three |
[[.rs.]] [[.IS2.]] | 1e | information separator two |
[[.us.]] [[.IS1.]] | 1f | information separator one |
[[.space.]] | 20 | |
[[.exclamation-mark.]] | 21 | ! |
[[.quotation-mark.]] | 22 | " |
[[.number-sign.]] | 23 | # |
[[.dollar-sign.]] | 24 | $ |
[[.percent-sign.]] | 25 | % |
[[.ampersand.]] | 26 | & |
[[.apostrophe.]] | 27 | ' |
[[.left-parenthesis.]] | 28 | ( |
[[.right-parenthesis.]] | 29 | ) |
[[.asterisk.]] | 2a | * |
[[.plus-sign.]] | 2b | + |
[[.comma.]] | 2c | , |
[[.hyphen.]] | 2d | - |
[[.period.]] | 2e | . |
[[.slash.]] | 2f | / |
[[.zero.]] | 30 | 0 |
[[.one.]] | 31 | 1 |
[[.two.]] | 32 | 2 |
[[.three.]] | 33 | 3 |
[[.four.]] | 34 | 4 |
[[.five.]] | 35 | 5 |
[[.six.]] | 36 | 6 |
[[.seven.]] | 37 | 7 |
[[.eight.]] | 38 | 8 |
[[.nine.]] | 39 | 9 |
[[.colon.]] | 3a | : |
[[.semicolon.]] | 3b | ; |
[[.less-than-sign.]] | 3c | < |
[[.equals-sign.]] | 3d | = |
[[.greater-than-sign.]] | 3e | > |
[[.question-mark.]] | 3f | ? |
[[.commercial-at.]] | 40 | @ |
[[.left-square-bracket.]] | 5b | [ |
[[.backslash.]] | 5c | \ |
[[.right-square-bracket.]] | 5d | ] |
[[.circumflex.]] | 5e | ^ |
[[.underscore.]] | 5f | _ |
[[.grave-accent.]] | 60 | ` |
[[.left-curly-bracket.]] | 7b | { |
[[.vertical-line.]] | 7c | | |
[[.right-curly-bracket.]] | 7d | } |
[[.tilde.]] | 7e | ~ |
[[.del.]] | 7f | delete |
[[.pad.]] | 80 | padding character |
[[.hop.]] | 81 | high octet preset |
[[.bph.]] | 82 | break permitted here |
[[.nbh.]] | 83 | no break here |
[[.ind.]] | 84 | index |
[[.nel.]] | 85 | next line |
[[.ssa.]] | 86 | start of selected area |
[[.esa.]] | 87 | end of selected area |
[[.hts.]] | 88 | character (horizontal) tabulation set |
[[.htj.]] | 89 | character (horizontal) tabulation with justification |
[[.lts.]] | 8a | line (vertical) tabulation set |
[[.pld.]] | 8b | partial line forward (down) |
[[.plu.]] | 8c | partial line backward (up) |
[[.ri.]] | 8d | reverse line feed (index) |
[[.ss2.]] | 8e | single-shift two |
[[.ss3.]] | 8f | single-shift three |
[[.dcs.]] | 90 | device control string |
[[.pu1.]] | 91 | private use one |
[[.pu2.]] | 92 | private use two |
[[.sts.]] | 93 | set transmit state |
[[.cch.]] | 94 | cancel character |
[[.mw.]] | 95 | message waiting |
[[.spa.]] | 96 | start of protected area |
[[.epa.]] | 97 | end of protected area |
[[.sos.]] | 98 | start of string |
[[.sgci.]] | 99 | single graphic character introducer |
[[.sci.]] | 9a | single character introducer |
[[.csi.]] | 9b | control sequence introducer |
[[.st.]] | 9c | string terminator |
[[.osc.]] | 9d | operating system command |
[[.pm.]] | 9e | private message |
[[.apc.]] | 9f | application program command |
[[.nbsp.]] | a0 | no-break space |
[[.shy.]] | ad | soft hyphen |
[[.alm.]] | 061c | arabic letter mark |
[[.sam.]] | 070f | syriac abbreviation mark |
[[.ospm.]] | 1680 | ogham space mark |
[[.mvs.]] | 180e | mongolian vowel separator |
[[.nqsp.]] | 2000 | en quad |
[[.mqsp.]] | 2001 | em quad |
[[.ensp.]] | 2002 | en space |
[[.emsp.]] | 2003 | em space |
[[.3/msp.]] | 2004 | three-per-em space |
[[.4/msp.]] | 2005 | four-per-em space |
[[.6/msp.]] | 2006 | six-per-em space |
[[.fsp.]] | 2007 | figure space |
[[.psp.]] | 2008 | punctation space |
[[.thsp.]] | 2009 | thin space |
[[.hsp.]] | 200a | hair space |
[[.zwsp.]] | 200b | zero-width space |
[[.zwnj.]] | 200c | zero-width non-joiner |
[[.zwj.]] | 200d | zero-width joiner |
[[.lrm.]] | 200e | left-to-right mark |
[[.rlm.]] | 200f | right-to-left mark |
[[.ls.]] | 2028 | line separator |
[[.ps.]] | 2029 | paragraph separator |
[[.lre.]] | 202a | left-to-right embedding |
[[.rle.]] | 202b | right-to-left embedding |
[[.pdf.]] | 202c | pop directional formatting |
[[.lro.]] | 202d | left-to-right override |
[[.rlo.]] | 202e | right-to-left override |
[[.nnbsp.]] | 202f | narrow no-break space |
[[.mmsp.]] | 205f | medium mathematical space |
[[.wj.]] | 2060 | word joiner |
[[.(fa).]] | 2061 | function application |
[[.(it).]] | 2062 | invisible times |
[[.(is).]] | 2063 | invisible separator |
[[.(ip).]] | 2064 | invisible plus |
[[.lri.]] | 2066 | left-to-right isolate |
[[.rli.]] | 2067 | right-to-left isolate |
[[.fsi.]] | 2068 | first strong isolate |
[[.pdi.]] | 2069 | pop directional isolate |
[[.iss.]] | 206a | inhibit symmetric swapping |
[[.ass.]] | 206b | activate symmetric swapping |
[[.iafs.]] | 206c | inhibit arabic form shaping |
[[.aafs.]] | 206d | activate arabic form shaping |
[[.nads.]] | 206e | national digit shapes |
[[.nods.]] | 206f | nominal digit shapes |
[[.idsp.]] | 3000 | ideographic space |
[[.zwnbsp.]] | feff | zero-width no-break space |
[[.iaa.]] | fff9 | interlinear annotation anchor |
[[.ias.]] | fffa | interlinear annotation separator |
[[.iat.]] | fffb | interlinear annotation terminator |
[[.sflo.]] | 1bca0 | shorthand format letter overlap |
[[.sfco.]] | 1bca1 | shorthand format continuing overlap |
[[.sfds.]] | 1bca2 | shorthand format down step |
[[.sfus.]] | 1bca3 | shorthand format up step |
[[.x80.]]–[[.xff.]] | invalid UTF-8 bytes |
Search
Columns++ offers the ability to find and replace within a region marked by an indicator. Notepad++ uses several indicators, including the 1st to 5th Styles from the Search menu and the Find Mark Style used by the Mark function. Columns++ lets you use any one of those six indicators, or a custom indicator (the default), to define the region for searching.
By default, if there is a rectangular selection or a multiple selection when you begin a search, Columns++ will use it to set the search region. If a search region is not already set:
- if nothing is selected, the search region will be set to the entire document;
- if an ordinary selection covering more than a single line is present, the search region will be set to that selection;
- if a selection on a single line is present, Columns++ will show a dialog offering to create a rectangular selection based on the current selection, from which it will then set the search region.
The Search... item on the Columns++ menu opens the Search in indicated region dialog. Many of the options on this dialog are similar to the corresponding Notepad++ search options. You can drag the left or right edge of this dialog to make it wider, and you can leave it open while you edit the document in other ways.
Find what
The Find what box specifies the text, extended text or regular expression for which to search. Contiguous segments of the indicated region are searched sequentially (forward or backward), one at a time. It is not possible for a single match to span multiple segments. (E.g., for a region derived from a rectangular selection, each match must be contained in a single row.) When no more occurrences of the search string can be found, Columns++ gives a status message to that effect; if focus remains on the search dialog, the next search will resume from the beginning of the region.
When using regular expressions, the circumflex (^) and the dollar sign ($) match the beginning and end (respectively) of contiguous segments of the search region as well as their usual match to the beginning and ending of a line. This is particularly useful when the search region comes from a column selection, since ^ and $ will match the left and right ends of the selection in each row.
Replace with
The Replace with box specifies the text, extended text or regular expression replacement for search and replace operations.
When using regular expressions, the replacement can contain formulas, specified as:
(?=formula) or (?=format:formula)
within the replacement string; the Formulas section describes them in detail.
Technical note: The results of formulas are enclosed in parentheses and substituted into the regular expression replacement string before it is processed by the regular expression engine. Ordinarily this results in expected behavior and needs no consideration by the user. However, formulas in Columns++ support the return call described in section 20 of the ExprTk documentation. Using this call, it is possible to substitute an arbitrary character string; since the substitution occurs before regular expression replacement processing, characters substituted in this way will be interpreted as part of regular expression replacement. For example, (?=return ['$', reg(1)]) would be replaced by the capture group the number of which is given by the first capture group.
Extended search syntax
The syntax for extended find and replace strings is almost the same as in Notepad++ searches, with these exceptions:
- \U (note the capital U) can be followed by one to six hexadecimal digits to specify any valid Unicode character.
- \b, \d, \o and \x specify Unicode code points in Unicode files and bytes in non-Unicode files. In Notepad++ search they always specify Unicode code points.
- \b, \d, \o, \u and \x can be followed by any number of digits of the appropriate type up to the maximum for each (2 binary, 3 decimal, 3 octal, 4 hexadecimal and 2 hexadecimal, respectively); the first non-digit, or the maximum number of digits, terminates the sequence. In Notepad++ search these sequences must contain exactly the required number of digits.
Search actions
The Find, Count, Replace and Replace All buttons work like those in Notepad++ search, except that they search in the indicated region, and there is no Wrap around option. Instead, Find and Replace show a message at the bottom of the dialog when the end of the search region is reached and subsequently restart from the beginning of the search region, while Count and Replace All always make use of the entire search region. The downward-pointing arrowheads on the right side of the Count and Replace All buttons open menus of additional options.
From the Count button menu:
- Select All creates a multiple selection including one selection for each match in the search region.
- Count Before counts matches in the search region preceding the selection or caret. Matches which overlap the selection or caret are not counted.
- Count After counts matches in the search region following the selection or caret. Matches which overlap the selection or caret are not counted.
- Select Before creates a multiple selection including one selection for each match in the search region preceding the selection or caret. Matches which overlap the selection or caret are not selected.
- Select After creates a multiple selection including one selection for each match in the search region following the selection or caret. Matches which overlap the selection or caret are not selected.
From the Replace All button menu:
- Replace Before replaces matches in the search region preceding the selection or caret. Matches which overlap the selection or caret are not counted.
- Replace After replaces matches in the search region following the selection or caret. Matches which overlap the selection or caret are not counted.
- Clear History is shown when the Replace button has been used to make replacements using formula substitutions. Ordinarily the Replace functions will continue incrementing counters and referencing capture group and calculation history until after a Replace All/Before/After action is performed or the Replace with string is changed. This menu item clears the counters and the capture group and calculation history immediately.
Other controls in the Search dialog
When you initiate a search action, the region used for the search is determined by whether a region is already defined using the selected indicator, what is currently selected, and the Auto set check box in the Selection -> Region group box:
no search region defined | search region defined | |||
---|---|---|---|---|
Auto set checked | Auto set not checked | Auto set checked | Auto set not checked | |
nothing selected | The search region is set to the entire document. | Columns++ will prompt you to make a rectangular selection based on the cursor position. | The search region is not changed. Stepwise searching (Find or Replace) begins or continues from the position of the cursor or past the selection. | The search region is not changed. Stepwise searching begins or continues from the position of the cursor or past the selection. For rectangular and multiple selections, the primary selection (the one containing the cursor) determines where stepwise searching begins or continues. |
selection within a single line | Columns++ will prompt you to make a rectangular selection based on the current selection. | |||
multi-line selection | The search region is set to the selection. | |||
rectangular or multiple selection | The search region is set to the selection. |
Backward direction, Match whole word only, Match case and the Search Mode options (Normal, Extended and Regular expression) have the same meanings as in the Notepad++ search dialogs.
Selection -> Region | |
---|---|
Set | sets the indicated region to the current selection. If nothing is selected, sets the indicated region to the entire document. |
Add | adds the current selection to the indicated region. |
Remove | removes from the indicated region any text that is within the current selection. |
Auto set | Automatically sets the search region if a rectangular or multiple selection is present, or if nothing is selected and no search region is set, when one of the search buttons is clicked, as described above. |
Auto clear | De-selects the current selection after it is used to set or modify the indicated region. Even if this box is not checked, if Auto set is checked, a rectangular or multiple selection will be de-selected after adding it to or removing it from an existing region to prevent the region from being set to the selection when you perform a search. |
Indicator | |
drop-down | selects which indicator is used to identify the search region. |
Clear on close | determines whether an existing indicated region is cleared when the dialog is closed. This option is set to checked when you choose the Custom Style indicator and to unchecked when you choose any other indicator; however, you may change it after you select an indicator and it will stick until you change the indicator again. |
Clear | clears the current search indicator, so that there is no search region indicated. |
Calculation
Columns++ can add or average columns of numbers or perform calculations across rows; these are explained below. See Number formats for details about how Columns++ recognizes numbers, how numbers are represented internally in calculations, and some limitations on the accuracy of calculations.
Calculating in columns
The Add numbers... and Average numbers... items on the Columns++ menu perform calculations on a rectangular selection of a column of numbers, or of multiple columns separated by elastic tabstops. (These commands can be used on selections that include traditional fixed tabs; but the results may not be as expected, since they treat tabs as logical separators, ignoring physical positioning.)
Columns++ shows a dialog to present the results of the calculation, which offers the following options:
Thousands separator | Select a Thousands separator option (None, Comma/Period, Apostrophe or Blank) to control how numeric results are formatted. |
---|---|
Decimal places |
If Automatic is checked, Notepad++ chooses the number of decimal places in the result based on the data. For Add numbers, the result uses the fewest decimal places needed to avoid losing precision; for Average numbers, the result shows three more than the greatest number of decimal places in any input values. If Automatic is not checked, choose the number of decimal places (0-16) to which to round results. If Suppress trailing zeros is checked, zeros at the end of the decimal portion of numbers are omitted; if this box is not checked, exactly the number of decimal places selected are included. |
Format as time |
If Automatic is checked, Notepad++ uses the time formatting rules set in the Time formats dialog if at least one of the input numbers is expressed as a time; otherwise, the result is formatted as a simple number. If Automatic is not checked, choose the format (1-4 segments) to be used for the results. |
Insert... | Check to insert the results into the document, at the end of the rectangular selection, when you close the dialog. If the last line of the rectangular selection is empty (spaces, tabs and/or virtual space), the option will be Insert these results in the last line of the selection; otherwise, it will be Insert a line containing these results following the last line of the selection. |
Copy these results to the clipboard? | Close the dialog with the Yes button to copy the results to the clipboard, or use No to leave the clipboard unchanged. |
Calculating across rows
The Calculate... command from the Columns++ menu inserts the results of a calculation into each line of a rectangular selection. The command opens a dialog which lets you supply the formula for the calculation. Formulas are described in a separate section; they are mostly ordinary mathematical expressions, with special variables and functions to represent numbers found in the selection; for example, if a single column of numbers is selected, this + 20 would add a column with each of the original numbers increased by 20. Here are the options on the Calculate dialog:
Formula | Enter the formula for the calculation, as described in the Formulas section. | ||||||
---|---|---|---|---|---|---|---|
Regex | If you enter a regular expression, the first occurrence of the expression within the selection in each row of the rectangular selection is matched. Use the match case box to indicate whether to use case-sensitive matching. If Skip unmatched lines is checked and the regular expression box is not empty, rows in which the regular expression does not match are ignored; the formula is not evaluated, nor is a tab or any space padding added to the selection in the row to account for the new column. | ||||||
Thousands separator | Select a Thousands separator option (None, Comma/Period, Apostrophe or Blank) to control how numeric results are formatted. | ||||||
Decimal places | Choose the number of decimal places (0-16) to which to round results. If Suppress trailing zeros is checked, zeros at the end of the decimal portion of numbers are omitted; if this box is not checked, exactly the number of decimal places selected are included. | ||||||
Format as time | Check the Enabled box to use the formats enabled in the bottom section of the Time formats dialog to show the results. The Formats... button lets you open that dialog without closing this one. | ||||||
New column |
|
Formulas
Formulas are representations of mathematical computations. Columns++ uses the ExprTk (Expression Toolkit) package to implement formulas used by the Calculate... command and formulas in regular expression replacements for the Search functions. Following are descriptions of the variables and functions defined by Columns++, along with some general features of the syntax of ExprTk expressions. See Number formats for details about how Columns++ recognizes numbers, how numbers are represented internally in calculations, and some limitations on the accuracy of calculations.
Wherever a numeric value is used, it is also possible for the value to be Not-a-Number, an indication that something which was expected to produce a number failed to do so. This can happen because you tried to get a number from the document, but the associated text could not be unambiguously interpreted as a number. It can also be the result of an undefined mathematical operation, such as dividing by zero. In most cases, if any of the inputs to an operation or function are Not-a-Number, the result is also Not-a-Number. When the result of a formula is Not-a-Number, Columns++ does not insert any text (aside from a tab and/or spaces needed to keep columns aligned).
Variables and functions implemented by Columns++
When Calculate is applied to a rectangular selection, the formula given is evaluated once for each row of the selection (or, if a regular expression is given and Skip unmatched lines is checked, for each row in which the regular expression matches). The term row refers to the part of a single line in the document which is included in the rectangular selection, and the term current row is the row for which the formula is being evaluated, and into which the results of the formula will be inserted. Rows are processed in sequence, from top to bottom.
Variables and functions in Calculate command formulas | |||||||
---|---|---|---|---|---|---|---|
count | the total number of rows (lines) in the selection | ||||||
index | counting from one, the row number within the selection of the current row | ||||||
match | zero if no regular expression was given or if the regular expression did not match in the current row; otherwise, counting from one and including the current row, the number of rows on which the regular expression has matched | ||||||
line | the line number of line containing the current row (within the entire document, counting from one; that is, the same as the line number shown in the left margin if line numbers are enabled in Notepad++) | ||||||
this | if a regular expression was given, the numeric value of the text matched by the regular expression; otherwise, the numeric value of the current row | ||||||
col(n)
col(n,p) col(n,p,v) tab(n) tab(n,p) tab(n,p,v) reg(n) reg(n,p) reg(n,p,v) |
These functions retrieve the numeric value in the specified segment within a row (the selected part of a line). The col function divides the segments by white space (any run of blanks and/or tabs). The tab function divides by tab characters. The reg function retrieves regular expression capture groups.
| ||||||
last
last(p) last(p,v) |
The last function with no arguments represents the last result of a calculation that was not Not-a-Number; if the current row is the first row, or if no previous calculations have resulted in anything other than Not-a-Number, last is zero.
When p or p and v are specified, they are interpreted as for col/tab/reg, except that p cannot be zero; the function retrieves the result of the calculation on the row indicated, substituting v, if specified, for Not-a-Number. |
When formulas are used in regular expression replacements for the Search dialog, the formula given is evaluated once for each match.
Formulas in regular expression replacements are specified as:
(?=formula) or (?=format:formula)
Variables and functions in Search regular expression replacement formulas | |||||||
---|---|---|---|---|---|---|---|
match | counting from one and including the current match, the number of times the regular expression has been matched and replaced (When doing stepwise find and replace, matches which were found but not replaced are not counted.) | ||||||
line | the line number in which the current match begins (within the entire document, counting from one; that is, the same as the line number shown in the left margin if line numbers are enabled in Notepad++) | ||||||
this | the numeric value of the text matched by the regular expression | ||||||
reg(n)
reg(n,p) reg(n,p,v) |
These functions retrieve the numeric value of a regular expression capture group.
| ||||||
sub(n)
sub(n,p) sub(n,p,v) |
These functions retrieve the numeric value previously calculated for a substitution.
| ||||||
last
last(p) last(p,v) |
equivalent to sub(0), sub(0,p) and sub(0,p,v) |
Format specifications for Search regular expression replacement formulas | |
---|---|
When a format is not specified, the default is 1.-6 (up to six decimals, suppress trailing zeros, suppress decimal separator if nothing follows, no leading zeros except that a digit is required before the decimal point). | |
n | One or two digits specify the minimum number of integer digits to be shown (i.e., shorter values will be left-padded with zeros); 0 indicates that a leading zero is not required for decimals. If omitted, the default is 1. |
t | The letter t (or T) specifies that the result of the formula will be shown in time format. If n is used it must appear before t; n then applies to the leftmost time segment in each result, regardless of what time unit that represents. |
. , | A period or a comma indicates that decimal places can be shown, using the current decimal separator (see Options — it doesn’t matter whether you use a period or a comma in the format). When a format is specified without a decimal indicator, decimals are rounded and not shown. If a decimal indicator is present but no additional specification follows, the default is .-6 (up to six decimal places, suppress trailing zeros, suppress decimal separator if nothing follows). |
One of the following can follow the decimal indicator if it is specified: | |
d | one or two digits specifying the exact number of decimal places to be shown |
-d | one or two digits specifying the maximum number of decimal places to be shown, omitting any trailing zeros and decimal separator |
m-d | Up to d decimal places will be shown, but no fewer than m. If m is 0, all decimal places will be omitted if they are zeros, but the decimal separator will still be shown. |
Syntax of formulas
Formulas are written using most of the common conventions for writing mathematical expressions in typical programming languages: numbers are written with an optional minus sign, digits and an optional decimal point (no commas); +, -, *, /, % and ^ indicate addition, subtraction, multiplication, division, remainder and exponentiation; parentheses are used to indicate order of operations. You can also use logical expressions built from common operators, including = or ==, != or <>, <, <=, >, >=, & and |, in a conditional expression:
test ? option1 : option2 | yields option1 if test is true, option2 if test is false |
---|
so col(1)>10?col(2):col(3) gives the content of column 2 if column 1 is greater than 10, otherwise the content of column 3.
Formulas can use the many functions built into ExprTk, including these common mathematical functions:
abs | absolute value |
---|---|
avg | average of any number of values |
ceil | smallest integer greater than or equal to |
erf | error function |
erfc | complimentary error function |
exp | e to the power of the given value |
floor | largest integer less than or equal to |
frac | fractional (decimal) part |
hypot | hypotenuse of a right triangle from two sides (eg: hypot(x,y) = sqrt(x*x + y*y)) |
log | natural logarithm |
log10 | base 10 logarithm |
log2 | base 2 logarithm |
max | largest of any number of values |
min | smallest of any number of values |
ncdf | normal cumulative distribution function |
round | round to the nearest integer |
roundn | round the first argument to the number of decimal places specified by the second argument |
sqrt | square root |
trunc | integer part (round down) |
and trigonometric functions (in all cases, angles are expressed in radians):
acos | arc cosine; interval [-1,+1] |
---|---|
acosh | inverse hyperbolic cosine |
asin | arc sine; interval [-1,+1] |
asinh | inverse hyperbolic sine |
atan | arc tangent; interval [-1,+1] |
atan2 | two-argument arc tangent; interval [-pi,+pi] |
atanh | inverse hyperbolic tangent |
cos | cosine |
cosh | hyperbolic cosine |
cot | cotangent |
csc | cosecant |
deg2grad | convert from degrees to gradians |
deg2rad | convert from degrees to radians |
grad2deg | convert from gradians to degrees |
rad2deg | convert from radians to degrees |
sec | secant |
sin | sine |
sinc | sine cardinal |
sinh | hyperbolic sine |
tan | tangent |
tanh | hyperbolic tangent |
ExprTk expressions have many more features which are described in Sections 8, 12, 13 and 20 of the documentation for ExprTk. Columns++ supports the return call (section 20 of the ExprTk documentation). The returned strings and scalar values will be concatenated and inserted in the new column; if two or more scalar values are specified without intervening strings they will be separated by a tab character (if Tabbed is checked) or a single space. The concatenated string will be left aligned, regardless of whether Numeric aligned is checked.
Alignment
Align left, Align right, Align numeric and Align... process rectangular selections. The selection can be a single column, or multiple columns separated by elastic tabs. (These commands can be used on selections that include traditional fixed tabs; but the results may not be as expected, since they treat tabs as logical separators, ignoring physical positioning.) Alignment is accomplished by adding and/or removing ASCII spaces at the beginning and/or end of a column; therefore, precise alignment is not always possible when using proportionally-spaced fonts.
Align numeric
Details about how numbers are recognized and interpreted are given in the section on Number formats. The alignment of items which are not recognized as numbers is unchanged. The Decimal separator is comma item near the bottom of the Columns++ menu determines whether the comma or the period is the decimal separator.
The settings in the Time formats dialog determine how numeric alignment proceeds when there are numbers with colons. The Numbers with one or two colons represent setting identifies which colon (days:hours, hours:minutes or minutes:seconds) is present in all numbers with colons; that colon is aligned across all lines. Numbers without colons are aligned according to the Time units: numbers with no colons represent setting, such that they line up with the position that same unit would occupy in a time-formatted number with four segments, all of which except days have two integer digits (e.g., “1:00:00:00”).
Custom alignment
The Align... command opens a dialog which allows you to specify a string of one or more characters, or a regular expression, to be aligned in the column or columns within a rectangular selection:
Align by | Specify a character, a string of characters, or a regular expression to be aligned in each column. Items in which this character string or regular expression does not match are not changed. Regular expression matches are aligned by the start (leftmost position) of the match. To align by some other part of the match, rewrite it using a lookbehind assertion or the \K directive. It does not matter what characters are included in the match, only the position at which it starts. |
---|---|
First Last Regular expression | Choose whether to align using the first or the last occurrence of the Align by character or string in each row, or whether to interpret the Align by specification as a regular expression. |
Match case | Check to distinguish upper and lower case when the string to be matched includes letters. |
The following settings can be useful when the column to be processed includes some lines which will not be aligned (because the Align by character, string or expression does not match). Since unmatched items are not changed, you can use the margin settings to control how the set of aligned items, taken together, is placed relative to the unmatched lines. In other situations, the default of 0 Left is usually best. | |
Margin | Specify number of space characters, if any, to be used as a margin between the edge of the column and the aligned items. |
Left Right | Choose whether aligned items should be positioned relative to the left side or the right side of the column in which they occur. The margin is relative to this side of the column. |
Timestamps
The Timestamps... command opens a dialog that lets you convert between common timestamp formats. The command requires a rectangular selection, which can be a single column or multiple columns separated by tabs. The dialog is comprised of five sections: From counter, From date and time, Time zones and locale, To counter and To date and time.
The Overwrite selection checkbox at the bottom left of the dialog controls whether the results of the conversion replace the selected timestamps (checked) or whether one or more new columns are added to the right of the selection to hold the results (default, unchecked).
Note: The Timestamps... command does not work properly on older versions of Windows. Columns++ attempts to detect this and removes the menu item on those systems. Based on Microsoft documentation, the author expects that it will be fully functional on Windows 10 version 1903/19H1 and later, and on Windows Server 2022 and later; it may work on some older systems with the exception of time zone functionality.
From counter
From counter processing may be enabled or disabled using the check box at the top left of the section. When enabled, this section controls the interpretation of any items in the selection which consist of a single number. The number is treated as a counter representing elapsed time. The starting point is called the epoch and the length of time represented by an increase of one in the counter is called the unit.
Note: As with other Columns++ commands, numbers to be recognized as counters can include spaces, apostrophes and either commas or periods (whichever is not selected as the decimal separator). If your data contains date and time formats that are separated only by these characters (for example, 12.10.1980 if Decimal separator is comma is selected), be sure to disable the From counter section. You can leave it enabled if all your date and time formatted data uses separators that would disqualify the entries as numbers.
Unix time | a common timestamp counter, for which the epoch is January 1st, 1970 and the unit is one second. Unix time does not count leap seconds. |
---|---|
Windows filetime | the timestamp format used to store creation, modification and access dates of files in Windows. Its epoch in January 1st, 1601 and its unit is 100 nanoseconds. Windows filetime counts leap seconds. |
Excel 1900 | the default date system for Excel for Windows, and for Excel for Mac beginning with Excel 2016. It counts in days, beginning with 1 for January 1st, 1900 and counting February 29th, 1900 (even though there was no such date, since 1900 was not a leap year). The count does not include leap seconds. |
Excel 1904 | the default date system for Excel for Mac prior to Excel 2016. The epoch is January 1st, 1904, the unit is 1 day, and leap seconds are not counted. |
Custom | enables the Epoch, Unit and Include leap seconds in count controls to define counter specifications. |
Epoch | specifies the date and time corresponding to a counter value of zero. Enter this as a four-digit year, a dash, a one- or two-digit month, a dash, and a one- or two-digit day, optionally followed by a space and a time in hours, minutes and seconds (including up to seven decimal places), separated by colons. Note: This field is always specified in Universal Coordinated Time. If you have a counter which counts relative to a local date and time, enter the corresponding UTC date and time here. |
Unit | the number of seconds corresponding to an increase of one in the counter. The smallest possible value is 1e-7 (or .0000001), corresponding to 100 nanoseconds. Use .001 to represent one millisecond, 1 to represent one second or 86400 to represent one day. |
Include leap seconds in count | when checked, the count includes leap seconds. Leap seconds are occasional one-second adjustments made to keep clock time synchronized to astronomical time. Ignoring leap seconds makes the correspondence between timestamp counters and clock time straightforward, but if a leap second occurs between two such timestamps, the difference between the timestamps will not accurately represent the elapsed time. Most timestamps ignore leap seconds; Windows filetime is a notable exception. For a more comprehensive explanation, see Leap second on Wikipedia. |
From date and time
From date and time processing may be enabled or disabled using the check box at the top left of the section. When enabled, this section controls the interpretation of any items in the selection which were not processed by the From counter section.
year-month-day month-day-year day-month-year | When one of these options is selected, Columns++ tries to interpret each item as a date and an optional time following the date. The parsing algorithm recognizes names and abbreviations for months in the current locale. When the order of month, day and year is ambiguous (e.g., 12-08-04), the selected order is used (e.g., August 4th, 2012; December 8th, 2004; or August 12th, 2004). | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
yy limit | specifies the latest year represented by a two-digit year; two-digit years are interpreted as years from 99 years before this year through this year. | ||||||||||||||||||
Parse | Selecting this option allows you to specify a regular expression; Columns++ attempts to match the expression to each item and, when the match succeeds, named capture groups are used to designate the elements of the date and time. The name of each capture group is a single, case-sensitive letter:
The groups y and either D or both M and d must match and provide valid values for the parse to succeed. Example: Suppose you have entries of the form 2018.113t14.08 representing 14:08 on the 113th day of 2018; you could use (?<y>\d+)\.(?<D>\d+)t(?<H>\d+)\.(?<m>\d+) to parse them. |
Time zones and locale
Time zones and locale specification may be enabled or disabled using the check box at the top left of the section. When enabled, this section lets you specify the time zones and the locale to be used. When this section is disabled (the default condition), Universal Coordinated Time is assumed and the locale of your Windows system is used. Only enable this section when you need to use specified time zones instead of Universal Coordinated Time, or to interpret or format dates and times in another language.
The top row in this section lets you choose the region and time zone assumed when parsing dates and times which do not include a time offset, and the region and time zone used to format generated dates and times. The bottom row lets you choose the language and locale used. Neither time zones nor locales apply to counters.
To counter
The To Counter button closes the dialog and converts timestamps to the specified counter. The other settings in this section have the same meaning as in the corresponding settings in the From counter section.
To date and time
The To Date/Time button closes the dialog and converts timestamps to the specified date and time format. An example of a date and time formatted as specified is shown just above the button.
ISO 8601 | Format the date and time according to the ISO 8601 standard; equivalent to specifying yyyy-MM-dd'T'HH:mm:ss.sssZ as a custom picture. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Locale short | Use the short date and short time formats from Windows regional settings, separated by a space. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Locale long | Use the long date and long time formats from Windows regional settings, separated by a space. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Custom | Use the specified picture to format the date and time. The picture format is similar to that used for specifying date and time formats in Windows settings:
All other characters are copied verbatim to the result. |
Sorting
Notepad++ supports sorting lines using a rectangular selection to define the sort keys, but this does not work as expected when tabs (whether elastic or traditional fixed) are used. The sort commands in Columns++ use a rectangular selection to identify the sort keys and work as expected when tabs are present. These are “stable” sorts, meaning the order of lines with equal sort keys is unchanged. There are three variants of ascending and descending sorts:
binary | The raw byte values of the internal representations of the selected sort strings are used as sort keys. For most purposes, this matches what you would expect from a “case sensitive” sort, with the sort order dependent on the active code page. Unicode files sort by code point. |
---|---|
locale | The sort order is defined by the current Windows locale. For most purposes, this matches what you would expect from a “case insensitive” sort. |
numeric | The selections on each line are interpreted as tab-separated numbers. The Number formats section describes in detail how Columns++ recognizes numbers. Items which can’t be interpreted as numbers sort first (whether the sort is ascending or descending). |
Custom sorts
In addition to the six immediate sort commands on the Columns++ menu, you can use the Sort... command to open a dialog giving you more control over the details of the sort:
What to sort | |
---|---|
Whole lines | Individual lines remain intact and are sorted using the column selection to define the sort keys. |
Selected text only | Only the selected portions of lines are sorted; the surrounding text on each line remains in place. Note: This will result in blank-padding lines in the selection which do not extend to or past the right boundary of the column selection. If elastic tabstops are enabled and the number of tabs included in the column selection is different on different lines (for example, because some lines are short), results using Selected text only are unlikely to be as expected. |
Sort direction | |
---|---|
Ascending | Smaller numbers, narrower text, or characters earlier in the collating sequence, come first. |
Descending | Larger numbers, wider text, or characters later in the collating sequence, come first. |
Sort type | |
---|---|
Binary | The raw byte values of the internal representations of the selected sort strings are used as sort keys. For most purposes, this matches what you would expect from a “case sensitive” sort, with the sort order dependent on the active code page. Unicode files sort by code point. |
Locale | The sort order is defined by a Windows locale, as specified in the Locale sort details section. |
Numeric | Sort strings are interpreted as numbers, as described in the Number formats section. Strings which can’t be interpreted as numbers sort first (whether the sort is ascending or descending). When Regular expression is selected, the regular expression is used to parse the selected text on each line; in all other cases, the text is interpreted as a sequence of tab-separated values. |
Width | The visible width of the selected sort strings are used as keys. |
Sort key | |
---|---|
Entire column | The selected text on each line is used as the sort key. |
Ignore surrounding blanks/tabs | For Binary and Locale sorts, leading and trailing blanks and tabs in the text selected on each line are ignored, and the remaining text is used as the sort key. For Numeric sorts, this option behaves the same as Entire column (the text is treated as tab-separated values regardless of which option is selected, which is the same as the immediate numeric sorts on the Columns++ menu). |
Tabbed | The selected text is tab-separated; sort keys must be specified in the Keys box. |
Regular expression | A regular expression is used to parse the selected text on each line. |
Find what | Specifies a regular expression. The first match of the expression within the selected text in each line will be used to determine the sort key. |
Match case | When checked, the regular expression match is case sensitive; otherwise, the case of the text is ignored. |
Specify keys using capture groups | When checked, the Keys box specifies the sort sequence in terms of capture groups. When unchecked, the text matched by the regular expression is used as the sort key. |
Keys | A list of keys, separated by spaces, commas and/or semicolons, to be used for sorting. The major sort key is listed first, with subsequent keys having lower precedence. Each key is designated with a number. If Tabbed is selected, the number indicates a tab-separated field, numbered left to right counting from 1; 0 represents the entire selected text in the line. If Regular expression is selected, the number is the number of a capture group; 0 represents the entire match. Each sort key number may be followed (without intervening spaces) by one of the letters a or d, and/or one of the letters b, l, n or w. These specify ascending, descending, binary, locale, numeric and width, overriding the selections in the Sort direction and/or Sort type boxes for the capture group or tab field to which they are appended. |
Locale sort details | |
---|---|
Locale sorting makes use of the Windows API function LCMapStringEx. The exact behavior of the sort is dependent on the exact behavior of this function; the following attempts to describe the most important points. | |
Language | Selects the language for which a selection of locales will be offered. |
Locale | Selects a Windows locale from those available for the selected language. |
Case sensitive | Case sensitivity in a linguistic sort is not applied character by character; instead, when and only when two strings match completely except for case, case is applied to further sort them. Consequently, when this box is checked, the result will still not resemble what most users expect of a “case sensitive” sort. When this box is unchecked, the LINGUISTIC_IGNORECASE flag is passed to LCMapStringEx. |
Sort digits as numbers | This causes the sort to attempt to recognize strings including digits — like “data5” and “data10” — in such a way that “data5” will sort before “data10” in an ascending sort instead of after. The same algorithm is used to sort file names in Windows File Explorer. When this box is checked, the SORT_DIGITSASNUMBERS flag is passed to LCMapStringEx. |
Ignore diacritics | Windows API documentation says: Ignore nonspacing characters, as linguistically appropriate. Note: This flag does not always produce predictable results when used with decomposed characters, that is, characters in which a base character and one or more nonspacing characters each have distinct code point values. When this box is checked, the LINGUISTIC_IGNOREDIACRITIC flag is passed to LCMapStringEx. |
Ignore symbols and punctuation | This causes spaces, punctuation and “symbols” (the documentation is not more specific) to be ignored. Strings are sorted as if all the letters and numbers were run together, ignoring spaces, hyphens, periods and so on. When this box is checked, the NORM_IGNORESYMBOLS flag is passed to LCMapStringEx. |
Conversion
Convert tabs to spaces
Use Convert tabs to spaces on any selection to replace tabs in the selection with equivalent spaces, taking elastic tabstops into account if enabled. If nothing is selected, the entire file is converted.
Convert separated values to tabs...
Convert tabs to separated values...
These commands convert the selection, or the entire file if nothing is selected, between delimiter-separated values (typically *.csv, comma-separated values) and tabbed presentation (typically *.tsv or tab-separated values).
Both delimiter-separated values and tab-separated values use a structure comprised of records (rows) containing fields (which are interpreted as being arranged in columns). In tabbed documents, each line of the file is a record, and fields within a record are separated by tabs. Fields cannot contain tabs or line-ending characters as such, but these can be encoded, typically using backslash notation (\t, \n, \r for tab, new line and return). Consistency requires that the encoding character must also be encoded (e.g., two backslashes in the file to represent a single backslash in the field’s value).
In delimiter-separated files, records are divided by line breaks and fields are divided by a separator character, typically a comma. However, when a field contains the separator character or line-ending characters, the problematic characters are escaped rather than encoded, meaning that the original character is still used in the file, but context indicates that it is not to be interpreted as a field or record separator. Typically, quote marks surround a field which contains line-ending or separator characters, and quotes within the field are doubled.
There are many variations in the details of data representation in delimiter-separated and tab-separated values files. When you select Convert separated values to tabs... or Convert tabs to separated values..., Columns++ displays a dialog in which you can adjust the conversion accordingly:
Column separator |
| |||||||
---|---|---|---|---|---|---|---|---|
Separated values syntax |
| |||||||
Tab, new line and return characters in tabbed documents |
Fields in tabbed presentation cannot contain tabs or line-ending characters; if there are any of these characters in separated values fields, they must be encoded or replaced when converting to tabs.
|
Number formats
Columns++ interprets characters in a document as numbers in many contexts, including the Calculation commands, the Align numeric command, numeric sort fields and formula substitutions in regular expression searches.
Numbers can include thousands separators and decimals. The Decimal separator is comma item near the bottom of the Columns++ menu determines whether the comma or the period is the decimal separator; thousands separators may be a space, an apostrophe, or whichever of comma or period is not the decimal separator. Numbers can also be times, using colons to separate days, hours, minutes and seconds.
Internal representation of numeric values and accuracy of calculations
When Columns++ performs calculations, it represents numeric values internally as double precision floating point numbers. Any number up to 9,007,199,254,740,992 without a fraction or decimal, positive or negative, is represented exactly. Most fractions and decimals cannot be represented exactly, but in ordinary use, rounding to a reasonable number of decimal places (so that the total number of digits before and after the decimal is under 15) will make discrepancies irrelevant. Numeric results which exceed 9,007,199,254,740,992 in absolute value are represented in scientific notation.
Recognizing numbers in documents
There is some flexibility in what can be included along with a number in a column or a regular expression match. Common currency signs can precede the number with or without a space, and a minus sign can precede or follow a currency sign. Non-numeric characters (such as units, like “mg” or “ft”) can follow the number. (These are not interpreted, though; Columns++ will add 5 yards and 5 inches to get 10 without complaint.) Non-numeric characters can precede the number if they are separated from the number by at least one space.
- Add numbers and Average numbers skip items that have no digits; but if an item which includes one or more digits cannot be unambiguously interpreted as a number, Columns++ will select the item and will not perform the calculation.
- In formulas, variables and functions which represent numbers in the document are set to Not-a-Number if the associated document text cannot be unambiguously interpreted as a number.
- When sorting numerically, fields which cannot be unambiguously interpreted as numbers sort to the beginning.
- Align numeric uses slightly more lenient rules for recognizing numbers; the alignment of items which are not recognized as numbers is unchanged. The limitations in the Internal representation of numeric values and accuracy of calculations section above do not apply to Align numeric, as this command does not convert numbers to internal representation.
Decimal separator is comma
Decimal separator is comma may be checked or unchecked to control how Columns++ interprets numbers. This setting is maintained per document (while the document remains open), so it can be different in different tabs.
Time formats
Time formats... opens a dialog that allows you to control how Columns++ interprets and shows numbers represented as times.
Time units: numbers with no colons represent | |||||||||
---|---|---|---|---|---|---|---|---|---|
days hours minutes seconds | Select the unit for calculations involving times. In calculations involving times, times specified without colons are interpreted as being in this unit, and times specified with colons are converted to this unit. | ||||||||
Numbers with one or two colons represent | |||||||||
| Select the way times with one or two colons will be interpreted. (Times with three colons are always days:hours:minutes:seconds.) | ||||||||
Results of calculations can use these formats for times | |||||||||
| Check the box before each format to enable results to be shown in that format when times are displayed. The Calculation commands and formula substitutions in regular expression search replacements can show their results as times; when time display is enabled, these settings determine which formats can be used for results. |
Options
Options... opens a dialog that allows you to control some aspects of Columns++:
Show Columns++ on the main menu bar. | lets you choose whether to add an entry for Columns++ to the main menu bar, just to the left of the Plugins menu, or leave it as an entry on the Plugins menu. |
---|---|
Replace: Don't move to the following occurrence. | has the same effect as the option of the same name on the Searching panel of the Preferences dialog in Notepad++, but for the Search in indicated region dialog in Columns++. When checked, the Replace button in the search dialog does not immediately perform another find after replacing text; in effect, the button alternates between finding and replacing, giving you a chance to see the effect of the replace before moving to the next occurrence of the search string. |
Show Elastic tabstops progress dialog when seconds remaining exceeds about: | lets you choose the minimum estimated remaining time, from 1 to 20 seconds, that will cause Columns++ to display a progress dialog during a long-running Elastic tabstops operation. The default is 2. |
Automatically extend selections to form rectangles | |
---|---|
You can enable “implicit” selections for Columns++ commands that require rectangular selections, bypassing the dialogs that ask you if you want to make a rectangular selection. | |
Selections on one line extend downward to the last line. | A selection of one or more characters on a single line is “projected” downward to the last line of the file. This allows you to select full columns (skipping headers, if desired) without scrolling all the way to the end of the file. If the last line of the file is completely empty (that is, the file ends with an end-of-line sequence) that line will not be included in the selection. |
Full row selections are replaced by the enclosing rectangle. | A single selection of complete lines is replaced by a rectangular selection wide enough to include all the text on all lines in the selection. Usually you get this kind of selection by dragging in the left margin. If the selection ends at the beginning of a line (as when dragging downward in the left margin), that line is not included in the rectangular selection. The selection made by Edit|Select All will be converted to a rectangular selection that encompasses the entire file (excluding the last line, if it consists of only a line ending). |
Zero-width selections extend to the right to the end of the longest line. | A “thin” selection, or a rectangular selection containing no characters or virtual space, is extended to the right far enough to enclose the end of the longest line in the selection. When selecting a rectangular region meant to extend from some column far enough to the right include the ends of all lines, this avoids the need to scroll through and figure out how wide the selection needs to be. |
Custom style for Search in indicated region | |
---|---|
By default, Columns++ defines a custom style to indicate the area to be searched by shading the background. This uses a Scintilla resource called an indicator; in some cases this might conflict with other plugins, so some options to control it are available here. You can also choose the color and transparency of the background. These options are disabled if the Search in indicated region dialog is open when the Options dialog is opened. | |
Enable custom style | When checked, a custom style will be available. |
Alpha, Red, Green, Blue | Specify the transparency and color of the background for the custom style. |
Override Notepad++ indicator allocation | Notepad++ version 8.5.6 introduced a mechanism to avoid conflicts in indicator numbers used by plugins. If this mechanism is available, Columns++ will set the indicator accordingly unless this box is checked. If you have plugins installed which have not been updated to use the new mechanism, it is possible that there will still be a conflict with the indicator number Notepad++ assigns; in that case, you can check this box to choose the indicator number manually. If this box is disabled, either the version of Notepad++ you are running does not support the indicator allocation mechanism, or there were no available indicators remaining when this plugin was loaded. |
Indicator number | When Override Notepad++ indicator allocation is unchecked and a Notepad++ indicator allocation is available, this box shows the Scintilla indicator number allocated to Columns++. Otherwise it allows you to choose the indicator number Columns++ will use. |
Check for updates and show a notification at the bottom of the Columns++ menu | |
---|---|
Columns++ can check for updates. It does this by connecting to the Internet and requesting release information from GitHub. No information about your installation is sent to GitHub. This check is done when Notepad++ loads Columns++, no more often than once every twelve hours. If Columns++ finds a release newer than the one currently installed, it adds the notice Update Available to the Help/About... entry at the bottom of the Columns++ menu. | |
Show for any new release | Show a notice if a release newer than the one you have installed is found. |
Show for stable releases only | Show a notice only if a stable release newer than the one you have installed is found. Releases are generally marked stable (production-ready) after they have been released for long enough that the author believes any serious problems would have been reported. |
Do not check | Select this option if you do not want Columns++ to connect to GitHub to check for new releases. |
Help/About
Help/About... provides access to release/version identification, this help file, and changelog, license and source information. It also allows you to check GitHub for the newest release and the latest stable release of Columns++.