Introduction
Columns++ is a plugin for Notepad++ which offers features for working with text and data arranged in columns, including an implementation of elastic tabstops, enhanced searching and sorting, column alignment and numeric calulations. Like Notepad++, Columns++ is released under the GNU General Public License (either version 3 of the License, or, at your option, any later version). Columns++ was first released by Randall Joseph Fellmy in 2023; you can find the source code on GitHub.
Columns++ uses the C++ Mathematical Expression Toolkit Library (ExprTk) by Arash Partow, which is released under the MIT license; JSON for Modern C++ by Niels Lohmann, which is released under the MIT license; and the Boost.Regex library, which is released under the Boost Software License, Version 1.0.
Purpose and limitations
Columns++ is designed to provide some helpful functions for editing text or data that is lined up visually in columns, so that you can make a rectangular selection of the column(s) you want to process.
The integrated implementation of Elastic tabstops works to line up columns when tabs are used as logical separators, including tab-separated values data files as well as any ordinary text or code document containing sections in which you want to line up columns easily using tabs. You can use this feature on its own or with the other functions in Columns++.
Columns++ is optimized for use with Elastic tabstops. It also works with files that use traditional, fixed tabs for alignment, or no tabs at all; however, you should ordinarily select only one column at a time in files that don’t use Elastic tabstops.
Columns++ is generally not helpful when columns do not line up visually, such as in comma-separated values files. However, Columns++ can convert between delimiter-separated values and tabbed presentation; and there are some features, particularly Search using numeric formulas in regular expression replacement strings and Sorting with custom criteria, which may be useful in documents that are not column-oriented.
Elastic tabstops can cause loading and editing to be slow for large files. By default, Elastic tabstops is automatically turned off for files over 1000 KB or 5000 lines. You can change these limits.
Elastic tabstops
Columns++ includes a new implementation of Nick Gravgaard’s Elastic tabstops. (Please note that as of this writing I have not communicated with Mr. Gravgaard about my implementation of his proposal, and no endorsement on his part is implied. — RJF)
The first item of the Columns++ menu enables or disables Elastic tabstops. Elastic tabstops stretches tabs so that columns line up to fit their content, using only a single tab to separate one column from the next.
This implementation of Elastic tabstops includes some options that were not part of the original proposal. These options can be accessed by using the Profile... menu option. There are three “built-in” profiles:
Classic | endeavors to reproduce precisely the behavior described in the proposal linked above. |
---|---|
General | ensures that leading tabs are always used for indentation, and are not lined up with elastic tabstops. |
Tabular | is suitable for tab-separated values files, in which the entire file is a single table with the values in each row separated by single tabs. |
You can select a profile from the drop-down box in the Elastic tabstops profile dialog. You can also change individual settings; choose some options to automatically enable a profile or disable elastic tabstops for different types of files; and save, rename or delete profiles.
Settings in an elastic tabstops profile
Along with the enabled or disabled status of elastic tabstops, the settings in an elastic tabstops profile are kept independently for each document you have open. These settings, which are available in the Elastic tabstops profile dialog, are:
Use leading tabs for indentation only; don't make them elastic. | When checked, this option treats tabs which occur at the beginning of a line, before the first non-tab character, as ordinary fixed-width tabs instead of elastic tabs. Without this option, a line with a tab used to line up a column of data cannot be followed by a line that uses tabs for indentation without an intervening blank line; otherwise, the first leading tab will expand to line up with the tab on the previous line. The disadvantage is that if you want an empty column at the beginning of a line, you must place a space before the first tab to make it line up with the next column. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Line up elastic tabstops throughout the entire document. | Normally elastic tabstops are positioned independently whenever a column is interrupted; that is, tabstops created by tabs that appear on adjacent lines are lined up, but they don’t “project through” lines with fewer (or no) tabs. This option indicates that a single set of tabstops is to be used for the entire document, so that columns line up even when intervening lines have fewer columns. | ||||||||
Do not allow text following the last tab on a line to span columns. | Normally, text following the last tab on a line is not treated as belonging to a “column” at all. This makes sense for documents that mix text and tables. However, for documents that are entirely tabular but have omitted tabs at the end of lines where the final columns are blank, this option (along with the one above) is needed to keep things lined up properly. | ||||||||
Override default/language tab size (used for indent or minimum): | Elastic tabstops uses the “tab size” in different ways depending on whether Use leading tabs for indentation only is checked: when checked, the tab size represents the number of spaces each leading tab indents, and it is otherwise ignored; when unchecked, it is the minimum space between any two tabstops (that is, the width of the intervening column plus the space between columns). When the Override tab size box is unchecked, Columns++ uses the tab size set in Notepad++; when checked, the spin box to the right specifies the size (in spaces) to be used. | ||||||||
Minimum space between elastic columns: | This spin box specifies the size (in spaces) occupied by the tab following the longest span of text in a column. | ||||||||
Apply monospaced font optimizations: |
Responsiveness with elastic tabstops enabled is greatly improved if it is possible to calculate the width of text by counting characters rather than by measuring. This only works if the fonts in use are monospaced (typewriter-like fonts in which every character has the same width; also called fixed pitch fonts), and all the fonts used by the styles in the current language have the same width.
Usually it’s best to let Columns++ determine whether to use monospaced font optimizations, but there can be exceptional cases. Columns++ checks the width of a space and a capital letter W in each font assigned to a style in the current language; if these are all the same, it uses monospace optimizations. In some cases, a language might define styles which inhibit optimization but are never applied in a particular file; for large files, the performance gain from forcing monospaced font optimizations may be considerable. Conversely, a font might use monospaced characters in the ASCII range but wider characters outside that range; in this case, monospaced font optimizations can cause processing to be much slower than necessary, since each line in which text overflows the expected width in any column forces additional measurement and layout of text. If you want to use elastic tabstops with a large file, but response is sluggish and the best estimate chosen by Columns++ seems wrong, it’s worth trying the opposite setting. |
These settings are only applied when you click the OK button near the bottom right of the dialog.
Saving, renaming and deleting profiles
You can save the settings in a profile by clicking the Save As... button to the right of the profile selection drop-down box. You can give the profile any name that does not begin with an asterisk or an open parenthesis and is not one of the three built-in profiles (“Classic,” “General” and “Tabular”). You can use the additional options from the drop-down menu at the right of the Save As... button to rename or delete a profile. If you have made changes to an existing profile that is not a built-in profile, you can save the changes without having to type the profile name again by using the Save option.
Automatically enabling or disabling elastic tabstops
By default, Columns++ uses whatever settings were in effect for the last active tab when you open a file or a new tab. You can change this behavior with the remaining options on the Elastic tabstops profile dialog.
The checkbox under the profile selection dropdown labeled Automatically enable this profile when opening type files. is available when a built-in or saved profile is selected (and Disable... when opening type files. in the bottom section of the dialog, which will be explained later, is not checked). Checking this box assigned the selected built-in or saved profile to be enabled whenever you open a file of the same type as the one you are currently viewing. The Type can be existing files with the same extension, existing files with no extension, or new files. This option is only applied when you click the OK button near the bottom right of the dialog.
The options in the box labeled When opening an existing file without an explicit rule for its extension allow you choose what happens when opening existing files for which you haven’t set either Automatically enable this profile... or Disable... when opening...:
keep the same settings as the last viewed tab. | This is the default behavior: each existing file you open begins with same the elastic tabstops settings you had previously. Note that setting does not affect the default for new files; if you want a profile enabled, or elastic tabstops disabled, whenever you open a new tab with File|New you must set that behavior specifically using one of the when opening new files options in the Elastic tabstops profile dialog opened when viewing a new file. |
---|---|
disable elastic tabstops. | Elastic tabstops will be turned off when opening any existing file unless you’ve specifically set a rule to turn it on for that file’s extension. |
enable this profile: | You can select any built-in or saved profile, which will be enabled when opening any existing file unless you’ve set a different rule for that file’s extension. |
The options in the box labeled Disable elastic tabstops (applies to all profiles) allow you choose specific conditions under which elastic tabstops should always be disabled:
when opening type files. | If you always want elastic tabstops disabled when you open the type of file in the current tab, check this box. |
---|---|
when opening files over ____ KB. | Elastic tabstops can cause loading and editing to be slow for large files. These options disable elastic tabstops when loading files over the specified limits, regardless of any other settings. The default values disable elastic tabstops for files over 1000 KB or 5000 lines. |
when opening files over ____ lines. |
Note that although the options for automatically enabling or disabling elastic tabstops do not affect the tab you have open, they are only applied when you click the OK button near the bottom right of the dialog.
Rectangular selections
Most of the commands from the Columns++ menu operate on rectangular selections.
You can select a single column or multiple columns separated by tabs. Since each tab is interpreted as a column separator, this works as expected when elastic tabstops are used. The results with traditional fixed tabs are not likely to be obvious or expected when sequences of multiple fixed tabs are included in the selection, since Columns++ interprets each tab as starting a new “logical” column without regard to physical placement.
When selecting one or more columns in a document using tabs, you should generally include the tab that ends the rightmost selected column in your selection. Unless all the entries in the last column are the same width, it is often difficult or impossible to get a complete selection without including the final tabs; in any case, Columns++ will process the trailing tabs intelligently.
When you invoke a command that requires a rectangular selection and the current selection is not a non-zero-width rectangular selection, Columns++ will inform you of this and, if possible, offer reasonable options to create a rectangular selection based on the current selection or cursor position.
You can enable specific “implicit” rectangular selections in the Options dialog if you would prefer that Columns++ make those selections without prompting you.
Regular expressions
Several commands in Columns++ can use regular expressions for matching character strings. Columns++ uses the same regular expression engine, Boost.Regex, used in Notepad++, so the syntax and behavior are mostly the same. Some considerations unique to Columns++ are described below.
Within a rectangular selection, the selection in each row is matched independently of the surrounding text. The ^ assertion matches the beginning of the selection within a row, the $ assertion matches the end of the selection, and lookahead and lookbehind assertions cannot examine text past the boundaries of the selection. (Lookbehind assertions in Notepad++ can examine all text back to the beginning of the document, even when counting or replacing in a selection.) When using the Search in indicated region dialog, the region to be searched can be made up of one or more separate segments, each of which is searched independently. When a rectangular selection initializes the search region, each row of the selection becomes a separate segment of the search region.
A rectangular selection enclosing a series of lines, or the entire document, is not the same as an ordinary selection encompassing the same series of lines, or the entire document. Rectangular selections do not include line endings, and each line is a separate selection when matching. This applies to search regions created from such selections for use in the Search in indicated region dialog, where it can be important to distinguish these two cases (which, unfortunately, appear the same visually).
Matches for regular expressions using the \K directive are never replaced when performing stepwise Find and Replace in Notepad++. In the Search in indicated region dialog in Columns++, such matches can be replaced if you do not click outside the dialog between finding the match and replacing it.
Regular expressions are matched as UTF-16 sequences for Unicode documents, and as byte sequences for other documents. (This is the same as in Notepad++.) Scintilla (the display control used in Notepad++) handles Unicode as UTF-8. When displaying Unicode documents that contain invalid UTF-8, Scintilla displays each byte that cannot be decoded as a hexadecimal code in reversed colors. When matching a regular expression, Columns++ processes each of these bytes as if it were the Unicode replacement character, U+FFFD (�). Notepad++ ignores errors in Unicode text when matching regular expressions.
Search
Columns++ offers the ability to find and replace within a region marked by an indicator. Notepad++ uses several indicators, including the 1st to 5th Styles from the Search menu and the Find Mark Style used by the Mark function. Columns++ lets you use any one of those six indicators, or a custom indicator (the default), to define the region for searching.
By default, if there is a rectangular selection or a multiple selection when you begin a search, Columns++ will use it to set the search region. If a search region is not already set:
- if nothing is selected, the search region will be set to the entire document;
- if an ordinary selection covering more than a single line is present, the search region will be set to that selection;
- if a selection on a single line is present, Columns++ will show a dialog offering to create a rectangular selection based on the current selection, from which it will then set the search region.
The Search... item on the Columns++ menu opens the Search in indicated region dialog. Many of the options on this dialog are similar to the corresponding Notepad++ search options. You can drag the left or right edge of this dialog to make it wider, and you can leave it open while you edit the document in other ways.
Find what
The Find what box specifies the text, extended text or regular expression for which to search. Contiguous segments of the indicated region are searched sequentially (forward or backward), one at a time. It is not possible for a single match to span multiple segments. (E.g., for a region derived from a rectangular selection, each match must be contained in a single row.) When no more occurrences of the search string can be found, Columns++ gives a status message to that effect; if focus remains on the search dialog, the next search will resume from the beginning of the region.
When using regular expressions, the circumflex (^) and the dollar sign ($) match the beginning and end (respectively) of contiguous segments of the search region as well as their usual match to the beginning and ending of a line. This is particularly useful when the search region comes from a column selection, since ^ and $ will match the left and right ends of the selection in each row.
Replace with
The Replace with box specifies the text, extended text or regular expression replacement for search and replace operations.
When using regular expressions, the replacement can contain formulas, specified as:
(?=formula) or (?=format:formula)
within the replacement string; the Formulas section describes them in detail.
Technical note: The results of formulas are enclosed in parentheses and substituted into the regular expression replacement string before it is processed by the regular expression engine. Ordinarily this results in expected behavior and needs no consideration by the user. However, formulas in Columns++ support the return call described in section 20 of the ExprTk documentation. Using this call, it is possible to substitute an arbitrary character string; since the substitution occurs before regular expression replacement processing, characters substituted in this way will be interpreted as part of regular expression replacement. For example, (?=return ['$', reg(1)]) would be replaced by the capture group the number of which is given by the first capture group.
Extended search syntax
The syntax for extended find and replace strings is almost the same as in Notepad++ searches, with these exceptions:
- \U (note the capital U) can be followed by one to six hexadecimal digits to specify any valid Unicode character.
- \b, \d, \o and \x specify Unicode code points in Unicode files and bytes in non-Unicode files. In Notepad++ search they always specify Unicode code points.
- \b, \d, \o, \u and \x can be followed by any number of digits of the appropriate type up to the maximum for each (2 binary, 3 decimal, 3 octal, 4 hexadecimal and 2 hexadecimal, respectively); the first non-digit, or the maximum number of digits, terminates the sequence. In Notepad++ search these sequences must contain exactly the required number of digits.
Search actions
The Find, Count, Replace and Replace All buttons work like those in Notepad++ search, except that they search in the indicated region, and there is no Wrap around option. Instead, Find and Replace show a message at the bottom of the dialog when the end of the search region is reached and subsequently restart from the beginning of the search region, while Count and Replace All always make use of the entire search region. The downward-pointing arrowheads on the right side of the Count and Replace All buttons open menus of additional options.
From the Count button menu:
- Select All creates a multiple selection including one selection for each match in the search region.
- Count Before counts matches in the search region preceding the selection or caret. Matches which overlap the selection or caret are not counted.
- Count After counts matches in the search region following the selection or caret. Matches which overlap the selection or caret are not counted.
- Select Before creates a multiple selection including one selection for each match in the search region preceding the selection or caret. Matches which overlap the selection or caret are not selected.
- Select After creates a multiple selection including one selection for each match in the search region following the selection or caret. Matches which overlap the selection or caret are not selected.
From the Replace All button menu:
- Replace Before replaces matches in the search region preceding the selection or caret. Matches which overlap the selection or caret are not counted.
- Replace After replaces matches in the search region following the selection or caret. Matches which overlap the selection or caret are not counted.
- Clear History is shown when the Replace button has been used to make replacements using formula substitutions. Ordinarily the Replace functions will continue incrementing counters and referencing capture group and calculation history until after a Replace All/Before/After action is performed or the Replace with string is changed. This menu item clears the counters and the capture group and calculation history immediately.
Other controls in the Search dialog
When you initiate a search action, the region used for the search is determined by whether a region is already defined using the selected indicator, what is currently selected, and the Auto set check box in the Selection -> Region group box:
no search region defined | search region defined | |||
---|---|---|---|---|
Auto set checked | Auto set not checked | Auto set checked | Auto set not checked | |
nothing selected | The search region is set to the entire document. | Columns++ will prompt you to make a rectangular selection based on the cursor position. | The search region is not changed. Stepwise searching (Find or Replace) begins or continues from the position of the cursor or past the selection. | The search region is not changed. Stepwise searching begins or continues from the position of the cursor or past the selection. For rectangular and multiple selections, the primary selection (the one containing the cursor) determines where stepwise searching begins or continues. |
selection within a single line | Columns++ will prompt you to make a rectangular selection based on the current selection. | |||
multi-line selection | The search region is set to the selection. | |||
rectangular or multiple selection | The search region is set to the selection. |
Backward direction, Match whole word only, Match case and the Search Mode options (Normal, Extended and Regular expression) have the same meanings as in the Notepad++ search dialogs.
Selection -> Region | |
---|---|
Set | sets the indicated region to the current selection. If nothing is selected, sets the indicated region to the entire document. |
Add | adds the current selection to the indicated region. |
Remove | removes from the indicated region any text that is within the current selection. |
Auto set | Automatically sets the search region if a rectangular or multiple selection is present, or if nothing is selected and no search region is set, when one of the search buttons is clicked, as described above. |
Auto clear | De-selects the current selection after it is used to set or modify the indicated region. Even if this box is not checked, if Auto set is checked, a rectangular or multiple selection will be de-selected after adding it to or removing it from an existing region to prevent the region from being set to the selection when you perform a search. |
Indicator | |
drop-down | selects which indicator is used to identify the search region. |
Clear on close | determines whether an existing indicated region is cleared when the dialog is closed. This option is set to checked when you choose the Custom Style indicator and to unchecked when you choose any other indicator; however, you may change it after you select an indicator and it will stick until you change the indicator again. |
Clear | clears the current search indicator, so that there is no search region indicated. |
Calculation
Columns++ can add or average columns of numbers or perform calculations across rows; these are explained below. See Number formats for details about how Columns++ recognizes numbers.
Calculating in columns
The Add numbers... and Average numbers... items on the Columns++ menu perform calculations on a rectangular selection of a column of numbers, or of multiple columns separated by elastic tabstops. (These commands can be used on selections that include traditional fixed tabs; but the results may not be as expected, since they treat tabs as logical separators, ignoring physical positioning.)
Columns++ shows a dialog to present the results of the calculation, which offers the following options:
Thousands separator | Select a Thousands separator option (None, Comma/Period, Apostrophe or Blank) to control how numeric results are formatted. |
---|---|
Decimal places |
If Automatic is checked, Notepad++ chooses the number of decimal places in the result based on the data. For Add numbers, the result uses the fewest decimal places needed to avoid losing precision; for Average numbers, the result shows three more than the greatest number of decimal places in any input values. If Automatic is not checked, choose the number of decimal places (0-16) to which to round results. If Suppress trailing zeros is checked, zeros at the end of the decimal portion of numbers are omitted; if this box is not checked, exactly the number of decimal places selected are included. |
Format as time |
If Automatic is checked, Notepad++ uses the time formatting rules set in the Time formats dialog if at least one of the input numbers is expressed as a time; otherwise, the result is formatted as a simple number. If Automatic is not checked, choose the format (1-4 segments) to be used for the results. |
Insert... | Check to insert the results into the document, at the end of the rectangular selection, when you close the dialog. If the last line of the rectangular selection is empty (spaces, tabs and/or virtual space), the option will be Insert these results in the last line of the selection; otherwise, it will be Insert a line containing these results following the last line of the selection. |
Copy these results to the clipboard? | Close the dialog with the Yes button to copy the results to the clipboard, or use No to leave the clipboard unchanged. |
Calculating across rows
The Calculate... command from the Columns++ menu inserts the results of a calculation into each line of a rectangular selection. The command opens a dialog which lets you supply the formula for the calculation. Formulas are described in a separate section; they are mostly ordinary mathematical expressions, with special variables and functions to represent numbers found in the selection; for example, if a single column of numbers is selected, this + 20 would add a column with each of the original numbers increased by 20. Here are the options on the Calculate dialog:
Formula | Enter the formula for the calculation, as described in the Formulas section. | ||||||
---|---|---|---|---|---|---|---|
Regex | If you enter a regular expression, the first occurrence of the expression within the selection in each row of the rectangular selection is matched. Use the match case box to indicate whether to use case-sensitive matching. If Skip unmatched lines is checked and the regular expression box is not empty, rows in which the regular expression does not match are ignored; the formula is not evaluated, nor is a tab or any space padding added to the selection in the row to account for the new column. | ||||||
Thousands separator | Select a Thousands separator option (None, Comma/Period, Apostrophe or Blank) to control how numeric results are formatted. | ||||||
Decimal places | Choose the number of decimal places (0-16) to which to round results. If Suppress trailing zeros is checked, zeros at the end of the decimal portion of numbers are omitted; if this box is not checked, exactly the number of decimal places selected are included. | ||||||
Format as time | Check the Enabled box to use the formats enabled in the bottom section of the Time formats dialog to show the results. The Formats... button lets you open that dialog without closing this one. | ||||||
New column |
|
Formulas
Formulas are representations of mathematical computations. Columns++ uses the ExprTk (Expression Toolkit) package to implement formulas used by the Calculate... command and formulas in regular expression replacements for the Search functions. Following are descriptions of the variables and functions defined by Columns++, along with some general features of the syntax of ExprTk expressions.
Numeric values in formulas
Numeric values are represented internally as double precision floating point numbers. Any number up to 9,007,199,254,740,992 without a fraction or decimal, positive or negative, is represented exactly. Most fractions and decimals cannot be represented exactly, but in ordinary use, rounding to a reasonable number of decimal places (so that the total number of digits before and after the decimal is under 15) will make discrepancies irrelevant.
Wherever a numeric value is used, it is also possible for the value to be Not-a-Number, an indication that something which was expected to produce a number failed to do so. This can happen because you tried to get a number from the document, but the associated text could not be unambiguously interpreted as a number. It can also be the result of an undefined mathematical operation, such as dividing by zero. In most cases, if any of the inputs to an operation or function are Not-a-Number, the result is also Not-a-Number. When the result of a formula is Not-a-Number, Columns++ does not insert any text (aside from a tab and/or spaces needed to keep columns aligned).
Variables and functions implemented by Columns++
When Calculate is applied to a rectangular selection, the formula given is evaluated once for each row of the selection (or, if a regular expression is given and Skip unmatched lines is checked, for each row in which the regular expression matches). The term row refers to the part of a single line in the document which is included in the rectangular selection, and the term current row is the row for which the formula is being evaluated, and into which the results of the formula will be inserted. Rows are processed in sequence, from top to bottom.
Variables and functions in Calculate command formulas | |||||||
---|---|---|---|---|---|---|---|
count | the total number of rows (lines) in the selection | ||||||
index | counting from one, the row number within the selection of the current row | ||||||
match | zero if no regular expression was given or if the regular expression did not match in the current row; otherwise, counting from one and including the current row, the number of rows on which the regular expression has matched | ||||||
line | the line number of line containing the current row (within the entire document, counting from one; that is, the same as the line number shown in the left margin if line numbers are enabled in Notepad++) | ||||||
this | if a regular expression was given, the numeric value of the text matched by the regular expression; otherwise, the numeric value of the current row | ||||||
col(n)
col(n,p) col(n,p,v) tab(n) tab(n,p) tab(n,p,v) reg(n) reg(n,p) reg(n,p,v) |
These functions retrieve the numeric value in the specified segment within a row (the selected part of a line). The col function divides the segments by white space (any run of blanks and/or tabs). The tab function divides by tab characters. The reg function retrieves regular expression capture groups.
| ||||||
last
last(p) last(p,v) |
The last function with no arguments represents the last result of a calculation that was not Not-a-Number; if the current row is the first row, or if no previous calculations have resulted in anything other than Not-a-Number, last is zero.
When p or p and v are specified, they are interpreted as for col/tab/reg, except that p cannot be zero; the function retrieves the result of the calculation on the row indicated, substituting v, if specified, for Not-a-Number. |
When formulas are used in regular expression replacements for the Search dialog, the formula given is evaluated once for each match.
Formulas in regular expression replacements are specified as:
(?=formula) or (?=format:formula)
Variables and functions in Search regular expression replacement formulas | |||||||
---|---|---|---|---|---|---|---|
match | counting from one and including the current match, the number of times the regular expression has been matched and replaced (When doing stepwise find and replace, matches which were found but not replaced are not counted.) | ||||||
line | the line number in which the current match begins (within the entire document, counting from one; that is, the same as the line number shown in the left margin if line numbers are enabled in Notepad++) | ||||||
this | the numeric value of the text matched by the regular expression | ||||||
reg(n)
reg(n,p) reg(n,p,v) |
These functions retrieve the numeric value of a regular expression capture group.
| ||||||
sub(n)
sub(n,p) sub(n,p,v) |
These functions retrieve the numeric value previously calculated for a substitution.
| ||||||
last
last(p) last(p,v) |
equivalent to sub(0), sub(0,p) and sub(0,p,v) |
Format specifications for Search regular expression replacement formulas | |
---|---|
When a format is not specified, the default is 1.-6 (up to six decimals, suppress trailing zeros, suppress decimal separator if nothing follows, no leading zeros except that a digit is required before the decimal point). | |
n | One or two digits specify the minimum number of integer digits to be shown (i.e., shorter values will be left-padded with zeros); 0 indicates that a leading zero is not required for decimals. If omitted, the default is 1. |
t | The letter t (or T) specifies that the result of the formula will be shown in time format. If n is used it must appear before t; n then applies to the leftmost time segment in each result, regardless of what time unit that represents. |
. , | A period or a comma indicates that decimal places can be shown, using the current decimal separator (see Options — it doesn’t matter whether you use a period or a comma in the format). When a format is specified without a decimal indicator, decimals are rounded and not shown. If a decimal indicator is present but no additional specification follows, the default is .-6 (up to six decimal places, suppress trailing zeros, suppress decimal separator if nothing follows). |
One of the following can follow the decimal indicator if it is specified: | |
d | one or two digits specifying the exact number of decimal places to be shown |
-d | one or two digits specifying the maximum number of decimal places to be shown, omitting any trailing zeros and decimal separator |
m-d | Up to d decimal places will be shown, but no fewer than m. If m is 0, all decimal places will be omitted if they are zeros, but the decimal separator will still be shown. |
Syntax of formulas
Formulas are written using most of the common conventions for writing mathematical expressions in typical programming languages: numbers are written with an optional minus sign, digits and an optional decimal point (no commas); +, -, *, /, % and ^ indicate addition, subtraction, multiplication, division, remainder and exponentiation; parentheses are used to indicate order of operations. You can also use logical expressions built from common operators, including = or ==, != or <>, <, <=, >, >=, & and |, in a conditional expression:
test ? option1 : option2 | yields option1 if test is true, option2 if test is false |
---|
so col(1)>10?col(2):col(3) gives the content of column 2 if column 1 is greater than 10, otherwise the content of column 3.
Formulas can use the many functions built into ExprTk, including these common mathematical functions:
abs | absolute value |
---|---|
avg | average of any number of values |
ceil | smallest integer greater than or equal to |
erf | error function |
erfc | complimentary error function |
exp | e to the power of the given value |
floor | largest integer less than or equal to |
frac | fractional (decimal) part |
hypot | hypotenuse of a right triangle from two sides (eg: hypot(x,y) = sqrt(x*x + y*y)) |
log | natural logarithm |
log10 | base 10 logarithm |
log2 | base 2 logarithm |
max | largest of any number of values |
min | smallest of any number of values |
ncdf | normal cumulative distribution function |
round | round to the nearest integer |
roundn | round the first argument to the number of decimal places specified by the second argument |
sqrt | square root |
trunc | integer part (round down) |
and trigonometric functions (in all cases, angles are expressed in radians):
acos | arc cosine; interval [-1,+1] |
---|---|
acosh | inverse hyperbolic cosine |
asin | arc sine; interval [-1,+1] |
asinh | inverse hyperbolic sine |
atan | arc tangent; interval [-1,+1] |
atan2 | two-argument arc tangent; interval [-pi,+pi] |
atanh | inverse hyperbolic tangent |
cos | cosine |
cosh | hyperbolic cosine |
cot | cotangent |
csc | cosecant |
deg2grad | convert from degrees to gradians |
deg2rad | convert from degrees to radians |
grad2deg | convert from gradians to degrees |
rad2deg | convert from radians to degrees |
sec | secant |
sin | sine |
sinc | sine cardinal |
sinh | hyperbolic sine |
tan | tangent |
tanh | hyperbolic tangent |
ExprTk expressions have many more features which are described in Sections 8, 12, 13 and 20 of the documentation for ExprTk. Columns++ supports the return call (section 20 of the ExprTk documentation). The returned strings and scalar values will be concatenated and inserted in the new column; if two or more scalar values are specified without intervening strings they will be separated by a tab character (if Tabbed is checked) or a single space. The concatenated string will be left aligned, regardless of whether Numeric aligned is checked.
Alignment
Align left, Align right, Align numeric and Align... process rectangular selections. The selection can be a single column, or multiple columns separated by elastic tabs. (These commands can be used on selections that include traditional fixed tabs; but the results may not be as expected, since they treat tabs as logical separators, ignoring physical positioning.) Alignment is accomplished by adding and/or removing ASCII spaces at the beginning and/or end of a column; therefore, precise alignment is not always possible when using proportionally-spaced fonts.
Align numeric
Details about how numbers are recognized and interpreted are given in the section on Number formats. The alignment of items which are not recognized as numbers is unchanged. The Decimal separator is comma item near the bottom of the Columns++ menu determines whether the comma or the period is the decimal separator.
The settings in the Time formats dialog determine how numeric alignment proceeds when there are numbers with colons. The Numbers with one or two colons represent setting identifies which colon (days:hours, hours:minutes or minutes:seconds) is present in all numbers with colons; that colon is aligned across all lines. Numbers without colons are aligned according to the Time units: numbers with no colons represent setting, such that they line up with the position that same unit would occupy in a time-formatted number with four segments, all of which except days have two integer digits (e.g., “1:00:00:00”).
Custom alignment
The Align... command opens a dialog which allows you to specify a string of one or more characters, or a regular expression, to be aligned in the column or columns within a rectangular selection:
Align by | Specify a character, a string of characters, or a regular expression to be aligned in each column. Items in which this character string or regular expression does not match are not changed. Regular expression matches are aligned by the start (leftmost position) of the match. To align by some other part of the match, rewrite it using a lookbehind assertion or the \K directive. It does not matter what characters are included in the match, only the position at which it starts. |
---|---|
First Last Regular expression | Choose whether to align using the first or the last occurrence of the Align by character or string in each row, or whether to interpret the Align by specification as a regular expression. |
Match case | Check to distinguish upper and lower case when the string to be matched includes letters. |
The following settings can be useful when the column to be processed includes some lines which will not be aligned (because the Align by character, string or expression does not match). Since unmatched items are not changed, you can use the margin settings to control how the set of aligned items, taken together, is placed relative to the unmatched lines. In other situations, the default of 0 Left is usually best. | |
Margin | Specify number of space characters, if any, to be used as a margin between the edge of the column and the aligned items. |
Left Right | Choose whether aligned items should be positioned relative to the left side or the right side of the column in which they occur. The margin is relative to this side of the column. |
Sorting
Notepad++ supports sorting lines using a rectangular selection to define the sort keys, but this does not work as expected when tabs (whether elastic or traditional fixed) are used. The sort commands in Columns++ use a rectangular selection to identify the sort keys and work as expected when tabs are present. These are “stable” sorts, meaning the order of lines with equal sort keys is unchanged. There are three variants of ascending and descending sorts:
binary | The raw byte values of the internal representations of the selected sort strings are used as sort keys. For most purposes, this matches what you would expect from a “case sensitive” sort, with the sort order dependent on the active code page. Unicode files sort by code point. |
---|---|
locale | The sort order is defined by the current Windows locale. For most purposes, this matches what you would expect from a “case insensitive” sort. |
numeric | The selections on each line are interpreted as tab-separated numbers. The Number formats section describes in detail how Columns++ recognizes numbers. Items which can’t be interpreted as numbers sort first (whether the sort is ascending or descending). |
Custom sorts
In addition to the six immediate sort commands on the Columns++ menu, you can use the Sort... command to open a dialog giving you more control over the details of the sort:
What to sort | |
---|---|
Whole lines | Individual lines remain intact and are sorted using the column selection to define the sort keys. |
Selected text only | Only the selected portions of lines are sorted; the surrounding text on each line remains in place. Note: This will result in blank-padding lines in the selection which do not extend to or past the right boundary of the column selection. If elastic tabstops are enabled and the number of tabs included in the column selection is different on different lines (for example, because some lines are short), results using Selected text only are unlikely to be as expected. |
Sort direction | |
---|---|
Ascending | Smaller numbers, narrower text, or characters earlier in the collating sequence, come first. |
Descending | Larger numbers, wider text, or characters later in the collating sequence, come first. |
Sort type | |
---|---|
Binary | The raw byte values of the internal representations of the selected sort strings are used as sort keys. For most purposes, this matches what you would expect from a “case sensitive” sort, with the sort order dependent on the active code page. Unicode files sort by code point. |
Locale | The sort order is defined by a Windows locale, as specified in the Locale sort details section. |
Numeric | Sort strings are interpreted as numbers, as described in the Number formats section. Strings which can’t be interpreted as numbers sort first (whether the sort is ascending or descending). When Regular expression is selected, the regular expression is used to parse the selected text on each line; in all other cases, the text is interpreted as a sequence of tab-separated values. |
Width | The visible width of the selected sort strings are used as keys. |
Sort key | |
---|---|
Entire column | The selected text on each line is used as the sort key. |
Ignore surrounding blanks/tabs | For Binary and Locale sorts, leading and trailing blanks and tabs in the text selected on each line are ignored, and the remaining text is used as the sort key. For Numeric sorts, this option behaves the same as Entire column (the text is treated as tab-separated values regardless of which option is selected, which is the same as the immediate numeric sorts on the Columns++ menu). |
Tabbed | The selected text is tab-separated; sort keys must be specified in the Keys box. |
Regular expression | A regular expression is used to parse the selected text on each line. |
Find what | Specifies a regular expression. The first match of the expression within the selected text in each line will be used to determine the sort key. |
Match case | When checked, the regular expression match is case sensitive; otherwise, the case of the text is ignored. |
Specify keys using capture groups | When checked, the Keys box specifies the sort sequence in terms of capture groups. When unchecked, the text matched by the regular expression is used as the sort key. |
Keys | A list of keys, separated by spaces, commas and/or semicolons, to be used for sorting. The major sort key is listed first, with subsequent keys having lower precedence. Each key is designated with a number. If Tabbed is selected, the number indicates a tab-separated field, numbered left to right counting from 1; 0 represents the entire selected text in the line. If Regular expression is selected, the number is the number of a capture group; 0 represents the entire match. Each sort key number may be followed (without intervening spaces) by one of the letters a or d, and/or one of the letters b, l, n or w. These specify ascending, descending, binary, locale, numeric and width, overriding the selections in the Sort direction and/or Sort type boxes for the capture group or tab field to which they are appended. |
Locale sort details | |
---|---|
Locale sorting makes use of the Windows API function LCMapStringEx. The exact behavior of the sort is dependent on the exact behavior of this function; the following attempts to describe the most important points. | |
Language | Selects the language for which a selection of locales will be offered. |
Locale | Selects a Windows locale from those available for the selected language. |
Case sensitive | Case sensitivity in a linguistic sort is not applied character by character; instead, when and only when two strings match completely except for case, case is applied to further sort them. Consequently, when this box is checked, the result will still not resemble what most users expect of a “case sensitive” sort. When this box is unchecked, the LINGUISTIC_IGNORECASE flag is passed to LCMapStringEx. |
Sort digits as numbers | This causes the sort to attempt to recognize strings including digits — like “data5” and “data10” — in such a way that “data5” will sort before “data10” in an ascending sort instead of after. The same algorithm is used to sort file names in Windows File Explorer. When this box is checked, the SORT_DIGITSASNUMBERS flag is passed to LCMapStringEx. |
Ignore diacritics | Windows API documentation says: Ignore nonspacing characters, as linguistically appropriate. Note: This flag does not always produce predictable results when used with decomposed characters, that is, characters in which a base character and one or more nonspacing characters each have distinct code point values. When this box is checked, the LINGUISTIC_IGNOREDIACRITIC flag is passed to LCMapStringEx. |
Ignore symbols and punctuation | This causes spaces, punctuation and “symbols” (the documentation is not more specific) to be ignored. Strings are sorted as if all the letters and numbers were run together, ignoring spaces, hyphens, periods and so on. When this box is checked, the NORM_IGNORESYMBOLS flag is passed to LCMapStringEx. |
Conversion
Convert tabs to spaces
Use Convert tabs to spaces on any selection to replace tabs in the selection with equivalent spaces, taking elastic tabstops into account if enabled. If nothing is selected, the entire file is converted.
Convert separated values to tabs...
Convert tabs to separated values...
These commands convert the selection, or the entire file if nothing is selected, between delimiter-separated values (typically *.csv, comma-separated values) and tabbed presentation (typically *.tsv or tab-separated values).
Both delimiter-separated values and tab-separated values use a structure comprised of records (rows) containing fields (which are interpreted as being arranged in columns). In tabbed documents, each line of the file is a record, and fields within a record are separated by tabs. Fields cannot contain tabs or line-ending characters as such, but these can be encoded, typically using backslash notation (\t, \n, \r for tab, new line and return). Consistency requires that the encoding character must also be encoded (e.g., two backslashes in the file to represent a single backslash in the field’s value).
In delimiter-separated files, records are divided by line breaks and fields are divided by a separator character, typically a comma. However, when a field contains the separator character or line-ending characters, the problematic characters are escaped rather than encoded, meaning that the original character is still used in the file, but context indicates that it is not to be interpreted as a field or record separator. Typically, quote marks surround a field which contains line-ending or separator characters, and quotes within the field are doubled.
There are many variations in the details of data representation in delimiter-separated and tab-separated values files. When you select Convert separated values to tabs... or Convert tabs to separated values..., Columns++ displays a dialog in which you can adjust the conversion accordingly:
Column separator |
| |||||||
---|---|---|---|---|---|---|---|---|
Separated values syntax |
| |||||||
Tab, new line and return characters in tabbed documents |
Fields in tabbed presentation cannot contain tabs or line-ending characters; if there are any of these characters in separated values fields, they must be encoded or replaced when converting to tabs.
|
Number formats
Columns++ interprets characters in a document as numbers in many contexts, including the Calculation commands, the Align numeric command, numeric sort fields and formula substitutions in regular expression searches.
Numbers can include thousands separators and decimals. The Decimal separator is comma item near the bottom of the Columns++ menu determines whether the comma or the period is the decimal separator; thousands separators may be a space, an apostrophe, or whichever of comma or period is not the decimal separator. Numbers can also be times, using colons to separate days, hours, minutes and seconds.
There is some flexibility in what can be included along with a number in a column or a regular expression match. Common currency signs can precede the number with or without a space, and a minus sign can precede or follow a currency sign. Non-numeric characters (such as units, like “mg” or “ft”) can follow the number. (These are not interpreted, though; Columns++ will add 5 yards and 5 inches to get 10 without complaint.) Non-numeric characters can precede the number if they are separated from the number by at least one space.
- Add numbers and Average numbers skip items that have no digits; but if an item which includes one or more digits cannot be unambiguously interpreted as a number, Columns++ will select the item and will not perform the calculation.
- In formulas, variables and functions which represent numbers in the document are set to Not-a-Number if the associated document text cannot be unambiguously interpreted as a number.
- When sorting numerically, fields which cannot be unambiguously interpreted as numbers sort to the beginning.
- Align numeric uses slightly more lenient rules for recognizing numbers; the alignment of items which are not recognized as numbers is unchanged.
Decimal separator is comma
Decimal separator is comma may be checked or unchecked to control how Columns++ interprets numbers. This setting is maintained per document (while the document remains open), so it can be different in different tabs.
Time formats
Time formats... opens a dialog that allows you to control how Columns++ interprets and shows numbers represented as times.
Time units: numbers with no colons represent | |||||||||
---|---|---|---|---|---|---|---|---|---|
days hours minutes seconds | Select the unit for calculations involving times. In calculations involving times, times specified without colons are interpreted as being in this unit, and times specified with colons are converted to this unit. | ||||||||
Numbers with one or two colons represent | |||||||||
| Select the way times with one or two colons will be interpreted. (Times with three colons are always days:hours:minutes:seconds.) | ||||||||
Results of calculations can use these formats for times | |||||||||
| Check the box before each format to enable results to be shown in that format when times are displayed. The Calculation commands and formula substitutions in regular expression search replacements can show their results as times; when time display is enabled, these settings determine which formats can be used for results. |
Options
Options... opens a dialog that allows you to control some aspects of Columns++:
Show Columns++ on the main menu bar. | lets you choose whether to add an entry for Columns++ to the main menu bar, just to the left of the Plugins menu, or leave it as an entry on the Plugins menu. |
---|---|
Replace: Don't move to the following occurrence. | has the same effect as the option of the same name on the Searching panel of the Preferences dialog in Notepad++, but for the Search in indicated region dialog in Columns++. When checked, the Replace button in the search dialog does not immediately perform another find after replacing text; in effect, the button alternates between finding and replacing, giving you a chance to see the effect of the replace before moving to the next occurrence of the search string. |
Show Elastic tabstops progress dialog when seconds remaining exceeds about: | lets you choose the minimum estimated remaining time, from 1 to 20 seconds, that will cause Columns++ to display a progress dialog during a long-running Elastic tabstops operation. The default is 2. |
Automatically extend selections to form rectangles | |
---|---|
You can enable “implicit” selections for Columns++ commands that require rectangular selections, bypassing the dialogs that ask you if you want to make a rectangular selection. | |
Selections on one line extend downward to the last line. | A selection of one or more characters on a single line is “projected” downward to the last line of the file. This allows you to select full columns (skipping headers, if desired) without scrolling all the way to the end of the file. If the last line of the file is completely empty (that is, the file ends with an end-of-line sequence) that line will not be included in the selection. |
Full row selections are replaced by the enclosing rectangle. | A single selection of complete lines is replaced by a rectangular selection wide enough to include all the text on all lines in the selection. Usually you get this kind of selection by dragging in the left margin. If the selection ends at the beginning of a line (as when dragging downward in the left margin), that line is not included in the rectangular selection. The selection made by Edit|Select All will be converted to a rectangular selection that encompasses the entire file (excluding the last line, if it consists of only a line ending). |
Zero-width selections extend to the right to the end of the longest line. | A “thin” selection, or a rectangular selection containing no characters or virtual space, is extended to the right far enough to enclose the end of the longest line in the selection. When selecting a rectangular region meant to extend from some column far enough to the right include the ends of all lines, this avoids the need to scroll through and figure out how wide the selection needs to be. |
Custom style for Search in indicated region | |
---|---|
By default, Columns++ defines a custom style to indicate the area to be searched by shading the background. This uses a Scintilla resource called an indicator; in some cases this might conflict with other plugins, so some options to control it are available here. You can also choose the color and transparency of the background. These options are disabled if the Search in indicated region dialog is open when the Options dialog is opened. | |
Enable custom style | When checked, a custom style will be available. |
Alpha, Red, Green, Blue | Specify the transparency and color of the background for the custom style. |
Override Notepad++ indicator allocation | Notepad++ version 8.5.6 introduced a mechanism to avoid conflicts in indicator numbers used by plugins. If this mechanism is available, Columns++ will set the indicator accordingly unless this box is checked. If you have plugins installed which have not been updated to use the new mechanism, it is possible that there will still be a conflict with the indicator number Notepad++ assigns; in that case, you can check this box to choose the indicator number manually. If this box is disabled, either the version of Notepad++ you are running does not support the indicator allocation mechanism, or there were no available indicators remaining when this plugin was loaded. |
Indicator number | When Override Notepad++ indicator allocation is unchecked and a Notepad++ indicator allocation is available, this box shows the Scintilla indicator number allocated to Columns++. Otherwise it allows you to choose the indicator number Columns++ will use. |
Check for updates and show a notification at the bottom of the Columns++ menu | |
---|---|
Columns++ can check for updates. It does this by connecting to the Internet and requesting release information from GitHub. No information about your installation is sent to GitHub. This check is done when Notepad++ loads Columns++, no more often than once every twelve hours. If Columns++ finds a release newer than the one currently installed, it adds the notice Update Available to the Help/About... entry at the bottom of the Columns++ menu. | |
Show for any new release | Show a notice if a release newer than the one you have installed is found. |
Show for stable releases only | Show a notice only if a stable release newer than the one you have installed is found. Releases are generally marked stable (production-ready) after they have been released for long enough that the author believes any serious problems would have been reported. |
Do not check | Select this option if you do not want Columns++ to connect to GitHub to check for new releases. |
Help/About
Help/About... provides access to release/version identification, this help file, and changelog, license and source information. It also allows you to check GitHub for the newest release and the latest stable release of Columns++.