It follows the same idea of character-classification as the tool pftdbns, but does not sort files into directories. Instead it renames files of the same filename-macrostructure to be equally formatted (e.g. inserting a leading 0 in a numbered part of the name).
In short words: character classes are used to detect the
micro-structure of a string.
Therefore the microstructure of a string is an abstraction
on strings, based on character classes.
Macrostructures of a string are abstractions on the microstructure.
For this tool I use the most common character classes. If you are used to regular expressions, then you might know other character classes also (for example "lowercase" or "alnum"). Because of the special intention of name-alignment (reformatting strings to similar-formatted strings) only the character classes that you can see below, are distinguished.
character(s) | character class | character class short-name |
---|---|---|
a..z, A..Z | letter | l |
0..9 | integer | i |
space, tab, newline | blank | b |
dot (".") | dot | d |
slash ("/") | slash | s |
other characters | other | o |
Example string | corresponding microstructure | different notation of microstructure |
---|---|---|
hallo_this_is_an_example.txt | lllllollllollollollllllldlll | 5l1o4l1o2l1o2l1o7l1d3l |
my_holiday-pictures-1.jpg | llolllllllolllllllloidlll | 2l1o7l1o8l1o1i1d3l |
my_holiday-pictures-2.jpg | llolllllllolllllllloidlll | 2l1o7l1o8l1o1i1d3l |
my_holiday-pictures-3.jpg | llolllllllolllllllloidlll | 2l1o7l1o8l1o1i1d3l |
my_holiday-pictures-23.jpg | llolllllllolllllllloiidlll | 2l1o7l1o8l1o2i1d3l |
my_holiday-pictures-183.jpg | llolllllllolllllllloiiidlll | 2l1o7l1o8l1o3i1d3l |
As the tool pftdbns looks at the microstructure, it would group the following files together:
As the above example of the holiday pictures showed us,
the microstructure does not help in grouping them together,
because the numbering of the pictures was not done with a
fixed format.
Because the number of digits in the picture-numbering
is not fixed (so that the number of digits grows),
they can't be grouped together by the microstructure.
You might argue that this is a reason why the
pftdbns
tool does stupid things.
But when you use a picture viewer to view your files,
the order will not be as you might expect, when looking at the numbers.
The order will be:
my_holiday-pictures-1.jpg,
my_holiday-pictures-183.jpg,
my_holiday-pictures-2.jpg,
my_holiday-pictures-23.jpg,
my_holiday-pictures-3.jpg
This might disturb you, when you want to
show your pictures to your audience,
especially if you discover this chaos
at the time you start your presentation and
didn't looked at the mess before (so you couldn't
fix it by renaming your hundred of files by hand ;-)).
So what we now introduce, is the macrostructure of the string. If we again represent each macrostructure by a letter, but now an uppercase letter, then we have the same characters of the microstructure's lowercase letters as uppercase letters for the macrostructure.
The macrostructure of a string is like the microstructure of a string,
with ignored number of successive occurences of equal character classes.
So it abstracts away the numbers that we saw in the
alternative notation of the microstructure (which we show in the table above).
Or we could say: it's a filter on the micortsructure, that ignores INT's of the microstructure.
The next table shows the string of the filename,
the microctructure of that string and the macrostructure.
Example string | corresponding microstructure | representation of macrostructure |
---|---|---|
hallo_this_is_an_example.txt | lllllollllollollollllllldlll | LOLOLOLOLDL |
my_holiday-pictures-1.jpg | llolllllllolllllllloidlll | LOLOLOIDL |
my_holiday-pictures-2.jpg | llolllllllolllllllloidlll | LOLOLOIDL |
my_holiday-pictures-3.jpg | llolllllllolllllllloidlll | LOLOLOIDL |
my_holiday-pictures-23.jpg | llolllllllolllllllloiidlll | LOLOLOIDL |
my_holiday-pictures-183.jpg | llolllllllolllllllloiiidlll | LOLOLOIDL |
If you now would group your files together and put them into a directory,
one directory for each of the macrostructures,
then your grouping would be fine.
But when using your picture viewer,
you again would have the trouble of the wrong ordered sequence.
So, what we should do now, is renaming the files
in a way that they will have the same microstructure.
But before I forget to mention it: We should only rename files to get the same
microstructure, if they have the same macrostructue.
Otherwise we might run into bigger trouble than before: think about the above examples
and what would happen, if we (try to) force all of the above files
- independently of the macrostructure - to get the same microstructure.
As you can see, it only makes sense to rename the files that belongs to the same macrostructure.
(No, we shouldn't (!) do this and I think this is obvious! But it's good that we asked this question.)
To have an automatism, that makes it unnecessary for the user to think about the renaming, is a fine thing. But it seems the best to me, to give the user options to change the behaviour of the tool, because an automatism that fits all needs will obviously not be possible.
The version 0.7 of namealign has the following options:
-letter | letter will also be changed |
-na | no action: show only what would be done if this flag is NOT set |
-minint | integers: minimal size (remove leading 0's) |
-s | single repitition: for all non-alnum chars only replace by one char |
-show_all | show all: even files that will not be renamed will be shown |
-verbose | verbose: same as show_all: even files that will not be renamed will be shown |
-version | show program version and exit |
-help | Display this list of options |
--help | Display this list of options |
The option minint does NOT shrink the size
to the minimum int-size of an individual file.
It uses the minimum size of integers that can be used
for all files of the same macrostructure.
Character Class | Default behaviour | action changed by options |
---|---|---|
Blank | EACH blank will be substituded by one "_" |
with option -s ALL successive blanks will be replaced by ONE "_" |
Slash | EACH slash will be substituded by one "_" |
with option -s ALL successive slashes will be replaced by ONE "_" |
Letter | let them as they are | with option -letter add "z" for each missing letter on the right side |
Integer | add missing leading "0" (length of int-string is used) |
with option -minint throw away as much leading zeros as possible (First throw away leading zeros in general, then add missing zeros.) (length of int-string after removing leading zeros is used) |
Dot | multiple successive Dots will be replaced by one dot |
with option -s ALL successive dots will be replaced by ONE dot only |
Other | EACH other character will be replaced by one "_" |
with option -s ALL successive other chars will be replaced by ONE "_" |