How to use Bash associative arrays
[*]
Command interpreters and scripting languages like the Bash shell are essential tools of any operating system. Here’s how to use in Bash the very powerful data structures called associative arrays, or hashes.
[*]In Bash, a hash is a data structure that can contain many sub-variables, of the same or different kinds, but indexes them with user-defined text strings, or keys, instead of fixed numeric identifiers. Besides being extremely flexible, hashes also make scripts more readable. If you need to process the areas of certain countries, for example, a syntax like:
print area_of('Germany')
[*]would be as self-documenting as it can be, right?
[*]SEE: Hiring Kit: JavaScript Developer (TechRepublic Premium)
How to create and fill Bash hashes
[*]Bash hashes must be declared with the uppercase A switch (meaning Associative Array), and can then be filled by listing all their key/value pairs with this syntax:
# Country areas, in square miles
declare -A area_of
area_of=( [Italy]="116347" [Germany]="137998" [France]="213011" [Poland]="120728" [Spain]="192476" )
[*]The first thing to notice here is that the order in which the elements are declared is irrelevant. The shell will just ignore it, and store everything according to its own internal algorithms. As proof, this is what happens when you retrieve those data as they were stored:
print ${area_of[*]}
213011 120728 137998 192476 116347
print ${!area_of[*]}
France Poland Germany Spain Italy
[*]By default, the asterisk inside the square brackets extracts all and only the values of a hash. Adding the exclamation mark, instead, retrieves the hash keys. But in both cases there is no easily recognizable order.
[*]You may also populate a hash dynamically, by calling other programs. If you, for example, had another shell script called hash-generator, that outputs all the pairs as one properly formatted string:
#! /bin/bash
printf '[Italy]="116347" [Germany]="137998" [France]="213011" [Poland]="120728" [Spain]="192476"'
calling hash-generator in this way from the script that actually uses the area_of hash:
VALS=$( hash-generator )
eval declare -A area_of=( $VALS )
[*]would fill that hash with exactly the same keys and values. Of course, the message here is that “hash-generator” can be any program, maybe much more powerful than Bash, as long as it can output data in that format. To fill a hash with the content of an already existing plain text file, instead, follow these suggestions from Stack Overflow.
How to process hashes
[*]The exact syntax to refer to a specific element of a hash, or delete it, is this:
print ${area_of['Germany]}
unset ${area_of['Germany]}
[*]To erase a whole hash, pass just its name to unset, and then re-declare it:
unset area_of
declare -A area_of
[*]The number of key/value pairs stored into a hash is held by the special variable called “${#HASHNAME[@]}” (don’t look at me, I did not invent this syntax). But if all you need is to process all the elements of a hash, regardless of their number or internal order, just follow this example:
for country in "${!area_of[@]}"
do
echo "Area of $country: ${area_of[$country]}"
done
[*]whose output is:
[*]Area of France: 213011 square miles
[*]Area of Poland: 120728 square miles
[*]Area of Germany: 137998 square miles
[*]You can use basically the same procedure to create a “mirror” hash, with keys and values inverted:
declare -A country_whose_area_is
for country in "${!area_of[@]}"; do
country_whose_area_is[${area_of[$country]}]=$country
done
[*]Among other things, this “mirroring” may be the easiest way to process the original hash looking at its values, instead of keys.
How to sort hashes
[*]If hash elements are stored in semi-random sequences, what is the most efficient way to handle them in any alphanumerical order? The answer is that it depends on what exactly should be ordered and when. In the many cases when what should be sorted is only the final output of a loop, and all is needed to do that is a sort command right after the closing statement:
for country in "${!area_of[@]}"
do
echo "$country: ${area_of[$country]}"
done | sort
[*]To sort the output by key (even if keys were not retrieved in that order!):
[*]France: 213011 square miles
[*]Germany: 137998 square miles
[*]Italy: 116347 square miles.
[*]Sorting the same lines numerically, by country area, is almost as easy. Prepending the areas at the beginning of each line:
for aa in "${!area_of[@]}"
do
printf "%s|%s = %s square milesn" "${area_of[$aa]}" "$aa" "${area_of[$aa]}"
done
[*]yields lines like these:
[*]213011|France = 213011 sq. miles
[*]120728|Poland = 120728 sq. miles
[*]137998|Germany = 137998 sq. miles
[*]that, while still unsorted, now start with just the strings on which we want to sort. Therefore, using sort again, but piped to the cut command with “|” as column separator:
1 for aa in "${!area_of_generated[@]}"
2 do
3 printf "%s|%s = %s square milesn" "${area_of_generated[$aa]}" "$aa" "${area_of_generated[$aa]}"
4 done | sort | cut '-d|' -f2-
[*]will sort by areas and then remove them, to finally produce the desired result:
[*]Italy = 116347 sq. miles
[*]Poland = 120728 sq. miles
[*]Germany = 137998 sq. miles
Multi-level hashes
[*]While Bash does not support nested, multi-level hashes, it is possible to emulate them with some auxiliary arrays. Consider this code, that stores the areas of European regions, while also cataloging them by country:
1 declare -a european_regions=('Bavaria' 'Lazio' 'Saxony' 'Tuscany')
2 declare -a european_countries=('Italy' 'Germany')
3 declare -A area_of_country_regions
4 area_of_country_regions=( [Lazio in Italy]="5000" [Tuscany in Italy]="6000" [Bavaria in Germany]="9500" [Saxony in Germany]="7200" )
5
6 for country in "${european_countries[@]}"
7 do
8 for region in "${european_regions[@]}"
9 do
10 cr="$region in $country"
11 if test "${area_of_country_regions[$cr]+isset}"
12 then
13 printf "Area of %-20.20s: %sn" "$cr" "${area_of_country_regions[$cr]}"
14 fi
15 done
16 done
[*]The code creates two normal arrays, one for countries and one for regions, plus one hash with composite keys that associate each region to its country and emulate a two-level hash. The code then generates all possible combinations of regions and countries, but only processes existing elements of areaofcountry_regions, recognizing them with the *isset test of line 11. Rough, but effective, isn’t it?
Also see
For all the latest Technology News Click Here
For the latest news and updates, follow us on Google News.