|

Sort NumPy Arrays By Columns or Rows

NumPy is a fundamental module used in many Python programs and analyses because it conducts numerical computations very efficiently. However, for those new to NumPy, it can be difficult to grasp at first. Specifically, understanding array indexing and sorting quickly becomes complex. Fortunately, NumPy has some built-in functions that make performing basic sorting operations quite simple.

NumPy arrays can be sorted by a single column, row, or by multiple columns or rows using the argsort() function. The argsort function returns a list of indices that will sort the values in an array in ascending value. The kind argument of the argsort function makes it possible to sort arrays on multiple rows or columns. This article will go through sorting single columns and rows and sorting multiple columns using simple examples with code.

Create a NumPy Array of Random Integers

To start, import the numpy module.

import numpy

Now create an array of random integers. Here, I’m creating an array with 5 rows and 4 columns. The values in the array are random, so you will get different values than those shown here if you use the same code. If you’re not familiar with the basics of creating numpy arrays and numpy array shapes, check out this article.

a = np.random.randit(100, size=(5, 4))

output:
[[44 47 64 67]
 [67  9 83 21]
 [36 87 70 88]
 [88 12 58 65]
 [39 87 46 88]]

Sort NumPy Array by Column

To sort an array by a specific column we’re going to use the numpy argsort() function. argsort() returns array indices that would result in a sorted array (source). Let’s take a quick look at what argsort returns and then how to use argsort to get a sorted array.

To start, call argsort on the first column of the array we created. You should get a result something like this. Where the argsort results give the index for the smallest to largest values.

a[:, 0].argsort()
output:
[2 4 0 1 3]

The output shows that the smallest value in the first column is at position 2 (the third value). That is correct, the lowest value in the first column of the array is 36, which is the third value (position 2) in the first column.

To sort the array, we now need to use the indices from the argsort result to reorder the rows in the array. This is a simple procedure that can be done in one line of code. We’ll simply use the argsort result as the row indices and assign the resulting array back to a, as follows.

a = a[a[:, 0].argsort()]
output:
[[36 87 70 88]
 [39 87 46 88]
 [44 47 64 67]
 [67  9 83 21]
 [88 12 58 65]]

As you can see, the rows are now ordered least to greatest according to the first column. To sort on a different column, simply change the column index. For example, we can sort on column 2 with a[a[:, 1].argsort()].

Sort NumPy Array by Row

A NumPy array can also be sorted by row values. This is accomplished in the same way as sorting with columns. We just need to change the indexing positions. For example, let’s take the array we created that’s sorted on the first column, then sort the columns by values in the first row.

To do this, simply move the index (0) to the row position and move the argsort result to the column position. The code below shows the demonstration and result.

a = a[:, a[0, :].argsort()]
output:
[[36 70 87 88]
 [39 46 87 88]
 [44 64 47 67]
 [67 83  9 21]
 [88 58 12 65]]

Sorting on Mulitple Columns

Sometimes it’s necessary to sort on more than one column. One example of this would be with data that have year, month, day, and value in separate columns. NumPy’s argsort can handle sorting multiple columns using the kind argument.

Let’s start by creating an array with 4 columns that represent, year, month, day, and a value.

b = np.array([[2020, 1, 1, 98],
              [2020, 2, 1, 99],
              [2021, 3, 6, 43],
              [2020, 2, 1, 54],
              [2021, 1, 1, 54],
              [2020, 1, 2, 74],
              [2021, 1, 3, 87],
              [2021, 3, 9, 23]])

Now we’ll use argsort to sort the columns, beginning with the lowest priority. That means if we want to sort on year, then month, then day, we need to sort by day first, then month, then year. For all but the first sort we need to specify kind as ‘mergesort’, which will maintain the previous sorting context. This is demonstrated in the code block below.

b = b[b[:, 2].argsort()]  # sort by day
b = b[b[:, 1].argsort(kind='mergesort')]  # sort by month
b = b[b[:, 0].argsort(kind='mergesort')]  # sort by year
output:
[[2020    1    1   98]
 [2020    1    2   74]
 [2020    2    1   99]
 [2020    2    1   54]
 [2021    1    1   54]
 [2021    1    3   87]
 [2021    3    6   43]
 [2021    3    9   23]]

As you can see, this procedure successfully sorted based on the three columns.

Conclusion

NumPy is a fundamental Python package. Because it is built in a compiled language it can greatly increase the speed of Python program over Python lists and tuples (where appropriate). The downside is that NumPy can have a steep learning curve. Once you are able to understand array shaping, indexing, and sorting you are well on your way to being a proficient NumPy user.

Whether you’re looking to take your GIS skills to the next level, or just getting started with GIS, we have a course for you! We’re constantly creating and curating more courses to help you improve your geospatial skills.

All of our courses are taught by industry professionals and include step-by-step video instruction so you don’t get lost in YouTube videos and blog posts, downloadable data so you can reproduce everything the instructor does, and code you can copy so you can avoid repetitive typing


My Recommended Equipment

Computer: Dell XPS

Mouse: Logitech M557 Bluetooth Mouse

External Hard Drive: Seagate Portable 2TB


This article contains affiliate links. When you click on links in this article Open Source Options may make a commission on any sales. This does not have any impact on the price you pay for products.

Similar Posts