Sort NumPy Arrays By Columns or Rows
NumPy is a fundamental module used in many Python programs and analyses because it conducts numerical computations very efficiently. However, for those new to NumPy, it can be difficult to grasp at first. Specifically, understanding array indexing and sorting quickly becomes complex. Fortunately, NumPy has some built-in functions that make performing basic sorting operations quite simple.
NumPy arrays can be sorted by a single column, row, or by multiple columns or rows using the argsort()
function. The argsort
function returns a list of indices that will sort the values in an array in ascending value. The kind
argument of the argsort
function makes it possible to sort arrays on multiple rows or columns. This article will go through sorting single columns and rows and sorting multiple columns using simple examples with code.
Create a NumPy Array of Random Integers
To start, import the numpy
module.
import numpy
Now create an array of random integers. Here, I’m creating an array with 5 rows and 4 columns. The values in the array are random, so you will get different values than those shown here if you use the same code. If you’re not familiar with the basics of creating numpy
arrays and numpy
array shapes, check out this article.
a = np.random.randit(100, size=(5, 4))
output:
[[44 47 64 67]
[67 9 83 21]
[36 87 70 88]
[88 12 58 65]
[39 87 46 88]]
Sort NumPy Array by Column
To sort an array by a specific column we’re going to use the numpy
argsort()
function. argsort()
returns array indices that would result in a sorted array (source). Let’s take a quick look at what argsort
returns and then how to use argsort
to get a sorted array.
To start, call argsort
on the first column of the array we created. You should get a result something like this. Where the argsort
results give the index for the smallest to largest values.
a[:, 0].argsort() output: [2 4 0 1 3]
The output shows that the smallest value in the first column is at position 2 (the third value). That is correct, the lowest value in the first column of the array is 36, which is the third value (position 2) in the first column.
To sort the array, we now need to use the indices from the argsort
result to reorder the rows in the array. This is a simple procedure that can be done in one line of code. We’ll simply use the argsort
result as the row indices and assign the resulting array back to a
, as follows.
a = a[a[:, 0].argsort()] output: [[36 87 70 88] [39 87 46 88] [44 47 64 67] [67 9 83 21] [88 12 58 65]]
As you can see, the rows are now ordered least to greatest according to the first column. To sort on a different column, simply change the column index. For example, we can sort on column 2 with a[a[:, 1].argsort()]
.
Sort NumPy Array by Row
A NumPy array can also be sorted by row values. This is accomplished in the same way as sorting with columns. We just need to change the indexing positions. For example, let’s take the array we created that’s sorted on the first column, then sort the columns by values in the first row.
To do this, simply move the index (0) to the row position and move the argsort
result to the column position. The code below shows the demonstration and result.
a = a[:, a[0, :].argsort()] output: [[36 70 87 88] [39 46 87 88] [44 64 47 67] [67 83 9 21] [88 58 12 65]]
Sorting on Mulitple Columns
Sometimes it’s necessary to sort on more than one column. One example of this would be with data that have year, month, day, and value in separate columns. NumPy’s argsort
can handle sorting multiple columns using the kind
argument.
Let’s start by creating an array with 4 columns that represent, year, month, day, and a value.
b = np.array([[2020, 1, 1, 98], [2020, 2, 1, 99], [2021, 3, 6, 43], [2020, 2, 1, 54], [2021, 1, 1, 54], [2020, 1, 2, 74], [2021, 1, 3, 87], [2021, 3, 9, 23]])
Now we’ll use argsort
to sort the columns, beginning with the lowest priority. That means if we want to sort on year, then month, then day, we need to sort by day first, then month, then year. For all but the first sort we need to specify kind as ‘mergesort’, which will maintain the previous sorting context. This is demonstrated in the code block below.
b = b[b[:, 2].argsort()] # sort by day b = b[b[:, 1].argsort(kind='mergesort')] # sort by month b = b[b[:, 0].argsort(kind='mergesort')] # sort by year output: [[2020 1 1 98] [2020 1 2 74] [2020 2 1 99] [2020 2 1 54] [2021 1 1 54] [2021 1 3 87] [2021 3 6 43] [2021 3 9 23]]
As you can see, this procedure successfully sorted based on the three columns.
Conclusion
NumPy is a fundamental Python package. Because it is built in a compiled language it can greatly increase the speed of Python program over Python lists and tuples (where appropriate). The downside is that NumPy can have a steep learning curve. Once you are able to understand array shaping, indexing, and sorting you are well on your way to being a proficient NumPy user.
Whether you’re looking to take your GIS skills to the next level, or just getting started with GIS, we have a course for you! We’re constantly creating and curating more courses to help you improve your geospatial skills.
All of our courses are taught by industry professionals and include step-by-step video instruction so you don’t get lost in YouTube videos and blog posts, downloadable data so you can reproduce everything the instructor does, and code you can copy so you can avoid repetitive typing
My Recommended Equipment
Computer: Dell XPS
Mouse: Logitech M557 Bluetooth Mouse
External Hard Drive: Seagate Portable 2TB
This article contains affiliate links. When you click on links in this article Open Source Options may make a commission on any sales. This does not have any impact on the price you pay for products.