tulika  goyal tulika goyal

Sort and Filter: The basics of spreadsheets

Introduction

The most basic tool used for data wrangling is a spreadsheet. Data contained in a spreadsheet is in a structured, machine-readable format and hence can quickly be sorted and filtered. In other recipes in the handbook, you’ll learn how to use the humble spreadsheet as a power tool for carrying out simple sums (finding the total, the average etc.), applying bulk processes, or pulling out different graphs and charts.

By the end of the module, you will have learned how to download data, how to import it into a spreadsheet, and how to begin cleaning and interpreting it using the ‘sort’ and ‘filter’ functions.

Spreadsheets: An Overview

Nowadays spreadsheets are widespread so a lot of people are familiar with them already. A variety of spreadsheet programs and applications exist. For example Microsoft’s Office package comes with Excel, the OpenOffice package comes with Calc and so on. Not surprisingly, Google decided to add spreadsheets to their documents package. Since it does not require you to purchase or install any additional software, we will be using Google Spreadsheets for this course.

Depending on what you want to do you might consider using different spreadsheet software. Here are some of the considerations you might make when picking your weapon of choice:

SpreadsheetGoogle SpreadsheetsOpen(Libre)OfficeMicrosoft Excel

UsageFree (as in Beer)Free (as in Freedom)Commercial

Data StorageGoogle DriveYour hard diskYour hard disk

Needs InternetYesNoNo

Installation requiredNoYesYes

CollaborationYesNoNo

Sharing resultsEasyHarderHarder

VisualizationsLarge rangeBasic chartsBasic charts

Creating a spreadsheet and uploading data

In this course we will use Google docs for our data-wrangling – it allows you to start right away without need of installing software. Since the data we are working with is already public we also don’t need to worry about the fact that it is not stored on our local drive.

Walktrough: Creating a Spreadsheet and uploading data.

Head over to Google docs.

If you are not yet logged in to Google docs, you need to login.

The first step is going to be creating a new spreadsheet.

Do this by clicking the create button to the left and select spreadsheet.

https://i1.wp.com/farm9.staticflickr.com/8448/7871786616_ef5892fe33_o_d.jpg

Doing so will create a new spreadsheet for you.

Let’s upload some data.

You will need the file we downloaded from the World Bank in the last tutorial. If you haven’t done the tutorial or lost the file: download it here .

In your spreadsheet select import from the file menu. This will open a dialog for you.

Select the file you downloaded.

Don’t forget to select insert new sheets, and click import

https://i0.wp.com/farm9.staticflickr.com/8306/7872679284_c321614681_b_d.jpg

Navigating and using the Spreadsheet

Now we loaded some data let’s deal with the basics of spreadsheets. A spreadsheet is basically a table of “cells” in which you can input data. The cells are organized in “rows” and “columns”. Typically rows are labeled by numbers, columns by letters. This also means cells can be addressed by their “column” and “row” coordinates. The cell A1 denotes the cell in the first row in the first column, A2 the one in the second row, B1 the one in the second column and so on.

To enter or change data in a cell click on it and start typing – this will change the contents of the cell. Basic navigation can be done this way or via keyboard. Find a list of keyboard shortcuts good to know below:

Key or CombinationWhat it does

TabEnd input on the current cell and jump to the cell right to the current one

EnterEnd input and jump to the next row (This will try to be intelligent, so if you’re entering multiple columns, it will jump to the first column you are entering

UpMove to the cell one row up

DownMove to the cell one row down

LeftMove to the cell left

RightMove to the cell on the Right

Ctrl+<direction>Move to the outermost cell in the direction given

Shift+<direction>Select the current cell and the cell in <direction>

Ctrl+Shift+<direction>Select all cells from the current to the outermost cell in <direction>

Ctrl+cCopy – copies the selected cells into the clipboard

Ctrl+vPaste – pastes the clipboard

Ctrl+xCut – copies the selected cells into the clipboard and removes them from their original position

Ctrl+zUndo – undoes the last change you made

Ctrl+yRedo – undoes an undo

Tip: Practice a bit, and you will find that you will become a lot faster using the keyboard than the mouse!

Locking Rows and Columns

The spreadsheet we are working on is quite large. You will notice, that while scrolling the column with the column labels will frequently disappear, leaving you quite lost. The same with the country names. To avoid this you can “lock” rows and columns so they don’t disappear.

Walkthrough: Locking the top row

Go to the Spreadsheet with our data and scroll to the top.

On the top left, where the column and row labels are you’ll see a small striped area.

https://i2.wp.com/farm9.staticflickr.com/8322/8070104022_e233a65687_o_d.png

Hover over the striped bar on top of box showing row “1”. A hand shaped cursor should appear, click and drag it down one row.

Your result should look like this:

https://i1.wp.com/farm9.staticflickr.com/8176/8070115059_d960b3d09e_o_d.png

Try scrolling – notice how the top row remains fixed?

Sorting Data

The first thing to do when looking at a new dataset is to orient yourself. This involves at looking at maximum/minimum values and sorting the data so it makes sense. Let’s look at the columns. We have data about the GDP, healthcare expenditure and life expectancy. Now let’s explore the range of data by simply sorting.

Walkthrough: Sorting a dataset

Select the whole sheet you want to sort. Do this by clicking on the right upper grey field, between the row and column names.

https://i2.wp.com/farm9.staticflickr.com/8322/8070104022_e233a65687_o_d.png

Select “Sort Range…” from the “Data” menu – this will open an additional Selection

Check the “Data has header row” checkbox

https://i2.wp.com/farm9.staticflickr.com/8437/7872826062_017d1bfe19_o_d.jpg

Select the column you want to sort by in the dropdown menu

Try to sort by GDP – Which country has the lowest?

Try again with different values, can you sort ascending and descending?

Tip: Be careful! A common mistake is to forget to select all the data. If you sort without selecting all the data, the rows will no longer match up.

A version of this recipe can also be found in the Handbook.

Filtering Data

The next thing commonly done with datasets is to filter out the values you don’t want to see. Did you notice that some “Country Names” are actually not countries? You’ll find things like “World”, “North America” and “Arab World”. Let’s filter them out.

Walkthrough: Filtering Data

Select the whole table.

Select “Filter” from the “Data” menu.

You now should see triangles next to the column names in the first row.

Click on the triangle next to country name.

you should see a long list of country names in the box.

https://i2.wp.com/farm9.staticflickr.com/8316/8070573150_2cf29b914f_o_d.png

Find those that are not a country and click on them (the green check mark will disappear).

Now you have successfully filtered your dataset.

Go ahead and play with it – the data will not be deleted, it’s just not displayed.

tulika  goyal

tulika goyal Creator

B-tech 2nd year student of polymer science.

Suggested Creators

tulika  goyal