A Data Analysis Journey: Microsoft Excel, Google Sheets, Jupyter Notebooks Part 1
Recently, my wife and I decided to learn more about Excel. I wanted to learn Excel as I mostly use Google Sheets for spreadsheet problems, and my wife has been using Excel at work, but wants to learn more data analysis for her job.
This seemed like the right opportunity to learn together and help each other out as we learned. We chose the book, Microsoft Excel 2019 Data Analysis and Business Modeling (6th Edition) (Business Skills) 6th Edition by Wayne Winston. This booked appeared to have the right balance of what we were looking for. We wanted something that provided hands-on exercises and had a data analysis focus.
In my job, I help build online learning content at Udacity, so this was going to be a learning opportunity for me on learning from a book with hands-on exercises that go along with it. We quickly learned after the first chapter that learning from a book with exercises was going to be more challenging then we had hoped. There simply was not a lot of support when you needed it, which is unfortunate for book of this length.
As we started our journey I also thought about how many of the things I would be learning in to do in Excel I already knew how to do in both Google Sheets and in Jupyter Notebooks. This led me to think about what the different strengths and weaknesses of each are. There are so many people in the world that never get exposed to other tools and therefore don’t know what’s possible. I decided I would right a series of posts about our experience with this book and how others might want to try new tools as well. I hope that these posts help in learning new ways to work with data and allow others to explore.
I have been using Google Sheets as my primary spreadsheet tool for about a decade now. Over that time, I have become much more comfortable with Sheets and more capable in using some of it’s more advanced features. Here are a coupe of important things to note about Google Sheets
- It’s free to use with a Google account
- It’s a fantastic collaborative spreadsheet tool
- Google Sheets can do many very advanced spreadsheet and database functions
- It’s easy to get started with.
- Google Sheets is a cloud based tool, but you do some things offline.
I stopped using Microsoft tools around 2010 and as I learned more and more about how amazing data analysis and spreadsheet could be, it was not in Excel. However, I do believe that in general spreadsheet skills should be mostly transferable between tools and I have been interested in how to use Excel more effectively for a while now. Here are a few things to note about Excel.
- There are two versions the cloud Office 365 version and the desktop version.
- The Office 365 version has generally less features overall, but a few tricks up it’s sleeve. It’s also semi collaborative.
- Excel is very powerful, but fairly complicated to get started with.
- While there is a free version of Excel at office.com with a Microsoft account, you will likely need the paid version to what you need to.
- I purchased the personal version for $69 per year to get the office 365 and desktop versions
- Excel is used by more people
I was introduced to Jupyter Notebooks and Pandas about 3 years ago and I have since taken a few courses that use them as well as played around and completed some data analysis using Python, Pandas and Jupyter Notebooks. Jupyter Notebooks more code based and most often use the Python or R programming languages to complete their analysis along with other libraries. You can think of a library as a program someone else created to more easily complete tasks in code. They are kind of like a macro in Excel or an add-on in Google Sheets. One of those great libraries is Pandas. There are some advantages and some disadvantages to using them.
- All of these are open source software which means that they are free to use
- You can install and use Jupyter Notebooks ocally or use them in cloud based environments like Google’s Colabs. In fact, if you have never used a Notebook before Colab is a fantastic way to try it out without having to install anything on your computer.
- Jupyter Notebooks rely on other programming languages like Python or R which means you also would need to learn at least a little about how these languages works.
- Notebooks are easily shareable and can be saved into many different formats
- Notebooks are often used for machine learning and AI meaning they can be used for basic data analysis and much more.
- They are less beginner friendly, but very powerful.
- There are lots of great tutorials and guides out there to help you learn how to use Jupyter, Python and Pandas.
- You can get pretty much everything you need by installing a program called Anaconda. But again if you are new, it would be best to try something like Google Colab where you won’t need to install anything.
Wrapping Up Part 1
The tools above are certainly not exhaustive and there are many other pluses and minuses about each of the tools listed here. However, these are the tools that I’m most familiar with.
Throughout my career as an Educator and now as a Curriculum Manager, I have learned that many people do not have even basic skills when it comes to analyzing data or even just working in a spreadsheet. This series is about showing people a few tools that are available and what you can do with them.
This series is not meant to teach people how to do data analysis in these tools, but rather to pickup a few tricks here and there and to get new people interested in learning more about data analysis as data has become critical in today’s world. It is also meant to help compare these tools to see which tools might be best for different types of tasks in data analysis as well. I hope people find this series helpful and/or interesting.