It's no big secret that I have struggled mightily to find the right fit or focus for my community-taught developer journey. Front-end work has it's fun points but, I am ultimately not an amazing designer and learning JavaScript felt a wee bit like low level torture. I am more than content to lean into my favorite no-code tools when I need to put a pretty face on something I've built. Python seems to suit me much more and I am still working away at building my Pythonista ninja skills. For me, one of the more exciting aspects of working with Python is how widely the language is used in the data science spaces. I started to dig a bit into the data scientist career path and quickly became rather discouraged. It seems the door is narrow for someone self-taught versus those degreed in mathematics and/or computer science. The barrier to entry is higher than what I can overcome in the immediate future. So what is a wanna be data nerd to do?
Enter the Data Analyst role! Once I learned more about what a data analyst does, I realized this is exactly the type of entry point I'm seeking. I can leverage and build upon my former experience with Excel, keep growing my Python skills, and bring in new tools like SQL to round out a starter stack. This feels doable and actually exciting (for me at least!). With all of that in mind, what does the road ahead look like?
Data Analyst Road Map
Last night I started on the Data Analyst Career Path through Codecademy Pro. They don't pay me but, I can't say enough good things about their Pro membership. I did the annual membership at $20/month and it has allowed me to venture into different areas of tech and take them for a test drive without feeling like I wasted money on a course if it ends up not being for me. My Pro membership lets me take as many courses as I like so I can move on to something new if I start a topic and don't find it interesting. Ok, that's the end of my Codecademy Pro pitch, now let's talk about what I'm learning!
What does data analytics mean, anyway? The data analyst takes data and converts it into a form that can be used to help make decisions. A design team may not be able to look at a spreadsheet full of rows and columns of user data and readily find patterns to inform their design strategy. A data analyst can gather, clean, and present that same data in a way that tells a design team something like 50% of visitors never make it past their site home page. Now the design team has information they can work with to analyze the home page and try to determine how to change UX and UI so users click through to subsequent pages.
How exactly does a data analyst perform this sorcery of wrangling piles of data into tidy, digestible, and, most importantly, actionable pieces of information? Sources seem to disagree on exactly how many steps are involved, and what they should be called, but everyone seems to mostly agree the data analytics process follows a series of steps that may be non-linear and may require additional iterations before the process is complete. I am going to share my current (admittedly limited) understanding of the steps here:
1. Form the question you want to answer
It sounds fairly simple but, asking the right question is key to making sure your next steps go in the proper direction and it is not always easy to do. In the design team example used above, the team may have started with a broad question like "How can we change the site design to increase conversion from visits to sales?" While this question addresses the end goal, it is too broad and does not include anything that can be specifically measured. If we think about the path from site visit to sale, we can realize there are a series of clicks and pages to interact with along the way. We can think of these interactions as a series of stepping stones a visitor takes to complete the conversion journey. This means if we can identify the stepping stone where the most visitors are leaving the path (aka drop-off), we can focus in on that step and how to improve it. A good question in this case might be "Over the past 60 days, which step/interaction in the path from site visit to conversion saw the highest rate of drop-off?"
2. Determine what data you need and collect it
Depending upon the question you need to answer, you may be able to gather all the data you need from internal (aka primary) sources. In our conversion drop-off case, we would want to know how many visitors each page sees. Our question also requires data around how many visitors to each page moved on to the next page in the path to conversion. Other relevant data could include how many sales were completed during the 60 day period, and numbers around revenue such as total sales or average sales per conversion. Some use cases benefit from external (or secondary) data sources such as public databases, focus group results, API's, and search engine trends.
3. Clean the data, organize, prepare for analysis
This is part of the process I don't yet know a lot about. Data needs cleaning to filter out bad data and ensure your analysis is built upon the most accurate and complete information possible. During this process, duplicate and incomplete entries are removed and outliers addressed. A lot of this data wrangling can be carried out in Python using libraries such as pandas and NumPy. Python can also automate significant portions of the process to make cleaning more efficient. I am really looking forward to learning more about automating some of the more tedious parts of data cleaning. Saving valuable time for analysis is a big win!
4. Investigate, explore, analyze, and interpret
I know even less about the analysis phase than the cleaning phase but, there are some things I can share. Different data mining methods can be used to help identify patterns in your data you might not recognize at first review. I'll have more to say about this as I get deeper into my learning. Once patterns are identified, conclusions can be drawn about how the pieces of information relate and how they contribute to answering your question.
5. Share your findings
Now that you have your pieces of information, it's time to share them with your team. Odds are good your audience won't be a room full of data scientists so, it's important to present your conclusions in ways that make them accessible to all stakeholders. These presentations can take the form of visuals, such as graphs and charts, or via storytelling where you lay out your findings in sentence form or written report. Most cases will make use of a combination of these methods. There are even tools built into my beloved Python to help with data visualization.
This is the way
When I look back on projects I most enjoyed over the course of my admittedly varied career, they mostly share a common thread: data driven decision making. I never had formal data science education but, it instinctively made sense to me to try to prove a possible solution was the best one. What better way to prove your hypothesis than with data, right? I knew nothing about coding or machine learning but, I understood Excel and how to use it to create a spreadsheet of data. I had some idea of how to interpret the data for a dataset I developed, so it was easy enough to do even without the "real tools" a data analyst would use. I was able to cobble together productivity data to demonstrate an email team I managed was just as productive (or even more so) when working part time from home. The data supported my hypothesis these employees would be more productive with the additional flexibility of remote work days and allowed us to take the experiment from pilot to policy. I found the whole process challenging, educational, and very satisfying when the problem was solved.
It's been over ten years since the scenario I describe above. Data is a whole different beast now in terms of how much is available, how readily it can be accessed, and the tools available to take it through the data analytics process. As a true believer in data driven decision making in business, I'm excited to learn the tools of the trade and see how I can shape the industry. Next up in my Data Analyst Career Path: Python Fundamentals. It looks like this section is folding in items I already completed in the other Python course as some items are showing completion percentages. So nice to not lose my progress! I also learned a tiny bit about Funnel Analysis which is the process of identifying where a user drops off in the journey to conversion (similar to the example used in this article). Funnel analysis is a fascinating concept to me and I hope to find time to take a deeper dive into it soon.