A Month of Visualizations

In February I decided to do a data visualization a day, to practice with various tools available, as well as learn about what plots work well for different situations. I think I learned a fair bit doing this exercise, but I had some thoughts about the process and decided to do a write up about the entire process. 

The month was divided between learning ggplot2 and Tableau. Previously, in my job search I found that most Data Analyst positions required knowledge of Tableau which was an area I was lacking in, and ggplot2 is probably the best R package for visualization. Since I was already familiar with R, I decided to spend the first half of the on ggplot2 and other R packages. I spent a good deal of time learning various functions and formatting requirements for different plots. Then I spent the latter half of the month playing with Tableau, and in my personal opinion, I barely scratched the surface there. I look at some dashboards on /r/dataisbeautiful and can see that it can create some powerful dashboards, but a lot of that power is hidden behind layers of “easy to use” interfaces, and often I found myself not knowing how to describe what I wanted to do in order to find a solution. This, in addition to a lacking amount of documentation in a central location made Tableau feel mostly like a power pivot table that could make some pretty plots.  


Positives and Pitfalls

The positives of this experience were great however; I know feel much more comfortable with ggplot2 and other r graphing packages. I learned how to make interactive plots in R, as well as better looking maps using open street maps in conjunction with R. I loved the granularity of control I had using ggplot2. I feel much more comfortable in Tableau than I did before, and now know where I need to focus on practicing. I think my attention to design and communicating the ideas of the data have vastly improved.

There were however some negatives to this experience. Maybe not negatives, but things I didn’t realize before starting this endeavor. Primarily being, it's a bit weird to do visualization about a random data set every day. I felt like I didn’t ever really get to know enough about the data before I started designing a viz. If I were to do something like this again, I would rather spend a week or more on a single data set, perhaps do a weekly write up on a data set. Being familiar with the data set as a whole is important when trying to decide how to make the plot. The domain knowledge needed when making visualization is critical, otherwise what are you communicating? Plus understanding more about the data you have is also very important to be able to make decisions about it.

For example in my UFO plot, I categorized the shape a fair bit because there were almost 30 shapes, many of which could overlap. I reduced these to 9 shape categories, which could really be reduced based on the descriptions. This understanding of the data set is important when trying to make something to communicate an idea, tell the story, or even start a conversation. 

 However, in the rush to get a viz for the day and other competing priorities, (applying for grad schools, my actual job, personal life, and more), the domain knowledge is sort of thrown to the side. And my decision to try a new data set basically every day meant I either had to spend a lot of time getting antiquated, or I would do a very surface level look at a data set. The quantity meant my quality went way down. I think this is a pitfall to avoid in future projects. I vastly underestimated the amount of time making visualization would take.

Description of Process


I would describe my process for making a viz this month as a five step process. 
1.     Choose something to visualize (Pick a problem or chart type to practice)
2.     Download it or scrape the data (Where is it, how do I get it?)
3.     Familiarize myself with the data and do cleaning. (What do I have, what can it say?)
4.     Learn how to make the style of plot I want.  (How do I make this?)
5.     Design the chart and fiddle with settings (How do I make this appealing?)
This let me quicklyish make a visualization every day.  I had planned out a month of 28 different visualizations I wanted to do.  I sort of planned some plots around holidays, and tried to do maps on Mondays.   This was a rough outline however and some changes ended up happening.  These were some visualizations I didn’t end up doing for various reasons, such as lack of time to make the data, lack of visualization ideas for the data, and other road blocks.  You can see a list of those and some excuses at the end.

Stats on the Process

As I mentioned sometimes I had to abandon some visualizations I did this at various stages in my process but I spent just over 8 hours that amounted to no results to share. I did however learn in those 8 hours and work on problem solving skills.  
In total, I spent 51.8333 hours this month working on this project.  The following table shows how much time I spent on each step. (I counted step 1 as 0 as I did that all last month in about 15 mins
Step
Data Gathering
Data Cleaning
Researching
Designing
Time (in Hours)
12.75
11.17
13.92
14
% of Time
24.6%
21.54%
26.85%
27.01%

I was surprised that I spent most of my time designing, as that always seemed like the end of my time. I knew the researching how to I took a good deal amount of time. I spent a larger amount of time cleaning datasets than I expected, considering 18 of the 28 data visualizations, I downloaded the data set from someplace like kaggle.


I wondered if there was a difference between the time spent on R and tableau, and while there was, it is important to remember that I did tableau in the second half of the month and R in the first. This doesn’t account for me being worn down, or other life factors. However, Tableau was usually faster overall, but I did a lot less data scraping and researching with Tableau, which isn’t because of the program.






I wanted to look at how the process proportions changed daily, so I did this area plot, which was useful, but not as useful as the bar plot I did which I think fits better as this is a discreet number of projects.




I spent about 1.8 hours a day working on visualization. I wondered how that compared to the rest of my time so I made daily waffle charts and arranged them into a calendar.




Takeaways


Doing Data Visualization is a long and time consuming process. Instead of doing one a day, I would recommend maybe one a week, or every other week if you’re trying to have a good quantity with some quality analysis.  Without some analysis you can’t get much depth out of your viz, and to have a good viz, you need some knowledge of the data set.
However, this sort of rapid fire method did help me improve my familiarity with various data visualization software, in particular ggplot2. It also exposed me to more kinds of visualizations and helped me learn how to format data to make production of those easier. For that aspect, I think this project was a success.

Unrealized Visualizations

·         Proportional cost of a MTG deck over a few years
o   Deck lists change with the meta
o   Scraping prices could take a while
o   Not a very interesting plot imo
·         Emigration Viz
o   Didn’t have a clear vision
o   Probably a map, but would I  do lines, colors based on setteling?
o   Didn’t have a data set or clear place to scrape data from
·         Meyers Briggs viz
o   Thought I had a data set from Kaggle
o   Wanted to do word coulds for each type
o   Didn’t have time to clean the data set to provide a list for each type
·         Global temp viz
o   I couldn’t think of a creative way to show climate change that hasn’t been done
o   Saw this viz  and didn’t want to copy
·         Common letters in bad words
o   Getting data felt a lot like my Pi code
o   Didn’t actually show anything particularly interesting.
·         Popular names by state
o   Wanted to do a slider in tableau by year
o   Ran out of time to scrape the data
·         Tic tac toe
o   Didn’t really know how to visualize the best moves
o   XKCD already gave a wonderful example
·         Dice rolls
o   I wanted to do this in R, with the code allowing you to select k n-sided dice, and just couldn’t make the code expand properly.
o   Ran out of time that day

Comments

Popular Posts