A Month of Visualizations
In February I decided to do a data
visualization a day, to practice with various tools available, as well as learn
about what plots work well for different situations. I think I learned a fair
bit doing this exercise, but I had some thoughts about the process and decided
to do a write up about the entire process.
The month was divided between
learning ggplot2 and Tableau. Previously, in my job search I found that most
Data Analyst positions required knowledge of Tableau which was an area I was
lacking in, and ggplot2 is probably the best R package for visualization. Since
I was already familiar with R, I decided to spend the first half of the on
ggplot2 and other R packages. I spent a good deal of time learning various
functions and formatting requirements for different plots. Then I spent
the latter half of the month playing with Tableau, and in my personal opinion,
I barely scratched the surface there. I look at some dashboards on /r/dataisbeautiful and can
see that it can create some powerful dashboards, but a lot of that power is
hidden behind layers of “easy to use” interfaces, and often I found myself not
knowing how to describe what I wanted to do in order to find a solution. This,
in addition to a lacking amount of documentation in a central location made
Tableau feel mostly like a power pivot table that could make some pretty plots.
Positives and
Pitfalls
The positives of this experience
were great however; I know feel much more comfortable with ggplot2 and other r
graphing packages. I learned how to make interactive plots in R, as well as
better looking maps using open street maps in conjunction with R. I loved the granularity
of control I had using ggplot2. I feel much more comfortable in Tableau than I
did before, and now know where I need to focus on practicing. I think my
attention to design and communicating the ideas of the data have vastly improved.
There were however some negatives to
this experience. Maybe not negatives, but things I didn’t realize before
starting this endeavor. Primarily being, it's a bit weird to do visualization
about a random data set every day. I felt like I didn’t ever really get to know
enough about the data before I started designing a viz. If I were to do
something like this again, I would rather spend a week or more on a single data
set, perhaps do a weekly write up on a data set. Being familiar with the data
set as a whole is important when trying to decide how to make the plot. The
domain knowledge needed when making visualization is critical, otherwise what
are you communicating? Plus understanding more about the data you have is also
very important to be able to make decisions about it.
For example in my UFO plot, I
categorized the shape a fair bit because there were almost 30 shapes, many of
which could overlap. I reduced these to 9 shape categories, which could really
be reduced based on the descriptions. This understanding of the data set is
important when trying to make something to communicate an idea, tell the story,
or even start a conversation.
However, in the rush to get a
viz for the day and other competing priorities, (applying for grad schools, my
actual job, personal life, and more), the domain knowledge is sort of thrown to
the side. And my decision to try a new data set basically every day meant I
either had to spend a lot of time getting antiquated, or I would do a very
surface level look at a data set. The quantity meant my quality went way
down. I think this is a pitfall to avoid in future projects. I vastly
underestimated the amount of time making visualization would take.
Description of
Process
I would describe my process for making
a viz this month as a five step process.
1.
Choose something to
visualize (Pick a problem or chart type to practice)
2.
Download it or scrape
the data (Where is it, how do I get it?)
3.
Familiarize myself with
the data and do cleaning. (What do I have, what can it say?)
4.
Learn how to make the
style of plot I want. (How do I make this?)
5.
Design the chart and
fiddle with settings (How do I make this appealing?)
This let me quicklyish make a visualization every day. I had planned out a month of 28 different visualizations
I wanted to do. I sort of planned some
plots around holidays, and tried to do maps on Mondays. This
was a rough outline however and some changes ended up happening. These were some visualizations I didn’t end
up doing for various reasons, such as lack of time to make the data, lack of visualization
ideas for the data, and other road blocks. You can see a list of those and some excuses
at the end.
Stats on the Process
As I mentioned sometimes I had to abandon some visualizations
I did this at various stages in my process but I spent just over 8 hours that
amounted to no results to share. I did however learn in those 8 hours and work
on problem solving skills.
In total, I spent 51.8333 hours this month working on this
project. The following table shows how
much time I spent on each step. (I counted step 1 as 0 as I did that all last
month in about 15 mins
Step
|
Data Gathering
|
Data Cleaning
|
Researching
|
Designing
|
Time (in Hours)
|
12.75
|
11.17
|
13.92
|
14
|
% of Time
|
24.6%
|
21.54%
|
26.85%
|
27.01%
|
I was surprised that I spent most of my time designing, as
that always seemed like the end of my time. I knew the researching how to I
took a good deal amount of time. I spent a larger amount of time cleaning
datasets than I expected, considering 18 of the 28 data visualizations, I
downloaded the data set from someplace like kaggle.
I wondered if there was a difference between the time spent
on R and tableau, and while there was, it is important to remember that I did
tableau in the second half of the month and R in the first. This doesn’t
account for me being worn down, or other life factors. However, Tableau was
usually faster overall, but I did a lot less data scraping and researching with
Tableau, which isn’t because of the program.
I wanted to look at how the process proportions changed
daily, so I did this area plot, which was useful, but not as useful as the bar
plot I did which I think fits better as this is a discreet number of projects.
I spent about 1.8 hours a day working on visualization. I
wondered how that compared to the rest of my time so I made daily waffle charts
and arranged them into a calendar.
Takeaways
Doing Data Visualization is a long and time consuming
process. Instead of doing one a day, I would recommend maybe one a week, or
every other week if you’re trying to have a good quantity with some quality analysis.
Without some analysis you can’t get much
depth out of your viz, and to have a good viz, you need some knowledge of the
data set.
However, this sort of rapid fire method did help me improve
my familiarity with various data visualization software, in particular ggplot2.
It also exposed me to more kinds of visualizations and helped me learn how to
format data to make production of those easier. For that aspect, I think this
project was a success.
Unrealized Visualizations
·
Proportional cost of a MTG deck over a few years
o
Deck lists change with the meta
o
Scraping prices could take a while
o
Not a very interesting plot imo
·
Emigration Viz
o
Didn’t have a clear vision
o
Probably a map, but would I do lines, colors based on setteling?
o
Didn’t have a data set or clear place to scrape
data from
·
Meyers Briggs viz
o
Thought I had a data set from Kaggle
o
Wanted to do word coulds for each type
o
Didn’t have time to clean the data set to
provide a list for each type
·
Global temp viz
o
I couldn’t think of a creative way to show
climate change that hasn’t been done
·
Common letters in bad words
o
Getting data felt a lot like my Pi code
o
Didn’t actually show anything particularly interesting.
·
Popular names by state
o
Wanted to do a slider in tableau by year
o
Ran out of time to scrape the data
·
Tic tac toe
o
Didn’t really know how to visualize the best
moves
o
XKCD already gave a wonderful example
·
Dice rolls
o
I wanted to do this in R, with the code allowing
you to select k n-sided dice, and just couldn’t make the code expand properly.
o
Ran out of time that day
Comments
Post a Comment