Author Archives: randyzwitch - Articles

JuliaCon 2015: Everyday Analytics and Visualization (video)

By: randyzwitch - Articles

Re-posted from: http://randyzwitch.com/juliacon-2015-everyday-analytics-and-visualization-video/

At long last, here’s the video of my presentation from JuliaCon 2015, discussion common analytics tasks and visualization. This is really two talks, the first being an example of using the citibike NYC API to analyze ridership of their public bike program, and the second a discussion of the Vega.jl package.

Speaking at JuliaCon 2015 at MIT CSAIL is the professional highlight of my year; hopefully even more of you will attend next year.

Enjoy!

Edit: For those of you who would like to follow-along using the actual presentation code, it is available on GitHub.

Vega.jl Rebooted – Now with 100% More Pie and Donut Charts!

By: randyzwitch - Articles

Re-posted from: http://randyzwitch.com/vega-jl-julia/

piedonut

 

 

 

 

 

Mmmmm, chartjunk!

Rebooting Vega.jl

Recently, I’ve found myself without a project to hack on, and I’ve always been interested in learning more about browser-based visualization. So I decided to revive the work that John Myles White had done in building Vega.jl nearly two years ago. And since I’ll be giving an analytics & visualization workshop at JuliaCon 2015, I figure I better study the topic in a bit more depth.

Back In Working Order!

The first thing I tackled here was to upgrade the syntax to target v0.4 of Julia. This is just my developer preference, to avoid using Compat.jl when there are so many more visualizations I’d like to support. So if you’re using v0.4, you shouldn’t see any deprecation errors; if you’re using v0.3, well, eventually you’ll use v0.4!

Additionally, I modified the package to recognize the traction that Jupyter Notebook has gained in the community. Whereas the original version of Vega.jl only displayed output in a tab in a browser, I’ve overloaded the writemime method to display :VegaVisualization inline for any environment that can display HTML. If you use Vega.jl from the REPL, you’ll still get the same default browser-opening behavior as existed before.

The First Visualization You Added Was A Pie Chart…

…And Followed With a Donut Chart?

Yup. I’m a troll like that. Besides, being loudly against pie charts is blowhardy (even if studies have shown that people are too stupid to evaluate them).

Adding these two charts (besides trolling) was a proof-of-concept that I understood the codebase sufficiently in order to extend the package. Now that the syntax is working for Julia v0.4, I understand how the package works (important!), and have improved the workflow by supporting Jupyter Notebook, I plan to create all of the visualizations featured in the Trifacta Vega Editor and other standard visualizations such as boxplots. If the community has requests for the order of implementation, I’ll try and accommodate them. Just add a feature request on Vega.jl GitHub issues.

Why Not Gadfly? You’re Not Starting A Language War, Are You?

No, I’m not that big of a troll. Besides, I don’t think we’ve squeezed all the juice (blood?!) out of the R vs. Python infographic yet, we don’t need another pointless debate.

My sole reason for not improving Gadfly is just that I plain don’t understand how the codebase works! There are many amazing computer scientists & developers in the Julia community, and I’m not really one of them. I do, however, understand how to generate JSON strings and in that sense, Vega is the perfect platform for me to contribute.

Collaborators Wanted!

If you’re interested in visualization, as well as learning Julia and/or contributing to a package, Vega.jl might be a good place to start. I’m always up for collaborating with people, and creating new visualizations isn’t that difficult (especially with the Trifacta examples). So hopefully some of you will be interested in enough to join me to adding one more great visualization library to the Julia community.

Introducing Twitter.jl

By: randyzwitch - Articles

Re-posted from: http://randyzwitch.com/twitter-api-julia/

This is possibly the latest “announcement” of a package ever, given that Twitter.jl has actually existed on METADATA for nearly a year now, but that’s how things go sometimes. Here’s how to get started with Twitter.jl and some of the highlights.

Hello, World!

If ‘Hello, World!’ is the canonical example of getting started with a programming language, the Twitter API is becoming the first place to start for people wanting to learn about APIs. Authenticating with the Twitter API using Julia is similar to using the R or Python packages, except that rather than doing the OAuth “dance”, Twitter.jl takes all four authentication values in one function:All four of these values can be found after registering at the Twitter Developer page and creating an application. Having all four values in your script is less secure than just providing the api key and api secret, but in the future, I’ll likely implement the full OAuth “handshake”. One thing to keep in mind with this function as it currently works is that no validation of your credentials is performed; the only thing this function does is define a global variable twittercred for later use by the various functions that create the OAuth headers. To shout “Hello, World!” to all of your Twitter followers, you can use the following code:

General Package/Function Structure

From the example above, you can see that the function naming follows the Twitter REST API naming convention, with the HTTP verb first and the endpoint as the remainder of the function name. As such, it’s a good idea at this early package state to have the Twitter documentation open while using this package, so that you can quickly find the methods you are looking for.

For each function/API endpoint, I’ve gone through and determined which parameters are required; these are required arguments in the Julia functions. For all other options, each function takes a second optional Dict{String, String} for any option shown in the Twitter documentation. While this Dict structure allows for ultimate flexibility (and quick definition of functions!), I do realize that it’s less than optimal that you don’t know what optional arguments each Twitter endpoint allows.

As an example, suppose you wanted to search for tweets containing the hashtag #julialang. The minimum function call is as follows:By default, the API will return the 15 most recent tweets containing the #julialang hashtag. To return the most recent 100 tweets (the maximum per API ‘page’), you can pass the “count” parameter via the Options Dict:

Composite Types and DataFrames definitions

The Twitter API is structured into 4 return data types (Places, Users, Tweets, and Entities), and I’ve mimicked these types using Julia Composite Types. As such, most functions in Twitter.jl return an array of specific type, such as Array{TWEETS,1} from the prior #julialang search example. The benefit to defining custom types for the returned Twitter data is that rudimentary DataFrame methods have also been defined:

I describe these DataFrames as ‘rudimentary’ as they parse the top level of JSON into columns, which results in some DataFrame columns having complex data types such as Dict() (and within the Dict(), nested Dicts!). As a running theme in this post, this is something I hope to get around to improving in the future.

Want to Get Started Developing Julia? Start Here!

One of the common questions I get asked is how to get started with Julia, both from a learning perspective and from a package development perspective. Hacking away on the core Julia codebase is great if you have the ability, but the code can certainly be intimidating (the people are quite friendly though). Creating a package isn’t necessarily hard, but you have to think about an idea you want to implement. The third alternative is…

…improve the Twitter package! If you go to the GitHub page for Twitter.jl, you’ll see a long list of TODO items that need to be worked on. The hardest part (building the OAuth headers) has already been taken care of. What’s left is re-factoring the code for simplification, factoring out the OAuth code in general into a new Julia library (also partially started), then building the Streaming API functions, cleaning up the DataFrame methods to remove the Dict column types, paging through API results…and so-on.

So if any of you are on the sidelines wanting to get some practice on developing packages, without needing to worry about learning Astrophysics first, I’d love to collaborate. And if any Julia programming masters want to collaborate, well that’s great too. All help and pull requests are welcomed.

In the meantime, hopefully some of you will find this package useful for natural language processing, social networking analysis or even creating bots 😉