I'm not the parent author, but why I haven't tried the Python ecosystem:
I've heard of NumPy, PyPy, SciPy, Pandas, matplotlib, and now Numba. I don't particularly know what these do or how they overlap or interact. Which is kind of the point: to a complete outsider, the world of Python scientific computing feels like a wild west where everybody is happily proclaiming that their setup is just right.
Additionally, Python is slow; Julia is fast. I've heard things like "well, Python is only slow at certain things, and makes it easy to write C code when needed." I don't want to write any C code. Same goes for "typically only a small portion of your Python code is a bottleneck, and it's easy to port that to C."
A huge part of the appeal of Julia is that you don't have to worry about language interoperability, calling C, etc. Everything can be written efficiently and readably in Julia, and because the base language is designed around scientific computing, there's little worry about add-on scientific computing packages not playing nice together.
Now, I fully believe that I could get the right Python environment and set of packages set up be productive. But to get going with Julia, I just download Julia and go.
> I've heard of NumPy, PyPy, SciPy, Pandas, matplotlib, and now Numba. I don't particularly know what these do or how they overlap or interact. Which is kind of the point: to a complete outsider, the world of Python scientific computing feels like a wild west where everybody is happily proclaiming that their setup is just right.
It's not terribly complicated once you spend a little time working with the libraries. Working with arrays? import numpy. Machine learning? import sklearn. Plotting? import matplotlib. Need to do some interpolation, integration, or work with some strange orthogonal polynomials, (etc...)? import scipy. Sure, there's some minor overlap between scipy and numpy, but nothing that causes any problems in my experience.
What I love about using python is that, in addition to all the great math and science libraries, you have all the other python tools at your disposal. Working with xml? import xml. Web-scraping? import urllib2 (or whatever people use now), PyQuery. Additionally, there's all the file system business work that's a joy to do in python using os, sys, etc... libraries.
> NumPy, PyPy, SciPy, Pandas, matplotlib, and now Numba
I'm not sure I buy this argument. Except for pypy, these are all complementary packages and work together. If you breakup anything into its constituent parts, you can make it seem complicated if you want to.
> But to get going with Julia, I just download Julia and go.
Until you need to do something very straightforward like web scraping or interacting with AWS and find out that nobody's released a package for that yet, so you're going to have to reinvent the wheel before you can get to the scientific question that is of actual interest.
Julia looks very interesting as a language, let's just hope it doesn't end up a ghetto like R (lots of awesome statistical tools, a dearth of libraries and tools for everything else).
Definitely, Julia doesn't have as many libraries as Python, but most common needs are covered. And for the rest you can actually call Python through PyCall.jl
This just proves my point, though. Gumbo is an HTML parser, not a scraper. The AWS library "will" support EC2 and S3. Again, this is all perfectly normal and it'd be absurd to expect such a young project to have a rich ecosystem. I think the Moore foundation grant is awesome and hope to use Julia someday. It's just that (1) as a scientist, you really just want to get shit done and (2) because it's marketed solely to scientists, I'm concerned Julia might never develop an ecosystem of general-purpose tools. R certainly hasn't.
Not sure what you mean by "web scrapper" than, but if you just need something to download web pages (in addition to parsing HTML), then Requests.jl covers it pretty well.
IIUC from GitHub page, AWS.jl has most API implemented, just not thoroughly tested. So it's not really mature or robust, but from my experience with packages at the same stage of development it should be pretty usable for a scientist who "just wants to get shit done".
I totally share your concern regarding marketing solely for scientists, though.
My experience with bridges is that they always suck, and the ways in which they suck are extremely difficult to debug and never obvious until after you've made a significant investment.
Shit I've seen: silent memory corruption, silently dropping the last element of an array if and only if it is the last declared field in the structure, killing the debugger every time you hit the bridge, inaccessibility of critical code because it uses X unsupported language feature, event-loop integration nightmares (IO suppression / null-routing / deadlock), exception incompatibility, unconfigurable signal/interrupt stealing. And that's all without counting the "usual suspects" of documentation, performance, and testing issues.
Maybe the Julia-Python bridge doesn't have any of these problems. But I'm not going to be the one to find out.
The PyCall Julia-Python bridge is shockingly good. It was written by Steven Johnson, also the author of FFTW – the world's state-of-the-art Fast Fourier Transform library for a two decades now. He is one of the best, most thorough programmers I know. When he writes something, it works.
I usually use cython for optimizing python, which is a lot easier than working directly with the CPython C API. You do need to know C to get the most out of it, but you can go a long way just adding type declarations to python to speed it up.
Here, I'll use logical indexing to pull out all the positive even numbers from this array of random integers:
Numpy:
a = np.random.random_integers(low = -100, high = 100, size = (100,))
a[np.logical_and(a % 2 == 0, a > 0)
(or a[(a % 2 == 0) * (a > 0)])
Logical indexing is used everywhere in numerical computing. Using functions like logical_and() or boolean multiplication or addition is more difficult to follow than Matlab or Julia.
Julia:
a = rand(-100:100, 100)
a[(a % 2 .== 0) & (a .> 0)]
Well, that's beautiful. Element-wise operations are prefixed with "."
You have made a very verbose numpy version :) Here is a slightly better one:
a = np.random.randint(-100, 100, size=100)
a[(a & 2 == 0) & (a > 0)]
Python does indeed have logical boolean operators.
I think this is clearer than your Julia example because:
- The numpy version makes it clear that you are using random integers, not floating point values.
- The keyword argument for "size" makes it clear what that second 100 is for. (The use of the first two numbers, -100 and 100, is pretty clear from context.)
- In numpy, most operations are element-wise by default, because the result would be ambiguous or not useful otherwise. This removes the line noise of the extra "." before operations.
Don't get me wrong, I think Julia is awesome. I just think you've constructed a very poor example for numpy.
> That's also completely clear in the Julia version, if you learn a little Julia.
In every language I've used, the default is for a "rand" function to return random floats between 0 and 1, and given arguments it returns floats between the arguments. I don't think it has to do with learning Julia, it is just that including "integer" in the function name makes it clear the function returns integers.
> This is why I think the Julia approach is better. If I write `a == 7`, am I testing whether `a` is 7 or whether any of the elements of `a` are 7?
I think this is more of a comment about mixing arrays and scalars in a dynamic language. I made my comment assuming you are performing operations on arrays. If you are comparing two arrays, I think the default of element-wise operations makes more sense.
In julia the rand function is more general in that it samples from a distribution, which you can pass as the first argument (defaulting to uniform on [0,1]). Since the first argument is an integer range, you get an integer value.