News

Which Is A Better Programming Language For Data Science? Python Or R

Python vs. R is a raging debate topic between members of the data scientist community. Both languages are used for data science and analysis and they offer advantages and disadvantages depending on the work you are doing.

To help data scientists choose the right language, a computer science professor named Norm Matloff from the University of California, Davis, has published a detailed comparison of Python and R across various factors.

Professor Matloff compared both languages across the following 11 aspects to determine which language is better suited for which tasks:

R vs. Python for data science

1. Elegance

Clear win for Python.

When it comes to elegance, Python is a winner due to its reduced usage of parentheses and braces while coding, making it “more sleek.”

2. Learning curve

Huge win for R.

Newcomers have an easy time learning R which already has data analysis features built into it and is good for statistical computing.

Whereas working with Python requires extra work to learn the material required to get started with the language such as NumPy, Pandas, and matplotlib.

3. Available libraries

Slight edge to R

The Python Package Index (PyPI) has over 183,000 packages, whereas the Comprehensive R Archive Network (CRAN) has over 12,000. “The fact that R has a canonical package structure is a big advantage,” says Matloff.

4. Machine learning

Slight edge to Python here

The increasing growth of Python in recent years can be attributed to the rise of ML and AI. While Python offers several finely-tuned libraries for image recognition, such as AlexNet, their R versions can easily be developed, says Matloff.

5. Statistical correctness

Big win for R

It is seen that professionals working on ML sometimes have an inadequate understanding of the statistical issues present in Python. Whereas R is a programming language for data science that was written by statisticians and for statisticians.

6. Parallel computation

Let’s call it a tie

Matloff writes that the base versions of R and Python don’t have strong support for multicore computation. Given that Python’s multiprocessing package doesn’t work well for its other issues, and R’s parallel package isn’t that great either, it’s a tie.

7. C/C++ interface

Slight win for R

R has powerful tools like Rcpp for interfacing R to C/C++ whereas Python has tools like swig for the same. It’s not as powerful compared to R and the Pybind11 package is still being developed.

8. Object orientation, metaprogramming

Slight win for R

Although functions are treated as objects in both R and Python, R takes it more seriously. For instance, cannot print a function to the terminal, which is possible in R. Also, R’s metaprogramming features (code that generates code), makes it more attractive.

9. Language unity

Horrible loss for R

The version of Python programming language is transitioning from 2.7 to 3.x, but it won’t cause much disruption. However, R is forking into two different versions due to RStudio: R and the Tidyverse.

It would have helped if Tidyverse were superior to ordinary R, but in Matloff’s opinion, it is not which “makes things more difficult for beginners.”

10. Linked data structures

Win for Python

It is easier to implement classical computer science data structures such as binary trees in Python. The same can be achieved in R using its ‘list’ class, but it is much slower.

11. Online help

Big win for R

The basic help() function in R is much more informative than Python well supported by example() making it an undisputed winner in this aspect.

To Top

Pin It on Pinterest

Share This