What are the disadvantages/limitations of Python
Although Python has many advantages, no language is a perfect solution for all the needs. Though Python has several strengths over almost all the other programming languages, it also has some disadvantages/limitations. Over the last five years, several developments made these disadvantages/limitations less severe, but they are still noticable.
Speed of execution
The main drawback with Python is its speed of execution. It isn’t a fully compiled language.
Instead, it’s first compiled to an internal bytecode form, which is then executed
by a Python interpreter. Python is slower in execution than C. Some of its speed limitations are due to its default implementation, cPython, being single-threaded. That is, cPython doesn’t use more than one hardware thread at a time. And while you can use the threading module built into Python to speed things up, threading only gives you concurrency, not parallelism. It’s good for running multiple tasks that aren’t CPU-dependent, but does nothing to speed up multiple tasks that each require a full CPU. This limitation is less noticable now than few years back as GPU-based servers have become many-fold powerful in the last ten years.
Concurrency
Standard implementation of Python isn’t designed to use multiple cores. Python does include a native way to run a Python workload across multiple CPUs. The multiprocessing module spins up multiple copies of the Python interpreter, each on a separate core, and provides primitives for splitting tasks across cores. But sometimes even multiprocessing isn’t enough. Sometimes the job calls for distributing work not only across multiple cores, but also across multiple machines.
This limitation is less noticable now than few years back as several libraries and frameworks were developed by open-source community to overcome this limitation. These Python toolkits can take an existing Python application and spread the work across multiple cores, multiple machines, or both. Some of the most popular frameworks are: Ray, Dask, Dispy, Pandaral·lel, Ipyparallel and Joblib
Ray
Developed by a team of researchers at the University of California, Berkeley, Ray underpins a number of distributed machine learning libraries. But Ray isn’t limited to machine learning tasks alone, even if that was its original use case. Any Python tasks can be broken up and distributed across systems with Ray.
Ray’s syntax is minimal, so you don’t need to rework existing apps extensively to parallelize them. The @ray.remote decorator distributes that function across any available nodes in a Ray cluster, with optionally specified parameters for how many CPUs or GPUs to use. The results of each distributed function are returned as Python objects, so they’re easy to manage and store, and the amount of copying across or within nodes is kept to a minimum. This last feature comes in handy when dealing with NumPy arrays, for instance.
Ray even includes its own built-in cluster manager, which can automatically spin up nodes as needed on local hardware or popular cloud computing platforms.
Dask
From the outside, Dask looks a lot like Ray. It, too, is a library for distributed parallel computing in Python, with its own task scheduling system, awareness of Python data frameworks like NumPy, and the ability to scale from one machine to many.
Dask works in two basic ways. The first is by way of parallelized data structures — essentially, Dask’s own versions of NumPy arrays, lists, or Pandas DataFrames. Swap in the Dask versions of those constructions for their defaults, and Dask will automatically spread their execution across your cluster. This typically involves little more than changing the name of an import, but may sometimes require rewriting to work completely.
The second way is through Dask’s low-level parallelization mechanisms, including function decorators, that parcel out jobs across nodes and return results synchronously (“immediate” mode) or asynchronously (“lazy”). Both modes can be mixed as needed, too.
One key difference between Dask and Ray is the scheduling mechanism. Dask uses a centralized scheduler that handles all tasks for a cluster. Ray is decentralized, meaning each machine runs its own scheduler, so any issues with a scheduled task are handled at the level of the individual machine, not the whole cluster.
click here to Learn Python the easiest way |