python

What does the yield keyword do in Python?

To understand what yield does, you must understand what generators are. And before generators come iterables.

Everything you can use “for... in...” on is an iterable; lists, strings, files…

These iterables are handy because you can read them as much as you wish, but you store all the values in memory and this is not always what you want when you have a lot of values.

Generators are iterators, but you can only iterate over them once. It’s because they do not store all the values in memory, they generate the values on the fly:

.. Yield is a keyword that is used like return, except the function will return a generator.

.. And it works because Python does not care if the argument of a method is a list or not. Python expects iterables so it will work with strings, lists, tuples and generators! This is called duck typing and is one of the reason why Python is so cool.

.. In a nutshell: a generator is a lazy, incrementally-pending list, and yield statements allow you to use function notation to program the list values the generator should incrementally spit out.

python memory_profiler 0.41

$ python -m memory_profiler example.py

Line #    Mem usage  Increment   Line Contents
==============================================
     3                           @profile
     4      5.97 MB    0.00 MB   def my_func():
     5     13.61 MB    7.64 MB       a = [1] * (10 ** 6)
     6    166.20 MB  152.59 MB       b = [2] * (2 * 10 ** 7)
     7     13.61 MB -152.59 MB       del b
     8     13.61 MB    0.00 MB       return a

python simple-requests

The goal of this library is to allow you to get the performance benefit of asynchronous requests, without needing to use any asynchronous coding paradigms. It is built on gevent and requests.

Usage

from simple_requests import Requests

# Creates a session and thread pool
requests = Requests()

# Sends one simple request; the response is returned synchronously.
login_response = requests.one('http://cat-videos.net/login?user=fanatic&password=c4tl0v3r')

# Cookies are maintained in this instance of Requests, so subsequent requests
# will still be logged-in.
profile_urls = [
    'http://cat-videos.net/profile/mookie',
    'http://cat-videos.net/profile/kenneth',
    'http://cat-videos.net/profile/itchy' ]

# Asynchronously send all the requests for profile pages
for profile_response in requests.swarm(profile_urls):

    # Asynchronously send requests for each link found on the profile pages
    # These requests take precedence over those in the outer loop to minimize overall waiting
    # Order doesn't matter this time either, so turn that off for a slight performance gain
    for friends_response in requests.swarm(profile_response.links, maintainOrder = False):

        # Do something intelligent with the responses, like using
        # regex to parse the HTML (see http://stackoverflow.com/a/1732454
        friends_response.html

Celery: Distributed Task Queue

Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.

The execution units, called tasks, are executed concurrently on a single or more worker servers using multiprocessing, Eventlet, or gevent. Tasks can execute asynchronously (in the background) or synchronously (wait until ready).