python memory_profiler 0.41

$ python -m memory_profiler example.py

Line #    Mem usage  Increment   Line Contents
==============================================
     3                           @profile
     4      5.97 MB    0.00 MB   def my_func():
     5     13.61 MB    7.64 MB       a = [1] * (10 ** 6)
     6    166.20 MB  152.59 MB       b = [2] * (2 * 10 ** 7)
     7     13.61 MB -152.59 MB       del b
     8     13.61 MB    0.00 MB       return a

How to use Fabric in a development environment

Fabric is a Python (2.5 or higher) library and command-line tool for streamlining
the use of SSH for application deployment or systems administration tasks.

.. Instead of running something like pip install MyApp and getting whatever libraries
come along, you can create a requirements file that includes all dependencies.

Example of a requirements.txt file
BeautifulSoup==3.2.0
python-dateutil==1.4.1
django==1.3.0
django-debug-toolbar==0.8.5
django-tagging==0.3
Markdown==2.0.1

 

Reddit: Lessons Learned From Mistakes Made Scaling To 1 Billion Pageviews A Month

Some of the lessons that stood out most for me:

  • Think of SSDs as cheap RAM, not expensive disk. When reddit moved from spinning disks to SSDs for the database the number of servers was reduced from 12 to 1 with a ton of headroom. SSDs are 4x more expensive but you get 16x the performance. Worth the cost.
  • Give users a little bit of power, see what they do with it, and turn the good stuff into features. One of the biggest revelations for me was how much reddit learns from its users and how much it relies on users to make the site run smoothly. Users are going to tell you a lot of things you don’t know. For example, reddit gold started as a joke in the community. They made it a product and users love it.
  • It’s not necessary to build a scalable architecture from the start. You don’t know what your feature set will be when you start out so you want know what your scaling problems will be. Wait until your site grows so you can learn where your scaling problems are going to be.
  • Treat nonlogged in users as second class citizens.  By always giving logged out always cached content Akamai bears the brunt for reddit’s traffic. Huge performance improvement.

..As of 2012 they had 240 servers supporting 2 billion pageviews a month and 2TB of data in Postgres. All high-traffic data was moved off of EBS and onto local ephemeral disks.

.. Did not account for increased latency after moving to EC2. In the datacenter they had submillisecond access between machines so it was possible to make a 1000 calls to memache for one page load. Not so on EC2. Memcache access times increased 10x to a millisecond which made their old approach unusable. Fix was to batch calls to memcache so a large number of gets are in one request.

.. Did not expire data. At reddit you can see comments all the way back to the beginning of time. They’ve started to put limits so you can’t vote on old comments or add comments to old threads. This causes data to grow and grow over time which makes it more and more difficult to keep the hot data in the database.

.. Put a limit on everything. Everything that can happen repeatedly put a high limit on it and raise or lower the limit as needed. Block users if the limit is passed. This protects the service. Example is uploading files of logos for subreddits. Users figured out they could upload really big files and harm the system. Don’t accept huge text blobs either. Someone will figure out how to send you 5GB of text.

.. Put everything into a queue. Votes, comments, thumbnail creation, precomputed queries, spam processing and corrections. Queues allow you to know when there’s a problem by monitoring queue lengths. Side benefit is queues hide problems from users because things like vote requests are in the queue and if they aren’t applied immediately nobody notices.

.. Be an active in your own community. Reddit users love that reddit admins are actually on the site and interacting with them.

.. Let users do the work for you. On a site with user input one problem is always cheating, spam, and fraud. Most of the work of moderation is done by thousands of volunteers who take care of most of the spam problem. It works amazingly well and is one of the reasons the reddit team can remain small.