scalability

Maximum (usable) number of rows in a Postgresql table

The rule is: Try and find the most common working set. See if it fits in RAM. Optimize hardware, PG/OS buffer settings and PG indexes/clustering for it. Otherwise look for aggregates, or if it’s not acceptable and you need fully random access, think what hardware could scan the whole table for you in reasonable time.

.. Are there any columns which are commonly used for filtering, such as state or date? Can you the working set that is most commonly used (like only last month)? If so, consider partitioning or clustering on these columns, and definitely index them. Basically, you’re trying to make sure that as much of the working set as possible fits in RAM.

Avoid scanning the table at all costs if it does not fit in RAM.

How to use both Django & NodeJS as backend for your application

The server part of library runs on top of NodeJS, which provides a high performance event-driven framework to manage the message exchange with client. All you need is a way to connect the socket.io server running on Node.JS with Django app. This can be easily done using Redis. Which is basically a key value store, but it also provides a way to subscribe and publish to keys, so it becomes a message bus with this architect socket.io server will subscribe a user specific keys, on to which Django is going to write a notification. Once the message is received, server will send it to the connected client.

.. Big names like Instagram and Pinterest are using both Django and NodeJS as a backend.

.. Instagram is the great example which run Django on Amazon High-CPU extra- large machine and as usage grows they gone from just a few of these machine to over 25 of them and a NodeJS server is used for sending a push notification. It aims at providing complete asynchronous client library for the API, including the REST search and streaming endpoint.

7 Lessons Learned While Building Reddit To 270 Million Page Views A Month

By far the most surprising feature of their architecture is in Lesson Six, whose essential idea is: The key to speed is to precompute everything and cache It. They turn the precompute knob up to 11. It sounds like nearly everything you see on Reddit has been precomputed and cached, regardless of the number of versions they need to create. For example, they precompute all 15 different sort orders (hot, new, top, old, this week. etc) for listings when someone submits a link. Normally developers would be afraid of going this extreme, being this wasteful. But they thought it’s better to wasteful upfront than slow. Wasting disk and memory is better than keeping users waiting.

.. One way to mitigate this problem is restart process that have died or become cancerous. Reddit uses Supervise to automatically restart applications. Special monitoring programs kill processes that use too much memory, use too much CPU, or aren’t responsive. Instead of worrying just restart and the system is up. Of course you have to read the logs and find a root cause, but until then it keeps you sane.

.. They store more data now in Memcachedb than Postgres. All queries are generated by same piece of control and is cached in memcached. Change password Links and associated state are cached for 20 minutes or so. Same for Captchas.

.. Rate-limit everything using memcache + expiration dates. A good way to protect your system from attacks. Without a rate limiting subsystem a single malicious user could take down the system. Not good. So for users and crawlers they keep a lot of it in memcache. If the user comes again within a second they get bounced. Regular users don’t click that fast so they want notice.

..

The essence of this lesson is: do the minimal amount of work on the backend and tell the user you are done.

About Docker

Containers are pretty similar to virtual machines. The main difference is that Docker allows containers to share the same Linux kernel as the host system that they’re running on. Containers only need to provide the packages and dependencies that are not already available on the host system. This greatly reduces the size of an application and provides a significant boost to performance allowing containers to be booted up in mere milliseconds.

.. It allows developers to work in an environment that perfectly mirrors production, reducing the introduction of bugs to live sites

.. It provides the flexibility to allow containers that run completely different frameworks, programming languages, or operating systems to seamlessly work and communicate with each other on the same host system