.. The Industrial Revolution changed all that. All the sudden, simply spending hours on a task does not matter as much as how much smarts you bring to that task. Because if you can invent a machine that can work as fast as five humans, hence this machine is doing the task more efficiently, then it doesn’t matter if you have more hours.
.. So all this said, what is the *true secret* to productivity?
A huge part of it is exposing yourself to the knowledge of alternative ways of thinking that are possible.
.. The literal secret to productivity is someone saying, “Oh, I never thought about it that way.”Tweet this
So 22 seconds versus 12 minutes is a very huge difference in performance and it should be noted that the outcomes of the 2 programs are exactly the same.
hi guys, I’ve been working on a project for large scaling high profile scraping, i got around 2-3k(should be in the future around the 100k) urls under the same host.
i took the amount of urls, split it by number of process, each part of urls went to new process with gevent pool. the results are good but i want better.
I’m using multiprocessing, requests.Session(), and gevent pool.
code structure: http://pastebin.com/Xu7Xy41i
the parser is lxml, which i found the fastest. requests.Session() support requests for same host multiprocessing + gevent.pool for multiprocessing async work
- i believe the ssl handshake slow things up, maybe there is a good solution for fast handshake, or avoid multiple handshakes.
- I’m up for any solution to get better performance.
- i thought about maybe keep amount of sockets open and get a queue of urls for each socket.
Explicit and implicit concurrency in Python.
.. With asyncio, an event loop runs and is in charge of switching between various coroutines. The main difference between asyncio and eventlet is that switching is performed explicitly. I have to use yield from in my coroutine if I wish to indicate that I’m ready to be switched out.