Wednesday, May 20, 2015

Installing Scrapy + Scrapy Tutorial baby steps

Installing Scrapy was somewhat of a challenge for me. *Shakes fist at the gods*

I couldn't get it to work through pip install Scrapy

It kept saying that Twisted was not installed.

But I did get it to work with sudo easy_install Scrapy

Go figure.

After Scrapy was finally installed, when I tried to start the tutorial project, it gave me the following error message:


UserWarning: You do not have a working installation of the service_identity module: 'No module named service_identity'.  Please install it from <https://pypi.python.org/pypi/service_identity> and make sure all of its dependencies are satisfied.  Without the service_identity module and a recent enough pyOpenSSL to support it, Twisted can perform only rudimentary TLS client hostname verification.  Many valid certificate/hostname mappings may be rejected.

Do not visit the link. Just enter this into your command line:
pip install service_identity and you should be good to go!

Of course, after that, I hit another roadblock right away. I couldn't get my spider to crawl because I was in the wrong directory. Your "project directory" refers to the outer "tutorial" folder (highlighted below), NOT the folder you created to store your project.
tutorial/
    scrapy.cfg
    tutorial/
        __init__.py
        items.py
        pipelines.py
        settings.py
        spiders/
            __init__.py
            ...
More notes from a newbie, just in case this helps someone out there:
-To uninstall something (replace something with the name of whatever it is you want to uninstall), simply pip uninstall something

-If that doesn't work, try inserting sudo in front of pip.

-To see what packages you have installed on your computer already, enter pip freeze and your Terminal will spit out a list of packages along with versions. Very helpful.

-To exit a Scrapy shell, ctrl+D

No comments:

Post a Comment