Rectangle 27 61

I think the main misconception is the package path vs the settings module path. In order to use django's models from an external script you need to set the DJANGO_SETTINGS_MODULE. Then, this module has to be importable (i.e. if the settings path is myproject.settings, then the statement from myproject import settings should work in a python shell).

As most projects in django are created in a path outside the default PYTHONPATH, you must add the project's path to the PYTHONPATH environment variable.

Here is a step-by-step guide to create a fully working (and minimal) Django models integration into a Scrapy project:

Note: This instructions work at the date of the last edit. If it doesn't work for you, please add a comment and describe your issue and scrapy/django versions.

/home/rolando/projects
  • Start the django project. $ cd ~/projects $ django-admin startproject myweb $ cd myweb $ ./manage.py startapp myapp
  • Create a model in myapp/models.py. from django.db import models class Person(models.Model): name = models.CharField(max_length=32)
myapp
INSTALLED_APPS
# at the end of settings.py
INSTALLED_APPS += ('myapp',)
myweb/settings.py
# at the end of settings.py
DATABASES['default']['ENGINE'] = 'django.db.backends.sqlite3'
DATABASES['default']['NAME'] = '/tmp/myweb.db'
  • Create the database. $ ./manage.py syncdb --noinput Creating tables ... Installing custom SQL ... Installing indexes ... Installed 0 object(s) from 0 fixture(s)
  • Create the scrapy project. $ cd ~/projects $ scrapy startproject mybot $ cd mybot
scrapy_djangoitem
from scrapy.contrib.djangoitem import DjangoItem
    from scrapy.item import Field

    from myapp.models import Person


    class PersonItem(DjangoItem):
        # fields for this item are automatically created from the django model
        django_model = Person

The final directory structure is this:

/home/rolando/projects
 mybot
  mybot
   __init__.py
   items.py
   pipelines.py
   settings.py
   spiders
       __init__.py
  scrapy.cfg
 myweb
     manage.py
     myapp
      __init__.py
      models.py
      tests.py
      views.py
     myweb
         __init__.py
         settings.py
         urls.py
         wsgi.py

From here, basically we are done with the code required to use the django models in a scrapy project. We can test it right away using scrapy shell command but be aware of the required environment variables:

$ cd ~/projects/mybot
$ PYTHONPATH=~/projects/myweb DJANGO_SETTINGS_MODULE=myweb.settings scrapy shell

# ... scrapy banner, debug messages, python banner, etc.

In [1]: from mybot.items import PersonItem

In [2]: i = PersonItem(name='rolando')

In [3]: i.save()
Out[3]: <Person: Person object>

In [4]: PersonItem.django_model.objects.get(name='rolando')
Out[4]: <Person: Person object>

So, it is working as intended.

Finally, you might not want to have to set the environment variables each time you run your bot. There are many alternatives to address this issue, although the best it is that the projects' packages are actually installed in a path set in PYTHONPATH.

# Setting up django's project full path.
import sys
sys.path.insert(0, '/home/rolando/projects/myweb')

# Setting up django's settings module name.
# This module is located at /home/rolando/projects/myweb/myweb/settings.py.
import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'myweb.settings'

# Since Django 1.7, setup() call is required to populate the apps registry.
import django; django.setup()

Note: A better approach to the path hacking is to have setuptools-based setup.py files in both projects and run python setup.py develop which will link your project path into the python's path (I'm assuming you use virtualenv).

  • Create the spider. $ cd ~/projects/mybot $ scrapy genspider -t basic example example.com The spider code: # file: mybot/spiders/example.py from scrapy.spider import BaseSpider from mybot.items import PersonItem class ExampleSpider(BaseSpider): name = "example" allowed_domains = ["example.com"] start_urls = ['http://www.example.com/'] def parse(self, response): # do stuff return PersonItem(name='rolando')

Create a pipeline in mybot/pipelines.py to save the item.

class MybotPipeline(object):
    def process_item(self, item, spider):
        item.save()
        return item

Here you can either use item.save() if you are using the DjangoItem class or import the django model directly and create the object manually. In both ways the main issue is to define the environment variables so you can use the django models.

mybot/settings.py
ITEM_PIPELINES = {
    'mybot.pipelines.MybotPipeline': 1000,
}
$ scrapy crawl example

Incredibly detailed description, thank you. Worked like a charm. I only ran into one problem, I had to change os.environ['DJANGO_SETTINGS_MODULE'] = 'myweb.settings' to os.environ['DJANGO_SETTINGS_MODULE'] = 'settings'. So that the myweb prefix wasn't added, otherwise it would not recognize the module.

Right. The value of DJANGO_SETTINGS_MODULE highly depends how you have set up your python path variable. That might be quite confusing as django and scrapy uses by default the same name for the project directory and the project package. The path added to the sys.path should be the parent of the directory containing the settings.py file. Anyway, I'm glad this solved your issue.

This is an amazingly complete answer. The only problem I had was in scrapy shell step I had to add import django django.setup()

pip install scrapy_djangoitem
mybot/items.py
from scrapy_djangoitem import DjangoItem

@DanSandland Thanks for your comment, I've added a note about that.

python - Access Django models with scrapy: defining path to Django pro...

python django django-models scrapy
Rectangle 27 8

If you stick to what structs are intended for (in C#, Visual Basic 6, Pascal/Delphi, C++ struct type (or classes) when they are not used as pointers), you will find that a structure is not more than a compound variable. This means: you will treat them as a packed set of variables, under a common name (a record variable you reference members from).

I know that would confuse a lot of people deeply used to OOP, but that's not enough reason to say such things are inherently evil, if used correctly. Some structures are inmutable as they intend (this is the case of Python's namedtuple), but it is another paradigm to consider.

Yes: structs involve a lot of memory, but it will not be precisely more memory by doing:

point.x = point.x + 1
point = Point(point.x + 1, point.y)

The memory consumption will be at least the same, or even more in the inmutable case (although that case would be temporary, for the current stack, depending on the language).

But, finally, structures are structures, not objects. In POO, the main property of an object is their identity, which most of the times is not more than its memory address. Struct stands for data structure (not a proper object, and so they don't have identity anyhow), and data can be modified. In other languages, record (instead of struct, as is the case for Pascal) is the word and holds the same purpose: just a data record variable, intended to be read from files, modified, and dumped into files (that is the main use and, in many languages, you can even define data alignment in the record, while that's not necessarily the case for properly called Objects).

Want a good example? Structs are used to read files easily. Python has this library because, since it is object-oriented and has no support for structs, it had to implement it in another way, which is somewhat ugly. Languages implementing structs have that feature... built-in. Try reading a bitmap header with an appropriate struct in languages like Pascal or C. It will be easy (if the struct is properly built and aligned; in Pascal you would not use a record-based access but functions to read arbitrary binary data). So, for files and direct (local) memory access, structs are better than objects. As for today, we're used to JSON and XML, and so we forget the use of binary files (and as a side effect, the use of structs). But yes: they exist, and have a purpose.

They are not evil. Just use them for the right purpose.

If you think in terms of hammers, you will want to treat screws as nails, to find screws are harder to plunge in the wall, and it will be screws' fault, and they will be the evil ones.

c# - Why are mutable structs “evil”? - Stack Overflow

c# struct immutability mutable
Rectangle 27 8

If you stick to what structs are intended for (in C#, Visual Basic 6, Pascal/Delphi, C++ struct type (or classes) when they are not used as pointers), you will find that a structure is not more than a compound variable. This means: you will treat them as a packed set of variables, under a common name (a record variable you reference members from).

I know that would confuse a lot of people deeply used to OOP, but that's not enough reason to say such things are inherently evil, if used correctly. Some structures are inmutable as they intend (this is the case of Python's namedtuple), but it is another paradigm to consider.

Yes: structs involve a lot of memory, but it will not be precisely more memory by doing:

point.x = point.x + 1
point = Point(point.x + 1, point.y)

The memory consumption will be at least the same, or even more in the inmutable case (although that case would be temporary, for the current stack, depending on the language).

But, finally, structures are structures, not objects. In POO, the main property of an object is their identity, which most of the times is not more than its memory address. Struct stands for data structure (not a proper object, and so they don't have identity anyhow), and data can be modified. In other languages, record (instead of struct, as is the case for Pascal) is the word and holds the same purpose: just a data record variable, intended to be read from files, modified, and dumped into files (that is the main use and, in many languages, you can even define data alignment in the record, while that's not necessarily the case for properly called Objects).

Want a good example? Structs are used to read files easily. Python has this library because, since it is object-oriented and has no support for structs, it had to implement it in another way, which is somewhat ugly. Languages implementing structs have that feature... built-in. Try reading a bitmap header with an appropriate struct in languages like Pascal or C. It will be easy (if the struct is properly built and aligned; in Pascal you would not use a record-based access but functions to read arbitrary binary data). So, for files and direct (local) memory access, structs are better than objects. As for today, we're used to JSON and XML, and so we forget the use of binary files (and as a side effect, the use of structs). But yes: they exist, and have a purpose.

They are not evil. Just use them for the right purpose.

If you think in terms of hammers, you will want to treat screws as nails, to find screws are harder to plunge in the wall, and it will be screws' fault, and they will be the evil ones.

c# - Why are mutable structs “evil”? - Stack Overflow

c# struct immutability mutable
Rectangle 27 0

If you are just learning object oriented programming language then I will suggest you to start with JAVA. Because if you don't understand the ideas behind the object oriented programming nicely, you will certainly legging behind. but if you have good experience on the ideologies (i.e. structured programming language or object oriented) then, its not a matter whether you should go with JAVA or Python. The basic concept is the main thing you need to learn.

Which to choose Python or java - Stack Overflow

java python
Rectangle 27 0

Python - Basic Main Structure

Extremely simple snippet showing the basic structure of a Python program.

Python