Django Import - Export: IntegrittyError when trying to insert duplicate record in field(s) with unique or unique_together constraints
更新
我已经提交了一个功能请求。其思想是当
我有以下型号:
1 2 3 4 5 6 7 8 9 10 | class Compositions(models.Model): composer_key = models.ForeignKey( Composer, ) composition = models.CharField( max_length=383, ) class Meta(object): unique_together = (('composer_key', 'composition'), ) |
在管理界面中使用django import export,而不为csv文件中的每个条目提供
1 2 | duplicate key value violates unique constraint"data_compositions_composer_key_id_12f91ce7dbac16bf_uniq" DETAIL: Key (composer_key_id, composition)=(2, Star Wars) already exists. |
csv文件如下:
1 2 3 | id composer_key composition 1 Hot Stuff 2 Star Wars |
这个想法是使用
管理员:
1 2 3 4 5 6 7 8 9 10 11 12 13 | class CompositionsResource(resources.ModelResource): class Meta: model = Compositions skip_unchanged = True report_skipped = True class CompositionsAdmin(ImportExportModelAdmin): resource_class = CompositionsResource admin.site.register(Compositions, CompositionsAdmin) |
然而,这并不能解决这个问题,因为
考虑到在使用
只需要一个改变。你也可以使用django的进出口
模特儿
1 2 3 4 5 6 7 8 9 10 11 12 | class Compositions(models.Model): composer_key = models.ForeignKey( Composer, ) composition = models.CharField( max_length=383, unique=False ) date_created = models.DateTimeField(default=timezone.now) class Meta(object): unique_together = (('composer_key','composition'),) |
用try覆盖save_实例。失败时忽略错误。行政管理部门
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | class CompositionsResource(resources.ModelResource): class Meta: model = Compositions skip_unchanged = True report_skipped = True def save_instance(self, instance, using_transactions=True, dry_run=False): try: super(CompositionsResource, self).save_instance(instance, using_transactions, dry_run) except IntegrityError: pass class CompositionsAdmin(ImportExportModelAdmin): resource_class = CompositionsResource admin.site.register(Compositions, CompositionsAdmin) |
并导入此
1 | from django.db import IntegrityError |
关于已接受答案的一个注释:它将给出所需的结果,但会用大文件猛击磁盘使用量和时间。
我一直使用的一种更有效的方法(在花了大量时间浏览文档之后)是重写
python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | class CompositionsResource(resources.ModelResource): set_unique = set() class Meta: model = Composers skip_unchanged = True report_skipped = True def before_import(self, dataset, using_transactions, dry_run, **kwargs): # Clear out anything that may be there from a dry_run, # such as the admin mixin preview self.set_unique = set() def skip_row(self, instance, original): composer_key = instance.composer_key # Could also use composer_key_id composition = instance.composition tuple_unique = (composer_key, composition) if tuple_unique in self.set_unique: return true else: self.set_unique.add(tuple_unique) return super(CompositionsResource, self).skip_row(instance, original) # save_instance override should still go here to pass on IntegrityError |
这种方法至少可以减少在同一个数据集中遇到的重复项。我用它来处理多个平面文件,每个文件大约有60000行,但是有很多重复/嵌套的外键。这使得初始数据导入速度更快。
模型:
1 2 3 4 5 6 7 8 9 10 11 12 | class Compositions(models.Model): composer_key = models.ForeignKey( Composer, ) composition = models.CharField( max_length=383, unique=False ) date_created = models.DateTimeField(default=timezone.now) class Meta(object): unique_together = (('composer_key','composition'),) |
这是我为上述模型"即时"编写的脚本,以便自动放弃重复条目。我将它保存到
1 2 | $./manage.py shell >>> from project_name import csv |
CV.PY:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | from data.models import Composer, Compositions import csv import sys, traceback from django.utils import timezone filename = '/path/to/duc.csv' with open(filename, newline='') as csvfile: all_lines = csv.reader(csvfile, delimiter=',', quotechar='"') for each_line in all_lines: print (each_line) try: instance = Compositions( id=None, date_created=timezone.now(), composer_key=Composer.objects.get(id=each_line[2]), composition=each_line[3] ) instance.save() print ("Saved composition: {0}".format(each_line[3])) except: // exception type must be inserted here exc_type, exc_value, exc_traceback = sys.exc_info() //debugging mostly print (exc_value) |