From 79ef29ede226c31896c9ef19a6c6be868c2ae533 Mon Sep 17 00:00:00 2001 From: Roberta Takenaka <505143+robertatakenaka@users.noreply.github.com> Date: Sun, 31 Mar 2024 15:23:59 -0300 Subject: [PATCH] =?UTF-8?q?Modifica=20a=20entrada=20do=20pacote=20pelo=20u?= =?UTF-8?q?pload=20e=20adiciona=20novas=20valida=C3=A7=C3=B5es=20do=20XML?= =?UTF-8?q?=20(#426)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Corrige Institution.__str__, adiciona atributos de autocomplete e altera InstitutionHistory.panels de FieldPanel para Autocomplete (#401) * Faz correções na app journal: adiciona Journal.title, wagtail_hooks.JournalCreateView, etc (#402) * Adiciona Journal.title * Modifica os atributos de journal.models.Owner e Publisher * Cria journal.wagtail.JournalCreateView para adicionar o usuário como creator * Adiciona migrações de banco de dados relacionados a journal * Adiciona filtros de journal_acron e publication_year para migrar dados de artigos (#403) * Adiciona filtros de journal_acron e publication_year para migrar dados de artigos, criando uma amostragem de migração * Adiciona os parâmetros journal_acron e publication_year * Garante que no XML migrado (seja nativo ou gerado a partir do HTML) tenha o PID v2 e o order (article-id other) (#405) * Corrige ou adiciona ao XML o elemento pid-v2 usando como valor o pid do artigo do site clássico * Atualiza packtools versão 3.4.0 para ter XMLWithPre.order Corrige ou adiciona ao XML o elemento article-id (other/order) usando como valor os últimos 5 dígitos do pid do artigo do site clássico * Atualiza a versão da biblioteca scielo_classic_website para 1.6.4 para corrigir a obtenção de registros de artigos em serial xml * Evita guardar versões anteriores dos arquivos * Cria o procedimento de corrigir o valor do Pid v2 (#410) * Cria PidProviderXML.fix_pid_v2 * Cria FixPidV2 para controlar o que foi corrigido no upload e no core * Cria FixPidV2ModelAdmin * Adiciona PidProviderAPIClient.fix_pid_v2, fix_pid_v2_url. Refatora PidProviderAPIClient.enabled * Cria APIPidProviderFixPidV2Error * Cria provider.requester.PidRequester.fix_pid_v2 * Cria SPSPkg.fix_pid_v2 * Cria ArticleProc.fix_pid_v2 e adiciona a chamada no procedimento de generate_sps_package * Cria tarefas para corriger o valor de pid v2 em PidProviderXML a partir de ArticleProc.pid * Cria provider.provider.PidProvider com os métodos fix_pid_v2, get_sps_pkg_name, get_xmltree * Adiciona a migração correspondente ao modelo FixPidV2 * Corrige ausencia de pid v3 no xml submetido do upload para o core (#411) * Atualiza a versão de packtools 4.1.1 para usar XMLWithPre.data e .files * Modifica PidProviderXML.is_registered para atualizar os pids de xml_with_pre com os valores registrados, além disso, era necessário retornar se está registrado e igual ou registrado e diferente ou não registrado * Distingue status de demanda de registro e status do registro * Modifica PidProviderAPIClient._process_post_xml_response para atualizar ou não os valores dos pids de xml_with_pre com os valores fornecidos pelo Core * Adiciona registered_in_core como filtro de PidProviderXMLModelAdmin * Atualiza dependencias base.txt e production.txt (#409) * Comenta app captcha * Atualiza dependencias --------- Co-authored-by: Roberta Takenaka <505143+robertatakenaka@users.noreply.github.com> * Modifica comportamento de Pid provider, que passa a aceitar mudanças de pids (#415) * Cria PidProviderXML.complete_pids, que completa pids com registrados ou inéditos * Cria PidProviderXML._check_pids, que valida pid do XML é inédito e/ou registrado e/ou pertencente a outro documento * Cria PidProviderXML.get_pids, que retorna todos os pids vigentes e outros * Corrige PidProviderXML._is_registered_pid, adicionando a verificação em OtherPid * Corrige PidProviderXML._get_unique_v3, que usa _is_registered_pid e agora não precisa verificar OtherPid * Ajusta PidProviderXML._add_other_pid * Remove PidProviderXML._complete_pids excedente * Corrige PidProvider._add_pid_v3 e _add_pid_v2 * Corrige PidProviderXML.is_registered * Ajusta PidProviderXML._save, removendo _add_other_pid e removendo change_pids * Modifica PidProviderXML.register * Melhora XMLVersion.__str__, mostrando nome do arquivo + data no lugar de pid v3 * Melhora _process_post_xml_response * Para PidProvider.provide_pid_for_xml_with_pre, adiciona parâmetro caller, completa XML com pids registrados se ausentes no XML, adiciona xml_changed ao retorno * Adiciona comando para completar XML com pids registrados antes de solicitar pid para Core * Cria meio de configurar / habilitar / desabilitar fix_pid_v2 do Core (#416) * Cria a classe PidProviderEndpoint, inline de PidProviderConfig * Modifica o modo de obter fix_pid_v2_url * Adiciona modelo PidProviderEndpoint * Adiciona 'fixed_in_core': False ao retorno de fix_pid_v2 (#417) * Evita que SPSPkg armazene arquivos em excesso (#418) * Verifica se xml registrado e xml recebido são iguais, somente após completar XML com os pids registrados (#419) * Compara se xml_with_pre é igual ao registrado somente após adicionar os pids registrados se aplicável * Adiciona a funcionalidade de forçar o registro no Core mesmo que o registro está indicando que já está sincronizado * Melhora ordem dos itens do menu (#408) * Refatora a funcionalidade da ordem do menu * Reordena menu itens padrao do wagtail e remove algum deless * Insere funcao get_menu_order em menu_order * Altera a ordem dos app * Move as operações anteriores de ArticleProc, IssueProc, JournalProc para um arquivo (#420) * Cria o modelo ArticleProcReport e ArticleProcReportModelAdmin * Cria o modelo ProcReport para armazenar processamentos anteriores, mantendo apenas o vigente nos respectivos ArticleProc, IssueProc, JournalProc * Adiciona as migrações de banco de dados * Melhora o registro das operações das tarefas relacionadas à migração e publicação (#422) * Melhora os rótulos, deixa todos os campos não editáveis, apresenta os eventos do mais recente para o mais antigo * Adiciona Article.data, Issue.data, Journal.data * Adiciona retorno às função que criam instâncias de Article, Issue e Journal * Adiciona Article.data, Issue.data, Journal.data nos detalhes das operações de entrada de dados * Aplica black * Adiciona * Adiciona mais detalhes ao registro da tarefa de gerar o XML a partir do HTML * Adiciona mais detalhes ao registro da tarefa de gerar o pacote SPS * Corrige o valor de 'completed' dos resultados das operações de solicitação de pid v3 * Adiciona o parâmetro compression em ZipFile * Modifica o sps_pkg_status para PENDING se o pacote não tem todos os texts * Modifica o sps_pkg_status para DONE se o pacote não tem todos os texts * Modifica o sps_pkg_status para PENDING se o pacote não tem todos os texts * Corrige ausência de importação de ZIP_DEFLATED * Adiciona o atributo order para a listagem dos itens na área administrativa * Adiciona as migrações de banco de dados * Adiciona detalhes do processamento da adição de arquivos no minio * Refatora upload parte 3 - agrupa em uma tarefa as validações: assets, renditions, conteúdo do XML (#398) * Cria a tarefa upload.tasks.task_validate_original_zip_file * Cria upload.tasks.task_validate_xml_content * Cria upload.xml_validation * Anota TODO para inserir parâmetros para as validações * Atualiza packtools para a versão 3.3.4 que contempla mais validações * Remove package.tasks * Adiciona importações faltantes * Refatora upload parte 3 - agrupa em uma tarefa as validações: assets, renditions, conteúdo do XML (#399) * Cria a tarefa upload.tasks.task_validate_original_zip_file * Cria upload.tasks.task_validate_xml_content * Cria upload.xml_validation * Anota TODO para inserir parâmetros para as validações * Atualiza packtools para a versão 3.3.4 que contempla mais validações * Remove package.tasks * Adiciona importações faltantes * Refatora upload parte 2 - Adiciona funções em upload.controller para avaliar o pacote recém recebido (#400) * Cria os upload.choices.VE_UNEXPECTED_ERROR e VE_FORBIDDEN_UPDATE_ERROR * Cria/Edita Package.get, create_or_update, _add_validation_result * Cria funções para avaliar o XML recém-recebido (é esperado? os dados de journal e issue estão corretos?) * Cria testes para upload.controller.* * Adiciona a migração de banco de dados por criar novos valores de choices * Corrige ausência de definição de variáveis * Refatora upload parte 3 - agrupa em uma tarefa as validações: assets, renditions, conteúdo do XML (#399) * Cria a tarefa upload.tasks.task_validate_original_zip_file * Cria upload.tasks.task_validate_xml_content * Cria upload.xml_validation * Anota TODO para inserir parâmetros para as validações * Atualiza packtools para a versão 3.3.4 que contempla mais validações * Remove package.tasks * Adiciona importações faltantes * Aplica black * Cria função para associar os tipos de erros com os relatórios e faz ajustes nos tipos de erros * Associa por inferência o tipo de impacto de cada tipo de erro * Refatora Package.check_opinions e check_resolutions; Remove article e issue do formulário * Corrige defeitos das validações iniciais à recepção do pacote e ajusta a validação do conteúdo do XML * Remove a verificação de article e issue no formulário * Troca a tarefa que executará as validações --------- Co-authored-by: Samuel Veiga Rangel <82840278+samuelveigarangel@users.noreply.github.com> --- upload/choices.py | 61 ++++++---------- upload/controller.py | 154 +++++++++++++++++++++++++++++++++++---- upload/forms.py | 8 +- upload/models.py | 60 ++++++++------- upload/tasks.py | 79 ++++++++++---------- upload/tests.py | 69 +++++++++++++----- upload/wagtail_hooks.py | 97 ++++++++++-------------- upload/xml_validation.py | 55 ++++++++------ 8 files changed, 358 insertions(+), 225 deletions(-) diff --git a/upload/choices.py b/upload/choices.py index 0a6ce5b4..edc669e3 100644 --- a/upload/choices.py +++ b/upload/choices.py @@ -47,15 +47,17 @@ VE_PACKAGE_FILE_ERROR = "package-file-error" VE_UNEXPECTED_ERROR = "unexpected-error" VE_FORBIDDEN_UPDATE_ERROR = "forbidden-update-error" -VE_ARTICLE_JOURNAL_INCOMPATIBILITY_ERROR = "article-journal-incompatibility-error" +VE_ARTICLE_JOURNAL_INCOMPATIBILITY_ERROR = "journal-incompatibility-error" VE_ARTICLE_IS_NOT_NEW_ERROR = "article-is-not-new-error" VE_XML_FORMAT_ERROR = "xml-format-error" +VE_XML_CONTENT_ERROR = "xml-content-error" VE_BIBLIOMETRICS_DATA_ERROR = "bibliometrics-data-error" VE_SERVICES_DATA_ERROR = "services-data-error" VE_DATA_CONSISTENCY_ERROR = "data-consistency-error" VE_CRITERIA_ISSUES_ERROR = "criteria-issues-error" VE_ASSET_ERROR = "asset-error" VE_RENDITION_ERROR = "rendition-error" +VE_GROUP_DATA_ERROR = "group-error" VALIDATION_ERROR_CATEGORY = ( (VE_UNEXPECTED_ERROR, "UNEXPECTED_ERROR"), @@ -63,6 +65,8 @@ (VE_ARTICLE_JOURNAL_INCOMPATIBILITY_ERROR, "ARTICLE_JOURNAL_INCOMPATIBILITY_ERROR"), (VE_ARTICLE_IS_NOT_NEW_ERROR, "ARTICLE_IS_NOT_NEW_ERROR"), (VE_XML_FORMAT_ERROR, "XML_FORMAT_ERROR"), + (VE_XML_CONTENT_ERROR, "VE_XML_CONTENT_ERROR"), + (VE_GROUP_DATA_ERROR, "VE_GROUP_DATA_ERROR"), (VE_BIBLIOMETRICS_DATA_ERROR, "BIBLIOMETRICS_DATA_ERROR"), (VE_SERVICES_DATA_ERROR, "SERVICES_DATA_ERROR"), (VE_DATA_CONSISTENCY_ERROR, "DATA_CONSISTENCY_ERROR"), @@ -75,56 +79,39 @@ VR_XML_OR_DTD = "xml_or_dtd" VR_ASSET_AND_RENDITION = "asset_and_rendition" VR_INDIVIDUAL_CONTENT = "individual_content" -VR_GROUPED_CONTENT = "grouped_content" +VR_GROUP_CONTENT = "group_content" VR_STYLESHEET = "stylesheet" VR_PACKAGE_FILE = "package_file" -VALIDATION_REPORT_ITEMS = { - VR_XML_OR_DTD: set( - [ - VE_XML_FORMAT_ERROR, - ] - ), - VR_ASSET_AND_RENDITION: set( - [ - VE_ASSET_ERROR, - VE_RENDITION_ERROR, - ] - ), - VR_INDIVIDUAL_CONTENT: set( - [ - VE_ARTICLE_IS_NOT_NEW_ERROR, - VE_ARTICLE_JOURNAL_INCOMPATIBILITY_ERROR, - VE_BIBLIOMETRICS_DATA_ERROR, - VE_DATA_CONSISTENCY_ERROR, - ] - ), - VR_GROUPED_CONTENT: set( - [ - VE_CRITERIA_ISSUES_ERROR, - VE_SERVICES_DATA_ERROR, - ] - ), - VR_PACKAGE_FILE: set( - [ - VE_PACKAGE_FILE_ERROR, - ] - ), -} - VALIDATION_DICT_ERROR_CATEGORY_TO_REPORT = { VE_XML_FORMAT_ERROR: VR_XML_OR_DTD, VE_ASSET_ERROR: VR_ASSET_AND_RENDITION, VE_RENDITION_ERROR: VR_ASSET_AND_RENDITION, VE_ARTICLE_IS_NOT_NEW_ERROR: VR_INDIVIDUAL_CONTENT, VE_ARTICLE_JOURNAL_INCOMPATIBILITY_ERROR: VR_INDIVIDUAL_CONTENT, + VE_XML_CONTENT_ERROR: VR_INDIVIDUAL_CONTENT, VE_BIBLIOMETRICS_DATA_ERROR: VR_INDIVIDUAL_CONTENT, VE_DATA_CONSISTENCY_ERROR: VR_INDIVIDUAL_CONTENT, - VE_CRITERIA_ISSUES_ERROR: VR_GROUPED_CONTENT, - VE_SERVICES_DATA_ERROR: VR_GROUPED_CONTENT, + VE_CRITERIA_ISSUES_ERROR: VR_INDIVIDUAL_CONTENT, + VE_SERVICES_DATA_ERROR: VR_INDIVIDUAL_CONTENT, + VE_GROUP_DATA_ERROR: VR_GROUP_CONTENT, VE_PACKAGE_FILE_ERROR: VR_PACKAGE_FILE, + VE_UNEXPECTED_ERROR: VR_PACKAGE_FILE, + VE_FORBIDDEN_UPDATE_ERROR: VR_PACKAGE_FILE, + } + +def _get_categories(): + d = {} + for k, v in VALIDATION_DICT_ERROR_CATEGORY_TO_REPORT.items(): + d.setdefault(v, []) + d[v].append(k) + return d + + +VALIDATION_REPORT_ITEMS = _get_categories() + # Model ValidationResult, Field status VS_CREATED = "created" VS_DISAPPROVED = "disapproved" diff --git a/upload/controller.py b/upload/controller.py index a6455da4..013e9d36 100644 --- a/upload/controller.py +++ b/upload/controller.py @@ -2,6 +2,7 @@ import sys from datetime import datetime +from django.utils.translation import gettext as _ from packtools.sps.models.journal_meta import Title, ISSN from packtools.sps.pid_provider.xml_sps_lib import XMLWithPre, GetXMLItemsError from packtools.sps.models.front_articlemeta_issue import ArticleMetaIssue @@ -21,11 +22,19 @@ choices, ) from .utils import file_utils, package_utils, xml_utils + +from upload import xml_validation from pid_provider.requester import PidRequester from article.models import Article from issue.models import Issue from journal.models import OfficialJournal, Journal from tracker.models import UnexpectedEvent +from upload.xml_validation import ( + validate_xml_content, + add_app_data, + add_sps_data, + add_journal_data, +) pp = PidRequester() @@ -122,7 +131,7 @@ def receive_package(package): data={}, ) # falhou, retorna response - return package + return response # sucesso, retorna package package._add_validation_result( error_category=choices.VE_XML_FORMAT_ERROR, @@ -132,11 +141,10 @@ def receive_package(package): "xml_path": package.file.path, }, ) - return package + return response except GetXMLItemsError as exc: # identifica os erros do arquivo Zip / XML - _identify_file_error(package) - return package + return _identify_file_error(package) def _identify_file_error(package): @@ -145,13 +153,18 @@ def _identify_file_error(package): xml_path = None xml_str = file_utils.get_xml_content_from_zip(package.file.path, xml_path) xml_utils.get_etree_from_xml_content(xml_str) - except (file_utils.BadPackageFileError, file_utils.PackageWithoutXMLFileError) as exc: + return {} + except ( + file_utils.BadPackageFileError, + file_utils.PackageWithoutXMLFileError, + ) as exc: package._add_validation_result( error_category=choices.VE_PACKAGE_FILE_ERROR, message=exc.message, status=choices.VS_DISAPPROVED, data={"exception": str(exc), "exception_type": str(type(exc))}, ) + return {"error": str(exc), "error_type": choices.VE_PACKAGE_FILE_ERROR} except xml_utils.XMLFormatError as e: data = { @@ -166,6 +179,7 @@ def _identify_file_error(package): data=data, status=choices.VS_DISAPPROVED, ) + return {"error": str(e), "error_type": choices.VE_XML_FORMAT_ERROR} def _check_article_and_journal(xml_with_pre): @@ -198,7 +212,9 @@ def _check_article_and_journal(xml_with_pre): if article: # verifica a consistência dos dados de journal e issue # no XML e na base de dados - _compare_journal_and_issue_from_xml_to_journal_and_issue_from_article(article, response) + _compare_journal_and_issue_from_xml_to_journal_and_issue_from_article( + article, response + ) if response.get("error"): # inconsistências encontradas return _handle_error(response, article, article_previous_status) @@ -241,7 +257,9 @@ def _get_article_previous_status(article, response): response["package_category"] = choices.PC_ERRATUM return article_previos_status else: - response["error"] = f"Unexpected package. Article has no need to be updated / corrected. Article status: {article_previos_status}" + response[ + "error" + ] = f"Unexpected package. Article has no need to be updated / corrected. Article status: {article_previos_status}" response["error_type"] = choices.VE_FORBIDDEN_UPDATE_ERROR response["package_category"] = choices.PC_UPDATE @@ -284,12 +302,12 @@ def _get_journal(journal_title, issn_electronic, issn_print): if not j and journal_title: try: - j = OfficialJournal.objects.get(journal_title=journal_title) + j = OfficialJournal.objects.get(title=journal_title) except OfficialJournal.DoesNotExist: pass if j: - return Journal.objects.get(official=j) + return Journal.objects.get(official_journal=j) raise Journal.DoesNotExist(f"{journal_title} {issn_electronic} {issn_print}") @@ -301,11 +319,11 @@ def _check_journal(origin, xmltree): xml = ISSN(xmltree) issn_electronic = xml.epub issn_print = xml.ppub - return dict(journal=_get_journal(journal_title, issn_electronic, issn_print)) - except Journal.DoesNotExist: + except Journal.DoesNotExist as exc: + logging.exception(exc) return dict( - error=f"Journal in XML is not registered in Upload: {journal_title} {issn_electronic} (electronic) {issn_print} (print)", + error=f"Journal in XML is not registered in Upload: {journal_title} (electronic: {issn_electronic}, print: {issn_print})", error_type=choices.VE_ARTICLE_JOURNAL_INCOMPATIBILITY_ERROR, ) except Exception as e: @@ -347,7 +365,9 @@ def _check_issue(origin, xmltree, journal): return {"error": str(e), "error_type": choices.VE_UNEXPECTED_ERROR} -def _compare_journal_and_issue_from_xml_to_journal_and_issue_from_article(article, response): +def _compare_journal_and_issue_from_xml_to_journal_and_issue_from_article( + article, response +): issue = response["issue"] journal = response["journal"] if article.issue is issue and article.journal is journal: @@ -366,3 +386,111 @@ def _compare_journal_and_issue_from_xml_to_journal_and_issue_from_article(articl error_type=choices.VE_DATA_CONSISTENCY_ERROR, ) ) + + +def validate_xml_content(package, journal, issue): + # VE_BIBLIOMETRICS_DATA_ERROR = "bibliometrics-data-error" + # VE_SERVICES_DATA_ERROR = "services-data-error" + # VE_DATA_CONSISTENCY_ERROR = "data-consistency-error" + # VE_CRITERIA_ISSUES_ERROR = "criteria-issues-error" + + # TODO completar data + data = {} + # add_app_data(data, app_data) + # add_journal_data(data, journal, issue) + # add_sps_data(data, sps_data) + + try: + for xml_with_pre in XMLWithPre.create(path=package.file.path): + _validate_xml_content(package, xml_with_pre, data) + except Exception as e: + exc_type, exc_value, exc_traceback = sys.exc_info() + UnexpectedEvent.create( + exception=e, + exc_traceback=exc_traceback, + detail={ + "operation": "upload.controller.validate_xml_content", + "detail": dict(file_path=package.file.path), + }, + ) + + +def _validate_xml_content(package, xml_with_pre, data): + # TODO completar data + data = {} + # xml_validation.add_app_data(data, app_data) + # xml_validation.add_journal_data(data, journal, issue) + # xml_validation.add_sps_data(data, sps_data) + + try: + results = xml_validation.validate_xml_content( + xml_with_pre.sps_pkg_name, xml_with_pre.xmltree, data + ) + for result in results: + _handle_xml_content_validation_result(package, xml_with_pre.sps_pkg_name, result) + try: + error = ValidationResult.objects.filter( + package=package, + status=choices.VS_DISAPPROVED, + category__in=choices.VALIDATION_REPORT_ITEMS[choices.VR_INDIVIDUAL_CONTENT], + )[0] + package.status = choices.PS_VALIDATED_WITH_ERRORS + except IndexError: + # nenhum erro + package.status = choices.PS_VALIDATED_WITHOUT_ERRORS + package.save() + except Exception as e: + exc_type, exc_value, exc_traceback = sys.exc_info() + UnexpectedEvent.create( + exception=e, + exc_traceback=exc_traceback, + detail={ + "operation": "upload.controller._validate_xml_content", + "detail": { + "file": package.file.path, + "item": xml_with_pre.sps_pkg_name, + "exception": str(e), + "exception_type": str(type(e)), + }, + }, + ) + + +def _handle_xml_content_validation_result(package, sps_pkg_name, result): + # ['xpath', 'advice', 'title', 'expected_value', 'got_value', 'message', 'validation_type', 'response'] + + try: + if result["response"] == "OK": + status = choices.VS_APPROVED + else: + status = choices.VS_DISAPPROVED + + # VE_BIBLIOMETRICS_DATA_ERROR, VE_SERVICES_DATA_ERROR, + # VE_DATA_CONSISTENCY_ERROR, VE_CRITERIA_ISSUES_ERROR, + error_category = result.get("error_category") or choices.VE_XML_CONTENT_ERROR + + message = result["message"] + advice = result["advice"] or "" + message = ". ".join([_(message), _(advice)]) + package._add_validation_result( + error_category=error_category, + status=status, + message=message, + data=result, + ) + except Exception as e: + exc_type, exc_value, exc_traceback = sys.exc_info() + UnexpectedEvent.create( + exception=e, + exc_traceback=exc_traceback, + detail={ + "operation": "upload.controller._handle_xml_content_validation_result", + "detail": { + "file": package.file.path, + "item": sps_pkg_name, + "result": result, + "exception": str(e), + "exception_type": str(type(e)), + }, + }, + ) diff --git a/upload/forms.py b/upload/forms.py index 5cdd8f64..cf208e5c 100644 --- a/upload/forms.py +++ b/upload/forms.py @@ -3,18 +3,12 @@ class UploadPackageForm(WagtailAdminModelForm): - def save_all(self, user, article, issue): + def save_all(self, user): upload_package = super().save(commit=False) if self.instance.pk is None: upload_package.creator = user - if article is not None: - upload_package.article = article - - if issue is not None: - upload_package.issue = issue - self.save() return upload_package diff --git a/upload/models.py b/upload/models.py index 5bdbc5c3..f1365400 100644 --- a/upload/models.py +++ b/upload/models.py @@ -41,7 +41,10 @@ class Package(CommonControlField): default=choices.PS_ENQUEUED_FOR_VALIDATION, ) article = models.ForeignKey( - Article, blank=True, null=True, on_delete=models.SET_NULL, + Article, + blank=True, + null=True, + on_delete=models.SET_NULL, ) issue = models.ForeignKey(Issue, blank=True, null=True, on_delete=models.SET_NULL) assignee = models.ForeignKey(User, blank=True, null=True, on_delete=models.SET_NULL) @@ -54,9 +57,6 @@ def autocomplete_label(self): panels = [ FieldPanel("file"), - FieldPanel("category"), - AutocompletePanel("article"), - AutocompletePanel("issue"), ] def __str__(self): @@ -93,16 +93,13 @@ def add_validation_result( cls, package_id, error_category=None, status=None, message=None, data=None ): package = cls.objects.get(pk=package_id) - val_res = package._add_validation_result( - error_category, status, message, data) + val_res = package._add_validation_result(error_category, status, message, data) return val_res def _add_validation_result( self, error_category=None, status=None, message=None, data=None ): - val_res = ValidationResult.create( - error_category, self, status, message, data - ) + val_res = ValidationResult.create(error_category, self, status, message, data) self.update_status(val_res) return val_res @@ -145,26 +142,27 @@ def create_or_update(cls, user_id, file, article=None, category=None, status=Non user_id, file, article_id=article.id, category=category, status=status ) - def check_errors(self): - for vr in self.validationresult_set.filter(status=choices.VS_DISAPPROVED): - if vr.resolution.action in (choices.ER_ACTION_TO_FIX, ""): - self.status = choices.PS_PENDING_CORRECTION - self.save() - return self.status - - self.status = choices.PS_READY_TO_BE_FINISHED + def check_resolutions(self): + try: + item = self.validationresult_set.filter( + status=choices.VS_DISAPPROVED, + resolution__action__in=[choices.ER_ACTION_TO_FIX, ""], + )[0] + self.status = choices.PS_PENDING_CORRECTION + except IndexError: + self.status = choices.PS_READY_TO_BE_FINISHED self.save() return self.status def check_opinions(self): - for vr in self.validationresult_set.filter(status=choices.VS_DISAPPROVED): - opinion = vr.analysis.opinion - if opinion in (choices.ER_OPINION_FIX_DEMANDED, ""): - self.status = choices.PS_PENDING_CORRECTION - self.save() - return self.status - - self.status = choices.PS_ACCEPTED + try: + item = self.validationresult_set.filter( + status=choices.VS_DISAPPROVED, + analysis__opinion__in=[choices.ER_OPINION_FIX_DEMANDED, ""], + )[0] + self.status = choices.PS_PENDING_CORRECTION + except IndexError: + self.status = choices.PS_ACCEPTED self.save() return self.status @@ -186,7 +184,7 @@ class ValidationResult(models.Model): id = models.AutoField(primary_key=True) category = models.CharField( _("Error category"), - max_length=64, + max_length=32, choices=choices.VALIDATION_ERROR_CATEGORY, null=False, blank=False, @@ -247,9 +245,7 @@ class Meta: base_form_class = ValidationResultForm @classmethod - def create( - cls, error_category, package, status=None, message=None, data=None - ): + def create(cls, error_category, package, status=None, message=None, data=None): val_res = ValidationResult() val_res.category = error_category val_res.package = package @@ -270,8 +266,7 @@ def update(self, error_category, status=None, message=None, data=None): @classmethod def add_resolution(cls, user, data): - validation_result = cls.objects.get( - pk=data["validation_result_id"].value()) + validation_result = cls.objects.get(pk=data["validation_result_id"].value()) try: opinion = data["opinion"].value() @@ -302,6 +297,7 @@ class ErrorResolution(CommonControlField): _("Action"), max_length=32, choices=choices.ERROR_RESOLUTION_ACTION, + default=choices.ER_ACTION_TO_FIX, null=True, blank=True, ) @@ -333,6 +329,8 @@ def create_or_update(cls, user, validation_result, action, rationale): obj = cls.get(validation_result) obj.updated = datetime.now() obj.updated_by = user + obj.action = action + obj.rationale = rationale obj.save() except cls.DoesNotExist: obj = cls.create(user, validation_result, action, rationale) diff --git a/upload/tasks.py b/upload/tasks.py index 5ed8f808..01e7839a 100644 --- a/upload/tasks.py +++ b/upload/tasks.py @@ -1,4 +1,6 @@ import json +import sys +import logging from celery.result import AsyncResult from django.contrib.auth import get_user_model @@ -16,18 +18,20 @@ from article.models import Article from config import celery_app from issue.models import Issue +from journal.models import Journal from journal.controller import get_journal_dict_for_validation from libs.dsm.publication.documents import get_document, get_similar_documents +from tracker.models import UnexpectedEvent from . import choices, controller, exceptions from .utils import file_utils, package_utils, xml_utils from upload.models import Package -from upload.xml_validation import validate_xml_content, add_app_data, add_sps_data, add_journal_data User = get_user_model() +# TODO REMOVE def run_validations( filename, package_id, package_category, article_id=None, issue_id=None ): @@ -442,6 +446,7 @@ def task_validate_renditions(file_path, xml_path, package_id): return True +# TODO REMOVE @celery_app.task(name="Validate XML") def task_validate_content_xml(file_path, xml_path, package_id): xml_str = file_utils.get_xml_content_from_zip(file_path) @@ -544,17 +549,20 @@ def task_request_pid_for_accepted_packages(self, user_id): @celery_app.task(bind=True) -def task_validate_original_zip_file(self, package_id, file_path, journal_id, issue_id, article_id): +def task_validate_original_zip_file( + self, package_id, file_path, journal_id, issue_id, article_id +): - for xml_with_pre in XMLWithPre.create(file_path=file_path): + for xml_with_pre in XMLWithPre.create(path=file_path): xml_path = xml_with_pre.filename - break - if xml_path: + # FIXME nao usar o otimizado neste momento + optimised_filepath = task_optimise_package(file_path) + # Aciona validação de Assets task_validate_assets.apply_async( kwargs={ - "file_path": file_path, + "file_path": optimised_filepath, "xml_path": xml_path, "package_id": package_id, }, @@ -563,7 +571,7 @@ def task_validate_original_zip_file(self, package_id, file_path, journal_id, iss # Aciona validação de Renditions task_validate_renditions.apply_async( kwargs={ - "file_path": file_path, + "file_path": optimised_filepath, "xml_path": xml_path, "package_id": package_id, }, @@ -583,33 +591,30 @@ def task_validate_original_zip_file(self, package_id, file_path, journal_id, iss @celery_app.task(bind=True) -def task_validate_xml_content(self, file_path, xml_path, package_id, journal_id, issue_id, article_id): - # VE_BIBLIOMETRICS_DATA_ERROR = "bibliometrics-data-error" - # VE_SERVICES_DATA_ERROR = "services-data-error" - # VE_DATA_CONSISTENCY_ERROR = "data-consistency-error" - # VE_CRITERIA_ISSUES_ERROR = "criteria-issues-error" - - # TODO completar data - data = {} - # add_app_data(data, app_data) - # add_journal_data(data, journal, issue) - # add_sps_data(data, sps_data) - - package = Package.objects.get(pk=package_id) - for xml_with_pre in XMLWithPre.create(file_path=file_path): - results = validate_xml_content(xml_with_pre.sps_pkg_name, xml_with_pre.xmltree, data) - - for result in results: - # ['xpath', 'advice', 'title', 'expected_value', 'got_value', 'message', 'validation_type', 'response'] - if not result["response"] == "ERROR": - continue - - message = result["message"] - advice = result["advice"] or '' - message = ". ".join(_(message), _(advice)) - package._add_validation_result( - error_category=choices.VE_DATA_CONSISTENCY_ERROR, - status=choices.VS_DISAPPROVED, - message=message, - data=result, - ) +def task_validate_xml_content( + self, file_path, xml_path, package_id, journal_id, issue_id, article_id +): + try: + package = Package.objects.get(pk=package_id) + if journal_id: + journal = Journal.objects.get(pk=journal_id) + else: + journal = None + + if issue_id: + issue = Issue.objects.get(pk=issue_id) + else: + issue = None + + controller.validate_xml_content(package, journal, issue) + + except Exception as e: + exc_type, exc_value, exc_traceback = sys.exc_info() + UnexpectedEvent.create( + exception=e, + exc_traceback=exc_traceback, + detail={ + "operation": "upload.tasks.task_validate_xml_content", + "detail": dict(file_path=file_path, xml_path=xml_path), + }, + ) diff --git a/upload/tests.py b/upload/tests.py index bfd1b3a2..eb798cb2 100644 --- a/upload/tests.py +++ b/upload/tests.py @@ -12,7 +12,9 @@ # Create your tests here. class ControllerTest(TestCase): - def test__compare_journal_and_issue_from_xml_to_journal_and_issue_from_article_journal_and_issue_differ(self): + def test__compare_journal_and_issue_from_xml_to_journal_and_issue_from_article_journal_and_issue_differ( + self, + ): response = {"journal": "not journal", "issue": "not issue"} article = Mock(spec=Article) article.issue = "issue" @@ -23,11 +25,15 @@ def test__compare_journal_and_issue_from_xml_to_journal_and_issue_from_article_j "error": f"{article.journal} {article.issue} (registered) differs from {journal} {issue} (XML)", "error_type": choices.VE_DATA_CONSISTENCY_ERROR, } - controller._compare_journal_and_issue_from_xml_to_journal_and_issue_from_article(article, response) + controller._compare_journal_and_issue_from_xml_to_journal_and_issue_from_article( + article, response + ) self.assertEqual(expected["error"], response["error"]) self.assertEqual(expected["error_type"], response["error_type"]) - def test__compare_journal_and_issue_from_xml_to_journal_and_issue_from_article_issue_differs(self): + def test__compare_journal_and_issue_from_xml_to_journal_and_issue_from_article_issue_differs( + self, + ): response = {"journal": "Journal", "issue": "Not same issue"} article = Mock(spec=Article) article.issue = "Issue" @@ -38,11 +44,15 @@ def test__compare_journal_and_issue_from_xml_to_journal_and_issue_from_article_i "error": f"{article.journal} {article.issue} (registered) differs from {journal} {issue} (XML)", "error_type": choices.VE_DATA_CONSISTENCY_ERROR, } - controller._compare_journal_and_issue_from_xml_to_journal_and_issue_from_article(article, response) + controller._compare_journal_and_issue_from_xml_to_journal_and_issue_from_article( + article, response + ) self.assertEqual(expected["error"], response["error"]) self.assertEqual(expected["error_type"], response["error_type"]) - def test__compare_journal_and_issue_from_xml_to_journal_and_issue_from_article_journal_differs(self): + def test__compare_journal_and_issue_from_xml_to_journal_and_issue_from_article_journal_differs( + self, + ): response = {"journal": "not journal", "issue": "issue"} article = Mock(spec=Article) article.issue = "issue" @@ -53,11 +63,15 @@ def test__compare_journal_and_issue_from_xml_to_journal_and_issue_from_article_j "error": f"{article.journal} (registered) differs from {journal} (XML)", "error_type": choices.VE_ARTICLE_JOURNAL_INCOMPATIBILITY_ERROR, } - controller._compare_journal_and_issue_from_xml_to_journal_and_issue_from_article(article, response) + controller._compare_journal_and_issue_from_xml_to_journal_and_issue_from_article( + article, response + ) self.assertEqual(expected["error"], response["error"]) self.assertEqual(expected["error_type"], response["error_type"]) - def test__compare_journal_and_issue_from_xml_to_journal_and_issue_from_article_journal_and_issue_compatible(self): + def test__compare_journal_and_issue_from_xml_to_journal_and_issue_from_article_journal_and_issue_compatible( + self, + ): response = {"journal": "journal", "issue": "issue"} article = Mock(spec=Article) article.issue = "issue" @@ -67,7 +81,9 @@ def test__compare_journal_and_issue_from_xml_to_journal_and_issue_from_article_j expected = { "package_status": choices.PS_ENQUEUED_FOR_VALIDATION, } - controller._compare_journal_and_issue_from_xml_to_journal_and_issue_from_article(article, response) + controller._compare_journal_and_issue_from_xml_to_journal_and_issue_from_article( + article, response + ) self.assertIsNone(response.get("error")) self.assertEqual(expected["package_status"], response["package_status"]) @@ -274,7 +290,9 @@ def test__get_journal_with_issn_print(self, mock_journal_get, mock_official_j_ge @patch("upload.controller.OfficialJournal.objects.get") @patch("upload.controller.Journal.objects.get") - def test__get_journal_with_journal_title(self, mock_journal_get, mock_official_j_get): + def test__get_journal_with_journal_title( + self, mock_journal_get, mock_official_j_get + ): journal = Journal() official_j = OfficialJournal() mock_journal_get.return_value = journal @@ -289,7 +307,9 @@ def test__get_journal_with_journal_title(self, mock_journal_get, mock_official_j @patch("upload.controller.OfficialJournal.objects.get") @patch("upload.controller.Journal.objects.get") - def test__get_journal_with_issn_print_after_raise_exception_does_not_exist_for_issn_electronic(self, mock_journal_get, mock_official_j_get): + def test__get_journal_with_issn_print_after_raise_exception_does_not_exist_for_issn_electronic( + self, mock_journal_get, mock_official_j_get + ): journal = Journal() official_j = OfficialJournal() mock_journal_get.return_value = journal @@ -307,13 +327,15 @@ def test__get_journal_with_issn_print_after_raise_exception_does_not_exist_for_i [ call(issn_electronic="EEEEEEE"), call(issn_print="XXXXXXX"), - ] + ], ) mock_journal_get.assert_called_with(official=official_j) @patch("upload.controller.OfficialJournal.objects.get") @patch("upload.controller.Journal.objects.get") - def test__get_journal_raises_multiple_object_returned(self, mock_journal_get, mock_official_j_get): + def test__get_journal_raises_multiple_object_returned( + self, mock_journal_get, mock_official_j_get + ): journal = Journal() official_j = OfficialJournal() mock_journal_get.return_value = journal @@ -328,14 +350,13 @@ def test__get_journal_raises_multiple_object_returned(self, mock_journal_get, mo mock_official_j_get.mock_calls, [ call(issn_electronic="EEEEEEE"), - ] + ], ) mock_journal_get.assert_not_called() @patch("upload.controller.Article") class GetArticlePreviousStatusTest(TestCase): - def test_get_article_previous_status_require_update(self, mock_article): response = {} article = Mock(spec=Article) @@ -354,7 +375,9 @@ def test_get_article_previous_status_required_erratum(self, mock_article): self.assertEqual(article.status, article_choices.AS_CHANGE_SUBMITTED) self.assertEqual(response["package_category"], choices.PC_ERRATUM) - def test_get_article_previous_status_not_required_erratum_and_not_require_update(self, mock_article): + def test_get_article_previous_status_not_required_erratum_and_not_require_update( + self, mock_article + ): response = {} article = Mock(spec=Article) article.status = "no matter what" @@ -362,7 +385,10 @@ def test_get_article_previous_status_not_required_erratum_and_not_require_update self.assertIsNone(result) self.assertEqual("no matter what", article.status) self.assertEqual(response["package_category"], choices.PC_UPDATE) - self.assertEqual(f"Unexpected package. Article has no need to be updated / corrected. Article status: no matter what", response["error"]) + self.assertEqual( + f"Unexpected package. Article has no need to be updated / corrected. Article status: no matter what", + response["error"], + ) self.assertEqual(choices.VE_FORBIDDEN_UPDATE_ERROR, response["error_type"]) @@ -371,8 +397,9 @@ def test_get_article_previous_status_not_required_erratum_and_not_require_update @patch("upload.controller.Article.objects.get") @patch("upload.controller.PidRequester.is_registered_xml_with_pre") class CheckArticleAndJournalTest(TestCase): - - def test__check_article_and_journal__registered_and_allowed_to_be_updated(self, mock_xml_with_pre, mock_article_get, mock_issue_get, mock_journal_get): + def test__check_article_and_journal__registered_and_allowed_to_be_updated( + self, mock_xml_with_pre, mock_article_get, mock_issue_get, mock_journal_get + ): mock_xml_with_pre.return_value = {"v3": "yjukillojhk"} @@ -417,7 +444,9 @@ def test__check_article_and_journal__registered_and_allowed_to_be_updated(self, self.assertEqual(choices.PS_ENQUEUED_FOR_VALIDATION, result["package_status"]) self.assertEqual(choices.PC_UPDATE, result["package_category"]) - def test__check_article_and_journal__new_document(self, mock_xml_with_pre, mock_article_get, mock_issue_get, mock_journal_get): + def test__check_article_and_journal__new_document( + self, mock_xml_with_pre, mock_article_get, mock_issue_get, mock_journal_get + ): mock_xml_with_pre.return_value = {} @@ -498,4 +527,4 @@ def test__check_article_and_journal__new_document(self, mock_xml_with_pre, mock_ # return response # # documento novo # response["package_status"] = choices.PS_ENQUEUED_FOR_VALIDATION -# return response \ No newline at end of file +# return response diff --git a/upload/wagtail_hooks.py b/upload/wagtail_hooks.py index 87b3e750..dbc2e424 100644 --- a/upload/wagtail_hooks.py +++ b/upload/wagtail_hooks.py @@ -2,6 +2,7 @@ from django.contrib import messages from django.http import HttpResponseRedirect +from django.shortcuts import get_object_or_404, redirect, render from django.urls import include, path from django.utils.translation import gettext as _ from wagtail import hooks @@ -27,77 +28,55 @@ choices, ) from .permission_helper import UploadPermissionHelper -from .tasks import run_validations +from .controller import receive_package from .utils import package_utils +from upload.tasks import task_validate_original_zip_file class PackageCreateView(CreateView): - def get_instance(self): - package_obj = super().get_instance() - - pkg_category = self.request.GET.get("package_category") - if pkg_category: - package_obj.category = pkg_category - - article_id = self.request.GET.get("article_id") - if article_id: - try: - package_obj.article = Article.objects.get(pk=article_id) - except Article.DoesNotExist: - ... - - return package_obj def form_valid(self, form): - article_data = self.request.POST.get("article") - article_json = json.loads(article_data) or {} - article_id = article_json.get("pk") - try: - article = Article.objects.get(pk=article_id) - except (Article.DoesNotExist, ValueError): - article = None - issue_data = self.request.POST.get("issue") - issue_json = json.loads(issue_data) or {} - issue_id = issue_json.get("pk") - try: - issue = Issue.objects.get(pk=issue_id) - except (Issue.DoesNotExist, ValueError): - issue = None + package = form.save_all(self.request.user) - self.object = form.save_all(self.request.user, article, issue) + response = receive_package(package) - if self.object.category in (choices.PC_UPDATE, choices.PC_ERRATUM): - if self.object.article is None: - messages.error( - self.request, - _("It is necessary to select an Article."), - ) - return HttpResponseRedirect(self.request.META["HTTP_REFERER"]) - else: - messages.success( - self.request, - _("Package to change article has been successfully submitted."), - ) + if response.get("error_type") == choices.VE_PACKAGE_FILE_ERROR: + # error no arquivo + messages.error(self.request, response.get("error")) + return HttpResponseRedirect(self.request.META["HTTP_REFERER"]) - if self.object.category == choices.PC_NEW_DOCUMENT: - if self.object.issue is None: - messages.error(self.request, _("It is necessary to select an Issue.")) - return HttpResponseRedirect(self.request.META["HTTP_REFERER"]) - else: - messages.success( - self.request, - _("Package to create article has been successfully submitted."), - ) + if response.get("error"): + # error + messages.error(self.request, response.get("error")) + return redirect(f"/admin/upload/package/inspect/{package.id}") - run_validations( - self.object.file.name, - self.object.id, - self.object.category, - article_id, - issue_id, + messages.success( + self.request, + _("Package has been successfully submitted and will be analyzed"), ) + # dispara a tarefa que realiza as validações de + # assets, renditions, XML content etc + + try: + journal_id = response["journal"].id + except (KeyError, AttributeError): + journal_id = None + try: + issue_id = response["issue"].id + except (KeyError, AttributeError): + issue_id = None + + task_validate_original_zip_file.apply_async( + kwargs=dict( + package_id=package.id, + file_path=package.file.path, + journal_id=journal_id, + issue_id=issue_id, + article_id=package.article and package.article.id or None, + ) + ) return HttpResponseRedirect(self.get_success_url()) @@ -378,7 +357,7 @@ class UploadModelAdminGroup(ModelAdminGroup): menu_order = get_menu_order("upload") -# modeladmin_register(UploadModelAdminGroup) +modeladmin_register(UploadModelAdminGroup) @hooks.register("register_admin_urls") diff --git a/upload/xml_validation.py b/upload/xml_validation.py index 304d0b63..0af4c99d 100644 --- a/upload/xml_validation.py +++ b/upload/xml_validation.py @@ -22,6 +22,9 @@ from packtools.sps.validation.journal_meta import JournalMetaValidation from packtools.sps.validation.preprint import PreprintValidation from packtools.sps.validation.related_articles import RelatedArticlesValidation + +from upload import choices +from upload.models import ValidationResult from tracker.models import UnexpectedEvent @@ -84,28 +87,38 @@ def add_sps_data(data, sps_data): def validate_xml_content(sps_pkg_name, xmltree, data): - - functions = ( - validate_affiliations, - validate_languages, - validate_article_attributes, - validate_article_id_other, - validate_subjects, - validate_article_type, - validate_authors, - validate_data_availability, - validate_doi, - validate_article_languages, - validate_licenses, - validate_toc_sections, - validate_xref, - validate_dates, - validate_journal, - validate_preprint, - validate_related_articles, + # TODO adicionar error_category + # VE_XML_CONTENT_ERROR: generic usage + # VE_BIBLIOMETRICS_DATA_ERROR: used in metrics + # VE_SERVICES_DATA_ERROR: used in reports + # VE_DATA_CONSISTENCY_ERROR: data consistency + # VE_CRITERIA_ISSUES_ERROR: required by the criteria document + + error_category_and_function_items = ( + (choices.VE_BIBLIOMETRICS_DATA_ERROR, validate_affiliations), + (choices.VE_BIBLIOMETRICS_DATA_ERROR, validate_authors), + (choices.VE_BIBLIOMETRICS_DATA_ERROR, validate_languages), + (choices.VE_CRITERIA_ISSUES_ERROR, validate_article_attributes), + (choices.VE_CRITERIA_ISSUES_ERROR, validate_data_availability), + (choices.VE_CRITERIA_ISSUES_ERROR, validate_licenses), + (choices.VE_DATA_CONSISTENCY_ERROR, validate_article_id_other), + (choices.VE_DATA_CONSISTENCY_ERROR, validate_article_languages), + (choices.VE_DATA_CONSISTENCY_ERROR, validate_article_type), + (choices.VE_DATA_CONSISTENCY_ERROR, validate_dates), + (choices.VE_DATA_CONSISTENCY_ERROR, validate_doi), + (choices.VE_DATA_CONSISTENCY_ERROR, validate_journal), + (choices.VE_DATA_CONSISTENCY_ERROR, validate_preprint), + (choices.VE_DATA_CONSISTENCY_ERROR, validate_related_articles), + (choices.VE_DATA_CONSISTENCY_ERROR, validate_subjects), + (choices.VE_DATA_CONSISTENCY_ERROR, validate_toc_sections), + (choices.VE_DATA_CONSISTENCY_ERROR, validate_xref), ) - for f in functions: - yield from f(sps_pkg_name, xmltree, data) + for error_category, f in error_category_and_function_items: + for item in f(sps_pkg_name, xmltree, data): + if item["validation_type"] in ("value in list", "value", "match"): + error_category = choices.VE_DATA_CONSISTENCY_ERROR + item["error_category"] = item.get("error_category") or error_category + yield item def validate_affiliations(sps_pkg_name, xmltree, data):