Añade tests para encoding #19

dieghernan · 2022-02-17T19:18:26Z

santiagomota · 2022-02-17T19:19:53Z

De nada

santiagomota · 2022-02-17T19:24:47Z

Algunos de los municipios con problemas (en la version de 2021-09-13):
23078 linea 829761 columna 49
03050 linea 203847 columan 49
23051 linea 117301 columna 48
Aparace un caracter (\xd1) o (\xdf). Cuando se lee el municipio con st_read() da un warning y lee bien todo lo que está antes, pero no lo que sigue. Lo he detectado en las parcelas catastrales

santiagomota · 2022-02-17T19:26:17Z

Mi solución es leer el fichero y substituir esos caracteres:
file_municipio_bu_temp <- readLines(file_municipio_bu, encoding = "ISO-8859-1")
file_municipio_bu_temp <- gsub("�", "", file_municipio_bu_temp)
file_municipio_bu_temp <- gsub("\xd1", "", file_municipio_bu_temp)
file_municipio_bu_temp <- gsub("\xbf", "_", file_municipio_bu_temp)
writeLines(file_municipio_bu_temp, file_municipio_bu, useBytes = TRUE)

dieghernan · 2022-02-17T22:15:45Z

https://github.com/dieghernan/CatastRo/blob/4b1dc6ceb3565e06a2b67c01688fa369c6bebf7c/R/utils_read.R#L12-L24

Gracias!

santiagomota · 2022-02-17T22:21:16Z

Pruébalo porque es muy posible que tengas que incluir el , useBytes = TRUE en el writeLines
https://stackoverflow.com/questions/31432560/readlines-and-writelines-r

codecov · 2022-02-17T22:24:22Z

Codecov Report

Merging #19 (6a3040f) into master (9cdfb42) will increase coverage by 0.01%.
The diff coverage is 90.32%.

❗ Current head 6a3040f differs from pull request most recent head 4b1dc6c. Consider uploading reports for the commit 4b1dc6c to get more accurate results

@@            Coverage Diff             @@
##           master      #19      +/-   ##
==========================================
+ Coverage   98.22%   98.23%   +0.01%     
==========================================
  Files          18       19       +1     
  Lines         788      794       +6     
==========================================
+ Hits          774      780       +6     
  Misses         14       14

Impacted Files	Coverage Δ
R/atom_ad_db.R	`100.00% <ø> (ø)`
R/atom_bu_db.R	`100.00% <ø> (ø)`
R/atom_cp_db.R	`100.00% <ø> (ø)`
R/utils_read.R	`88.88% <88.88%> (ø)`
R/atom_ad.R	`100.00% <100.00%> (ø)`
R/atom_bu.R	`100.00% <100.00%> (ø)`
R/atom_cp.R	`100.00% <100.00%> (ø)`
R/utils_wfs.R	`97.72% <100.00%> (+2.53%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d76051...4b1dc6c. Read the comment docs.

dieghernan · 2022-02-17T22:25:49Z

Le he metido unos tests, y funciona a la perfección, gracias. Si más adelante da problemas ya los solucionaremos:

https://github.com/dieghernan/CatastRo/blob/4b1dc6ceb3565e06a2b67c01688fa369c6bebf7c/tests/testthat/test-catr_atom_cp.R#L25-L27

santiagomota · 2022-02-17T22:31:13Z

Fenomenal. Un saludo

santiagomota · 2022-02-18T10:39:17Z

Le sigo dando vueltas y he visto esto: https://stackoverflow.com/questions/9934856/removing-non-ascii-characters-from-data-files
Probablemente sea mejor en la línea 19 (utils_read.R) cambiar el
newlines <- gsub("\xd1|\xbf", "_", newlines)
por
newlines <- stringi::stri_trans_general(newlines, "latin-ascii")
En este caso en los valores erróneos, en vez de un caracter "_", incluye un "�", pero la lectura de los datos es correcta.
La ventaja de esto es que substituye no sólo los (\xd1) o (\xdf), sino todos los especiales.

santiagomota · 2022-03-02T05:52:40Z

Me parece que la solución de stringi::stri_trans_general crea un problema que no había visto, las eñes, acentos, etc:
stringi::stri_trans_general("España á ç", "latin-ascii")
[1] "Espana a c"
Tengo que buscar una transformación que funcione o volver al
newlines <- gsub("\xd1|\xbf", "_", newlines)

santiagomota · 2022-03-02T06:28:09Z

... Este parece que funciona:
stringi::stri_trans_general("España á ç", "Any-Latn")
[1] "España á ç"
aunque no estoy 100% seguro

dieghernan and others added 2 commits February 17, 2022 19:58

Test encoding

213720d

Update docs with pkgdev

dc0b540

Arregla issue encoding

4b1dc6c

dieghernan merged commit 53c7bb4 into rOpenSpain:master Feb 17, 2022

dieghernan added a commit that referenced this pull request Feb 21, 2022

Arregla #19, gracias @santiagomota

38c832e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Añade tests para encoding #19

Añade tests para encoding #19

dieghernan commented Feb 17, 2022

santiagomota commented Feb 17, 2022

santiagomota commented Feb 17, 2022 •

edited

Loading

santiagomota commented Feb 17, 2022

dieghernan commented Feb 17, 2022

santiagomota commented Feb 17, 2022

codecov bot commented Feb 17, 2022

dieghernan commented Feb 17, 2022

santiagomota commented Feb 17, 2022

santiagomota commented Feb 18, 2022 •

edited

Loading

santiagomota commented Mar 2, 2022 •

edited

Loading

santiagomota commented Mar 2, 2022 •

edited

Loading

Añade tests para encoding #19

Añade tests para encoding #19

Conversation

dieghernan commented Feb 17, 2022

santiagomota commented Feb 17, 2022

santiagomota commented Feb 17, 2022 • edited Loading

santiagomota commented Feb 17, 2022

dieghernan commented Feb 17, 2022

santiagomota commented Feb 17, 2022

codecov bot commented Feb 17, 2022

Codecov Report

dieghernan commented Feb 17, 2022

santiagomota commented Feb 17, 2022

santiagomota commented Feb 18, 2022 • edited Loading

santiagomota commented Mar 2, 2022 • edited Loading

santiagomota commented Mar 2, 2022 • edited Loading

santiagomota commented Feb 17, 2022 •

edited

Loading

santiagomota commented Feb 18, 2022 •

edited

Loading

santiagomota commented Mar 2, 2022 •

edited

Loading

santiagomota commented Mar 2, 2022 •

edited

Loading