Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add optional popularity column to pelias csv importer #80

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,10 @@ The contents of the `name_json` field must be a JSON array. As a reminder, in CS

Aliases and languages can _both_ be specified. For example, the `name_json_es` field allows setting multiple aliases in Spanish.

## Popularity

Popularity values can be specified to mark records as more important than others. This value should be an integer greater than zero, in the `popularity` column.

## Categories

Category values can be added to a record. For a single category, use the `category` field. For multiple categories, use `category_json`, with the same formatting as for alias names.
Expand Down
16 changes: 15 additions & 1 deletion lib/streams/documentStream.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,16 @@ const through = require( 'through2' );
const peliasModel = require( 'pelias-model' );
const NAME_REGEX = /^name(_json)?_([a-z]{2})$/i;

function getPopularity(record) {
const popularityString = getCaseInsensitive('popularity', record);
const popularity = parseInt(popularityString);
if (popularityString && (isNaN(popularity) || !/^\d+$/.test(popularityString))) {
throw new Error("Popularity must be an int, got " + popularityString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we should throw errors for popularity.... This is not a required field, so a default value is enough when the data is incorrect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't recall, does throw-ing here kill the whole importer or does it print a message and skip just this one row?

In general, we've had this problem across all the importers, where different people have one of two needs:

  • they want the import to finish if at all possible, even if one file/record/etc fails
  • they want the import to be as complete as possible, and would rather the entire import fails than some invalid data makes its way in, or some data is missed

Where needed, we've introduced flags like missingFilesAreFatal in the OA importer to toggle between the two preferences.

So, if this throw causes the whole importer to fail, then we need to provide a way to allow invalid records to be skipped. If it prints a warning/error and continues on importing other records, then it's probably fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, looks like it does just skip that one record. Now you all know the background anyway :)

} else {
return popularity;
}
}

function getPostalCode(record) {
return getCaseInsensitive('zipcode', record)
|| getCaseInsensitive('postcode', record)
Expand Down Expand Up @@ -155,12 +165,16 @@ function processRecord(record, next_uid, stats) {
pelias_document.setAddress('number', housenumber);
}


const postcode = getPostalCode(record);
if (postcode) {
pelias_document.setAddress('zip', postcode);
}

const popularity = getPopularity(record);
if (popularity) {
pelias_document.setPopularity(popularity);
}

const addendumFields = getPrefixedFields('addendum_json_', record);
Object.keys(addendumFields).forEach(function(namespace) {
pelias_document.setAddendum(namespace, JSON.parse(addendumFields[namespace]));
Expand Down
40 changes: 40 additions & 0 deletions test/streams/documentStream.js
Original file line number Diff line number Diff line change
Expand Up @@ -324,4 +324,44 @@ tape('documentStream parses JSON from category_json field', function(test) {
test.equal(stats.badRecordCount, 0, 'bad record count unchanged');
test.end();
});
});

tape( 'documentStream accepts popularity', function(test) {
const input = {
NUMBER: '5',
STREET: '101st Avenue',
LAT: 5,
LON: 6,
postalcode: '10010',
popularity: '5000'
};
const stats = { badRecordCount: 0 };
const documentStream = DocumentStream.create('prefix', stats);

test_stream([input], documentStream, function(err, actual) {
test.equal(actual.length, 1, 'the document should be pushed' );
test.equal(stats.badRecordCount, 0, 'bad record count unchanged');
test.equal(actual[0].getPopularity(), 5000);
test.end();
});
});


tape( 'documentStream rejects invalid popularity', function(test) {
const input = {
NUMBER: '5',
STREET: '101st Avenue',
LAT: 5,
LON: 6,
postalcode: '10010',
popularity: '500a0'
};
const stats = { badRecordCount: 0 };
const documentStream = DocumentStream.create('prefix', stats);

test_stream([input], documentStream, function(err, actual) {
test.equal(actual.length, 0, 'the document should be skipped' );
test.equal(stats.badRecordCount, 1, 'bad record count went up by 1');
test.end();
});
});