Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance and correctness of cached/static string ids #44261

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 34 additions & 10 deletions src/generic_factory.h
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,11 @@ class generic_factory

private:
DynamicDataLoader::deferred_json deferred;
// generation or "modification count" of this factory
// it's incremented when any changes to the inner id containers occur
// version value corresponds to the string_id::_version,
// so incrementing the version here effectively invalidates all cached string_id::_cid
int version = 0;

protected:
std::vector<T> list;
Expand All @@ -134,16 +139,20 @@ class generic_factory
const std::string legacy_id_member_name = "ident";

bool find_id( const string_id<T> &id, int_id<T> &result ) const {
result = id.get_cid();
if( is_valid( result ) && list[result.to_i()].id == id ) {
return true;
if( id._version == version ) {
result = int_id<T>( id._cid );
return is_valid( result );
}
const auto iter = map.find( id );
// map lookup happens at most once per string_id instance per generic_factory::version
id._version = version;
// id was not found, explicitly marking it as "invalid"
if( iter == map.end() ) {
id._cid = -1;
return false;
}
result = iter->second;
id.set_cid( result );
id._cid = result.to_i();
return true;
}

Expand All @@ -154,7 +163,7 @@ class generic_factory
}
auto iter = map.begin();
const auto end = map.end();
for( ; iter != end; ) {
while( iter != end ) {
if( iter->second == i_id && iter->first != id ) {
map.erase( iter++ );
} else {
Expand Down Expand Up @@ -316,19 +325,27 @@ class generic_factory
* The function returns the actual object reference.
*/
T &insert( const T &obj ) {
// this invalidates `_cid` cache for all previously added string_ids,
// but! it's necessary to invalidate cache for all possibly cached "missed" lookups
// (lookups for not-yet-inserted elements)
// in the common scenario there is no loss of performance, as `finalize` will make cache
// for all ids valid again
version++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By incrementing here we're potentially incrementing version much more often than I was envisioning when I suggested this approach, so I'm a little concerned that we might actually hit integer overflow here. I guess it would take maybe a million itypes and a thousand reloads to get in the ballpark, so it's still pretty unlikely, but it no longer seems completely impossible...

But I'm probably just worrying about nothing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can wrap, but the probability of a collision is negligible, the only thing I'm slightly worried about is wrapping around and landing on the magic number -1. We can avoid that with

do {
    version++;
} while( version == -1 );

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just make version 64 bit, we don't have any tables of string_id or int_id such that we're concerned about their size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm of two minds regarding the possibility of the int overflow. Is it possible in theory? Sure. Will it happen in practice? Almost certainly not.

I'll implement Kevin's idea of skipping -1 and optionally make it 64 bit if profiling doesn't show any noticeable slowdown.

const auto iter = map.find( obj.id );
if( iter != map.end() ) {
T &result = list[iter->second.to_i()];
result = obj;
result.id.set_cid( iter->second );
result.id._cid = iter->second.to_i();
result.id._version = version;
return result;
}

const int_id<T> cid( list.size() );
list.push_back( obj );

T &result = list.back();
result.id.set_cid( cid );
result.id._cid = cid.to_i();
result.id._version = version;
map[result.id] = cid;
return result;
}
Expand All @@ -337,6 +354,12 @@ class generic_factory
virtual void finalize() {
DynamicDataLoader::get_instance().load_deferred( deferred );
abstracts.clear();

version++;
for( size_t i = 0; i < list.size(); i++ ) {
list[i].id._cid = static_cast<int>( i );
list[i].id._version = version;
}
}

/**
Expand Down Expand Up @@ -366,6 +389,7 @@ class generic_factory
void reset() {
list.clear();
map.clear();
version++;
}
/**
* Returns all the loaded objects. It can be used to iterate over them.
Expand Down Expand Up @@ -415,7 +439,7 @@ class generic_factory
* This function can be used to implement @ref int_id::is_valid().
*/
bool is_valid( const int_id<T> &id ) const {
return static_cast<size_t>( id.to_i() ) < list.size();
return id.to_i() >= 0 && static_cast<size_t>( id.to_i() ) < list.size();
}
/**
* Checks whether the factory contains an object with the given id.
Expand Down Expand Up @@ -457,8 +481,8 @@ Loading (inside a `T::load(JsonObject &jo)` function) can be done with two funct
value that will be used if the JSON data does not contain the requested data. It may
throw an error if the existing data is not valid (e.g. string instead of requested int).

The functions are designed to work with the `generic_factory` and therefor support the
`was_loaded` parameter (set be `generic_factory::load`). If that parameter is `true`, it
The functions are designed to work with the `generic_factory` and therefore support the
`was_loaded` parameter (set by `generic_factory::load`). If that parameter is `true`, it
is assumed the object has already been loaded and missing JSON data is simply ignored
(the default value is not applied and no error is thrown upon missing mandatory data).

Expand Down
2 changes: 1 addition & 1 deletion src/int_id.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ class string_id;
* Just like the @ref string_id, this is a wrapper for int based identifiers.
* The template parameter T specifies what kind of object it identifies (e.g. a trap type, monster
* type, ...)
*
* See `string_id` for documentation on usage.
*/
template<typename T>
class int_id
Expand Down
7 changes: 3 additions & 4 deletions src/newcharacter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2427,12 +2427,11 @@ tab_direction set_description( avatar &you, const bool allow_reroll,
loc_name, loc.targets_count() );
}

uilist_entry entry( loc.id.get_cid().to_i(), true, -1, loc_name );
uilist_entry entry( loc.id.id().to_i(), true, -1, loc_name );

select_location.entries.emplace_back( entry );

if( !you.random_start_location &&
loc.id.get_cid() == you.start_location.get_cid() ) {
if( !you.random_start_location && loc.id.id() == you.start_location.id() ) {
select_location.selected = offset;
}
offset++;
Expand Down Expand Up @@ -2809,7 +2808,7 @@ tab_direction set_description( avatar &you, const bool allow_reroll,
you.random_start_location = true;
} else if( select_location.ret >= 0 ) {
for( const auto &loc : start_locations::get_all() ) {
if( loc.id.get_cid().to_i() == select_location.ret ) {
if( loc.id.id().to_i() == select_location.ret ) {
you.random_start_location = false;
you.start_location = loc.id;
break;
Expand Down
9 changes: 7 additions & 2 deletions src/start_location.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@
#include "character.h"
#include "coordinates.h"
#include "debug.h"
#include "enum_conversions.h"
#include "field_type.h"
#include "game_constants.h"
#include "generic_factory.h"
#include "int_id.h"
Expand Down Expand Up @@ -40,6 +38,13 @@ const start_location &string_id<start_location>::obj() const
return all_start_locations.obj( *this );
}

/** @relates int_id */
template<>
int_id<start_location> string_id<start_location>::id() const
{
return all_start_locations.convert( *this, int_id<start_location>( -1 ) );
}

/** @relates string_id */
template<>
bool string_id<start_location>::is_valid() const
Expand Down
47 changes: 29 additions & 18 deletions src/string_id.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
template<typename T>
class int_id;

template<typename T>
class generic_factory;

/**
* This represents an identifier (implemented as std::string) of some object.
* It can be used for all type of objects, one just needs to specify a type as
Expand Down Expand Up @@ -39,6 +42,25 @@ class int_id;
* please define it in type_id.h, a central light-weight header that defines all ids
* people might want to use. This prevents duplicate definitions in many
* files.
*
* Notes on usage and performance (comparison between ids, ::obj lookup,
* assuming default implementation of generic_factory is used):
*
* `int_id` is fastest, but it can't be reused after game reload. Depending on the loaded
* combination of mods `int_id` might point to a different entity and there is no way to tell.
* Because of this NEVER define static int ids.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do currently have some static int_ids (with special handling so that they get reset on game load). Now we can probably get rid of those, which is cool, and create a clang-tidy check to ensure no static int_ids are created. (not in this PR, though).

* But, it's safe to cache and use int_id within the game session.
*
* `string_id` is a bit slower than int_id, but it's safe to use the same instance between game reloads.
* That means that string_id can be static (and should be for maximal performance).
* Comparison of string ids is relatively slow (same as std::string comparison).
* for newly created string_id (i.e. inline constant or local variable), first method invocation:
* `::id` call is relatively slow (string hash map lookup)
* `::obj` lookup is slow (string hash map lookup + array read)
* note, after the first invocation, all subsequent calls on the same string_id instance are fast
* for old (or static) string_id, second and subsequent method invocations:
* conversion to int_id is extremely fast (just returns int field)
* `::obj` call is relatively fast (array read by int index), a bit slower than on int_id
*/
template<typename T>
class string_id
Expand All @@ -55,14 +77,13 @@ class string_id
// a std::string, otherwise a "no matching function to call..." error is generated.
template<typename S, class = typename
std::enable_if< std::is_convertible<S, std::string >::value>::type >
explicit string_id( S && id, int cid = -1 ) : _id( std::forward<S>( id ) ), _cid( cid ) {
}
explicit string_id( S && id ) : _id( std::forward<S>( id ) ), _cid( -1 ), _version( -1 ) {}
/**
* Default constructor constructs an empty id string.
* Note that this id class does not enforce empty id strings (or any specific string at all)
* to be special. Every string (including the empty one) may be a valid id.
*/
string_id() : _cid( -1 ) {}
string_id() : _cid( -1 ), _version( -1 ) {}
/**
* Comparison, only useful when the id is used in std::map or std::set as key. Compares
* the string id as with the strings comparison.
Expand Down Expand Up @@ -171,24 +192,14 @@ class string_id
return !is_null();
}

// TODO: Exposed for now. Hide these and make them accessible to the generic_factory only

/**
* Assigns a new value for the cached int id.
*/
void set_cid( const int_id<T> &cid ) const {
_cid = cid.to_i();
}
Comment on lines -179 to -181
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should keep set_cid as a private function that takes two arguments (id and version)? The two values are always set together.

/**
* Returns the current value of cached id
*/
int_id<T> get_cid() const {
return int_id<T>( _cid );
}

private:
std::string _id;
// cached int_id counterpart of this string_id
mutable int _cid;
// generic_factory version that corresponds to the _cid
mutable int _version;

friend class generic_factory<T>;
};

// Support hashing of string based ids by forwarding the hash of the string.
Expand Down
34 changes: 19 additions & 15 deletions src/string_id_null_ids.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,20 @@
return id; \
}

MAKE_NULL_ID( activity_type, "ACT_NULL", -1 )
// If used with default implementation of the generic_factory,
// these ids must appear in corresponding json file, or "sentinels.json"
// (or have some alternative way to be inserted into the generic_factory)

MAKE_NULL_ID( activity_type, "ACT_NULL" )
MAKE_NULL_ID( harvest_list, "null" )
MAKE_NULL_ID( effect_type, "null" )
MAKE_NULL_ID( material_type, "null", 0 )
MAKE_NULL_ID( material_type, "null" )

MAKE_NULL_ID( overmap_land_use_code, "", 0 )
MAKE_NULL_ID( overmap_special, "", 0 )
MAKE_NULL_ID( overmap_connection, "", 0 )
MAKE_NULL_ID( profession, "null", 0 )
MAKE_NULL_ID( map_extra, "", 0 )
MAKE_NULL_ID( overmap_land_use_code, "" )
MAKE_NULL_ID( overmap_special, "" )
MAKE_NULL_ID( overmap_connection, "" )
MAKE_NULL_ID( profession, "null" )
MAKE_NULL_ID( map_extra, "" )
MAKE_NULL_ID( Skill, "none" )
MAKE_NULL_ID( SkillDisplayType, "none" )
MAKE_NULL_ID( npc_class, "NC_NONE" )
Expand All @@ -41,20 +45,20 @@ MAKE_NULL_ID( translation, "null" )

MAKE_NULL_ID2( itype, "null" )
MAKE_NULL_ID2( mtype, "mon_null" )
MAKE_NULL_ID2( oter_t, "", 0 )
MAKE_NULL_ID2( oter_type_t, "", 0 )
MAKE_NULL_ID2( ter_t, "t_null", 0 )
MAKE_NULL_ID2( oter_t, "" )
MAKE_NULL_ID2( oter_type_t, "" )
MAKE_NULL_ID2( ter_t, "t_null" )
MAKE_NULL_ID2( trap, "tr_null" )
MAKE_NULL_ID2( construction_category, "NULL", -1 )
MAKE_NULL_ID2( ammo_effect, "AE_NULL", 0 )
MAKE_NULL_ID2( field_type, "fd_null", 0 )
MAKE_NULL_ID2( furn_t, "f_null", 0 )
MAKE_NULL_ID2( construction_category, "NULL" )
MAKE_NULL_ID2( ammo_effect, "AE_NULL" )
MAKE_NULL_ID2( field_type, "fd_null" )
MAKE_NULL_ID2( furn_t, "f_null" )
MAKE_NULL_ID2( MonsterGroup, "GROUP_NULL" )
MAKE_NULL_ID2( mission_type, "MISSION_NULL" )
MAKE_NULL_ID2( species_type, "spec_null" )
MAKE_NULL_ID2( mutation_branch, "" )
MAKE_NULL_ID2( requirement_data, "null" )
MAKE_NULL_ID2( body_part_type, "bp_null" )
MAKE_NULL_ID2( bionic_data, "" )
MAKE_NULL_ID2( construction, "constr_null", -1 )
MAKE_NULL_ID2( construction, "constr_null" )
MAKE_NULL_ID2( vehicle_prototype, "null" )
Loading