- Introduction
- Naming Conventions
- Project and Library Structure
- File Structure
- Class Structure
- Language Fundamentals
- Language Extensions
- Documentation
- Testing
- General Comments
The intent of this document is to present rules and guidelines detailing best practices for developers writing C++ code for Machine Learning at Elastic. It is not intended to be an exhaustive, definitive set of coding standards covering style, format etc. nor is it intended to replace careful study of "Best Practice" handbooks such as the "Effective C++" series of books by Scott Meyers.
The rules contained in this document have deliberately been made as generic as possible. They should be agnostic to development platform, compiler type and version etc. with the caveat that the compiler supports the C++11 standard as a minimum. If a rule applies to a C++ standard version higher than this it will be specified as e.g. From C++14.
The guiding principle should be that consistency with existing code is paramount, therefore in the interests of brevity, detailed justification for the rules has been omitted.
The use of the words SHOULD, MUST etc., comply with RFC 2119.
-
Code SHOULD be consistent with its surrounding context.
-
Names MUST be meaningful and concise
-
Variable names MUST reflect use not type
-
Local variables MUST be named as per
variableName
-
Classes, structs and unions MUST be named as per
CClassName
,SStructName
,UUnionName
respectively- Implementation files MUST have a
.cc
extension - Header files MUST have a
.h
extension
- Implementation files MUST have a
-
Member variables MUST be named as per
m_ClassMember
,s_StructMember
,u_UnionMember
For class, structure and union member variables respectively -
Static members MUST be named as per
ms_ClassStatic
,ss_StructStatic
-
Methods SHOULD be named as per
methodName
-
Template classes MUST be named as per
CClassName
,<typename TYPE_NAME>
-
Enumerations MUST be named as per
EEnumName
,E_MemberName
-
Type aliases MUST be named as per
TTypeName
-
Type aliases referring to a template SHOULD identify the template instantiation concisely (we have a consistent naming convention in such cases which SHOULD be followed) e.g.
using TDoubleVec = std::vector<double> using TSizeDoubleMap = std::map<std::size_t, double>
-
Constants MUST be named as per
CONSTANT_NAME
-
Macros MUST be named as per
MACRO_NAME(...)
. However... -
Macros SHOULD NOT be used unless unavoidable
-
Files MUST be named as per
CClassName.cc
,CClassName.h
if they contain a single or principle class namedCClassName
-
Files containing primarily global typedefs SHOULD be named as per
<Identifier>Types.h
where<Identifier>
pertains to the file contents. -
Files containing primarily constants SHOULD be named
Constants.h
-
Files containing the function
main()
MUST be calledMain[Xxx].cc
. WhereXxx
is only used if necessary to distinguish multiple such files in the same directory -
Non-boolean accessor functions MUST be named as
clientId
NOTgetClientId
-
Boolean accessor functions MUST be named as
isComplete
NOTcomplete
-
Parameter names in function declaration and definition MUST be identical
-
Parameters in constructor initialiser lists SHOULD be as per
classMember
- Existing namespace usage MUST be followed:
Production code in a subdirectory
foo
resides in namespacefoo
orfoo_t
(or a nested namespace) - Namespaces MUST be used for logical groupings of code
- Namespaces MUST NOT span libraries
- Namespaces SHOULD NOT be imported with
using
directives (likeusing namespace ml
;), except in unit test implementation files. In particular,using namespace std;
andusing namespace boost;
MUST NOT be used anywhere - Files SHOULD be kept short
- Multiple classes defined in a single file SHOULD be avoided, except where they pertain to closely related functionality
- Shared constants and typedefs MUST be in a 'Types' namespace of the form
<library namespace>_t
- Libraries MUST NOT have circular dependencies
- Platform specific code SHOULD be in a separate file of the form
<class name>_<platform name>.cc
-
All source files MUST be formatted with the
clang-format
tool prior to check in.- This procedure can be simplified by use of the dev-tools/clang-format.sh script.
- The same specific version of
clang-format
used by the clang-format.sh script MUST be used. It is recommended that this be obtained from the pre-built binary packages of LLVM available from http://releases.llvm.org/download.html
-
The standard header file layout MUST be observed
Header files MUST contain the following items in the order defined below
- Copyright statement
- Include guard of the form
Note that test files are in the global namespace and hence the part in square bracket should be omitted in this case.
#ifdef INCLUDED_[<namespace>_]<class name>_h
- Include files SHOULD be avoided in header files.
- Include files SHOULD be in the recommended order (see below) if present
- Forward declarations. These MUST be used wherever possible to reduce include file requirements
- Class declarations. These MUST be in the recommended order (see below)
- End of include guard. This SHOULD be followed by a comment indicating the name of the guard to which this pertains.
- Judicious use of blank lines SHOULD be used to separate each of the above items.
-
The standard implementation file layout MUST be observed. Implementation files MUST contain the following items in the order defined below
- Elastic commercial code file header
- Class include file
- Other include files, in the recommended order (see below)
- Unnamed namespace local declarations (use of this is preferred to private declarations in the header file)
- Beginning of namespace for this library/application
- Constructor Implementation
- Destructor Implementation
- Copy/Move constructors (if present)
- Class operators (if present)
- Other method implementations
- End of namespace for this library/application
- Judicious use of blank lines SHOULD be used to separate each of the above items.
-
Standard ordering of
#include
statements SHOULD be followed- Own include file - for
.cc
files including their own.h
- Other ML include files
- 3rd party library include files (including Boost)
- Standard C++ include files
- Standard C include files. However C++ header wrappers SHOULD be included in preference to the equivalent C header, e.g include
cstdlib
in preference tostdlib.h
- Own include file - for
-
Include files SHOULD be grouped by library/subdirectory, with a blank line between each grouping.
clang-format
is then able to sort in alphabetical order within each grouped section. It is best practice to list the ML include files in the build order of the libraries they relate to, as this helps to catch accidental circular dependencies.
-
Class headers SHOULD be broken into sections
Elements SHOULD be placed in sections according to a number of criteria:
- Scope: public, protected, private or hidden (private + unimplemented functions)
- Type: constants, typedefs, methods, variables, or nested classes
- Static or non static
-
Header sections MUST be ordered according to scope - public, protected, private
-
Each section in a class MUST be prefixed with its scope keyword (public/protected/private) even if this repeats the scope already in effect
-
Structs and Unions MUST NOT be used for encapsulation
- Access specifiers MUST NOT appear
- Constructors (if present) MUST be trivial
- Other methods SHOULD NOT be used. This includes any explicit destructor.
-
Declarations SHOULD be given minimal scope
- This is a general rule, that should guide the scoping of classes, data, enums and functions
- It has a number of corollaries:
- Class variable data MUST NOT be public - this rule does not apply to static constants
- A class which is used by only one other class SHOULD be nested inside it
- A class which is instantiated in only one function MAY be nested inside the function definition. A lambda SHOULD be used in preference in this case.
- Typedefs, enums and static constants SHOULD be given minimal scope
-
Copy constructor and assignment operator SHOULD be hidden unless appropriate - new classes SHOULD use
delete
for this purpose -
Functions SHOULD NOT use default arguments
-
Functions and variable data not belonging to a class SHOULD be defined in a detail or unnamed namespace within implementation files
-
Destructor of a polymorphic class MUST be virtual
-
Destructor of a non-polymorphic class MUST NOT be virtual
-
Interdependencies between static objects MUST be avoided
-
Headers MUST NOT define static variables of non-built-in types
-
Implementation details SHOULD reside in the
.cc
file -
Template code SHOULD either be in implementation file, or inside class declaration
-
Function parameters MUST be ordered as [in] [in,out] [out]
-
Classes SHOULD be forward declared rather than included
-
Logically const methods SHOULD be const
-
Headers SHOULD NOT define static variables of non-built-in types
-
nullptr
MUST be used in preference to0
orNULL
-
Unreachable or ineffectual code MUST NOT be included (this also applies to 'commented out' code)
-
Functions returning a value MUST return a default value at the end
-
Exceptions SHOULD NOT be thrown - use return codes to explicitly handle error conditions
-
Exceptions thrown from 3rd party code MUST be caught in the smallest enclosing scope - such exceptions MUST be converted to an appropriate error code
-
Multi-value error codes SHOULD be returned as an enumeration
-
Switch statements SHOULD switch over enums and MUST cover all the cases
- When switching over a non-enum a default case SHOULD be used
- When switching over an enum a default case SHOULD NOT be used
-
Error conditions SHOULD always be returned early
-
The declaration of all variables SHOULD be given the minimal possible scope
-
Return codes SHOULD preserve the abstraction and encapsulation of the class - functions should not return an error specific to the internal implementation
-
assert()
MUST NOT be used -
C-style casts MUST NOT be used
-
Macros SHOULD NOT be used
-
Objects and references SHOULD be used in preference to pointers
-
Smart pointers SHOULD be used in preference to raw pointers
-
Dynamically acquired resources SHOULD be released in the same scope
-
A class wrapper SHOULD be used to avoid dangling resources
-
Resource ownership MUST be indicated by parameter type
Argument, Passed By Memory Ownership Value N/A Pointer Called function Const pointer (not pointer to const object) Calling function Reference Calling function Const reference Calling function -
Explicit integer definitions, specifying size, SHOULD be used
-
Member functions MUST be scoped with
this->
when called -
Floating point variables SHOULD be
double
-
Lambdas SHOULD be used in preference to any form of
bind
-
Default lambda capture modes SHOULD be avoided
-
The
override
keyword SHOULD be used consistently within a source file -
The
virtual
keyword SHOULD NOT be used in conjunction withoverride
-
Type aliases MUST be used in preference to typedefs - use
using
to create a type alias nottypedef
-
Rvalue references SHOULD only be used in the following cases
- For implementing move semantics
- For forwarding references in template code
-
Emplace operations SHOULD be used to add items to containers wherever applicable i.e. prefer
emplace_back
overpush_back
for vectors andemplace
overinsert
for maps -
Containers SHOULD have their capacity reserved in advance if applicable
-
Range based for loops SHOULD be preferred over their explicit counterparts
-
Uniform (braced) initialisers SHOULD be preferred in new code units
-
The
auto
keyword SHOULD be used liberally for variable assignments when the resulting code is less verbose -
auto
SHOULD NOT be used where the assigned type is not clear in the context e.g.auto obj = doSomething(someVariable); // bad
-
auto
types should always be assigned withoperator=
- Language features MUST NOT be used until supported by all compilers used for a given ML build version
- The current lowest common denominator compiler version is Visual Studio 2013
- C++14 features may be used as of version 7.0
- Where both Boost and the standard library contain equivalent implementations of the same feature the same provider SHOULD be used consistently across the codebase
make_shared
&make_unique (From C++14)
SHOULD be used to create the corresponding smart pointer types- Use of 3rd party libraries (including Boost) MUST be approved - this applies to new features used for the first time.
- STL algorithms SHOULD be used wherever appropriate
- Source code MUST be readable. This is of primary importance
- Doxygen MUST be used to comment all source files
- Header files MUST be commented for Doxygen in standard format.
The following MUST be included for all top-level classes:
- Brief summary
- Detailed description - what the class does.
- Implementation decisions - what has been done and why.
- In addition, the following MAY be used if required:
- Future enhancements - what we should do to this class in the future.
- Resource ownership - should be used where this class manages any resources (e.g. objects on the heap)
- Example - usage example etc
- Exclamation mark style MUST be used for Doxygen
- Implementation files SHOULD be commented in C++ style (not C, and not Doxygen)
- All class members SHOULD be documented
- All public and protected methods MUST be documented with
\param, \return
- Parameters MUST be documented by name
- Non-trivial resource ownership SHOULD be documented
The CPPUNIT framework is used for testing the Elastic Machine Learning C++ code.
- Every class SHOULD have a corresponding unit test suite
- Test classes SHOULD belong to the parent or global namespace
- Unit tests SHOULD exist for every public method in the corresponding class
- The number of lines in a method SHOULD be kept to a minimum
- C++ functions SHOULD be used in preference to C
reinterpret_cast
MUST only be used when interfacing with 3rd party codedynamic_cast
SHOULD be used judiciouslyconst_cast
SHOULD be used judiciously- Classes SHOULD NOT inherit from more than one base class
- All code SHOULD compile with no warnings
- Overloaded function parameters SHOULD NOT be implicitly convertible