Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Per-language subclasses of CodeGenerator in generators/ #7085

Closed
cpcallen opened this issue May 12, 2023 · 4 comments · Fixed by #7166, #7156, #7160, #7161 or #7162
Closed

Proposal: Per-language subclasses of CodeGenerator in generators/ #7085

cpcallen opened this issue May 12, 2023 · 4 comments · Fixed by #7166, #7156, #7160, #7161 or #7162
Labels
issue: feature request Describes a new feature and why it should be added

Comments

@cpcallen
Copy link
Contributor

cpcallen commented May 12, 2023

This is the second of three issues created to discuss proposed changes to generators (and perhaps track implementation of those proposals, if agreed). This issue concerns proposed changes to the generator implementations in generators/.

In code examples in these bugs, the word "language" / "Language" / "LANGUAGE" should be read as a metasyntactic variable standing in for some particular language like JavaScript.

Background

The way code generators are currently defined involves creating a CodeGenerator instance for the particular language to generate code for, and then adding a large number of properties to that object. These can broadly be divided into three categories:

  1. Constants and similarly static (but not static) properties.
  2. Helper methods and overrides of methods declared on CodeGenerator.
  3. Per-block-type generator functions.

A concrete example:

// CURRENT CODE:

// Create CodeGenerator instance:
const languageGenerator = new CodeGenerator('Language');

// Some initialisation via methods:
languageGenerator.addReservedWords('foo,bar,baz');

// Add some constants and methods (possibly overriding ones on prototype):
languageGenerator.ORDER_ATOMIC = 0;
languageGenerator.init = function(/* ... */) {/* ... */};


// Add per-block methods as properties, treating using CodeGenerator
// object as a dictionary:
languageGenerator['block_type'] = function(block) {/* ... */};

This is a perfectly good approach in JavaScript, but migrating this style of code to TypeScript poses some challenges because tsc generally complains about adding arbitrary properties to an object unless that object is declared as a dictionary.

The main proposal below addresses the first two categories; the third category is dealt with in the first proposal, #7084.

Main Proposal: Create per-language subclasses of CodeGenerator

By creating a subclass of CodeGenerator for each language, all of these additional properties (excluding the block generator functions, dealt with previously) can be created in a way that is compatible with the normal type-checking of classes. This can be done in a way that should not create any breaking changes:

class LanguageGenerator extends CodeGenerator {
  // Constants remain non-static for backwards compatibility:
  readonly ORDER_ATOMIC = 0;

  constructor() {
    super('Language');
    this.addReservedWords('foo,bar,baz');
  }

  // Methods declared in usual way, possibly overriding ones on
  // CodeGenerator.prototype:
  init(/* ... */) {/* ... */};
}

// Create instance of subclass:
const languageGenerator = new LanguageGenerator();

The resulting languageGenerator object would be essentially identical to the one created by current code; the only observable difference that there would be an intermediate LanguageGenerator.prototype object in the prototype chain between languageGenerator and CodeGenerator.prototype, and methods declared on LanguageGenerator would be on that intermediate prototype object rather than directly on the languageGenerator instance object.

Supplementary Proposal: Remove static data from CodeGenerator instances

Some of the data properties currently created on CodeGenerator instances—notably, the operator precedence tables—will always be the same for any given language, and would not vary even if multiple code generators for the same language were instantiated (e.g. to generate different code for the same blocks, or to use different indentation or loop traps).

It would make sense to:

  • Use enums where appropriate.
  • Either export such data separately from the LanguageGenerator module (option A):
    export enum Order {  // Or ORDER?
      ATOMIC = 0;
      // ...
    }
    
    export class LanguageGenerator { /* ... */ }
    or make it a static property on the (exported) LanguageGenerator constructor (option B):
    export class LanguageGenerator { /* ... */ }
    export namespace LanguageGenerator {
      export enum Order {  // Or ORDER?
        ATOMIC = 0;
        // ...
      }
    }
    (The use of namespace violates the Google styleguide, but this usage would be consistent with how we use it elsewhere.)

Breaking Changes

Either alternative will require us to export not only the languageGenerator instance as we do currently but also the Order enum (we'd probably want to export the LanguageGenerator class either way). (See also proposal issue #7086 for more on changes to how generators are published.)

Developer code adding new blocks to an existing generator will need to be updated slightly. If we export the Order enum separately then it would change from:

languageGenerator.blockGenerators['atomic_expression_block_type'] = function(block: Block) {
  // ...
  return [code, languageGenerator.ORDER_ATOMIC];
};

to

  return [code, Order.ATOMIC];

while if we choose the static (namespace) export, it would change to:

  return [code, LanguageGenerator.Order.ATOMIC];
@cpcallen
Copy link
Contributor Author

cpcallen commented May 15, 2023

I recommend we adopt both the main and supplementary proposals.

Broadly I prefer option A because it is more in line with our styleguide.

@BeksOmega
Copy link
Collaborator

Wrt the main proposal: This sounds good to me! Non-breaking and a much more modern code reuse structure.

Wrt the supplementary proposal: You state "notably, the operator precedence tables—will always be the same for any given language" and I'm not sure that that's true. Aren't there some languages (e.g. Haskell) that introduce other operators, and as such other precedences?

@cpcallen
Copy link
Contributor Author

Wrt the supplementary proposal: You state "notably, the operator precedence tables—will always be the same for any given language" and I'm not sure that that's true. Aren't there some languages (e.g. Haskell) that introduce other operators, and as such other precedences?

Hmm. It is true you can define new operators in Haskell, but can you do so in a way that modifies the operator precedence table? I.e., do they not just end up having the same precedence as some other existing category of operator? I am just a simple Blub programmer so must plead ignorance.

@cpcallen
Copy link
Contributor Author

cpcallen commented Jun 6, 2023

At our weekly team meeting on 2023-05-16, we decided:

  • Main Proposal: Create per-language subclasses of CodeGenerator: accepted.
  • Supplementary Proposal: Remove static data from CodeGenerator instances: TBD, but let's avoid namespaces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment