Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transformer: a helper for inserting new statements above or below the current statement. #6641

Open
Dunqing opened this issue Oct 17, 2024 · 6 comments
Assignees
Labels
C-enhancement Category - New feature or request

Comments

@Dunqing
Copy link
Member

Dunqing commented Oct 17, 2024

Unified usage like #4767 (comment) does

@Dunqing Dunqing added the C-enhancement Category - New feature or request label Oct 17, 2024
@Dunqing Dunqing self-assigned this Oct 17, 2024
@overlookmotel
Copy link
Collaborator

overlookmotel commented Oct 17, 2024

Good idea.

As #6644 is problematic, I suggest using Address as the unique ID.

pub struct Address(usize);
impl<'a, T> Box<'a, T> {
/// Get the memory address of a value allocated in the arena.
#[inline]
pub fn address(&self) -> Address {
Address(ptr::addr_of!(**self) as usize)
}
}

Currently we only support getting an Address for a Box, but we can also support getting Address for a &Statement:

pub trait GetAddress {
    fn address(&self) -> Address;
}

impl<'a> GetAddress for Statement<'a> {
    #[inline]
    fn address(&self) -> Address {
        match self {
            Statement::BlockStatement(s) => s.address(),
            Statement::BreakStatement(s) => s.address(),
            Statement::ContinueStatement(s) => s.address(),
            // ... all the rest
        }
    }
}

The match will be massive, but compiler should boil it all down to a single assembly instruction - it's very efficient.

We could generate this with oxc_ast_tools. But if that's too much faff, could also just write it out manually.

I know everyone but me sees Address as a hack, but it's really not. It's a fundamental of things in memory that they have a memory address! Why not utilize that?

@Dunqing
Copy link
Member Author

Dunqing commented Oct 17, 2024

I have no preference for either way. But this way I have a concern, we have many places where mutate directly the statement or call drain(). I think this could change the memory address, don't you? If not, I can try this way.

@overlookmotel
Copy link
Collaborator

overlookmotel commented Oct 17, 2024

Address represents the memory address of the Statement's contents, not the Statement itself. i.e. for Statement::BlockStatement, stmt.address() gives you memory address of the BlockStatement, not the Statement.

Statement consists of:

  1. Enum discriminant - 1 byte, which says what type of statement it is.
  2. A Box<T> - the "payload" e.g. Box<BlockStatement>.

Box is just a pointer. So you can move a Box, and its contents don't move - the pointer has moved, but it's still pointing to the same place.

Therefore, you can also move a Statement, and its "contents" (e.g the BlockStatement) doesn't move. So the Address doesn't change, because the BlockStatement is still in the same place.

So, how does this apply to Vec<Statement>?

Vec<Statement> is an array of Statements. As discussed above, you can move Statements around, and stmt.address() will still give you the same result. Vec::drain, Vec::retain, Vec::insert etc shuffle the Statements around in memory, but they don't move the contents of the statements. So those methods will not alter Addresses.

let mut stmts = Vec::new_in(allocator);

stmts.push(Statement::EmptyStatement(Box::new_in(
    EmptyStatement { span: Span::new(0, 0) },
    allocator,
)));

// Get address of the statement at index 0
let address = stmts[0].address();

// Insert another statement before it
stmts.insert(
    0,
    Statement::EmptyStatement(Box::new_in(EmptyStatement { span: Span::new(1, 1) }, allocator)),
);

// The original statement is now at index 1, but its `Address` has not changed
assert_eq!(stmts[1].address(), address);

Mutating a BlockStatement, Function etc in place will also not change its Address.

The only thing that will affect Address is if you replace a Statement with a new Statement.

If you take ownership of a Statement, unbox it, and then reallocate it into the arena again, that will also affect Address, because essentially it's the same as replacing the Statement with a new one. But I don't think we do that anyway, and if we do, we shouldn't because that's inefficient - why create a new Statement when you can just mutate the existing one?

The one tricky situation is if you replace a statement e.g. convert function f() {} statement to var f = function f() {};. The var statement is a new statement, so obviously will not have same address as the original. But that'd be the same with node IDs - the var statement would have a different ID.

So... TLDR... I think Address should broadly give us what we need for this.

@Dunqing
Copy link
Member Author

Dunqing commented Oct 18, 2024

@overlookmotel I got your point, I am trying it

@Dunqing
Copy link
Member Author

Dunqing commented Oct 18, 2024

Here a case can be replaced with StatementInjector (#6653). But I'm stuck on how to get address by Ancestor

if let Ancestor::VariableDeclaratorInit(declarator) = ctx.parent() {
// Special case when a function would get an inferred name:
// let Foo = () => {}
// let Foo = function() {}
// We'll add signature it on next line so that
// we don't mess up the inferred 'Foo' function name.
// Result: let Foo = () => {}; __signature(Foo, ...);
let id = declarator.id().get_binding_identifier().unwrap();
let symbol_id = id.symbol_id.get().unwrap();
let first_argument = Argument::from(ctx.ast.expression_from_identifier_reference(
ctx.create_bound_reference_id(
SPAN,
id.name.clone(),
symbol_id,
ReferenceFlags::Read,
),
));
arguments.insert(0, first_argument);
let statement = ctx.ast.statement_expression(
SPAN,
ctx.ast.expression_call(
SPAN,
Self::create_identifier_reference_from_binding_identifier(
&binding_identifier,
ctx,
),
NONE,
arguments,
false,
),
);
self.extra_statements.entry(symbol_id).or_insert(ctx.ast.vec()).push(statement);
return;
}

The above logic handles the following code, and determines whether the parent of the current node is ⁠Ancestor::VariableDeclaratorInit, if yes, transform and add a new statement to self.extra_statements

const statement = function b() {}

I want to use StatementInjector to replace self.extra_statements, so I need to get the Address of Statement::VariableDeclaration by Ancestor

Do you think it is possible?

@overlookmotel
Copy link
Collaborator

I think it should be possible to add support for getting Address from an Ancestor without UB. I'll try and figure that out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category - New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants