Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unused types, constants, vars #232

Open
Konstantin8105 opened this issue Oct 3, 2017 · 19 comments
Open

Unused types, constants, vars #232

Konstantin8105 opened this issue Oct 3, 2017 · 19 comments

Comments

@Konstantin8105
Copy link
Contributor

Konstantin8105 commented Oct 3, 2017

Problem

After transpile that code:

#include <stdio.h>
int main(void)
{
	printf("Hello World!\n");
	return 0;
}

We see many unused types, ... for example:

type __int32_t int
type __uint32_t uint32
type __int64_t int32
type __uint64_t uint32
type __quad_t int32
type __u_quad_t uint32
type __dev_t uint32
type __uid_t uint32
type __gid_t uint32
type __ino_t uint32
type __ino64_t uint32
type __mode_t uint32
...

Solution

  1. Install https://github.com/alecthomas/gometalinter
  2. Run application ./unused ./demo/hello.go
  3. Read output:
demo/hello.go:14:6: type __int128_t is unused (U1000)
demo/hello.go:15:6: type __uint128_t is unused (U1000)
demo/hello.go:16:6: type __builtin_ms_va_list is unused (U1000)
demo/hello.go:17:6: type size_t is unused (U1000)
demo/hello.go:18:6: type __u_char is unused (U1000)
demo/hello.go:19:6: type __u_short is unused (U1000)
demo/hello.go:20:6: type __u_int is unused (U1000)
demo/hello.go:21:6: type __u_long is unused (U1000)
demo/hello.go:22:6: type __int8_t is unused (U1000)
...
  1. Parse output and we can remove unused elements.

I try do it by hard and result look:

// Warning (TypedefDecl): %!s(int=333): function pointers are not supported
// Warning (TypedefDecl): %!s(int=341): function pointers are not supported
// Warning (TypedefDecl): %!s(int=350): function pointers are not supported
// Warning (TypedefDecl): %!s(int=353): function pointers are not supported
// Warning (VarDecl): %!s(int=27): probably an incorrect type translation 2

package main

import "github.com/elliotchance/c2go/noarch"

var stdin *noarch.File
var stdout *noarch.File
var stderr *noarch.File

func main() {
	__init()
	noarch.Printf([]byte("Hello World!\n\x00"))
	return
}
func __init() {
	stdin = noarch.Stdin
	stdout = noarch.Stdout
	stderr = noarch.Stderr
}
@elliotchance
Copy link
Owner

I have thought about this issue. There is not only many built-in types but also many functions that ship with the standard headers that add a lot of bloat to the output.

I would prefer to build in the tool (rather than using a third party command) because it's very possible that the logic and exclusions will become more complex over time.

I'm happy to strip all of this out by default and use a CLI argument like -keep-unused if they do want retain the full output.

@Konstantin8105
Copy link
Contributor Author

Example of terminal command for keeping unused types, ... - c2go transpile -keep-unused hello.c

@Konstantin8105
Copy link
Contributor Author

Example prime.c after transpiling ang removed unused var,.... look different instand of README.md

package main

import "unsafe"

import "github.com/elliotchance/c2go/noarch"

var stdin *noarch.File
var stdout *noarch.File
var stderr *noarch.File

func main() {
	__init()
	var n int
	var c int
	noarch.Printf([]byte("Enter a number\n\x00"))
	noarch.Scanf([]byte("%d\x00"), (*[1]int)(unsafe.Pointer(&n))[:])
	noarch.Printf([]byte("The number is: %d\n\x00"), n)
	if n == 2 {
		noarch.Printf([]byte("Prime number.\n\x00"))
	} else {
		for c = 2; c <= n-1; func() int {
			c += 1
			return c
		}() {
			if n%c == 0 {
				break
			}
		}
		if c != n {
			noarch.Printf([]byte("Not prime.\n\x00"))
		} else {
			noarch.Printf([]byte("Prime number.\n\x00"))
		}
	}
	return
}

func __init() {
	stdin = noarch.Stdin
	stdout = noarch.Stdout
	stderr = noarch.Stderr
}

May I change the README.md?

@elliotchance
Copy link
Owner

Yes, please update the README.

@Konstantin8105
Copy link
Contributor Author

We cannot use tool unused, because for Go code:

package main

import "fmt"

type number int

const (
	zero  number = 0
	one          = 1
	two          = 2
	three        = 3
)

func main() {
	for i := int(zero); i < int(three); i++ {
		fmt.printf("%d.\t%#v\n", i, number(i))
	}
}

Tools show:

○ → ../unused main.go 
main.go:9:2: const one is unused (U1000)
main.go:10:2: const two is unused (U1000)

But this is wrong result.

@Konstantin8105
Copy link
Contributor Author

Konstantin8105 commented Oct 20, 2017

Main point of that issue is 'Clean result Go code'.
Now, I understood - tool 'unused' is wrong way.
So, we can choose another way - if we follow by next step:

  • create copy of input C code
  • rename all system headers in copy file like in next experiment
  • transpile like usually.
  • add in output Go code import on Go implementation C system headers

Experiment:
We have a simple C code:
File file.c

#include<stdio.h>
int main(){
        int a = 42;
        printf("We have number : %d", 42);
        return 0;
}

Let's change little bit:
File file.c

#include<stdio_fake.h> // We change name of system header
int main(){
        int a = 42;
        printf("We have number : %d", 42);
        return 0;
}

Create a file stdio_fake.h:

void printf(const char * format, ...){
}

Run a clang like that - clang -E file2.c -I"./" and we have a clean result:

# 1 "file2.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 317 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "file2.c" 2
# 1 "./stdio_fake.h" 1

void printf(const char * format, ...){
}
# 2 "file2.c" 2

int main(){
 int a = 42;
 printf("We have number : %d", 42);
 return 0;
}

One of plus of that solution is system header files are platform indepentend.

Another solution:
in file main.go we have next line:

		cmd := exec.Command("clang", "-E", args.inputFile)

also we know - the one simple think about result:

# 28 "/usr/include/x86_64-linux-gnu/bits/types.h" 2 3 4    <------ IMPORTANT
typedef unsigned char __u_char;
typedef unsigned short int __u_short;
typedef unsigned int __u_int;
# 36 "/usr/include/stdio.h" 2 3 4                             <------ IMPORTANT
struct _IO_FILE;

We know - What file give to us any function,type,struct ,... In according to last result example - it is types.h and stdio.h.
So, we can ignore that entities in transpiling and at the end - we will have a clear Go code.

If we will use any of that solution or some preliminary like that, then may be we also can solve issue #237 .

or we can take a time to think about another solution.

@Konstantin8105
Copy link
Contributor Author

I will prepare the prototype.

@Konstantin8105
Copy link
Contributor Author

Example of prime.c afer changing preproccessor without any hand editing:

package main

import "os"
import "io/ioutil"
import "testing"
import "unsafe"
import "github.com/elliotchance/c2go/noarch"

type __int128_t int64
type __uint128_t uint64
type __builtin_ms_va_list []byte

func main() {
	__init()
	var n int
	var c int
	noarch.Printf([]byte("Enter a number\n\x00"))
	noarch.Scanf([]byte("%d\x00"), (*[1]int)(unsafe.Pointer(&n))[:])
	noarch.Printf([]byte("The number is: %d\n\x00"), n)
	if n == 2 {
		noarch.Printf([]byte("Prime number.\n\x00"))
	} else {
		for c = 2; c <= n-1; func() int {
			c += 1
			return c
		}() {
			if n%c == 0 {
				break
			}
		}
		if c != n {
			noarch.Printf([]byte("Not prime.\n\x00"))
		} else {
			noarch.Printf([]byte("Prime number.\n\x00"))
		}
	}
	return
}
func TestApp(t *testing.T) {
	os.Chdir("../../..")
	ioutil.WriteFile("build/stdin", []byte{'7'}, 0777)
	stdin, _ := os.Open("build/stdin")
	noarch.Stdin = noarch.NewFile(stdin)
	main()
}
func __init() {
}

@Konstantin8105
Copy link
Contributor Author

Algoritm of preprocessor:

  1. Take file pp.c
  2. Separate pp.c file to parts. Example of 1 part:
# 28 "/usr/include/x86_64-linux-gnu/bits/types.h" 2 3 4    <------ HEAD
typedef unsigned char __u_char;
typedef unsigned short int __u_short;
typedef unsigned int __u_int;

In HEAD we see 2 important elements:

  • Number 28 - it is position in source of system header
  • "/.../types.h" - system header
  1. Get list of user files - see STD lib identification wrong #237
  2. Calculate big number, for example MAX(position in source from each HEAD) + amount line in pp.c. Call that number UserPosition.
  3. Change the position in source for user source
  4. Write preprocessor file pp.c with new position in source for user source
  5. Transpilation with one simple point - If element of clang AST tree (Decl,... )have position of source less then UserPosition, then don't transpile that part.

@Konstantin8105
Copy link
Contributor Author

Konstantin8105 commented Oct 25, 2017

Problem

@Konstantin8105
Copy link
Contributor Author

May be another solution:
for example user run c2go transpile file.c
but at the end of work, c2go create a 2 files:
file.go - transpiling user code
system.go - transpiling system C headers code

Need approve for action

@Konstantin8105
Copy link
Contributor Author

@elliotchance I need your approve or comment about last message

@elliotchance
Copy link
Owner

You will not be able to split up user and system code. In simple examples this makes sense but in more complicated examples the same header file can be different when used in different ways.

The safest course of action is to deal with the duplicate logic after the transpile is complete, for example, let say you run:

c2go transpile foo.c bar.c

Will produce foo.go and bar.go. If they both included the same header files (which they probably did) you will see a lot of duplicate code between the files. At this stage you need to identify the functions and types that appear in more than one output file and extract them to a common file.

This solution would take in one or more Go files and produce new files, with an extra common file:

some_command foo.go bar.go

Produces a common.go and new foo.go and bar.go that do not include the elements in common.go.

It's not going to be possible to handle this duplicate code in the preprocessing stage because there are many decisions made during and after the transpiling that affect how the code is generated. It also won't be possible to split files by their include path/name. You should only rely on the input files to product output Go files with the same name.

Fundamentally this is not a difficult task (to extract the duplicates to a common file). There are already tools to parse and traverse the Go code easily (you only need to pay attention to the global types and names of the top level functions) and reliably extract parts of the AST to be written to another file.

I am trying to thing of a scenario where the macros in seperate files will resolve to different Go code. I can't think of any immediate examples but I have a feeling there will be some, and these will be tricky to deal with. That is a challenge for another day.

As of v0.17.0 (thanks to your awesome code) we can support multiple input files that get preprocessed and transpiled into a single output file. This is a great first step. This solution would work the same way except each input file would generate its own output file (much like input C files for clang produce a one-for-one .o file). Then we add on this stage and we have a much more robust way of dealing with multiple files.

@Konstantin8105
Copy link
Contributor Author

Konstantin8105 commented Nov 23, 2017

One more specific of preprocessor design - can easy solve dublicates of system include files for example:
https://github.com/Konstantin8105/c2go/blob/c121213007e93e8baa745e3903ee2b9ab1f207b2/main_test.go#L342-L349
Here we see duplicate of ./tests/multi/case1/four.c file. At the one of step review, we remove that test for minimize testing.
Like I remember, now, we can transpile C code like that without any dublicates in Go code:

#include<stdio.h>
#include<stdio.h> // <--- Dublicate
#include<stdio.h> // <--- Dublicate
int main(){
    prinf("All is OK");
    return 0;
}

One more: after command clang -E we will have one C clang preprocessor file and inside we can see tags (https://github.com/elliotchance/c2go/blob/master/preprocessor/parse_include_preprocessor_line_test.go#L21):

...
# 26 "/usr/include/x86_64-linux-gnu/bits/sys_errlist.h" 3 4
...
# 2 "f.c" 2
...

In according to that - we can easy separate. For that case f.c is user file, so transpile to f.go. And /usr/include/x86_64-linux-gnu/bits/sys_errlist.h to common.go

@elliotchance
Copy link
Owner

This would only work in cases where the headers are guaranteed to be exactly the same, which you can't guarantee or check for. System header files and regular header files need to be treated the same way, there is nothing special about a header file other than its name is common across some platforms.

Here is a concrete example of why the will not work:

errors.h:

void ERROR_FUNC() {
    printf("ERROR!");
}

main.c:

#define ERROR_FUNC error
#include "errors.h"

#undef ERROR_FUNC
#define ERROR_FUNC error2
#include "errors.h"

// We now have two different functions from a header file that is "dynamic".

This may seem like a silly example but it shows how the same header can be included to resolve to different code. You cannot deal with the duplicates at the preprocess stage, it's impossible. No compilers work like this for these reasons.

You must transpile each input C file independently, then deal with the duplicates as a Go AST problem, not as a C/preprocessor problem.

@Konstantin8105
Copy link
Contributor Author

#120

@Konstantin8105
Copy link
Contributor Author

Konstantin8105 commented Dec 19, 2017

Now, idea - cleaning on postprocessor step.

0) At the end of transpiling we have Go code.
1) find all function name in Go code and save in list. For example:"freeMatrix(), freeVactor() ..."
2) print Go code without comments in temp file
3) If name from function list is found more then 1 times, So function are used and rmoved from list
4) Removed unused functions from Go code.
5) Save Go code without unused functions.

@elliotchance , Please comment.

@elliotchance
Copy link
Owner

@Konstantin8105 yes that sounds good.

@Konstantin8105
Copy link
Contributor Author

Now, we can identify location of struct, variable, ... from C source.
So, we can create a ignore list of C header like : time.h and if some struct is from that header - so we ignore they.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants