-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Newline in fixed length COBOL source files #59
Comments
PROBLEM 1 : |
PROBLEM 2 : |
FIX ?
I implemented and tested this fix with success on our sample files. |
… I don't know if it is a good idea to merge this fix into master
Dropping support of UNIX-style files without correcting the bug would indeed be disappointing. As I understand the problem,
|
RDZ have the same problem an can't interpret correctly such a file. But ... So instead of looking for line endings chars, I think it's better to parse 80 chars and consider this as a whole line. Of course this should be one behavior of the parser and must be configurable (use fixed line length or use line endings char). |
Yup, but this 80 chars limit has no sense in free format, and the sexay thing is |
Maybe one solution is to have 2 implementations of ITextDocument:
|
Our friend Regis is right here : the idea was to restrict the knowledge of the text storage format (encoding and line endings) to the File namespace, ie for now the CobolFile class. The CobolFile implementation noramlizes the input as a Stream of Unicode chars with \r, \n, or \r\n line endings. The later phases of the compiler, notably the Text namespace / TextDocument class don't need to worry about the storage format anymore.
Document clearly two restrictions of our compiler :
NB : when we say we do not support these two cases, it will only have an impact if we generate Cobol from a TypeCobol program and then compile it with the IBM compiler. For Cobol code analysis in memory, it has no impact.
|
NB : this new fix won't resolve the problems found on our sample files in ASCII format, because is corrects the EBCDIC to Unicode conversion process, which has already been executed before in this case. |
Sorry, I can not push the new commit today, because it appears that I can't reach the Github server while using the VPN -> I will push it Monday |
…sue #59 : •restore support for single \r and \n characters as line endings in TextDocument (revert to the previous version of the file) •update CobolFile class : when reading a fixed length line, if we encounter a line ending character after Unicode conversion of an original EBCDIC character, replace it on the fly with a question mark '?' char Document clearly two restrictions of our compiler : •because of the internal conversion of the program text to Unicode characters in .Net or Java, we do not support alphanumeric literals containing non printable EBCDIC characters •because of the feature allowing free text format and variable line length, we do not support alphanumeric literals containing line ending characters NB : when we say we do not support these two cases, it will only have an impact if we generate Cobol from a TypeCobol program and then compile it with the IBM compiler. For Cobol code analysis in memory, it has no impact. In the two cases above, the solution is to modifiy the original EBCDIC program text before using our tool : •initialize numeric tables directly with numbers instead of their corresponding chars •set line ending chars individually Inside alphanumeric literals, for exemple with reference modification
@prudholu largely solved the problem, and I added the identified restrictions to the appropriate wiki page. |
Seen in a production source file : some lines contain newline characters (although normally these cannot exist in such sources, it seems this one was manually/hexadecimally edited).
This ruins the indexing, and results in
missing PeriodSeparator
errors on the first "half" of the line (the second "half" being after the newline character).The text was updated successfully, but these errors were encountered: