139 – GDC writes all-zero initialisers in the rodata section

Bug creation and email sending has been disabled, file new bugs at gcc.gnu.org/bugzilla

Bug 139 - GDC writes all-zero initialisers in the rodata section

Summary: GDC writes all-zero initialisers in the rodata section

Status:	NEW

Alias:	None

Product:	GDC
Classification:	Unclassified
Component:	gdc (show other bugs)
Version:	development
Hardware:	All All

Importance:	--- enhancement
Assignee:	Iain Buclaw

URL:

Depends on:
Blocks:

Reported:	2014-07-12 14:58 CEST by safety0ff
Modified:	2017-03-02 22:01 CET (History)
CC List:	2 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description safety0ff 2014-07-12 14:58:23 CEST

This causes 64 bytes of zero data to be needlessly written to object file in the following test:
struct Buffer { ubyte[64 * 1024] buffer; }
$ gdc test.d -c && du test.o
68        test.o    #  Instead of expected 4

Comment 1 Iain Buclaw 2014-07-12 15:07:50 CEST

FYI, the backend does *almost* the right thing.


        .globl  _D4test6Buffer6__initZ
        .section        .rodata
        .align 64
        .type   _D4test6Buffer6__initZ, @object
        .size   _D4test6Buffer6__initZ, 65536
_D4test6Buffer6__initZ:
        .zero   65536


It is able to correctly recognise that the initialiser is all zeros, and reduces it to .zero 65536.  Unfortunately, because there *is* an initialiser in the first place means that it puts it in .section .rodata, instead of optimising for size and putting it in .bss

What we should be doing is checking if initializer_zerop, then undoing our work.

This has a run-time cost (it would be nice to have a test for all zeros before building and discarding trees) but for the size reduction, it would be worth it.

Comment 2 Iain Buclaw 2014-07-12 15:55:39 CEST

https://github.com/D-Programming-GDC/GDC/commit/0fcf8babc0d0af85a9a04aaa23b5856237fbdb9f

Comment 3 Iain Buclaw 2014-07-13 08:27:47 CEST

Before:
52976	libgphobos2.a
5980	libgdruntime.a

After:
52824	libgphobos2.a
5976	libgdruntime.a

I guess this means that phobos doesn't have many 0-inited symbols. ;)

Comment 4 Johannes Pfau 2014-08-16 08:52:38 CEST

Timo Sintonen noted that this is actually contra-productive at least on embedded systems. .bss is read-write storage whereas .rodata can be in readonly storage.
http://forum.dlang.org/post/nadodelkzuwtrnquoove@forum.dlang.org

I wonder whether the reason you don't see a difference in phobos is a string pooling optimization: I'd expect a clever linker to combine all .zero blocks in .rodata into one block with the size of the largest single block, then use 'slices' to that block.

Comment 5 safety0ff 2014-08-17 04:24:40 CEST

Context of this report: it was filed because GDC was failing a dmd test (I can't remember which now.)

Comment 6 Iain Buclaw 2014-08-20 14:35:02 CEST

(In reply to safety0ff from comment #5)
> Context of this report: it was filed because GDC was failing a dmd test (I
> can't remember which now.)

It wasn't failing the test per say.  The test in DMD requires a post-script be ran, something that does not happen for GDC testsuite.

Comment 7 Johannes Pfau 2014-11-20 18:15:30 CET

I'm doing some experiments with D on microcontrollers lately (AVR 8 bit hello-world(blinking LED) is working: https://github.com/jpf91/GDC/tree/microD ) and I came across this again.

This bugfix leads to the strange situation that zero initializers are a performance penalty on these systems as RW memory is scarce, but intilizers with one member not set to zero are put into .rodata and are therefore a better option.

GCC puts all zero initialzed objects into rodata as well:
------------------------------------------------------------
struct Test
{
    int a;
    int b;
};

const struct Test tb = {0,0};
------------------------------------------------------------
	.globl	tb
	.section	.rodata
	.align 4
	.type	tb, @object
	.size	tb, 8
tb:
	.zero	8
------------------------------------------------------------


So are there any objections against reverting this commit?

Comment 8 Iain Buclaw 2014-11-21 17:10:47 CET

I have no problems, but maybe we should make a switch for those who want smaller binaries over speed?

Comment 9 Johannes Pfau 2014-11-23 10:49:17 CET

Sure. Should the default still be rodata? Rodata is also used for normal immutable x = Struct(0,0,0) style variables, so that make make sense.

Comment 10 Johannes Pfau 2014-11-26 18:56:39 CET

Reverting this change exposes a test failure in phobos. However every seemingly unrelated change hides the error. gcc-4.9 works fine. So I wonder whether this is actually a bug in the GCC-5 snapshot?

Here's the reduced test case:
----------------
import core.stdc.string;

void test()
{
    struct S { @disable this();}
    S s = void;
    emplaceInitializer(&s);
}

T* emplaceInitializer(T)(T* chunk)
{
    static immutable init = T.init;
    memcpy(chunk, &init, T.sizeof);
    return chunk;
}
----------------
gdc conv.d -c

(sorry, forgot to change language)
----------------
conv.d: In Funktion »emplaceInitializer«:
conv.d:11: Fehler: nicht-triviale Umwandlung bei Zuweisung
ulong
void *
MEM[(unsigned char * {ref-all})chunk] = D.2533;
conv.d:11: interner Compiler-Fehler: verify_gimple gescheitert
0xb2dc8f verify_gimple_in_seq(gimple_statement_base*)
	../../gcc-5-20140831/gcc/tree-cfg.c:4672
0x930759 gimplify_body(tree_node*, bool)
	../../gcc-5-20140831/gcc/gimplify.c:8847
0x930b16 gimplify_function_tree(tree_node*)
	../../gcc-5-20140831/gcc/gimplify.c:8932
0x7a94b7 cgraph_node::analyze()
	../../gcc-5-20140831/gcc/cgraphunit.c:612
0x7abdad analyze_functions
	../../gcc-5-20140831/gcc/cgraphunit.c:988
0x7ac515 symbol_table::finalize_compilation_unit()
	../../gcc-5-20140831/gcc/cgraphunit.c:2277
0x6f487e d_finish_compilation(tree_node**, int)
	../../gcc-5-20140831/gcc/d/d-objfile.cc:1947
----------------

----------------
{
  void * D.2533;
  struct S * D.2534;

  D.2533 = {};
  MEM[(unsigned char * {ref-all})chunk] = D.2533;
  D.2534 = chunk;
  return D.2534;
}
----------------

Comment 11 Iain Buclaw 2017-02-18 10:29:21 CET

Is this still problematic?  I ask because now the old data generation pass is gone completely.

Comment 12 Timo Sintonen 2017-02-19 08:00:29 CET

(In reply to Iain Buclaw from comment #11)
> Is this still problematic?  I ask because now the old data generation pass
> is gone completely.

This bug is originally about failing tests, but while in topic I want to add my point of wiew.

Anything in .rodata goes into the loadable file and into the code memory (rom in microcontrollers)
Anything in .data goes into the file and into the code memory where it is copied into the data memory (ram in microcontrollers)
Anything in .bss tekes space only in data memory. (except the gold linker may put it into the file while ld does not)

This means anything in .data consumes more resources than anything in other segments. Then there are different goals: in desktop programs it may be desirable to have a smaller file while in microcontrollers it is important to minimize the usage of ram.
It seems that in typical controllers the rom/ram ratio is good for applications written in C but appications written in D require more ram. This means I would like to move as much as possible to the .rodata section. there are other data too, like classinfo.

Maybe there should be a switch that selects between desktop mode and microcontroller mode. This switch might also remove all of typeinfo or other unnecessary things.

Comment 13 Iain Buclaw 2017-03-02 22:01:03 CET

What if all static vars were to be kept in .bss then?  Does that come with the same cost?

If no, then this is probably a valid argument for implementing bug 246 then.