This causes 64 bytes of zero data to be needlessly written to object file in the following test: struct Buffer { ubyte[64 * 1024] buffer; } $ gdc test.d -c && du test.o 68 test.o # Instead of expected 4
FYI, the backend does *almost* the right thing. .globl _D4test6Buffer6__initZ .section .rodata .align 64 .type _D4test6Buffer6__initZ, @object .size _D4test6Buffer6__initZ, 65536 _D4test6Buffer6__initZ: .zero 65536 It is able to correctly recognise that the initialiser is all zeros, and reduces it to .zero 65536. Unfortunately, because there *is* an initialiser in the first place means that it puts it in .section .rodata, instead of optimising for size and putting it in .bss What we should be doing is checking if initializer_zerop, then undoing our work. This has a run-time cost (it would be nice to have a test for all zeros before building and discarding trees) but for the size reduction, it would be worth it.
https://github.com/D-Programming-GDC/GDC/commit/0fcf8babc0d0af85a9a04aaa23b5856237fbdb9f
Before: 52976 libgphobos2.a 5980 libgdruntime.a After: 52824 libgphobos2.a 5976 libgdruntime.a I guess this means that phobos doesn't have many 0-inited symbols. ;)
Timo Sintonen noted that this is actually contra-productive at least on embedded systems. .bss is read-write storage whereas .rodata can be in readonly storage. http://forum.dlang.org/post/nadodelkzuwtrnquoove@forum.dlang.org I wonder whether the reason you don't see a difference in phobos is a string pooling optimization: I'd expect a clever linker to combine all .zero blocks in .rodata into one block with the size of the largest single block, then use 'slices' to that block.
Context of this report: it was filed because GDC was failing a dmd test (I can't remember which now.)
(In reply to safety0ff from comment #5) > Context of this report: it was filed because GDC was failing a dmd test (I > can't remember which now.) It wasn't failing the test per say. The test in DMD requires a post-script be ran, something that does not happen for GDC testsuite.
I'm doing some experiments with D on microcontrollers lately (AVR 8 bit hello-world(blinking LED) is working: https://github.com/jpf91/GDC/tree/microD ) and I came across this again. This bugfix leads to the strange situation that zero initializers are a performance penalty on these systems as RW memory is scarce, but intilizers with one member not set to zero are put into .rodata and are therefore a better option. GCC puts all zero initialzed objects into rodata as well: ------------------------------------------------------------ struct Test { int a; int b; }; const struct Test tb = {0,0}; ------------------------------------------------------------ .globl tb .section .rodata .align 4 .type tb, @object .size tb, 8 tb: .zero 8 ------------------------------------------------------------ So are there any objections against reverting this commit?
I have no problems, but maybe we should make a switch for those who want smaller binaries over speed?
Sure. Should the default still be rodata? Rodata is also used for normal immutable x = Struct(0,0,0) style variables, so that make make sense.
Reverting this change exposes a test failure in phobos. However every seemingly unrelated change hides the error. gcc-4.9 works fine. So I wonder whether this is actually a bug in the GCC-5 snapshot? Here's the reduced test case: ---------------- import core.stdc.string; void test() { struct S { @disable this();} S s = void; emplaceInitializer(&s); } T* emplaceInitializer(T)(T* chunk) { static immutable init = T.init; memcpy(chunk, &init, T.sizeof); return chunk; } ---------------- gdc conv.d -c (sorry, forgot to change language) ---------------- conv.d: In Funktion »emplaceInitializer«: conv.d:11: Fehler: nicht-triviale Umwandlung bei Zuweisung ulong void * MEM[(unsigned char * {ref-all})chunk] = D.2533; conv.d:11: interner Compiler-Fehler: verify_gimple gescheitert 0xb2dc8f verify_gimple_in_seq(gimple_statement_base*) ../../gcc-5-20140831/gcc/tree-cfg.c:4672 0x930759 gimplify_body(tree_node*, bool) ../../gcc-5-20140831/gcc/gimplify.c:8847 0x930b16 gimplify_function_tree(tree_node*) ../../gcc-5-20140831/gcc/gimplify.c:8932 0x7a94b7 cgraph_node::analyze() ../../gcc-5-20140831/gcc/cgraphunit.c:612 0x7abdad analyze_functions ../../gcc-5-20140831/gcc/cgraphunit.c:988 0x7ac515 symbol_table::finalize_compilation_unit() ../../gcc-5-20140831/gcc/cgraphunit.c:2277 0x6f487e d_finish_compilation(tree_node**, int) ../../gcc-5-20140831/gcc/d/d-objfile.cc:1947 ---------------- ---------------- { void * D.2533; struct S * D.2534; D.2533 = {}; MEM[(unsigned char * {ref-all})chunk] = D.2533; D.2534 = chunk; return D.2534; } ----------------
Is this still problematic? I ask because now the old data generation pass is gone completely.
(In reply to Iain Buclaw from comment #11) > Is this still problematic? I ask because now the old data generation pass > is gone completely. This bug is originally about failing tests, but while in topic I want to add my point of wiew. Anything in .rodata goes into the loadable file and into the code memory (rom in microcontrollers) Anything in .data goes into the file and into the code memory where it is copied into the data memory (ram in microcontrollers) Anything in .bss tekes space only in data memory. (except the gold linker may put it into the file while ld does not) This means anything in .data consumes more resources than anything in other segments. Then there are different goals: in desktop programs it may be desirable to have a smaller file while in microcontrollers it is important to minimize the usage of ram. It seems that in typical controllers the rom/ram ratio is good for applications written in C but appications written in D require more ram. This means I would like to move as much as possible to the .rodata section. there are other data too, like classinfo. Maybe there should be a switch that selects between desktop mode and microcontroller mode. This switch might also remove all of typeinfo or other unnecessary things.
What if all static vars were to be kept in .bss then? Does that come with the same cost? If no, then this is probably a valid argument for implementing bug 246 then.