I am starting this new post to write about my experiences at the internship at LexisNexis.
Week 1:
This week was spent more on building the platform for the main work. Following were the main completed tasks:
- Cloned HPCC system on local machine, built it and intalled it. Ran regression and compiler tests on it.
- Signed up for jira, github and gitter. Forked repo on github.
- Went through documentation of github to be able to do the essential works. Also learnt how to clean up commits.
- Sent pull request for the six examples to the master
- Started exploring the main issue. Ran gsoc5 and observed the filter operation performed twice. Going through the source code: ecl/hqlcpp/hqlinline.cpp and ecl/hqlcpp/hqlhtcpp as mentioned in the email.
Week 2:
Week 3:
I went closely through the generated xml and c++ code for gsoc6.ecl. The xml code looked pretty self-explanatory. Here are the two observation I made while going through it:
Week 4:
Week 5:
Week 6:
https://github.com/hpcc-systems/HPCC-Platform/compare/master…aranjan1002:childquery-2.1
Week 7:
- Using the option optimizeInlineOperations and not using it does not make a lot of difference in the generated C++ and XML code for gsoc5min.ecl
- None of the gsoc queries run with the new option. While gsoc5 gives the following error, the others go into infinite recursion leading to segmentation fault:
- gsoc5.ecl(53,8): error C9999: Internal Error at /home/aranjan/HPCC/HPCC-Platform/ecl/hql/hqlattr.cpp(3658
- I made some change to the code which can be seen here: https://github.com/aranjan1002/HPCC-Platform/compare/childquery-2.1…aranjan1002:childquery-2.1.1. I see that no splitter is generated for gsoc5 or gsoc6 which confused me. (It was later on resolved that adding the case of no_createrow in mustAssignInline function resolves the issue).
- I created some more changes in this new branch: https://github.com/aranjan1002/HPCC-Platform/compare/childquery-2.1…aranjan1002:childquery-2.2
- I see that splitters are created in this case with the option minimalOperationsInline and with the option optimizeInlineOperations infinite recursion occurs
In general the aim of the experimentation was to figure out how to make split operations inline. Since no splitter was generated, it had to be investigated. I made some changes to make the splitter inline and can be found here:
https://github.com/aranjan1002/HPCC-Platform/compare/childquery-2.1…aranjan1002:childquery-2.3
The generated c++ file seems to do the job of splitter with this change. To test it I tried to run the ecl file but
I was getting this error
aranjan@aranjan-GX776AA-ABA-a6342p:~/HPCC/ECLQueries$ ecl run gsoc5.ecl -t=thor
Program was terminated by signal 11
Error creating archive
I rebuilt and reinstalled the system. But I am getting the same error.
Week 8:
The first thing that I did was to make subgraphs inline selectively based upon the kind of activities that they have. Specifically, a subgraph is inlined only if all of its activities can be inlined. The code can be seen here:
https://github.com/hpcc-systems/HPCC-Platform/compare/master…aranjan1002:childquery-2.4
Doing this caused the following error to occur in gsoc6:
Graph[13], workunitwrite[24]: MP link closed. Master exception : Error aborting job, will cause thor restart
But it worked fine for roxie and showed a different error for hthor. The source of error was identified by comparing xmls of thor and roxie. It was missing an attribute in the child graph of generated xml. Adding that removed the error and it also gave the correct output. But it was working for gsoc1 to 4 and gave this error for each:
error C4821: INTERNAL: Graph context not found
Gsoc1 to 4 are different because they have consecutive child graphs and the result of one is the source for the other. So, it had to be figured out that how can the result be passed if one of them is inlined. It was observed that the code threw the error when it was trying to inline getGraphResult activity.
After quite a bit of debugging and going through the flow of control I came to the conclusion that the error caused because an IHqlExpression was missing this IAtom: externalAtom. With quite a few hacks here and there, I was able to make it compile for gsoc1 and the code could be seen here:
https://github.com/hpcc-systems/HPCC-Platform/compare/master…aranjan1002:childquery-2.6.3
In essence the changes were to save the getgraphresult expression with attribute externalAtom in a global variable: graphResult. Then, use this variable in the function buildGetLocalResult instead of the passed parameter. Also, instead of inlining the second child graph in the function optimizeInlineGraph, it is inlined at the end of the function generateGraph. I did that because I wanted to do the inline after the first child graph has been properly generated.
The next step is to make it compile for gsoc 1 to 6 properly and make it run at least for gsoc 5 and 6 (as it was doing before).
Week 9:
https://github.com/aranjan1002/code4life/tree/master/HPCC
Here are some details about the files:
Diary.md – Covers all the work from start to beginning.
Experiences.md – A summary of my experience and recommendations.
CodeGenerator.md – Some questions I have regarding the code generator
DocumentationSuggestions.md – Some suggestions about how to improve the documentations for the system
The remaining files are referred in the four docs above.
I plan to continue working on the project in my free time and hopefully finish it before I graduate in December.