User 677b9c22ff
10-11-2008 21:32:00
Peter,
I have some general comments for (very) large files, just for discussion.
Most mol viewers and text editors try to read in the whole file and then suffer
from memory errors.
Assuming you have a 50 GByte file with molecules (as in PubChem or generated
with the Markush generator) and you look at a specific set of 10 molecules in
the spreadsheet view or 10x10 in the matrix view at any given time.
That means at any given time only 100 molecules plus
the SDF fields need to be cached. That would be a maximum of
lets say 100 Mbyte memory (one meg for each mol with text).
If you assume that in most cases the files are homogeneous
as in case of SMILES or with constant SDF fields, there is no reason
to read in or count the whole file in the first place and there is also
no reason to start caching all the molecule views.
This also assumes a fast harddisk RAID array or SSDs or RamDisk
for real-time scrolling.
1) The program reads the file length of the SDF file.
2) The program determines the size of the right scrollbar.
3) If I have a 100 Gbyte molecule file and I move the scrollbar
to 50% it will move the filepointer to 50% of the filesize (50 Gbyte).
4) If I move to the bottom it will move to the end of the file.
5) If I use page-up and page-down it will exactly read in the number of molecules
in the matrix (positive or negative) by determining a overlap molecule and start from there.
6) Possible importerrors (filepointer is at half of the molecule) are cought
by an exception handler. For SMILES this would be EOF and for SDF $$$$ or M END.
In this way no real ID numbers or molecule numbers are allowed
but the viewer could be very fast. Currently it reads in all molecules
and then increases and counts the molecules which makes it very slow
for large molecule sets. I am currently looking into the API examples,
but there was no direct way given how the molecules are read into the Viewer
(with molimporter I guess). The examples (SimpleViewer.java took the molecules from SMILES).
Furthermore the implementation here would be very static and not as flexible as
Mview but faster.
The reason here not to load it into a DB is that it would save
the time of fingerprint generation and would really just serve
as a molecule viewer.
Cheers
Tobias
I have some general comments for (very) large files, just for discussion.
Most mol viewers and text editors try to read in the whole file and then suffer
from memory errors.
Assuming you have a 50 GByte file with molecules (as in PubChem or generated
with the Markush generator) and you look at a specific set of 10 molecules in
the spreadsheet view or 10x10 in the matrix view at any given time.
That means at any given time only 100 molecules plus
the SDF fields need to be cached. That would be a maximum of
lets say 100 Mbyte memory (one meg for each mol with text).
If you assume that in most cases the files are homogeneous
as in case of SMILES or with constant SDF fields, there is no reason
to read in or count the whole file in the first place and there is also
no reason to start caching all the molecule views.
This also assumes a fast harddisk RAID array or SSDs or RamDisk
for real-time scrolling.
1) The program reads the file length of the SDF file.
2) The program determines the size of the right scrollbar.
3) If I have a 100 Gbyte molecule file and I move the scrollbar
to 50% it will move the filepointer to 50% of the filesize (50 Gbyte).
4) If I move to the bottom it will move to the end of the file.
5) If I use page-up and page-down it will exactly read in the number of molecules
in the matrix (positive or negative) by determining a overlap molecule and start from there.
6) Possible importerrors (filepointer is at half of the molecule) are cought
by an exception handler. For SMILES this would be EOF and for SDF $$$$ or M END.
In this way no real ID numbers or molecule numbers are allowed
but the viewer could be very fast. Currently it reads in all molecules
and then increases and counts the molecules which makes it very slow
for large molecule sets. I am currently looking into the API examples,
but there was no direct way given how the molecules are read into the Viewer
(with molimporter I guess). The examples (SimpleViewer.java took the molecules from SMILES).
Furthermore the implementation here would be very static and not as flexible as
Mview but faster.
The reason here not to load it into a DB is that it would save
the time of fingerprint generation and would really just serve
as a molecule viewer.
Cheers
Tobias