"TableToArray" method could be improved #130
Replies: 4 comments 3 replies
-
Hi @GHRyunosuke As this doesn't seem to be an objective bug (right?), and GetText for each cell would be likely slower, I'm not sure if it would be a good idea. What do you think? |
Beta Was this translation helpful? Give feedback.
-
If speed is not an issue for your specific case, I'd use a simple For Each loop with FindElements iterating over all the rows and cells, in order to use GetText for each cell. |
Beta Was this translation helpful? Give feedback.
-
I agree with @6DiegoDiego9 - TableToArray was designed that way for speed (>25x faster on my system). What is interesting is that in this particular case, there must be some CSS formatting that GetText honors that puts a line break between the two text parts. I did a quick test to see if there is an easy way to improve but cannot replicate with just a simple table without CSS: Sub test_multi_elements_in_cell()
Dim driver As WebDriver
Dim htmldoc As HTMLDocument
Set driver = New WebDriver
driver.StartChrome
driver.OpenBrowser
'here we place anchor and superscript elements inside a table cell
htmlStr = "<html><body><table border='l' id='mytable'><tr><td><a>Part1</a><sup>Part2</sup></td></tr></table></body></html>"
driver.SaveStringToFile htmlStr, ".\snippet.html"
driver.NavigateToFile ".\snippet.html"
driver.Wait 1000
'load the html into an HTMLDocument object
Set htmldoc = driver.PageToHTMLDoc
'these produce same results with no linebreak
Debug.Print htmldoc.QuerySelector("#mytable > tbody > tr > td").innerText
Debug.Print driver.QuerySelector("#mytable > tbody > tr > td").GetText
'here we place anchor and paragraph elements inside a table cell
htmlStr = "<html><body><table border='l' id='mytable'><tr><td><a>Part1</a><p>Part2</p></td></tr></table></body></html>"
driver.SaveStringToFile htmlStr, ".\snippet.html"
driver.NavigateToFile ".\snippet.html"
driver.Wait 1000
'load the html into an HTMLDocument object
Set htmldoc = driver.PageToHTMLDoc
'these produce same results with a line break between parts
Debug.Print htmldoc.QuerySelector("#mytable > tbody > tr > td").innerText
Debug.Print driver.QuerySelector("#mytable > tbody > tr > td").GetText
driver.CloseBrowser
driver.Shutdown
End Sub |
Beta Was this translation helpful? Give feedback.
-
That's a big table. If looping over row and columns using GetText as @6DiegoDiego9 suggested is too slow for you, then you could make your own custom version of table-to-array, which is much faster than going through the WebDriver: Sub test_stock_table()
Dim driver As WebDriver
Dim htmlDoc As HTMLDocument
Dim tableElem As HTMLTable
Dim row As HTMLTableRow
Dim elemList As IHTMLElementCollection
Dim v() As Variant
Set driver = New WebDriver
driver.StartChrome
driver.OpenBrowser
driver.NavigateTo "https://www.xxxxxxxx.com/xxxxxx/"
driver.Wait
'load the html into an HTMLDocument object
Set htmlDoc = driver.PageToHTMLDoc
'now we can do everything on client side without calling WebDriver
Set tableElem = htmlDoc.QuerySelector(".table-Ngq2xrcG")
'dimension the size of array that we need
ReDim v(1 To tableElem.Rows.Length - 1, 1 To 5)
'skip row 0, which is header
For i = 1 To tableElem.Rows.Length - 1
Set row = tableElem.Rows(i)
'handle symbol and description
Set elemList = row.Cells(0).getElementsByTagName("a")
v(i, 1) = elemList(0).innerText
Set elemList = row.Cells(0).getElementsByTagName("sup")
v(i, 2) = elemList(0).innerText
'handle price
v(i, 3) = row.Cells(1).innerText
'handle percent change
v(i, 4) = row.Cells(2).innerText
'handle volume
v(i, 5) = row.Cells(3).innerText
Debug.Print v(i, 1), v(i, 2), v(i, 3), v(i, 4), v(i, 5)
Next i
driver.CloseBrowser
driver.Shutdown
End Sub |
Beta Was this translation helpful? Give feedback.
-
Hello @6DiegoDiego9 or @GCuser99
When using the ".TableToArray" method, the content of the "td" element becomes
However, if using the ".GetText" method towards the same "td" element you get the following output (i.e. there is a linebreak):
Given the result of the ".GetText", one can easily distinguish the 2 different parts in the "td" element.
So, could the ".TableToArray" method be updated to get the same output in the ".GetText" as shown above?
Beta Was this translation helpful? Give feedback.
All reactions