Yonke le mizekelo ingasentla imele ingxaki yokufumana eyona ndlela imfutshane phakathi kwee-vertex ezimbini kwigrafu (indlela okanye indlela phakathi kweendawo ezimbini, inani lezenzo okanye ubunzima bokufumana iphepha ukusuka kwindawo enye okanye kwenye). Olu didi lweengxaki zendlela ezimfutshane lubizwa ngokuba yi-SSSP (Umthombo omnye oMfutshane weNdlela) kunye nesiseko se-algorithm yokusombulula ezi ngxaki yi , ene- O(n^2)
yobunzima bokubala.
Kodwa, ngamanye amaxesha kufuneka sifumane zonke iindlela ezimfutshane phakathi kwazo zonke ii-vertex. Qwalasela lo mzekelo ulandelayo: wenzela imephu yakho iintshukumo rhoqo phakathi kwekhaya , umsebenzi kunye nethiyetha . Kulo mzekelo uya kuphelela ngeendlela ezi-6: work ⭢ home
, home ⭢ work
, work ⭢ theatre
, theatre ⭢ work
, home ⭢ theatre
kunye theatre ⭢ home
(iindlela ezibuyela umva zinokwahluka ngenxa yeendlela zendlela enye umzekelo) .
A(k, n) = n! / (n - m)! // where n - is a number of elements, // k - is a number of elements in permutation (in our case k = 2)
Okusinika iindlela ezili-12 zeendawo ezi-4 kunye neendlela ezingama-90 zeendawo ezili-10 (ezichukumisayo). Qaphela… oku ngaphandle kokuthathela ingqalelo iindawo eziphakathi phakathi kweendawo okt, ukusuka ekhaya ukuya emsebenzini kufuneka uwele izitalato ezi-4, uhambe ngomlambo kwaye uwele ibhulorho. Ukuba siyaqikelela, ezinye iindlela zinokuba neendawo eziqhelekileyo eziphakathi… kuhle ... ngenxa yoko siya kuba negrafu enkulu kakhulu, eneentloko ezininzi, apho i-vertex nganye iyakumela nokuba yindawo okanye indawo ephakathi ebalulekileyo. Udidi lweengxaki, apho kufuneka sifumane zonke iindlela ezimfutshane phakathi kwazo zonke izibini ze-vertexes kwigrafu, ibizwa ngokuba yi -APSP (Zonke Izibini Ezifutshane) kunye nesiseko se-algorithm yokusombulula ezi ngxaki yi , ene- O(n^3)
ukuntsokotha kokubala.
I-algorithm ye-Floyd-Warshall ifumana zonke iindlela ezimfutshane phakathi kweperi nganye yeevertex kwigrafu. I-algorithms yapapashwa nguRobert Floyd kwi- [1] (jonga icandelo elithi "References" ukuze ufumane iinkcukacha ezingaphezulu). Kwangalo nyaka, uPeter Ingerman kwi- [2] uchaze ukuphunyezwa kwe-algorithm yangoku ngohlobo lwezintlu ezintathu ezibekwe for
:
algorithm FloydWarshall(W) do for k = 0 to N - 1 do for i = 0 to N - 1 do for j = 0 to N - 1 do W[i, j] = min(W[i, j], W[i, k] + W[k, j]) end for end for end for end algorithm // where W - is a weight matrix of N x N size, // min() - is a function which returns lesser of it's arguments
Ukuba awuzange ube notshintsho ekusebenzeni ngegrafu emelwe ngohlobo lwe-matrix ngoko kunokuba nzima ukuqonda ukuba i-algorithm ingentla yenza ntoni. Ke, ukuqinisekisa ukuba sikwiphepha elinye, makhe sijonge ukuba igrafu inokumelwa njani ngohlobo lwe-matrix kwaye kutheni umelo olunjalo luluncedo ukusombulula eyona ngxaki imfutshane yendlela.
Umfanekiso ongezantsi ubonisa igrafu bee-vertexes ezi-5. Ngasekhohlo, igrafu iboniswa ngendlela ebonwayo, eyenziwe ngezangqa kunye neentolo, apho isangqa ngasinye simele i-vertex kwaye utolo lumele isiphelo esinesalathiso. Inani elingaphakathi kwesangqa lihambelana nenombolo yevertex kunye nenani elingentla komphetho liyahambelana nobunzima bomphetho. Ngasekunene, igrafu efanayo inikezelwa ngendlela ye-matrix yobunzima. I-matrix yobunzima luhlobo lwe apho iseli nganye ye-matrix iqulethe "ubunzima" - umgama phakathi kwe-vertex i
(umqolo) kunye ne-vertex j
(ikholamu). I-matrix yobunzima ayibandakanyi ulwazi malunga “nendlela” phakathi kwee-vertex (uluhlu lwee-vertex ofumana ngalo ukusuka ku i
ukuya ku j
) – nje ubunzima (umgama) phakathi kwezi vesi.
Kwi-matrix yobunzima sinokubona ukuba ixabiso leseli lilingana nobunzima phakathi kwee-vertex . Yiyo loo nto, ukuba sihlola iindlela ukusuka kwi-vertex 0
(umqolo we 0
), siya kubona ukuba ... kukho indlela enye kuphela - ukusuka ku 0
ukuya ku 1
. Kodwa, kumboniso obonakalayo sinokubona ngokucacileyo iindlela ukusuka kwi-vertex 0
ukuya kwi-vertexes yesi 2
kunye ne 3
(nge-vertex 1
). Isizathu salokhu silula - kwimeko yokuqala, i-matrix yobunzima iqulethe umgama phakathi kwee-vertex ezikufutshane kuphela. Nangona kunjalo, olu lwazi lulodwa lwanele ukufumana ukuphumla.
Makhe sibone ukuba isebenza njani. Nika ingqalelo kwiseli W[0, 1]
. Ixabiso libonisa, kukho indlela esuka kwivertex 0
ukuya kwivertex 1
enobunzima obulingana no 1
(ngokufutshane singabhala njenge: 0 ⭢ 1: 1
). Ngolu lwazi, ngoku sinokuskena zonke iiseli zomqolo woku 1
(oqulethe zonke izisindo zazo zonke iindlela ukusuka kwi-vertex 1
) kunye ne-port yangasemva le ngcaciso kumqolo 0
, ukwandisa ubunzima ngo 1
(ixabiso le- W[0, 1]
).
Ngokusebenzisa amanyathelo afanayo sinokufumana iindlela ukusuka kwi-vertex 0
ukuya kwezinye ii-vertex. Ngexesha lokukhangela, kunokwenzeka ukuba kukho iindlela ezingaphezu kwenye ezikhokelela kwi-vertex enye kwaye yintoni ebaluleke kakhulu ubunzima bezi ndlela zinokwahluka. Umzekelo wemeko enjalo ubonisiwe kumfanekiso ongezantsi, apho ukukhangela ukusuka kwi-vertex 0
ukuya kwi-vertex 2
kutyhile enye indlela eya kwi-vertex yesi 3
yobunzima obuncinane.
Sineendlela ezimbini: indlela yoqobo – 0 ⭢ 3: 4
kunye nendlela entsha esisandula ukuyifumanisa – 0 ⭢ 2 ⭢ 3: 3
(khumbula, ubunzima bematrix ayiqulathanga iindlela, ngoko asazi ukuba yeyiphi. iivertex zibandakanyiwe kwindlela yokuqala). Eli lixesha xa sithatha isigqibo sokugcina eyona imfutshane kwaye sibhale 3
kwiseli W[0, 3]
.
algorithm FloydWarshall(W) do for k = 0 to N - 1 do for i = 0 to N - 1 do for j = 0 to N - 1 do W[i, j] = min(W[i, j], W[i, k] + W[k, j]) end for end for end for end algorithm
Umjikelo ongaphandle for
- k
iiterates phezu kwazo zonke iivertex kwigrafu kwaye kuphindaphindo ngalunye, ulwahlulo u k
umele ivertex esizingela iindlela kuyo . Umjikelo wangaphakathi for
i
iphinda iphindaphinde ngaphezu kwazo zonke ii-vertex kwigrafu kwaye kwi-iteration nganye, i
i-vertex esikhangela iindlela ukusuka kuyo . Kwaye okokugqibela umjikelo ongaphakathi for
j
iterates phezu kwazo zonke iivertex kwigrafu nakuphinda-phindo ngalunye, j
imele ivertex esikhangela indlela eya kuyo. Xa kudityanisiwe isinika oku kulandelayo: kwi-iteration nganye k
sikhangela umendo kuzo zonke iivertex i
ukuya kuzo zonke iivertex j
ukugqitha kwivertex k
. Ngaphakathi kumjikelo sithelekisa indlela i ⭢ j
(emelwe ngu W[i, j]
) kunye nendlela i ⭢ k ⭢ j
(emelwe sisimbuku se- W[I, k]
kunye ne W[k, j]
) kunye nokubhala eyona imfutshane enye ibuyele kwi W[i, j]
.
Ikhowudi yomthombo kunye neegrafu zokulinga ziyafumaneka kwi-GitHub. Iigrafu zovavanyo zinokufunyanwa kuvimba Data/Sparse-Graphs.zip
. Zonke iibenchmarks kwesi sithuba ziphunyezwe kwifayile ye .
NO_EDGE = (int.MaxValue / 2) - 1
.
Malunga ne-#1. Xa sithetha ngee-matrixes sichaza iiseli ngokwe “miqolo” kunye “nezikholamu”. Ngenxa yoko kubonakala kungokwemvelo ukucinga i-matrix ngendlela "yesikwere" okanye "uxande" (Umfanekiso 4a).
i = row * row_size + col; // where row - cell row index, // col - cell column index, // row_size - number of cells in a row.
Uluhlu lomgca wobunzima be-matrix ngumqathango wangaphambili wokuphunyezwa okusebenzayo kwe-algorithm ye-Floyd-Warshall. Isizathu soko asilula kwaye ingcaciso eneenkcukacha ifanelwe yiposti eyahlukileyo… okanye izithuba ezimbalwa. Nangona kunjalo, okwangoku, kubalulekile ukukhankanya ukuba ukumelwa okunjalo kuphucula kakhulu , eneneni enempembelelo enkulu ekusebenzeni kwe-algorithm.
- Inqaku lombhali
Ngokuphathelele #2. Ukuba ujonga ngokusondeleyo kwi-algorithm yepseudocode, awuzukufumana naziphi iitshekhi eziyelelene nobukho bendlela phakathi kweeverteksi ezimbini. Endaweni yoko, pseudocode sebenzisa nje min()
umsebenzi. Isizathu silula - ekuqaleni, ukuba akukho mendo phakathi kweeverteksi, ixabiso leseli limiselwe kwi infinity
kwaye kuzo zonke iilwimi, ngaphandle kokuba iJavaScript, onke amaxabiso angaphantsi kune infinity
. Kwimeko yeenombolo ezipheleleyo kunokuhenda ukusebenzisa int.MaxValue
njengexabiso elithi "akukho ndlela". Nangona kunjalo oku kuyakukhokelela ekuphuphumeni okupheleleyo kwiimeko xa amaxabiso azo zombini i ⭢ k
no k ⭢ j
iindlela ziyakulingana ne int.MaxValue
. Yiyo loo nto sisebenzisa ixabiso elingaphantsi kwesiqingatha se- int.MaxValue
.
- Umfundi ocingayo
Inokwenzeka ngokwenene kodwa ngelishwa iya kukhokelela kwisohlwayo esibalulekileyo sokusebenza. Ngokufutshane, i-CPU igcina iinkcukacha-manani zeziphumo zovavanyo lwesebe. xa if
zengxelo zivavanya ukuba true
okanye false
. Isebenzisa olu balo-manani ukwenza ikhowudi "yesebe eliqikelelweyo ngokweenkcukacha-manani" ngaphambili phambi kokuba if
ivavanywe (oku kubizwa ngokuba ) kwaye ke ngoko iphumeze ikhowudi ngokufanelekileyo ngakumbi. Nangona kunjalo, xa ukubikezelwa kwe-CPU kungachanekanga, kubangela ilahleko enkulu yokusebenza xa kuthelekiswa nokuqikelelwa okuchanekileyo kunye nokuphunyezwa okungenamiqathango kuba i-CPU kufuneka ime kwaye ibale isebe elichanekileyo.
Kuba kwi-iteration k
nganye sihlaziya inxalenye ebalulekileyo ye-matrix yobunzima, izibalo ze-CPU zesebe ziba lilize kuba akukho pateni yekhowudi, onke amasebe asekelwe ngokukodwa kwidatha. Ngoko ke oko kuhlolwa kuya kukhokelela kwisixa esibalulekileyo .
Uhh, kujongeka ngathi sigqibile ngethiyori inxalenye - masiphumeze i-algorithm (sichaza oku kuphunyezwa Baseline
):
public void Baseline(int[] matrix, int sz) { for (var k = 0; k < sz; ++k) { for (var i = 0; i < sz; ++i) { for (var j = 0; j < sz; ++j) { var distance = matrix[i * sz + k] + matrix[k * sz + j]; if (matrix[i * sz + j] > distance) { matrix[i * sz + j] = distance; } } } } }
Le khowudi ingentla iphantse ifane ikopi yepseudocode ekhankanywe ngaphambili ngaphandle kokunye - endaweni Math.Min()
sisebenzisa if
sihlaziya iseli kuphela xa kuyimfuneko.
- Umfundi ocingayo
Isizathu silula. Ngomzuzu wokubhala i-JIT ikhupha ikhowudi ephantse ilingane kuzo zombini if
kunye nokuphunyezwa Math.Min
. Ungayijonga kwiinkcukacha kodwa nantsi i-snippets yemizimba ephambili ye-loop:
// // == Assembly code of implementation of innermost loop for of j using if. // 53: movsxd r14, r14d // // compare matrix[i * sz + j] and distance (Condition) // 56: cmp [rdx+r14*4+0x10], ebx 5b: jle short 64 // // if matrix[i * sz + j] greater than distance write distance to matrix // 5d: movsxd rbp, ebp 60: mov [rdx+rbp*4+0x10], ebx 64: // end of loop // // == Assembly code of implementation of innermost loop for of j using Math.Min. // 4f: movsxd rbp, ebp 52: mov r14d, [rdx+rbp*4+0x10] // // compare matrix[i * sz + j] and distance (Condition) // 57: cmp r14d, ebx. // // if matrix[i * sz + j] less than distance write value to matrix // 5a: jle short 5e // // otherwise write distance to matrix // 5c: jmp short 61 5e: mov ebx, r14d 61: mov [rdx+rbp*4+0x10], ebx 65: // end of loop
Ungabona, nokuba sisebenzisa if
okanye Math.Min
kusekho ujongo olunemiqathango. Nangona kunjalo, kwimeko if
akukho bhala ngokungeyomfuneko.
var max = v * (v - 1)) / 2; // where v - is a number of vertexes in a graph.
Masishukume!
Indlela | Ubungakanani | Ithetha | Impazamo | StdDev |
---|---|---|---|---|
Isiseko | 300 | 27.525 ms | 0.1937 ms | 0.1617 ms |
Isiseko | 600 | 217.897 ms | 1.6415 ms | 1.5355 ms |
Isiseko | 1200 | 1,763.335 ms | 7.4561 ms | 6.2262 ms |
Isiseko | 2400 | 14,533.335 ms | 63.3518 ms | 52.9016 ms |
Isiseko | 4800 | 119,768.219 ms | 181.5514 ms | 160.9406 ms |
Kwiziphumo esizibonayo, ixesha lokubala likhula ngokumangalisayo xa lithelekiswa nobukhulu begrafu - kwigrafu ye-300 vertexes ithathe i-27 milliseconds, igrafu ye-2400 vertex - imizuzwana ye-14.5 kunye negrafu ye-4800 - 119 imizuzwana phantse 2 imizuzu !
Nangona kunjalo, kwiimvavanyo zethu sisebenzisa i-directed, iigrafu ze-acyclic , ezinepropati emangalisayo - ukuba kukho indlela esuka kwi-vertex 1
ukuya kwi- 2
, ngoko akukho ndlela esuka kwi-vertex 2
ukuya kwi- 1
. Kithina, kuthetha ukuba, zininzi ii-vertex ezingadibananga esinokutsiba ukuba akukho ndlela esuka ku i
ukuya ku k
(sichaza oku kuphunyezwa njenge SpartialOptimisation
).
public void SpartialOptimisation(int[] matrix, int sz) { for (var k = 0; k < sz; ++k) { for (var i = 0; i < sz; ++i) { if (matrix[i * sz + k] == NO_EDGE) { continue; } for (var j = 0; j < sz; ++j) { var distance = matrix[i * sz + k] + matrix[k * sz + j]; if (matrix[i * sz + j] > distance) { matrix[i * sz + j] = distance; } } } } }
Nazi iziphumo zangaphambili ( Baseline
) kunye nangoku ( SpartialOptimisation
) yangoku kwiseti yeegrafu (iziphumo ezikhawulezayo ziphawulwe ngokungqindilili ):
Indlela | Ubungakanani | Ithetha | Impazamo | StdDev | Umlinganiselo |
---|---|---|---|---|---|
Isiseko | 300 | 27.525 ms | 0.1937 ms | 0.1617 ms | 1.00 |
SpartialOptimisation | 300 | 12.399 ms | 0.0943 ms | 0.0882 ms | 0.45 |
Isiseko | 600 | 217.897 ms | 1.6415 ms | 1.5355 ms | 1.00 |
SpartialOptimisation | 600 | 99.122 ms | 0.8230 ms | 0.7698 ms | 0.45 |
Isiseko | 1200 | 1,763.335 ms | 7.4561 ms | 6.2262 ms | 1.00 |
SpartialOptimisation | 1200 | 766.675 ms | 6.1147 ms | 5.7197 ms | 0.43 |
Isiseko | 2400 | 14,533.335 ms | 63.3518 ms | 52.9016 ms | 1.00 |
SpartialOptimisation | 2400 | 6,507.878 ms | 28.2317 ms | 26.4079 ms | 0.45 |
Isiseko | 4800 | 119,768.219 ms | 181.5514 ms | 160.9406 ms | 1.00 |
SpartialOptimisation | 4800 | 55,590.374 ms | 414.6051 ms | 387.8218 ms | 0.46 |
I-PC yam ixhotyiswe nge Inter Core i7-7700 CPU 3.60GHz (Kaby Lake)
iprosesa ene-8 logical processors ( ) okanye ii-cores ze-4 ezinobuchwepheshe . Ukuba nondoqo ongaphezulu kwesinye kufana nokuba “nezandla ezizezinye” esinokuzisebenzisa. Ke, masibone ukuba leliphi icandelo lomsebenzi elinokwahlulwa ngokufanelekileyo phakathi kwabasebenzi abaninzi kwaye emva koko, lingqamanise.
Masiqale for of k
loop. Kuluphu ngalunye lokuphindaphinda ubala iindlela ukusuka kwivertex nganye ukuya kwivertex nganye kwivertex k
. Iindlela ezintsha kunye nezihlaziyiweyo zigcinwa kwi-matrix yobunzima. Ukujonga uphindaphindo onokuthi uqaphele - banokubulawa ngalo naluphi na ulandelelwano: 0, 1, 2, 3 okanye 3, 1, 2, 0 ngaphandle kokuchaphazela isiphumo. Nangona kunjalo, kusafuneka ziphunyezwe ngokulandelelana, kungenjalo ezinye iiterations aziyi kukwazi ukusebenzisa iindlela ezintsha okanye ezihlaziyiweyo kuba aziyi kubhalwa kwi-matrix yobunzima, okwangoku. Ukungahambelani okunjalo ngokuqinisekileyo kuya kutyumza iziphumo. Ngoko kufuneka siqhubeke sikhangela.
Umgqatswa olandelayo for of i
loop. Kuluphu ngalunye lokuphinda lufundeka indlela esuka kwivertex i
ukuya kwivertex k
(iseli: W[i, k]
), indlela esuka kwivertex k
ukuya kwivertex j
(iseli: W[k, j
]) kwaye emva koko ijonga ukuba ngaba indlela eyaziwayo evela i
ukuya ku j
(iseli: W[i, j]
) mfutshane kuno i ⭢ k ⭢ j
indlela (isimbuku se: W[i, k]
+ W[k, j]
). Ukuba ujonga ngokusondeleyo kwipateni yokufikelela, unokuqaphela - kwi-iteration nganye i
loop ifunda idatha ukusuka k
kumqolo kwaye ihlaziywe i
rowu ethetha ngokusisiseko - uphindaphindo luzimele kwaye lunokuphunyezwa kungekuphela kulo naluphi na ulandelelwano kodwa nangokuhambelana!
Oku kubonakala kuthembisa, ngoko ke masikuphumeze (sichaza olu phumezo njenge SpartialParallelOptimisations
).
for of j
loop nayo inokudityaniswa. Nangona kunjalo, ukuhambelana komjikelo ongaphakathi kulo mzekelo akuphumelelanga kakhulu. Unokuziqinisekisa ngokwenza utshintsho olulula kwikhowudi .
- Inqaku lombhali
public void SpartialParallelOptimisations(int[] matrix, int sz) { for (var k = 0; k < sz; ++k) { Parallel.For(0, sz, i => { if (matrix[i * sz + k] == NO_EDGE) { return; } for (var j = 0; j < sz; ++j) { var distance = matrix[i * sz + k] + matrix[k * sz + j]; if (matrix[i * sz + j] > distance) { matrix[i * sz + j] = distance; } } }); } }
Nazi iziphumo ze Baseline
, SpartialOptimisation
kunye SpartialParallelOptimisations
uzalisekiso kwiseti enye yeegrafu (ukulinganisa kwenziwa kusetyenziswa class):
Indlela | Ubungakanani | Ithetha | Impazamo | StdDev | Umlinganiselo |
---|---|---|---|---|---|
Isiseko | 300 | 27.525 ms | 0.1937 ms | 0.1617 ms | 1.00 |
SpartialOptimisation | 300 | 12.399 ms | 0.0943 ms | 0.0882 ms | 0.45 |
SpartialParallelOptimisations | 300 | 6.252 ms | 0.0211 ms | 0.0187 ms | 0.23 |
Isiseko | 600 | 217.897 ms | 1.6415 ms | 1.5355 ms | 1.00 |
SpartialOptimisation | 600 | 99.122 ms | 0.8230 ms | 0.7698 ms | 0.45 |
SpartialParallelOptimisations | 600 | 35.825 ms | 0.1003 ms | 0.0837 ms | 0.16 |
Isiseko | 1200 | 1,763.335 ms | 7.4561 ms | 6.2262 ms | 1.00 |
SpartialOptimisation | 1200 | 766.675 ms | 6.1147 ms | 5.7197 ms | 0.43 |
SpartialParallelOptimisations | 1200 | 248.801 ms | 0.6040 ms | 0.5043 ms | 0.14 |
Isiseko | 2400 | 14,533.335 ms | 63.3518 ms | 52.9016 ms | 1.00 |
SpartialOptimisation | 2400 | 6,507.878 ms | 28.2317 ms | 26.4079 ms | 0.45 |
SpartialParallelOptimisations | 2400 | 2,076.403 ms | 20.8320 ms | 19.4863 ms | 0.14 |
Isiseko | 4800 | 119,768.219 ms | 181.5514 ms | 160.9406 ms | 1.00 |
SpartialOptimisation | 4800 | 55,590.374 ms | 414.6051 ms | 387.8218 ms | 0.46 |
SpartialParallelOptimisations | 4800 | 15,614.506 ms | 115.6996 ms | 102.5647 ms | 0.13 |
Kwiziphumo sinokubona ukuba ukunxulunyaniswa kwe for of i
loop kunciphise ixesha lokubala ngama-2-4 amaxesha xa kuthelekiswa nokuphunyezwa kwangaphambili ( SpartialOptimisation
)! Oku kuchukumisa kakhulu, nangona kunjalo, kubalulekile ukukhumbula, ukuhambelana kobalo olusulungekileyo kudla zonke izixhobo zekhompyutha ezikhoyo ezinokukhokelela ekulambeni kobutyebi kwezinye izicelo kwinkqubo.
var a = new [] { 5, 7, 8, 1 }; var b = new [] { 4, 2, 2, 6 }; var c = new [] { 0, 0, 0, 0 }; for (var i = 0; i < 4; ++i) c[i] = a[i] + b[i];
Ngendlela eyenziwe lula kakhulu , ukuphunyezwa kokuphinda-phinda oku- 0
koku kungasentla for
kwinqanaba le-CPU kunokucaciswa ngolu hlobo lulandelayo:
Nantsi into eyenzekayo. I-CPU ilayisha amaxabiso oluhlu a
kunye no b
ukusuka kwimemori ukuya kwiirejista ze-CPU (irejista enye inokubamba ngqo ixabiso elinye ). Emva koko i-CPU yenza umsebenzi wokongeza kwi-scalar kwezi rejista kwaye ibhale isiphumo sibuyele kwimemori ephambili - ngqo kuluhlu c
. Le nkqubo iphinda iphindwe kane ngaphambi kokuba i-loop iphele.
Nantsi into eyenzekayo. I-CPU ilayisha amaxabiso oluhlu a
kunye b
ukusuka kwimemori ukuya kwiirejista ze-CPU (nangona ngeli xesha, irejista yevektha enye inokubamba ixabiso loluhlu olubini ). Emva koko i-CPU yenza umsebenzi wokongeza i-vector kwezi rejista kwaye ibhale isiphumo sibuyele kwimemori ephambili - ngqo kuluhlu c
. Ngenxa yokuba sisebenza kumaxabiso amabini kanye, le nkqubo iphindwa kabini endaweni yesine.
public void SpartialVectorOptimisations(int[] matrix, int sz) { for (var k = 0; k < sz; ++k) { for (var i = 0; i < sz; ++i) { if (matrix[i * sz + k] == NO_EDGE) { continue; } var ik_vec = new Vector<int>(matrix[i * sz + k]); var j = 0; for (; j < sz - Vector<int>.Count; j += Vector<int>.Count) { var ij_vec = new Vector<int>(matrix, i * sz + j); var ikj_vec = new Vector<int>(matrix, k * sz + j) + ik_vec; var lt_vec = Vector.LessThan(ij_vec, ikj_vec); if (lt_vec == new Vector<int>(-1)) { continue; } var r_vec = Vector.ConditionalSelect(lt_vec, ij_vec, ikj_vec); r_vec.CopyTo(matrix, i * sz + j); } for (; j < sz; ++j) { var distance = matrix[i * sz + k] + matrix[k * sz + j]; if (matrix[i * sz + j] > distance) { matrix[i * sz + j] = distance; } } } } }
Ikhowudi yeVectorised inokujongeka ingaqhelekanga, ke masihambe ngayo inyathelo ngenyathelo.
Senza i-vector ye- i ⭢ k
indlela: var ik_vec = new Vector<int>(matrix[i * sz + k])
. Njengomphumo, ukuba i-vector inokubamba amaxabiso amane ohlobo int
kunye nobunzima be i ⭢ k
indlela ilingana no-5, siya kudala i-vector yesine 5's - [5, 5, 5, 5]. Okulandelayo, kuphindaphindo ngalunye, ngaxeshanye sibala iindlela ukusuka kwi-vertex i
ukuya kwi-vertexes j, j + 1, ..., j + Vector<int>.Count
.
Vector<int>.Count
lubuyisela inani leezakhi zodidi int
olungena kwiirejista ze-vector. Ubungakanani beerejista zeVektha buxhomekeke kwimodeli ye-CPU, ke le propati inokubuyisela amaxabiso awohlukeneyo kwii-CPU's.
- Inqaku lombhali
ij_vec
kunye ikj_vec
.ij_vec
kunye ikj_vec
iivektha kwaye ukhethe amaxabiso amancinci kwivektha entsha r_vec
.r_vec
umva ubunzima matrix.
Ngelixa i-#1 kunye ne #3 zithe ngqo, #2 ifuna ingcaciso. Njengoko bekutshiwo ngaphambili, ngee-vectors sisebenzisa amaxabiso amaninzi ngexesha elinye. Ngoko ke, isiphumo semisebenzi ethile ayinakuba sisinye, oko kukuthi, umsebenzi wothelekiso Vector.LessThan(ij_vec, ikj_vec)
ayinakubuya true
okanye false
kuba ithelekisa amaxabiso amaninzi. Ngoko endaweni yoko ibuyisela i-vector entsha equlathe iziphumo zothelekiso phakathi kwamaxabiso ahambelanayo asuka kwivektha ij_vec
kunye ne ikj_vec
( -1
, ukuba ixabiso elisuka kwi ij_vec
lingaphantsi kwexabiso kwi ikj_vec
kunye no 0
ukuba kungenjalo). IVektha ebuyiselweyo (ngokwayo) ayiloncedo ngenene, nangona kunjalo, sinokusebenzisa Vector.ConditionalSelect(lt_vec, ij_vec, ikj_vec)
ukusebenza kwevektha ukukhupha amaxabiso afunekayo kwi ij_vec
kunye ne ikj_vec
vector kwi-vector entsha – r_vec
. Lo msebenzi ubuyisela ivektha entsha apho amaxabiso alingana namancinane amabini ahambelanayo amaxabiso eevektha zongeniso okt, ukuba ixabiso le vector lt_vec
lilingana no -1
, ngoko umsebenzi ukhetha ixabiso kwi ij_vec
, kungenjalo ikhetha ixabiso kwi ikj_vec
.
Ngaphandle kwe- #2 , kukho enye inxalenye, ifuna inkcazo-yesibini, i-loop ye-non-vectorised. Kufuneka xa ubungakanani bobunzima be-matrix bungeyomveliso Vector<int>.Count
ixabiso. Kwimeko apho i-loop engundoqo ayikwazi ukucubungula onke amaxabiso (kuba awukwazi ukulayisha inxalenye ye-vector) kwaye okwesibini, i-non-vectored, i-loop iya kubala ukuphindaphinda okuseleyo.
Indlela | sz | Ithetha | Impazamo | StdDev | Umlinganiselo |
---|---|---|---|---|---|
Isiseko | 300 | 27.525 ms | 0.1937 ms | 0.1617 ms | 1.00 |
SpartialOptimisation | 300 | 12.399 ms | 0.0943 ms | 0.0882 ms | 0.45 |
SpartialParallelOptimisations | 300 | 6.252 ms | 0.0211 ms | 0.0187 ms | 0.23 |
SpartialVectorOptimisations | 300 | 3.056 ms | 0.0301 ms | 0.0281 ms | 0.11 |
Isiseko | 600 | 217.897 ms | 1.6415 ms | 1.5355 ms | 1.00 |
SpartialOptimisation | 600 | 99.122 ms | 0.8230 ms | 0.7698 ms | 0.45 |
SpartialParallelOptimisations | 600 | 35.825 ms | 0.1003 ms | 0.0837 ms | 0.16 |
SpartialVectorOptimisations | 600 | 24.378 ms | 0.4320 ms | 0.4041 ms | 0.11 |
Isiseko | 1200 | 1,763.335 ms | 7.4561 ms | 6.2262 ms | 1.00 |
SpartialOptimisation | 1200 | 766.675 ms | 6.1147 ms | 5.7197 ms | 0.43 |
SpartialParallelOptimisations | 1200 | 248.801 ms | 0.6040 ms | 0.5043 ms | 0.14 |
SpartialVectorOptimisations | 1200 | 185.628 ms | 2.1240 ms | 1.9868 ms | 0.11 |
Isiseko | 2400 | 14,533.335 ms | 63.3518 ms | 52.9016 ms | 1.00 |
SpartialOptimisation | 2400 | 6,507.878 ms | 28.2317 ms | 26.4079 ms | 0.45 |
SpartialParallelOptimisations | 2400 | 2,076.403 ms | 20.8320 ms | 19.4863 ms | 0.14 |
SpartialVectorOptimisations | 2400 | 2,568.676 ms | 31.7359 ms | 29.6858 ms | 0.18 |
Isiseko | 4800 | 119,768.219 ms | 181.5514 ms | 160.9406 ms | 1.00 |
SpartialOptimisation | 4800 | 55,590.374 ms | 414.6051 ms | 387.8218 ms | 0.46 |
SpartialParallelOptimisations | 4800 | 15,614.506 ms | 115.6996 ms | 102.5647 ms | 0.13 |
SpartialVectorOptimisations | 4800 | 18,257.991 ms | 84.5978 ms | 79.1329 ms | 0.15 |
Ukususela kwisiphumo kuyacaca, i-vectorisation iye yanciphisa kakhulu ixesha lokubala - ukusuka kwi-3 ukuya kwi-4 amaxesha xa kuthelekiswa ne-non-parallelised version ( SpartialOptimisation
). Umzuzu onomdla apha, kukuba i-vectorised version iphinda idlulele kwi-parallel version ( SpartialParallelOptimisations
) kwiigrafu ezincinci (ngaphantsi kweeverteksi ze-2400).
Ukuba unomdla kwisicelo esisebenzayo se-vectorisation, ndincoma ukuba ufunde uchungechunge lwezithuba zikaDan apho wafaka khona Array.Sort
(ezi ziphumo kamva zifumene uhlaziyo lwe-Garbage Collector kwi-.NET ).
Uzalisekiso lokugqibela ludibanisa iinzame zongqamaniso kunye novetho kunye ... ukunyaniseka alunamntu 🙂 Kuba… ke, sisanda kutshintsha umzimba Parallel.For
kwi SpartialParallelOptimisations
kunye ne-vectorised loop evela kwi SpartialVectorOptimisations
:
public void SpartialParallelVectorOptimisations(int[] matrix, int sz) { for (var k = 0; k < sz; ++k) { Parallel.For(0, sz, i => { if (matrix[i * sz + k] == NO_EDGE) { return; } var ik_vec = new Vector<int>(matrix[i * sz + k]); var j = 0; for (; j < sz - Vector<int>.Count; j += Vector<int>.Count) { var ij_vec = new Vector<int>(matrix, i * sz + j); var ikj_vec = new Vector<int>(matrix, k * sz + j) + ik_vec; var lt_vec = Vector.LessThan(ij_vec, ikj_vec); if (lt_vec == new Vector<int>(-1)) { continue; } var r_vec = Vector.ConditionalSelect(lt_vec, ij_vec, ikj_vec); r_vec.CopyTo(matrix, i * sz + j); } for (; j < sz; ++j) { var distance = matrix[i * sz + k] + matrix[k * sz + j]; if (matrix[i * sz + j] > distance) { matrix[i * sz + j] = distance; } } }); } }
Zonke iziphumo zesi sithuba zithiwe thaca apha ngezantsi. Njengoko kulindelekile, ukusetyenziswa ngaxeshanye ukuhambelana kunye ne-vectorisation kubonise iziphumo ezilungileyo, ukunciphisa ixesha lokubala ukuya kumaxesha angama-25 (kwiigrafu ze-1200 vertexes) xa kuthelekiswa nokuphunyezwa kokuqala.
Indlela | sz | Ithetha | Impazamo | StdDev | Umlinganiselo |
---|---|---|---|---|---|
Isiseko | 300 | 27.525 ms | 0.1937 ms | 0.1617 ms | 1.00 |
SpartialOptimisation | 300 | 12.399 ms | 0.0943 ms | 0.0882 ms | 0.45 |
SpartialParallelOptimisations | 300 | 6.252 ms | 0.0211 ms | 0.0187 ms | 0.23 |
SpartialVectorOptimisations | 300 | 3.056 ms | 0.0301 ms | 0.0281 ms | 0.11 |
SpartialParallelVectorOptimisations | 300 | 3.008 ms | 0.0075 ms | 0.0066 ms | 0.11 |
Isiseko | 600 | 217.897 ms | 1.6415 ms | 1.5355 ms | 1.00 |
SpartialOptimisation | 600 | 99.122 ms | 0.8230 ms | 0.7698 ms | 0.45 |
SpartialParallelOptimisations | 600 | 35.825 ms | 0.1003 ms | 0.0837 ms | 0.16 |
SpartialVectorOptimisations | 600 | 24.378 ms | 0.4320 ms | 0.4041 ms | 0.11 |
SpartialParallelVectorOptimisations | 600 | 13.425 ms | 0.0319 ms | 0.0283 ms | 0.06 |
Isiseko | 1200 | 1,763.335 ms | 7.4561 ms | 6.2262 ms | 1.00 |
SpartialOptimisation | 1200 | 766.675 ms | 6.1147 ms | 5.7197 ms | 0.43 |
SpartialParallelOptimisations | 1200 | 248.801 ms | 0.6040 ms | 0.5043 ms | 0.14 |
SpartialVectorOptimisations | 1200 | 185.628 ms | 2.1240 ms | 1.9868 ms | 0.11 |
SpartialParallelVectorOptimisations | 1200 | 76.770 ms | 0.3021 ms | 0.2522 ms | 0.04 |
Isiseko | 2400 | 14,533.335 ms | 63.3518 ms | 52.9016 ms | 1.00 |
SpartialOptimisation | 2400 | 6,507.878 ms | 28.2317 ms | 26.4079 ms | 0.45 |
SpartialParallelOptimisations | 2400 | 2,076.403 ms | 20.8320 ms | 19.4863 ms | 0.14 |
SpartialVectorOptimisations | 2400 | 2,568.676 ms | 31.7359 ms | 29.6858 ms | 0.18 |
SpartialParallelVectorOptimisations | 2400 | 1,281.877 ms | 25.1127 ms | 64.8239 ms | 0.09 |
Isiseko | 4800 | 119,768.219 ms | 181.5514 ms | 160.9406 ms | 1.00 |
SpartialOptimisation | 4800 | 55,590.374 ms | 414.6051 ms | 387.8218 ms | 0.46 |
SpartialParallelOptimisations | 4800 | 15,614.506 ms | 115.6996 ms | 102.5647 ms | 0.13 |
SpartialVectorOptimisations | 4800 | 18,257.991 ms | 84.5978 ms | 79.1329 ms | 0.15 |
SpartialParallelVectorOptimisations | 4800 | 12,785.824 ms | 98.6947 ms | 87.4903 ms | 0.11 |